-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FAIR data decisions: Lossy or lossless #27
Comments
"Unfortunately there is no easy way of specifying the degree of data loss in any aspiring FAIR dataset as metadata information." Do you mean that once only the cleaned data are presented (e.g. category 4) it is impossible for another person to quantify the loss from category 1? Though this would not preserve the lost information, metadata for a shared, cleaned dataset should ideally contain information about the cleaning process, up to and including any scripts that were used to do the cleaning. Besides scripts, a narrative description of the cleaning process and any reasonable explanation of what information has been lost is good practice. I'm a strong proponent of trying not to let the perfect get in the way of the practical (or any improvement upon the status quo). For situations where sharing and preserving large data sets are impractical, sharing category 4 is a vast improvement. Do you recommend revising categories or the standards define for each? |
NASA EOSDIS data products use a defined classification of Data Processing Levels. If such a classification is available for other data products then maybe it is enough to include that level specification in the metadata. |
I would say the whole point is that there is no one FAIR. FAIR is a landscape of degrees - or levels.
|
The challenge will be distilling the “in common” without enforcing one view or need |
One of the issues often confronted by depositors of aspiring FAIR data is how much data loss to tolerate. I give just one example, crystallographic data in chemistry (often described as the Gold Standard in chemical Data). There are the following hierarchies, with increasing data loss:
So most consumers of say category 4 would find it adequately FAIR for their needs, but some specialist users would find it too lossy, and might need to go as high as category 1. The trouble is that this type of data might be as much as 10,000 times larger than the minimal set.
Unfortunately there is no easy way of specifying the degree of data loss in any aspiring FAIR dataset as metadata information. This remember is considered the "gold" standard. One finds similar situations in other types of chemical data.
The text was updated successfully, but these errors were encountered: