-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requirement: Include both a format and a data/publication version number #13
Comments
I support the addition of both a data format and data publication version. The motivation for the format version is to make the format self describing and for identification, i.e. as a signature to match. It would also allow future evolution of the fundamental portions of the format. The data publication version was discussed during the previous evaluation. With miniSEED 2.x there is no versioning built into the format. Some data centers used the "data quality" identifier as a crude form of versioning, but this is extremely limited with only 4 "levels" and a vague implication of "quality". Tracking data versions, which are a reality in modern data management and use, is especially important for scientific data. Including the capability to identify versions directly in the format allows for basic versioning and can be used by systems external to the format for extended, version-specific metadata. |
Data/publication version number should be an optional (IRIS) extension. Linear version numbers do not support "forks" where data has been modified in multiple datacentres. I would not hardcode this feature into the standard, because something more clever might be needed in future. |
Format version is critical for sure. Andres has a good point about linear version numbers not working well with forks. If two data centers both receive version 7 of the data, each does something and then has a different version 8. The alternatives are to either name-space the data version (perhaps within the additional headers) or to declare that the data version has no meaning beyond the context of the datacenter where it was created. |
That is exactly the conclusion we got to in the previous conversation last July on this topic. In the case of the IRIS DMC, I think we would work with those that contribute data so that the version is done by the owner whenever possible. A system that identifies relative relationships between versions across forks and data centers would require some sort of central registry or much more complexity. I suspect a data publication version in a record would be useful for many data centers, justifying a fixed 1 byte it would use, but it would be OK to use an optional header for this if that's where the consensus lands. |
Summary(Please let me know if I missed a point or misunderstood something) Please vote on:
|
1 yes |
|
|
|
Yes
Yes
No |
|
|
|
Include both a format and a data/publication version number.
The text was updated successfully, but these errors were encountered: