Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec version vs zarr_format #299

Open
d-v-b opened this issue Jun 13, 2024 · 4 comments
Open

Spec version vs zarr_format #299

d-v-b opened this issue Jun 13, 2024 · 4 comments

Comments

@d-v-b
Copy link
Contributor

d-v-b commented Jun 13, 2024

The zarr_format metadata is an integer, but the spec document uses a string identifier that can represent major and minor versions. So, unlike the spec document, the zarr_format metadata cannot ever represent a minor version. Is this a problem? It seems like skew between the spec version and zarr_format is a recipe for trouble, but I don't see how to fix this without some disruption.

cc @WardF, as this relates to some of our conversations from the community meeting the other day, and I think the netcdf perspective would be useful here.

@jbms
Copy link
Contributor

jbms commented Jun 13, 2024

It was intentional that zarr_format not include a precise version number of the spec. Instead, it is intended that the spec defines some broad compatibility guarantees, and zarr_format only needs to be updated when we need to step outside of those guarantees.

The rationale is:

  1. If we include a precise version in the spec, then when creating the metadata, the implementation will need to choose which version to specify. For maximum compatibility, we want this version to be as low as possible, but it also need to be high enough to support all of the features that are used. Therefore we need some logic to figure out the minimum version of the zarr spec that supports all of the features that are used. Furthermore, if some of these features are experimental/still in the process of being standardized then the minimum version in some sense doesn't even exist yet.
  2. When reading the metadata, if the implementation encounters a newer version number than is known, there is also the question of what to do. One option is to just fail with an error immediately. However, it seems likely that at least some writers may not carefully choose the minimum version that supports all features actually used in the metadata, and instead may just pick the latest known version. In that case, it would be better when reading to just ignore the version number and attempt to parse the metadata anyway, and only fail if an unsupported feature is encountered.

If we don't include a precise version number, then when creating an array we don't have to worry about picking a version number, and when reading an array, we can still just validate the metadata according to the actual features in use.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jun 13, 2024

thanks @jbms, that's helpful. I think it would be good to write this logic into the spec. I will ping you if I submit a PR to that effect.

@yarikoptic
Copy link

if I got it right, this relates to

as a formalization of those "features" used/present in any given Zarr of a "major" zarr_format version. Is my understanding correct?

@d-v-b
Copy link
Contributor Author

d-v-b commented Sep 11, 2024

I think this discussion and #262 concern different levels of abstraction.

The properties of a particular version of zarr are formalized by the relevant zarr specification. See the specification for zarr version 2, or the specification for zarr version 3. I raised this issue to discuss a particular detail about how the zarr v3 specification defines the metadata that declares which version of zarr it is.

By contrast, ZEP 4 is at a higher level of abstraction: it concerns formalizing specifications of conventions that contain zarr data. To quote that ZEP, "A Zarr implementation itself should not even be aware of the existence of the convention.".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants