-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v3 codec structure in zarr.json #298
Comments
My take is that this kind of change is hard to get in at this point. Also, after having implemented codec pipelines in Python, Java and Scala, it doesn't feel like an undue burden to implement the validation logic. In zarr-python the validation code is really just a few lines of code. |
It's also not an undue burden for us to make material improvements to the v3 spec when we can, and I think the changes I'm proposing here are definitely improvements. If the spec is effectively immutable, then we can table this for zarr v4 :) |
Even if it's too late to change course, I would prefer the additional properties under |
the v3 spec states that codecs are stored in a JSON array under the key
codecs
. But the spec also states that the list of codecs is structured:This is actually a lot of semantic load for something simple like a JSON array. Instead of using a JSON array, I believe that the above structure could be expressed much better (where "expressed better" means "conveys intent more clearly, with no loss of information, and minimal added complexity") by using a JSON object with the following structure:
I am noting this because over in the
zarr-python
v3 implementation effort, we have written something like the above data structure as part of the basic parsing of the contents ofzarr.json
. In fact I think this data structure will arise in any implementation, because implementations must represent the structure of the codecs, and that structure is not captured at all by the JSON array representation. But, as I show here, it is trivial to describe the codec structure explicitly with JSON. A corollary benefit is that the above proposed data structure expresses much better the constraint that there be just 1 array -> bytes codec, which would reduce some validation burden from implementations.So, if we care about making this easier for implementations (and I think making it easy for implementations also makes it easier for users), we should considering this change to
zarr.json
. There is no change to the semantics of the spec, but it makeszarr.json
more clear. I understand that people may not want to change the spec. But I consider that a separate question from whether the current spec has defects that could in principle be fixed, such as the one described here.The text was updated successfully, but these errors were encountered: