-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define the list of codecs in the v3 spec #312
Changes from 5 commits
9216892
92f3b12
163a867
f273390
4c10158
04f99c7
c70bcb1
02d5aa8
3519fe5
4a5e522
a0f469f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -295,7 +295,7 @@ The following figure illustrates the first part of the terminology: | |||||||||||||||||||||
|
||||||||||||||||||||||
*Codec* | ||||||||||||||||||||||
|
||||||||||||||||||||||
The list of *codecs* specified for an array_ determine the encoded byte | ||||||||||||||||||||||
The list of *codecs* specified for an array_ determines the encoded byte | ||||||||||||||||||||||
representation of each chunk in the store_. | ||||||||||||||||||||||
|
||||||||||||||||||||||
.. _metadata document: | ||||||||||||||||||||||
|
@@ -634,13 +634,12 @@ mandatory names: | |||||||||||||||||||||
``codecs`` | ||||||||||||||||||||||
^^^^^^^^^^ | ||||||||||||||||||||||
|
||||||||||||||||||||||
Specifies a list of codecs to be used for encoding and decoding chunks. The | ||||||||||||||||||||||
value must be an array of objects, each object containing a member with | ||||||||||||||||||||||
``name`` whose value is a string referring to a v3 codec specification. The | ||||||||||||||||||||||
codec object may also contain a ``configuration`` object which consists of | ||||||||||||||||||||||
the parameter names and values as defined by the corresponding codec | ||||||||||||||||||||||
specification. Since an ``array -> bytes`` codec must be specified, the | ||||||||||||||||||||||
list cannot be empty. | ||||||||||||||||||||||
Specifies a list of codecs to be used for encoding and decoding chunks. The | ||||||||||||||||||||||
value MUST be an array of objects, where each object contains a member with the name | ||||||||||||||||||||||
``name`` whose value is a string. The ``name`` member identifies the specification for the codec. A | ||||||||||||||||||||||
codec object MAY contain a member named ``configuration``, which is an object defined by the respective codec | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should be changed back to "parameter names and values" instead of "object", since this is a) out of scope of this PR, and b) I think parameter names and values makes more sense because this section is taking about a JSON document (I think??). |
||||||||||||||||||||||
specification. Because ``codecs`` MUST contain an ``array -> bytes`` codec, the | ||||||||||||||||||||||
list cannot be empty (See :ref:`codecs <codecs>`). | ||||||||||||||||||||||
|
||||||||||||||||||||||
The following members are optional: | ||||||||||||||||||||||
|
||||||||||||||||||||||
|
@@ -1211,40 +1210,42 @@ the following procedure: | |||||||||||||||||||||
|
||||||||||||||||||||||
4. The chunk array ``A`` is equal to ``EC[0]``. | ||||||||||||||||||||||
|
||||||||||||||||||||||
.. _codec-specification: | ||||||||||||||||||||||
|
||||||||||||||||||||||
Specifying codecs | ||||||||||||||||||||||
----------------- | ||||||||||||||||||||||
|
||||||||||||||||||||||
To allow for flexibility to define and implement new codecs, this | ||||||||||||||||||||||
specification does not define any codecs, nor restrict the set of | ||||||||||||||||||||||
codecs that may be used. Each codec must be defined via a separate | ||||||||||||||||||||||
specification. In order to refer to codecs in array metadata | ||||||||||||||||||||||
documents, each codec must have a unique identifier, which is a URI | ||||||||||||||||||||||
that dereferences to a human-readable specification of the codec. A | ||||||||||||||||||||||
codec specification must declare the codec identifier, and describe | ||||||||||||||||||||||
(or cite documents that describe) the encoding and decoding algorithms | ||||||||||||||||||||||
and the format of the encoded data. | ||||||||||||||||||||||
|
||||||||||||||||||||||
A codec may have configuration parameters which modify the behaviour | ||||||||||||||||||||||
of the codec in some way. For example, a compression codec may have a | ||||||||||||||||||||||
compression level parameter, which is an integer that affects the | ||||||||||||||||||||||
resulting compression ratio of the data. Configuration parameters must | ||||||||||||||||||||||
be declared in the codec specification, including a definition of how | ||||||||||||||||||||||
configuration parameters are represented as JSON. | ||||||||||||||||||||||
|
||||||||||||||||||||||
The Zarr core development team maintains a repository of codec | ||||||||||||||||||||||
specifications, which are hosted alongside this specification in the | ||||||||||||||||||||||
`zarr-specs GitHub repository`_, and which are | ||||||||||||||||||||||
published on the `zarr-specs documentation Web site | ||||||||||||||||||||||
<https://zarr-specs.readthedocs.io/>`_. For ease of discovery, it is | ||||||||||||||||||||||
recommended that codec specifications are contributed to the | ||||||||||||||||||||||
zarr-specs GitHub repository. However, codec specifications may be | ||||||||||||||||||||||
maintained by any group or organisation and published in any location | ||||||||||||||||||||||
on the Web. For further details of the process for contributing a | ||||||||||||||||||||||
codec specification to the zarr-specs GitHub repository, see | ||||||||||||||||||||||
`ZEP 0 <https://zarr.dev/zeps/active/ZEP0000.html>`_ which describes | ||||||||||||||||||||||
the process for Zarr specification changes. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Further details of how codecs are configured for an array are given in the `Array metadata`_ section. | ||||||||||||||||||||||
.. _core-codecs: | ||||||||||||||||||||||
|
||||||||||||||||||||||
Core codecs | ||||||||||||||||||||||
----------- | ||||||||||||||||||||||
|
||||||||||||||||||||||
This spec defines a set of codecs ("core codecs") which all Zarr implementations SHOULD implement in | ||||||||||||||||||||||
order to ensure a minimal level of interoperability between Zarr implementations. | ||||||||||||||||||||||
Each core codec is each defined by specification documents which are hosted in the | ||||||||||||||||||||||
`zarr-specs GitHub repository`_, and are published on the `zarr-specs documentation web site | ||||||||||||||||||||||
<https://zarr-specs.readthedocs.io/>`_. The list of core codecs is part of the Zarr v3 specification. | ||||||||||||||||||||||
Changes to the list of core codecs MUST be made via the same protocol used for | ||||||||||||||||||||||
changing the Zarr v3 specification. Changes to the list of core codecs SHOULD be made carefully and | ||||||||||||||||||||||
in close collaboration with extant Zarr v3 implementations. A new core codec SHOULD be added to the | ||||||||||||||||||||||
d-v-b marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
list when a sufficient number of Zarr implementations support or intend to support that codec. | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is sufficient? 50%? 80%? 100%? This should be clarified. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any number we pick will be arbitrary. I think we have to rely on people's judgment here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think picking and agreeing on an arbitrary number is better than not picking a number and then leaving the door open to discussions/disagreements in the future on what a "sufficient number" means. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess because it's a "SHOULD" we can put in a completely arbitrary number and then ignore it. How about 60% There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. or "an overwhelming majority"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe it's worth removing this sentence completely? I don't particularly like adding an arbitrary number either as per your comments 😄 . I think it would be fine to remove this sentence, and if someone wants to add a core codec they can justify it as part of the normal ZEP process. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove text completely -- I like this solution :) |
||||||||||||||||||||||
An existing core codec SHOULD be removed from the list when a sufficient number of implementation | ||||||||||||||||||||||
developers and Zarr users deem the codec worth removing, e.g. because of a technical flaw in the | ||||||||||||||||||||||
algorithm underlying the codec. | ||||||||||||||||||||||
Comment on lines
+1232
to
+1234
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One of the main goals (to me) of specifying core codecs is to ensure backwards compatibility long into the future. Removing a codec reduces backwards compatibility, so I think there should be very strict conditions under which a codec should be removed.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need a reference to the ZIC here, since elsewhere in the PR I state that the list of core codecs should be changed according to the same process for changing the spec itself There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "To ensure backwards compatibility, existing core codecs SHOULD NOT be removed from the list." reads better to me There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also something I thought of last night: if a codec is removed from the list of core codecs, it is still available as an extension codec. So I don't think there's actually a large backwards compatibility risk. |
||||||||||||||||||||||
|
||||||||||||||||||||||
Community codecs | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is "Community" used elsewhere? Maybe "Extension", which I think is used in other places ("extension data types" at least). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I forget how I picked the word "community"... I don't think it's a term of art in the spec. If "extension" is a better fit I'm happy to switch to that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I favor "extension" over community because it fits with the existing Zarr extension framework. |
||||||||||||||||||||||
---------------- | ||||||||||||||||||||||
|
||||||||||||||||||||||
Zarr implementations MAY support a codec that is not in the list of core codecs | ||||||||||||||||||||||
(hereafter termed a "community codec"), provided the community codec does not use an identifier | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it worth namespacing codec identifiers somehow? I'm slightly against, since ideally codecs can graduate from "community" (or "extension") to "core" (after careful consideration). It'd be nice to avoid needing to rewrite metadata files when that happens. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we do any namespacing of the identifiers, we should not use any information about whether the codec is core or community / extension, for the reasons you note. Otherwise I don't have a preference. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think what is written here is fine. Namespacing codecs sounds complicated. |
||||||||||||||||||||||
that is already used by a core codec, as the identifiers of core codecs are reserved. | ||||||||||||||||||||||
|
||||||||||||||||||||||
This specification places no other constraints on community codecs. It is possible that separate | ||||||||||||||||||||||
developers may define distinct codecs that use the same identifier. | ||||||||||||||||||||||
d-v-b marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
To minimize the impact of such name collisions, codec developers are strongly encouraged | ||||||||||||||||||||||
to publish their codec specifications as additions to the "community codecs" section of Zarr v3 specification. | ||||||||||||||||||||||
Publication in the "community codecs" section does not confer primacy or an official designation to a codec. | ||||||||||||||||||||||
The list of community codecs exists expressly as a tool to enable coordinated codec development. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Stores | ||||||||||||||||||||||
====== | ||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the requirement was
name
refers to a v3 codec spec, so I guess there would be a catalogue of v3 codecs where I could go an look up the specification.Now the name is abitrary, so given the name how do I know where to find the codec specification, so if I get a bit of zarr data I can find out how to decode it.
And how do we avoid naming conflicts between community codecs? Or is this not our concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my concerns are resolved by changes under "Extension codecs" below, so maybe worth either repeating a bit of that or signposting that section here?