Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(stores): draft zip file store specification #311

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions docs/v3/stores/zipfile/v1.0.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
.. _zip-file-store-v1:

=============================
ZIP file store (version 1.0)
=============================

Specification URI:
https://zarr-specs.readthedocs.io/en/latest/v3/stores/zipfile/v1.0.html
Corresponding ZEP:
`ZEP0001 — Zarr specification version 3 <https://zarr.dev/zeps/accepted/TODO.html>`_
Issue tracking:
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/stores-filesystem-v1.0>`_
Suggest an edit for this spec:
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/main/docs/v3/stores/zipfile/v1.0.rst>`_

Copyright 2024-Present Zarr core development team. This work is
licensed under a `Creative Commons Attribution 3.0 Unported License
<https://creativecommons.org/licenses/by/3.0/>`_.

----


Abstract
========

The ZIP file store provides a way to store Zarr arrays and metadata in a
single, compressed ZIP archive. This specification defines the organization
and structure of the ZIP file to ensure compatibility across implementations.

Status of this document
=======================

This is a working draft.

Notes about design decisions for the native ZIP File Store
==========================================================

The ZIP file store is designed for simplicity and easy conversion to and from
the filesystem stores. The ability to ZIP a file system store using standard
command line tools (e.g. `zip`) is a key virtue of the design below.

Document conventions
====================

Conformance requirements are expressed with a combination of
descriptive assertions and [RFC2119]_ terminology. The key words
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
parts of this document are to be interpreted as described in
[RFC2119]_. However, for readability, these words do not appear in all
uppercase letters in this specification.

All of the text of this specification is normative except sections
explicitly marked as non-normative, examples, and notes. Examples in
this specification are introduced with the words "for example".


Native storage operations
=========================

Here we consider a ZIP file to be a standard ZIP archive, where:

* Each key has a name (sequence of characters) and contents
(sequence of bytes).
Comment on lines +63 to +64
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the keys are relative paths (not prefixed with a /).


* Each directory has a name (sequence of characters) and children (set
of zero or more files and/or directories).

* Each file or directory can be addressed by a path, comprised of its
name and the names of all ancestor directories, which uniquely
identifies it within the file system.

… and where the following native operations are supported:

* Create a file.

* Write the contents of a file.

* Read the contents of a file.

* Create a directory.

* List the children of a directory, returning the name and type (file
or directory) of each child.

… Note that the following operations may not be supported by ZIP file stores:

* Delete a file.

* Delete a directory.
Comment on lines +88 to +90
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #103


Key translation
===============

The Zarr store interface is defined in terms of `keys` and `values`,
where a `key` is a sequence of characters and a `value` is a sequence
of bytes. A ZIP file store represents keys as paths within a ZIP
archive. No further translation of keys is required.

Store API implementation
========================

The section below defines an implementation of the Zarr
:ref:`abstract-store-interface` in terms of the native operations of this
storage system.

* ``get(key) -> value`` : Read and return the contents of the object at
within the archive at path ``key``.

* ``set(key, value)`` : Write ``value`` as the contents of the file at
into the archive at path ``key .
Comment on lines +107 to +111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the use of at within and at into in these lines intentional? Sounds like a typo


* ``list()`` : List all keys in the archive.

* ``list_prefix(prefix)`` : List all keys within the archive that begin
with ``prefix``.

* ``list_dir(prefix)`` : List all keys within a directory within the archive.


Canonical URI
=============

The canonical URI format for this store follows the file URI scheme of the
archive itself, as defined in [RFC8089]_. For a Windows base directory path
"c:\\my data.zip" the canonical URI would be "file:///c:/my%20data.zip",
for a Posix base directory "/my data.zip" it would be"file:///my%20data.zip".


Store limitations
=================

The following limitations for this store are know:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following limitations for this store are know:
The following limitations for this store are known:


* ZIP file stores may not implement delete or rename operations


References
==========

.. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
Requirement Levels. March 1997. Best Current Practice. URL:
https://tools.ietf.org/html/rfc2119

.. [RFC8089] M. Kerwin. The "file" URI Scheme. February 2017. Proposed Standard.
URL: https://tools.ietf.org/html/rfc8089


Change log
==========

@@TODO