Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add base methods for iterator serialization #924

Merged
merged 15 commits into from
Aug 8, 2023

Conversation

CodyCBakerPhD
Copy link
Contributor

@CodyCBakerPhD CodyCBakerPhD commented Jul 30, 2023

Motivation

Zarr supports efficient parallelization, and a sister PR (hdmf-dev/hdmf-zarr#111) adds this capability but requires the argument number_of_jobs to be propagated from a top level of the HDMFIO.write due to the super call at this line

It also has a subtle condition in that our iterators (which in this role primarily define how to access data from a source) must be pickleable, so I've defined methods related to that for the GenericDataChunkIterator (which for several reasons should be the only type of iterator allowed for usage in parallelization)

An example in practice can be found downstream: catalystneuro/neuroconv#536

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running ruff from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

pyproject.toml Outdated Show resolved Hide resolved
src/hdmf/data_utils.py Outdated Show resolved Hide resolved
src/hdmf/backends/io.py Outdated Show resolved Hide resolved
src/hdmf/backends/io.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Aug 4, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01% 🎉

Comparison is base (ca7722f) 88.33% compared to head (434a21f) 88.34%.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #924      +/-   ##
==========================================
+ Coverage   88.33%   88.34%   +0.01%     
==========================================
  Files          45       45              
  Lines        9284     9292       +8     
  Branches     2651     2653       +2     
==========================================
+ Hits         8201     8209       +8     
  Misses        765      765              
  Partials      318      318              
Files Changed Coverage Δ
src/hdmf/backends/io.py 96.47% <100.00%> (ø)
src/hdmf/data_utils.py 90.42% <100.00%> (+0.17%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@CodyCBakerPhD CodyCBakerPhD changed the title Add 'number_of_jobs' to top-level io.write() for Zarr parallel support Add base methods for iterator serialization Aug 4, 2023
@CodyCBakerPhD CodyCBakerPhD marked this pull request as ready for review August 4, 2023 21:58
@CodyCBakerPhD
Copy link
Contributor Author

@rly @oruebel Ready for review

@CodyCBakerPhD CodyCBakerPhD requested a review from rly August 7, 2023 18:59
@rly rly merged commit 3f3586a into hdmf-dev:dev Aug 8, 2023
@rly rly deleted the add_parallel_zarr branch August 8, 2023 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants