-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
h5py file writing is slow #16
Comments
MPI must be available |
If a fold contains all images (slow/fast) of medulloblastoma and pilocytic astrocytoma, it takes 6+ hours for only the training set. This is unfeasible. The transfer is slow for the following reasons:
Other considerations:
New solution:
How?:
Limitations: |
Fixes part of #16, namely the writing part.
Although the feature compilation has been sped up, compiling features with CPU is taking approx 2 hours for all images. |
Concerning reading from the file, |
Using the GPU it takes 20 minutes now. |
Compiling features to their own dataset (so one hdf5 file per image) and later creating a virtual dataset combining the separate hdf5 datasets will allow for concurrent writing. The model can fit multiple times on 1 GPU. Every model instance can concurrently calculate embeddings per image and store the embeddings of one image in one hdf5 file before going to the next. |
Is your feature request related to a problem? Please describe.
Writing to hdf5 file takes very long. About an hour for a fold.
Describe the solution you'd like
Concurrent file writing.
Describe alternatives you've considered
HDF5 for python allows for concurrency [1].
Additional context
[1] https://docs.h5py.org/en/stable/mpi.html
The text was updated successfully, but these errors were encountered: