Replies: 3 comments 6 replies
-
For a little more context the error I get after reading a combined parquet from a list of vds objects built with multiprocessing is as follows: zarr.errors.MetadataError: error decoding metadata I do not get this when using list comprehension. |
Beta Was this translation helpful? Give feedback.
-
Hi Callum! Note that what you're trying to do is similar to #95. #95 (comment) might also be relevant. I'm planning to use dask / cubed to parallelize a really big set of
These sound like unrelated issues - once you have an in-memory vds, the writing out of the metadata doesn't depend on how you opened it. Generally I can't really offer more help on your specific issue without an MCVE. |
Beta Was this translation helpful? Give feedback.
-
Just adding on if it's helpful for you @CRWayman we have two examples so far of parallel reference generation with Virtualizarr. |
Beta Was this translation helpful? Give feedback.
-
Hello!
I am trying to make a vds list for a month of hourly data, and I thought I'd try to speed up that process by using multiprocessing to map open_virtual_dataset() over a list of files. Doing this with list comprehension definitely works, but I was surprised to see that it doesn't seem to work cleanly with multiprocessing. The only way I can see to do it is by passing the function and the indexes variable into functools.partial, and then applying pool.map() to my partial object and the list of files. This results in the metadata not writing out correctly for some reason.
Any ideas or advice would be greatly appreciated! If sticking with the list comprehension is my best bet, then I'll do that.
Thanks,
Callum
Beta Was this translation helpful? Give feedback.
All reactions