Replies: 3 comments 2 replies
-
without knowing too much about the details of the format, the "parquet" kerchunk format copies the structure of Do you have a specific reason why you can't upload the references as multiple files to object storage? I have very little experience with object storage, but so far just writing the entire directory there appeared to work great. Otherwise I believe you can simply pack the directory into an archive ( As for why the json didn't work, I believe you need this (same for kerchunk parquet, by the way): so = {"endpoint_url": "https://projects.pawsey.org.au", "anon": True}
ds = xr.open_dataset(
"s3://vzarr/virt_oisst.json",
engine="kerchunk",
storage_options={"target_options": so, "remote_options": so},
) i.e. we need to specify how to access the reference file ( |
Beta Was this translation helpful? Give feedback.
-
You can only write out what fsspec can interpret. At this point though I would recommend experimenting using icechunk instead of kerchunk. AFAIK the only advantage kerchunk parquet has over icechunk's model is that icechunk is newer and therefore less stable. (I should double-check this and add it to the FAQ.)
I don't know why this would be the case though. If that's really a restriction then your only option is kerchunk json, as both kerchunk parquet ("zarrquet") and icechunk are really many files together. |
Beta Was this translation helpful? Give feedback.
-
this is working! https://gist.github.com/mdsumner/c72ff510bf41c433662ef703a635daf8 so now for Icechunk! thanks again for the assistance it's really helpful for me getting these things tied together |
Beta Was this translation helpful? Give feedback.
-
Can we write unsharded parquet as the format? i.e. i'm doing
and end up with a directory of parq fragments:
but I need a single file for hosting that in object store. Can I bundle that into one .parquet? (I very well may be missing something about how to onshare the parquet form in object store ... so very happy to take pointers there!). Thanks!!
In the bigger picture I'm having problems connecting s3 to
(it opens fine with xarray from https://projects.pawsey.org.au/vzarr/oisst.json)
which led me to try parquet, but I expect I want that instead of json ultimately for very large collections.
Beta Was this translation helpful? Give feedback.
All reactions