Adding a new array to a dataset #1000
Replies: 3 comments 5 replies
-
You should be able to do it with the |
Beta Was this translation helpful? Give feedback.
-
I ended up with this - which seems to work but was non-obvious initially: ds.update({"variant_mask": xr.DataArray(data=selected_variants,dims=["variants"],name="variant_mask")})
sgkit.save_dataset(ds.drop_vars(set(ds.data_vars) - {"variant_mask"}), zarr_path, mode="a") This is a very common pattern - I wonder if we should have a function such as: |
Beta Was this translation helpful? Give feedback.
-
+1 to |
Beta Was this translation helpful? Give feedback.
-
It's a common use case to want to add an array to a dataset - in my case adding some variant masks based on duplicated sites and quality metrics. However, I don't want to have to rewrite the whole dataset every time I add an array.
pydata/xarray#6700 suggests this is not possible, and https://stackoverflow.com/questions/58042559/adding-new-xarray-dataarray-to-an-existing-zarr-store-without-re-writing-the-who suggests some clever use of the
'a'
mode into_zarr
with only the new arrays in a dataset, and specified regions:ds2.to_zarr("test.zarr/", mode="a", region={"x": slice(0, ds2.x.size), "z": slice(0, ds2.z.size)})
The other option is to write to the zarr files directly, but that seems a bit hacky.
Beta Was this translation helpful? Give feedback.
All reactions