-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guidance on best practice for saving DD arrays? #860
Comments
JLD2 is bad long term as it's locked to a DD version. I would use a netcdf via Rasters.jl or YAX We can't promise not to break JLD, even with 1.0 we need the freedom to add fields to structs, and add type parameters |
Could we use zarr to save plain DimArrays? Because nothing in the zarr spec is geo specific it should be possible. And we would rather have to map the DimArray layout to Zarr directly. But I am not sure, how this is going to interact with the zarr handling code in YAXArrays or Rasters. |
I'm realizing that Julia's powerful and flexible Type system becomes it's Achilles heel when it comes to saving data. With Matlab you can just save everything in your workspace to a .MAT and it will be backward compatible if the .MAT format is update. It seems that .JLD2 is Julia's best attempt at this but the flexibility for each package to define it's own types makes archival saving nearly impossible without conforming to an external data standard... this means no mixing of DataFrames and DD arrays and other such unique and creative data combinations. Regardless, we should probably have some guidance what one's options are for saving a DD to disk |
In most languages saving the workspace like that is bad practice except short term personal use. You want some real standardised serialisation. Felix is right Zarr is probably the best array format, but Rasters doesn't write it yet, so YAX. But forget JLD2 as anything but short-term personal use. |
Is there a best practice to saving DD arrays? For my large workflows I need checkpoints where I can save the state from which the processing can be restarted. Right now I'm using JLD2. Is that the best approach? Does DD want to explicitly "support" saving in this format.. such that DD is minimizes changes that could break loading of data archived with JLD2? Would it be helpful to add "Saving DD data" to the documentation?
The text was updated successfully, but these errors were encountered: