Guidance on best practice for saving DD arrays? #860

alex-s-gardner · 2024-11-14T00:43:45Z

Is there a best practice to saving DD arrays? For my large workflows I need checkpoints where I can save the state from which the processing can be restarted. Right now I'm using JLD2. Is that the best approach? Does DD want to explicitly "support" saving in this format.. such that DD is minimizes changes that could break loading of data archived with JLD2? Would it be helpful to add "Saving DD data" to the documentation?

rafaqz · 2024-11-14T11:29:05Z

JLD2 is bad long term as it's locked to a DD version. I would use a netcdf via Rasters.jl or YAX

We can't promise not to break JLD, even with 1.0 we need the freedom to add fields to structs, and add type parameters

felixcremer · 2024-11-14T22:49:36Z

Could we use zarr to save plain DimArrays? Because nothing in the zarr spec is geo specific it should be possible. And we would rather have to map the DimArray layout to Zarr directly. But I am not sure, how this is going to interact with the zarr handling code in YAXArrays or Rasters.

alex-s-gardner · 2024-11-14T23:58:30Z

I'm realizing that Julia's powerful and flexible Type system becomes it's Achilles heel when it comes to saving data. With Matlab you can just save everything in your workspace to a .MAT and it will be backward compatible if the .MAT format is update. It seems that .JLD2 is Julia's best attempt at this but the flexibility for each package to define it's own types makes archival saving nearly impossible without conforming to an external data standard... this means no mixing of DataFrames and DD arrays and other such unique and creative data combinations.

Regardless, we should probably have some guidance what one's options are for saving a DD to disk

rafaqz · 2024-11-15T09:30:55Z

In most languages saving the workspace like that is bad practice except short term personal use. You want some real standardised serialisation.

Felix is right Zarr is probably the best array format, but Rasters doesn't write it yet, so YAX.

But forget JLD2 as anything but short-term personal use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance on best practice for saving DD arrays? #860

Guidance on best practice for saving DD arrays? #860

alex-s-gardner commented Nov 14, 2024 •

edited

Loading

rafaqz commented Nov 14, 2024 •

edited

Loading

felixcremer commented Nov 14, 2024

alex-s-gardner commented Nov 14, 2024 •

edited

Loading

rafaqz commented Nov 15, 2024

Guidance on best practice for saving DD arrays? #860

Guidance on best practice for saving DD arrays? #860

Comments

alex-s-gardner commented Nov 14, 2024 • edited Loading

rafaqz commented Nov 14, 2024 • edited Loading

felixcremer commented Nov 14, 2024

alex-s-gardner commented Nov 14, 2024 • edited Loading

rafaqz commented Nov 15, 2024

alex-s-gardner commented Nov 14, 2024 •

edited

Loading

rafaqz commented Nov 14, 2024 •

edited

Loading

alex-s-gardner commented Nov 14, 2024 •

edited

Loading