Non-kerchunk backend for GRIB files. #312

sharkinsspatial · 2024-11-21T18:11:25Z

I did a bit of investigation into wrapping the existing kerchunk grib reader to create a GRIB backend but discovered that kerchunk forces inlining of derived coordinates that are not stored at the message level. Virtualizarr does not support reading inlined refs. This might be a good reason to kick off work on a Virtualizarr specific backend without a direct kerchunk dependency (which was an eventual goal).

I personally have limited experience working with GRIB internals so it would be valuable to get input here from someone with deeper experience like @mpiannucci. A few questions

Should we consider https://github.com/mpiannucci/gribberish for ChunkManifest generation (will it see continued development/maintenance)?
I'm leaning towards a model where we use gribberish as an optional dependency in Virtualizarr and place the backend code in this project rather than generating ChunkManifests via gribberish as is currently done for kerchunk refs via scan_gribberish but I don't have strong opinions on this.
Can we assume coordinate alignment across all messages in a GRIB file and use open_virtual_datatree or should our we also include an open_virtual_groups method for problematic datasets? @mpiannucci we'd also probably need some recommendations for from you about documentation we can include on the types of concat and grouping operations users would perform on the dict returned by open_virtual_groups.

ref #11 ref #238

The text was updated successfully, but these errors were encountered:

mpiannucci · 2024-11-21T21:40:17Z

I am happy to support building a gribberish backend for virutalizarr. I personally rely on gribberish for 3 different production apps so while it does not have the cf compliance of cfgrib, I am motivated to improve it tactically.
I would make gribberish optional if you want to use it.
The kerchunk backend for gribberish just flatmaps coordinates which is probably not what a lot of people want (I wanted to avoid data trees)

An alternative is to build a non kerhcunk cf_grib backend or update the kerchunk grib backend to grab the latitude and longitude from the codec. This is already supported but it is not how kerchunk works as default.

The reason that is works like this is because GRIB2 encodes coordinates and in many cases the coords are generated from metadata. So if you can, doign it up front and shoving it into bytes is smart. But you can just as easily force generation of the coord data from the codec, it doesnt even matter which grib message you ask for it, they will all be able to do so because every grib message includes all the metadata needed to give back the coordinates.

To summarize:

I think the best case would be to build a simple wrapper around cfgrib to start, because it is the most compliant
I support the idea of using gribberish, I can try to help as much as my time allows but i cannot promise anything
Either way you choose, you can always use the Grib Codec to get the coordinates instead of always forcing them to be inline.

I hope this was helpful

sharkinsspatial mentioned this issue Nov 21, 2024

Updates for numpy 2 compatibility. mpiannucci/gribberish#64

Merged

mpiannucci mentioned this issue Dec 1, 2024

Zarr 3 codec support mpiannucci/gribberish#69

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-kerchunk backend for GRIB files. #312

Non-kerchunk backend for GRIB files. #312

sharkinsspatial commented Nov 21, 2024

mpiannucci commented Nov 21, 2024

Non-kerchunk backend for GRIB files. #312

Non-kerchunk backend for GRIB files. #312

Comments

sharkinsspatial commented Nov 21, 2024

mpiannucci commented Nov 21, 2024