Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-kerchunk backend for GRIB files. #312

Open
sharkinsspatial opened this issue Nov 21, 2024 · 1 comment
Open

Non-kerchunk backend for GRIB files. #312

sharkinsspatial opened this issue Nov 21, 2024 · 1 comment

Comments

@sharkinsspatial
Copy link
Collaborator

I did a bit of investigation into wrapping the existing kerchunk grib reader to create a GRIB backend but discovered that kerchunk forces inlining of derived coordinates that are not stored at the message level. Virtualizarr does not support reading inlined refs. This might be a good reason to kick off work on a Virtualizarr specific backend without a direct kerchunk dependency (which was an eventual goal).

I personally have limited experience working with GRIB internals so it would be valuable to get input here from someone with deeper experience like @mpiannucci. A few questions

  1. Should we consider https://github.com/mpiannucci/gribberish for ChunkManifest generation (will it see continued development/maintenance)?
  2. I'm leaning towards a model where we use gribberish as an optional dependency in Virtualizarr and place the backend code in this project rather than generating ChunkManifests via gribberish as is currently done for kerchunk refs via scan_gribberish but I don't have strong opinions on this.
  3. Can we assume coordinate alignment across all messages in a GRIB file and use open_virtual_datatree or should our we also include an open_virtual_groups method for problematic datasets? @mpiannucci we'd also probably need some recommendations for from you about documentation we can include on the types of concat and grouping operations users would perform on the dict returned by open_virtual_groups.

ref #11 ref #238

@mpiannucci
Copy link
Contributor

  1. I am happy to support building a gribberish backend for virutalizarr. I personally rely on gribberish for 3 different production apps so while it does not have the cf compliance of cfgrib, I am motivated to improve it tactically.
  2. I would make gribberish optional if you want to use it.
  3. The kerchunk backend for gribberish just flatmaps coordinates which is probably not what a lot of people want (I wanted to avoid data trees)

An alternative is to build a non kerhcunk cf_grib backend or update the kerchunk grib backend to grab the latitude and longitude from the codec. This is already supported but it is not how kerchunk works as default.

The reason that is works like this is because GRIB2 encodes coordinates and in many cases the coords are generated from metadata. So if you can, doign it up front and shoving it into bytes is smart. But you can just as easily force generation of the coord data from the codec, it doesnt even matter which grib message you ask for it, they will all be able to do so because every grib message includes all the metadata needed to give back the coordinates.

To summarize:

  1. I think the best case would be to build a simple wrapper around cfgrib to start, because it is the most compliant
  2. I support the idea of using gribberish, I can try to help as much as my time allows but i cannot promise anything
  3. Either way you choose, you can always use the Grib Codec to get the coordinates instead of always forcing them to be inline.

I hope this was helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants