You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I personally have limited experience working with GRIB internals so it would be valuable to get input here from someone with deeper experience like @mpiannucci. A few questions
I'm leaning towards a model where we use gribberish as an optional dependency in Virtualizarr and place the backend code in this project rather than generating ChunkManifests via gribberish as is currently done for kerchunk refs via scan_gribberish but I don't have strong opinions on this.
Can we assume coordinate alignment across all messages in a GRIB file and use open_virtual_datatree or should our we also include an open_virtual_groups method for problematic datasets? @mpiannucci we'd also probably need some recommendations for from you about documentation we can include on the types of concat and grouping operations users would perform on the dict returned by open_virtual_groups.
I am happy to support building a gribberish backend for virutalizarr. I personally rely on gribberish for 3 different production apps so while it does not have the cf compliance of cfgrib, I am motivated to improve it tactically.
I would make gribberish optional if you want to use it.
The kerchunk backend for gribberish just flatmaps coordinates which is probably not what a lot of people want (I wanted to avoid data trees)
An alternative is to build a non kerhcunk cf_grib backend or update the kerchunk grib backend to grab the latitude and longitude from the codec. This is already supported but it is not how kerchunk works as default.
The reason that is works like this is because GRIB2 encodes coordinates and in many cases the coords are generated from metadata. So if you can, doign it up front and shoving it into bytes is smart. But you can just as easily force generation of the coord data from the codec, it doesnt even matter which grib message you ask for it, they will all be able to do so because every grib message includes all the metadata needed to give back the coordinates.
To summarize:
I think the best case would be to build a simple wrapper around cfgrib to start, because it is the most compliant
I support the idea of using gribberish, I can try to help as much as my time allows but i cannot promise anything
Either way you choose, you can always use the Grib Codec to get the coordinates instead of always forcing them to be inline.
I did a bit of investigation into wrapping the existing kerchunk grib reader to create a GRIB backend but discovered that
kerchunk
forces inlining of derived coordinates that are not stored at the message level.Virtualizarr
does not support reading inlined refs. This might be a good reason to kick off work on aVirtualizarr
specific backend without a direct kerchunk dependency (which was an eventual goal).I personally have limited experience working with GRIB internals so it would be valuable to get input here from someone with deeper experience like @mpiannucci. A few questions
ChunkManifest
generation (will it see continued development/maintenance)?gribberish
as an optional dependency inVirtualizarr
and place the backend code in this project rather than generatingChunkManifests
viagribberish
as is currently done for kerchunk refs viascan_gribberish
but I don't have strong opinions on this.open_virtual_datatree
or should our we also include anopen_virtual_groups
method for problematic datasets? @mpiannucci we'd also probably need some recommendations for from you about documentation we can include on the types of concat and grouping operations users would perform on thedict
returned byopen_virtual_groups
.ref #11 ref #238
The text was updated successfully, but these errors were encountered: