Replies: 4 comments 3 replies
-
This sounds really exciting! Many Xarray users, including myself, would find this useful. I'd love to see a prototype -- have you thought about what the implementation could look like yet? |
Beta Was this translation helpful? Give feedback.
-
I'll be honest, as a programmer who hasn't done anything with vectors beyond converting wind u/v components to wind direction and speed, a lot of this proposal is over my head. That said, if we're talking about CRS, pyproj, and rioxarray then @snowman2 is another person who has be brought into the conversation. Regarding geoxarray, I wanted geoxarray to serve two purposes:
A lot of the above is based on my limited 2D single-coordinate-per-pixel satellite data experience. If geoxarray needs to be limited to "only" purpose 1 and maybe the pyproj transformation in 2 and resampling gets moved to another package or absorbed into rioxarray that is fine by me, especially if it serves the community better. I say this because it wasn't clear to me but it sounds like what you're proposing would require at least 2 new libraries: the geolocation/geoxarray-style and the vector stuff with some of those being built on or being part of xarray core. Does that sound right? |
Beta Was this translation helpful? Give feedback.
-
@jthielen This sounds indeed very interesting. Our current approach is indeed somewhat hacky at the moment. From my xgcm perspective I am unsure about
Does this mean the DataVectorField would 'know' how to compute certain vector calculus operators? I haven't completely absorbed this logic here in detail, but would love to talk more. I am also curious as to what @TomNicholas and @rabernat think? |
Beta Was this translation helpful? Give feedback.
-
I am trying to implement some atmospheric sciences physics formulas and I just hit a weird quirk with my current workflow (using metpy to calculate divergence and vorticity of an Xarray.Dataset and then using SHTools to calculate the Spherical Harmonics). I would love to contribute a bunch of calculation functions to Metpy or Xarray but I am not sure how to handle basic partial derivatives on Xarray Vector fields (say ds.UU with coords lon, lat, pres and time) or more complicated ones with 2 dimensional coords (lon=(rlon,rlat)). Anyone has any updates on this? I would not mind writing a documentation page for such use cases 🙂 |
Beta Was this translation helpful? Give feedback.
-
Related to an earlier "ideas" discussion I posted on vectors in xarray, I've spent some time trying to brainstorm improved ways of representing scalar and vector fields (like those frequently encountered in models in the geosciences) and enabling easy vector calculus operations with them. I wanted to share some of those thoughts to see if such functionality already exists in the broader xarray ecosystem that I'm not yet aware of, and if not, if there is community interest for the kind of extension package I had in mind.
Tagging some folks who may have thoughts on this based on prior related discussions and overlapping tools/libraries: @keewis @rabernat @michaelavs @erogluorhan @benbovy @willirath @dcherian @eteq @djhoese @jbusecke @TomNicholas @dopplershift
Also, if there is a better forum for this type of brainstorming/proposal over xarray's Discussions, please let me know!
The Problem
Overview
xarray.DataArray
provides a wonderful data model for labeling dimensions of an n-dimensional array and providing coordinates/indexes associated with those dimensions. However, when working with scalar and vector fields, dimensions no longer have just one semantic meaning--instead, we haveWhen working with scalars alone (e.g., like those considered in the CDM or CF Conventions), this distinction is often unnecessary at the data model level. However, it becomes critical when representing vector fields and/or performing vector calculus in non-Cartesian coordinates (like those that occur with nearly all geospatial data, such as rectilinear and projected grids). In the latter case, the coordinates associated with the field dimensions carry with them basis vectors and scale factors that would ideally have a high-level representation in code.
Motivating Example
Given an
xarray.Dataset
containing model data over four dimensions (time, vertical, y, and x) with both scalars and vector field components, how can we represent the operations that make up something like vertical vorticity advection?With existing xarray functionality, a package could definitely take the approach of bundling all such calculations into their own functions in terms of scalars and vector components,
however, this is rather inflexible. Beyond not representing vectors as vectors (only components, which could be subtly error-prone), it relies upon the library developers to enumerate out all such fundamental calculations that users may wish to have. Instead, I'd love to see the operations themselves be able to be represented in code, so something like:
To do this, we would need:
DataArray
to make a "Scalar Field" type (I think a customFieldIndex
and axfield
accessor should do the trick!)Envisioned Use Cases
MetPy
A long requested enhancement to MetPy's suite of atmospheric kinematics calculations is the robust handling of projection scale factors. Presently (v1.2 and earlier), MetPy's function signatures look a bit like
and are designed to flexibly handle either
pint.Quantity
s orxarray.DataArray
s (in the latter case, optional arguments are automatically determined from coordinate information). However, to handle scale factors in the same way (while also making it as easy as possible for users to provide information they already have rather requiring prerequisite boilerplate code), this "blows up" to something likeBeyond that, the underlying conditional inclusion of scale factors throughout the calculation stack significantly increases the code complexity and verbosity of internal implementations. If the vector calculus operations and data representations themselves could be effectively encapsulated, this would be much easier to implement, test, maintain, and extend.
xgcm
xgcm does a lot of great things with scalar and vector handling on staggered grids, however, it seems to suffer due to the lack of a true vector data representation (e.g., xgcm/xgcm#410). If a
xfield
package containing aDataVectorField
type were to exist, I'd think much of xgcm's API could be cleaned-up/streamlined. What would such aDataVectorField
look like for a staggered grid? This is the kind of repr I'd imagine for a 2D vector on a C-grid implicitly in 3D space (like the common case of horizontal wind in the atmosphere):However, in memory, it would look a lot like a
Dataset
(or aDataArray
group-by along a stack dimension, as in #5775), but withFieldIndex
PandasIndex
and the likely futurexgcmIndex
to support true indexing operations, rather than just field alignment and scaling propertiesdata_vars
/components
are aligned/expressed in terms of the axes of the field coordinate system (stored in theFieldIndex
)geoxarray /
CRSIndex
/WCSIndex
This project would also aim to handle the "coordinate system representation problem" expressed in #3620 and related packages such as geoxarray, and potentially be able to subsume some of their functionality depending on the approach. However, I don't think this
xfield
idea would fit itself within the existinggeoxarray
, because this is meant to also work on scalar and vector fields in a more general sense. Neither wouldgeoxarray
be obviated, since it has intended functionality outside of just "carrying the CRS around," such as format/representation conversions.xoak
If xoak (or similar package) were to implement the long-coveted "2D longitude/latitude"
Index
, it would be natural to wrap that with the coordinate system handling provided here. For example, a horizontal wind vector with west-east and south-north components, but with data existing on an unspecified projected grid, could look likeLimits of Scope (at least to begin)
Tensor Fields
If this handles scalar fields and vector fields, would it not make sense to generalize to tensor fields? With popularity of things like
einops
, there's likely a place for it within the community.That being said, vectors map to lists, so it's easy to conceptualize as a data structure, and in that vein, keeping array dimension <-> field coordinate <-> vector component associations is straightforward. That no longer holds with tensors in general (especially mixed tensors), so that would be a whole other level of abstraction to handle. Also, tensors in curvilinear coordinates is way beyond my areas of expertise, so I can't even conceptualize what this would look like in a user-friendly way, which puts this functionality beyond practicality unless someone else with that expertise takes the lead on it.
Unstructured Grids (uxarray)
There is definitely some overlap with the "Application of calculus operations, including divergence, curl, Laplacian and gradient" stated goal. However, unless we can find a suficiently robust way to arbitrarily relate
then I don't think the approach used here could work. The "differential operations along orthogonal coordinates" is kind of central to the outlined approach to vector calculus here, and trying to generalize that to differential operators on meshes (where discritization is often based on cell geometry and integral-limit definitions of operators, rather than building up from differences in orthogonal directions) is a much more difficult problem. That being said, I could see collaboration to establish harmonious APIs to be a good goal! And perhaps there is something I'm missing that would allow unstructured grid handling here as well.
Integral Calculus, not just Differential
This really should be within scope for this kind of package, particularly with integration with
xgcm
. However, my mesoscale meteorology bias shines through in that I don't really use much in the way of integration-type operations, so it's hard to motivate (or test from experience) such functionality from my own use cases.This is definitely an area (pun intended) where it'd be great to have community engagement to develop.
CRS Format Conversions, Interpolations, Re-Projection, and other GIS-type operations
This seems like a better fit for
geoxarray
,xesmf
,rioxarray
, or other community packages.With CRS format/storage, this package should either pick a single, consensus-based CRS representation to use internally, or be generic. Given the need to "automagically" compute scale factors, it almost certainly is going to be the former, and that representation is going to be
pyproj.CRS
.Other GIS operations would definitely bloat the scope of this package too much. I think here we'd just need to be careful to transparently pass through everything that the indexes wrapped within the
FieldIndex
need to work properly, and then let those other packages do all the work.Beta Was this translation helpful? Give feedback.
All reactions