-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for custom coordinate names #94
Conversation
Glad to see this moving forward before a full I have one early suggestion: instead of having the basic CF-based check being against the
|
Codecov Report
@@ Coverage Diff @@
## master #94 +/- ##
==========================================
- Coverage 96.55% 95.83% -0.72%
==========================================
Files 6 6
Lines 319 384 +65
==========================================
+ Hits 308 368 +60
- Misses 11 16 +5
Continue to review full report at Codecov.
|
Done. Just wondering whether this is a one-to-one relationship. Lats and lons have to have those units, but is any variable with those units guaranteed to be --the-- grid coordinate ? |
Another question: there is a potential for confusion between two types of "bounds":
Suggestions about the terminology to use to make this distinction clear ? The code currently uses bounds and boundaries interchangeably. |
From my reading of the conventions document, while it is not explicitly stated to be one-to-one/uniquely identifying, it is implied since this is how the conventions instruct applications to interpret those coordinate types (and such an instruction wouldn't be very useful if it wasn't uniquely identifying). For example:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR will help users to use xesmf with their own data formats in most cases.
Minor changes are proposed, at least to clarify a few things.
xesmf/util.py
Outdated
def is_longitude(var): | ||
"""Return True if variable is a longitude. | ||
|
||
A variable is considered a longitude if its `units` are 'degrees_east' or a synonym. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's part of the Cf conventions.
Other criteria from cf paragraph: standard_name and axis attributes, despite the checking with axis attribute may lead to errors with 2D coordinates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now filters on
has_units_of_longitude(var) or has_standard_name_of_longitude(var)
xesmf/util.py
Outdated
return has_units_of_latitude(var) | ||
|
||
|
||
def has_name_of_longitude(var): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name -> standard_name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
xesmf/util.py
Outdated
|
||
def has_name_of_longitude(var): | ||
key = "standard_name" | ||
return var.attrs.get(key) == "longitude" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be careful, models with staggered grids use standard_name like "longitude_at_u_location".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should then this be
var.attrs.get(key).startswith("longitude")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without regexp, I would suggest an equality comparison to "longitude" and, in addition, that the string starts with "longitude_at_".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
xesmf/util.py
Outdated
np.ndarray, np.ndarray | ||
Longitude and latitude arrays. | ||
""" | ||
if var_names is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it support the case of an incomplete dictionary, for instance with missing lat or missing names for bounds.
If the dict is provided without name for bounds only, lon and lat are found explicitly and bounds are found from CF checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like
if var_names is None or "lon" not in var_names or "lat" not in var_names:
?
xesmf/util.py
Outdated
|
||
# TODO: This is probably not true in general, and depends on the convention used, starting point, clockwise, etc. | ||
def cf_to_esm_bnds_2d(vertices): | ||
"""Convert 3D CF-compliant bounds to 2D ESM boundaries.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this implementation is not robust. It assumes coordinates are given starting from the upper left corner and going anti-clockwise.
xesmf/util.py
Outdated
return var.attrs.get(key) == "longitude" | ||
|
||
|
||
def has_name_of_latitude(var): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minimal doc? Well, the name is explicit. Perhaps just to say that the input is a datarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
def cf_to_esm_bnds_1d(vertices): | ||
"""Convert 2D CF-compliant bounds to 1D ESM boundaries.""" | ||
v = vertices | ||
return np.concatenate((v[:, 0], v[-1:, 1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add in the docstring that the cell bounds are contiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, please check that this is what you meant.
xesmf/util.py
Outdated
|
||
|
||
def cf_lon_lat_bnds(ds): | ||
"""Return longitude and latitude boundaries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a reminder about the array shapes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
Return | ||
------ | ||
np.ndarray, np.ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the shapes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Codecov Report
@@ Coverage Diff @@
## master #94 +/- ##
==========================================
+ Coverage 96.55% 97.15% +0.59%
==========================================
Files 6 6
Lines 319 386 +67
==========================================
+ Hits 308 375 +67
Misses 11 11
Continue to review full report at Codecov.
|
Thanks @stefraynaud for the review. The one thing that still bothers me with the PR is the naming convention. There are various terms used that I think we should use more consistently:
I don't have any actual suggestion to make to clear this up. |
As for the terms, maybe it can be clarified in two generic steps:
|
The code to convert nxmx4 bounds to 2D grid corners should make respect this convention. I don't think it does at the moment: https://docs.google.com/document/d/1ueYsYrPob8Gd3HxQVHC0vKfpj_Jjh0iO4Pe70_QLpDk/edit |
If this PR is ready to merge, perhaps it can be moved to https://github.com/pangeo-data/xESMF? Unfortunately this has to be done manually. |
Also, given that cf-xarray has been released, would it be worth just switching over to depending on that at this point? |
Sounds good! That's definitely reasonable for it to not be worth it in this case, and I'm glad to hear it was discussed. I'll be using xESMF and cf-xarray in combination for an upcoming project or two, so if there are any sticking points, I'll be sure to put in a PR to help. |
So this PR was a main reason I wrote it up, so that logic to figure lat/lon didn't have to be written 5-10x over. The logic is pretty self contained so cf-xarray's utility functions should be easily copy/pastable. |
FWIW, I also favor outsourcing this to cf-xarray. However, I am not the author of the PR and so must defer to his judgement. Perhaps we could merge this as is, with good test coverage, and then refactor to use cf-xarray in a future PR without touching the tests? |
Historically, this PR came after PR #73.
This means: if the names are not specified, then search for them according to the default names, and if not found, use cf-xarray, |
One concern was weighing the maintenance costs of adding a dependency versus maintaining code. I thought the code was so basic, it was not worth an additional dependency. As xesmf and cf-xarray both evolve, I suspect the tradeoff will lean in favor of relying on cf-xarray. I think this PR still has an issue with the conversion from 2D vertices to grid corners however, and I'm short on time to investigate this. Is this in-scope for cf-xarray ? |
another approach is to make the code in cf-xarray easily copy-pastable which it may be already.
Sure. I added a function to guess 1D bounds (edges) already: https://cf-xarray.readthedocs.io/en/latest/generated/xarray.Dataset.cf.add_bounds.html#xarray.Dataset.cf.add_bounds. We can add a 2D version too. |
Great ! If xesmf can rely on cf-xarray for the variable identification and corner grid calculations, I think it would make sense to ditch this PR in favor of a new one using cf-xarray as a dependency. |
…eights Set add_nans_to_weights as default as discussed in JiaweiZhuang#56
As proposed in #74, this PR uses a dictionary of variable names to indicate which variables hold latitude and longitude coordinates and bounds. If this dictionary is not given, the code uses very basic heuristics based on the CF-Convention to find the coordinates and their bounds. The bounds are then converted into the format required by ESMF.
Note that the conversion from vertices (nx, ny, 4) is probably wrong and needs additional work and testing.