Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for custom coordinate names #94

Closed
wants to merge 9 commits into from

Conversation

huard
Copy link

@huard huard commented Apr 23, 2020

As proposed in #74, this PR uses a dictionary of variable names to indicate which variables hold latitude and longitude coordinates and bounds. If this dictionary is not given, the code uses very basic heuristics based on the CF-Convention to find the coordinates and their bounds. The bounds are then converted into the format required by ESMF.

Note that the conversion from vertices (nx, ny, 4) is probably wrong and needs additional work and testing.

@jthielen
Copy link

Glad to see this moving forward before a full cf-xarray package is ready to be integrated!

I have one early suggestion: instead of having the basic CF-based check being against the long_name attribute (which is not standardized in the current conventions), have it check for the required units attribute

@codecov-io
Copy link

codecov-io commented Apr 24, 2020

Codecov Report

Merging #94 into master will decrease coverage by 0.71%.
The diff coverage is 93.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #94      +/-   ##
==========================================
- Coverage   96.55%   95.83%   -0.72%     
==========================================
  Files           6        6              
  Lines         319      384      +65     
==========================================
+ Hits          308      368      +60     
- Misses         11       16       +5     
Impacted Files Coverage Δ
xesmf/util.py 94.18% <92.53%> (-5.82%) ⬇️
xesmf/frontend.py 95.55% <100.00%> (-0.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d2e365...d9b1062. Read the comment docs.

@huard
Copy link
Author

huard commented Apr 24, 2020

Done. Just wondering whether this is a one-to-one relationship. Lats and lons have to have those units, but is any variable with those units guaranteed to be --the-- grid coordinate ?

@huard
Copy link
Author

huard commented Apr 24, 2020

Another question: there is a potential for confusion between two types of "bounds":

  • the grid corners required by xESMF (n+1, m+1),
  • the grid vertices defined in the CF convention (n, 2) or (n, m, 4)

Suggestions about the terminology to use to make this distinction clear ? The code currently uses bounds and boundaries interchangeably.

@jthielen
Copy link

Just wondering whether this is a one-to-one relationship. Lats and lons have to have those units, but is any variable with those units guaranteed to be --the-- grid coordinate ?

From my reading of the conventions document, while it is not explicitly stated to be one-to-one/uniquely identifying, it is implied since this is how the conventions instruct applications to interpret those coordinate types (and such an instruction wouldn't be very useful if it wasn't uniquely identifying). For example:

Coordinates of latitude with respect to a rotated pole should be given units of degrees, not degrees_north or equivalents, because applications which use the units to identify axes would have no means of distinguishing such an axis from real latitude, and might draw incorrect coastlines, for instance.

Copy link

@stefraynaud stefraynaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR will help users to use xesmf with their own data formats in most cases.
Minor changes are proposed, at least to clarify a few things.

xesmf/util.py Outdated
def is_longitude(var):
"""Return True if variable is a longitude.

A variable is considered a longitude if its `units` are 'degrees_east' or a synonym.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That's part of the Cf conventions.
Other criteria from cf paragraph: standard_name and axis attributes, despite the checking with axis attribute may lead to errors with 2D coordinates.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now filters on

has_units_of_longitude(var) or has_standard_name_of_longitude(var)

xesmf/util.py Outdated
return has_units_of_latitude(var)


def has_name_of_longitude(var):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name -> standard_name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

xesmf/util.py Outdated

def has_name_of_longitude(var):
key = "standard_name"
return var.attrs.get(key) == "longitude"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful, models with staggered grids use standard_name like "longitude_at_u_location".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should then this be

var.attrs.get(key).startswith("longitude")

?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without regexp, I would suggest an equality comparison to "longitude" and, in addition, that the string starts with "longitude_at_".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

xesmf/util.py Outdated
np.ndarray, np.ndarray
Longitude and latitude arrays.
"""
if var_names is None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it support the case of an incomplete dictionary, for instance with missing lat or missing names for bounds.
If the dict is provided without name for bounds only, lon and lat are found explicitly and bounds are found from CF checking.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

if var_names is None or "lon" not in var_names or "lat" not in var_names:

?

xesmf/frontend.py Show resolved Hide resolved
xesmf/util.py Outdated

# TODO: This is probably not true in general, and depends on the convention used, starting point, clockwise, etc.
def cf_to_esm_bnds_2d(vertices):
"""Convert 3D CF-compliant bounds to 2D ESM boundaries."""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this implementation is not robust. It assumes coordinates are given starting from the upper left corner and going anti-clockwise.

xesmf/util.py Outdated
return var.attrs.get(key) == "longitude"


def has_name_of_latitude(var):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minimal doc? Well, the name is explicit. Perhaps just to say that the input is a datarray.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

def cf_to_esm_bnds_1d(vertices):
"""Convert 2D CF-compliant bounds to 1D ESM boundaries."""
v = vertices
return np.concatenate((v[:, 0], v[-1:, 1]))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add in the docstring that the cell bounds are contiguous.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, please check that this is what you meant.

xesmf/util.py Outdated


def cf_lon_lat_bnds(ds):
"""Return longitude and latitude boundaries.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a reminder about the array shapes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Return
------
np.ndarray, np.ndarray

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the shapes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@codecov-commenter
Copy link

codecov-commenter commented May 26, 2020

Codecov Report

Merging #94 into master will increase coverage by 0.59%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #94      +/-   ##
==========================================
+ Coverage   96.55%   97.15%   +0.59%     
==========================================
  Files           6        6              
  Lines         319      386      +67     
==========================================
+ Hits          308      375      +67     
  Misses         11       11              
Impacted Files Coverage Δ
xesmf/frontend.py 95.55% <100.00%> (-0.03%) ⬇️
xesmf/util.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d2e365...6b67e6c. Read the comment docs.

@huard
Copy link
Author

huard commented May 26, 2020

Thanks @stefraynaud for the review. The one thing that still bothers me with the PR is the naming convention. There are various terms used that I think we should use more consistently:

  • bounds
  • boundaries
  • vertices
  • corners

I don't have any actual suggestion to make to clear this up.

@stefraynaud
Copy link

As for the terms, maybe it can be clarified in two generic steps:

  1. Make sure that the docstrings mentions the shape of arrays.
  2. Clarify the user guide about the possible terms.

@huard
Copy link
Author

huard commented Jun 5, 2020

The code to convert nxmx4 bounds to 2D grid corners should make respect this convention. I don't think it does at the moment: https://docs.google.com/document/d/1ueYsYrPob8Gd3HxQVHC0vKfpj_Jjh0iO4Pe70_QLpDk/edit

@rabernat
Copy link

If this PR is ready to merge, perhaps it can be moved to https://github.com/pangeo-data/xESMF? Unfortunately this has to be done manually.

@jthielen
Copy link

Also, given that cf-xarray has been released, would it be worth just switching over to depending on that at this point?

@rabernat
Copy link

We discussed that at the last dev meeting (which you're welcome to join any time; see #98). @huard concluded it might not be worth it in this specific case.

@jthielen
Copy link

jthielen commented Jul 24, 2020

We discussed that at the last dev meeting (which you're welcome to join any time; see #98). @huard concluded it might not be worth it in this specific case.

Sounds good! That's definitely reasonable for it to not be worth it in this case, and I'm glad to hear it was discussed. I'll be using xESMF and cf-xarray in combination for an upcoming project or two, so if there are any sticking points, I'll be sure to put in a PR to help.

@dcherian
Copy link

it might not be worth it in this specific case.

So this PR was a main reason I wrote it up, so that logic to figure lat/lon didn't have to be written 5-10x over.

The logic is pretty self contained so cf-xarray's utility functions should be easily copy/pastable.

@rabernat
Copy link

FWIW, I also favor outsourcing this to cf-xarray. However, I am not the author of the PR and so must defer to his judgement. Perhaps we could merge this as is, with good test coverage, and then refactor to use cf-xarray in a future PR without touching the tests?

@stefraynaud
Copy link

Historically, this PR came after PR #73.
This latter was proposed to make the job of cf-xarray, and it has been chosen to keep only #94 because it simpler implementation.
Now that cf-xarray is here, a new PR must be proposed to merge to two capabilities:

  • Let the user specify the lon/lat names.
  • Find the lon/lat names according to CF conventions.

This means: if the names are not specified, then search for them according to the default names, and if not found, use cf-xarray,

@huard
Copy link
Author

huard commented Aug 3, 2020

One concern was weighing the maintenance costs of adding a dependency versus maintaining code. I thought the code was so basic, it was not worth an additional dependency. As xesmf and cf-xarray both evolve, I suspect the tradeoff will lean in favor of relying on cf-xarray.

I think this PR still has an issue with the conversion from 2D vertices to grid corners however, and I'm short on time to investigate this. Is this in-scope for cf-xarray ?

@dcherian
Copy link

dcherian commented Aug 3, 2020

One concern was weighing the maintenance costs of adding a dependency versus maintaining code. I thought the code was so basic, it was not worth an additional dependency. As xesmf and cf-xarray both evolve, I suspect the tradeoff will lean in favor of relying on cf-xarray.

another approach is to make the code in cf-xarray easily copy-pastable which it may be already.

I think this PR still has an issue with the conversion from 2D vertices to grid corners however, and I'm short on time to investigate this. Is this in-scope for cf-xarray ?

Sure. I added a function to guess 1D bounds (edges) already: https://cf-xarray.readthedocs.io/en/latest/generated/xarray.Dataset.cf.add_bounds.html#xarray.Dataset.cf.add_bounds. We can add a 2D version too.

@huard
Copy link
Author

huard commented Aug 3, 2020

Great ! If xesmf can rely on cf-xarray for the variable identification and corner grid calculations, I think it would make sense to ditch this PR in favor of a new one using cf-xarray as a dependency.

@huard huard closed this May 7, 2021
raphaeldussin added a commit to raphaeldussin/xESMF that referenced this pull request Jul 21, 2021
…eights

Set add_nans_to_weights as default as discussed in JiaweiZhuang#56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants