Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selection of a variable based on values of another variable that shares one dimension #404

Closed
aasdelat opened this issue Jun 8, 2024 · 19 comments
Labels
documentation Improvements or additions to documentation

Comments

@aasdelat
Copy link
Contributor

aasdelat commented Jun 8, 2024

Hello:
I have to select data within a geographical window given as longitudes east and west and latitudes north and south. But the spatial coordinate of my data is "values", and I have the variables "msl" (mean sea level pressure), "longitude" and "latitude".
I think that the name of the spatial dimension, "values", is a bit confusing, because you can think that they are values of a meteorological variable, but they are not. "values" is a spatial dimension taking values from 1 to the total number of grid points.
The dimensions of the msl variable are: "values" and "Ti". The longitude and latitude variables, both have only one dimension: "values".
So, each datum in msl can be assigned a unique longitude and latitude via the common dimension: "values".
So, how can accomplish this?

@aasdelat aasdelat changed the title Selection of a variable based on values of another value that shares one dimension Selection of a variable based on values of another variable that shares one dimension Jun 8, 2024
@felixcremer
Copy link
Member

It seems as if your dimensions and the values of your array are flipped.
Could you copy paste the output of printing the YAXArray in the REPL here?

Could you provide an example file that I could look at?

@aasdelat
Copy link
Contributor Author

Ok. The file is too large, so I send it by we transfer:
https://we.tl/t-AC7DBQp7vE

Here is the output of my datset, "ds":

julia> ds
YAXArray Dataset
Shared Axes: 
↓ values Sampled{Int64} 1:338880 ForwardOrdered Regular Points
Variables: 
latitude, 
msl
  ↓ Ti Sampled{DateTime} [1989-01-01T00:00:00, …, 1989-01-31T23:00:00] ForwardOrdered Irregular Points
longitude, 
Properties: Dict{String, Any}("edition" => "1", "source" => "/home/antonio/Documentos/datos/ERA5/ERA5_N320_1h_sl_sfc_NH15S_1978_2023_msl/ERA5_N320_1h_sl_sfc_NH15S_1989-01-01_1989-01-31_msl.grib", "centreDescription" => "European Centre for Medium-Range Weather Forecasts", "centre" => "ecmf", "subCentre" => "0", "Conventions" => "CF-1.7")

As a side note, I have noted that it does not print the name of the last variable ("longitude") in a new line, but at the end of the previous line (I have introduced a carriage return in the text after pasting it here). Perhaps it is an issue for DimensionalData.

@felixcremer
Copy link
Member

I had a look at the data and it seems that you have converted the dataset from GRIB to netcdf?
How did you do that and might the problem be related to this conversion?

Have you tried opening the original GRIB data with Rasters and the GRIBDatasets package?
If that works you could then convert the Raster to a YAXArray to continue working with YAXArrays.

@aasdelat
Copy link
Contributor Author

aasdelat commented Jun 10, 2024

Yes, they are converted from Grib. I can open the gribs with GRIBDatasets with no problem. In fact, I converted the grib files to netcdf by opeing the gribs in julia with NCDatasets and saving them with NCDatasets.
The structure of the data is correct, perhaps it is not the most common one, but it is correct, and YAXArrays reads and interprets it correctly. The only thing, is that I need a means for selecting a variable (msl) by the values of another variable (latitude and/or longitude) with which it shares one dimension (values).
I tested Raster, but it does not correctly interpret the data structure.

@aasdelat
Copy link
Contributor Author

aasdelat commented Jun 11, 2024

I have to correct myself: Raters correctly interprets the structure. I tested it a long time ago and it seem not to do it, but now it does!.
So, I will give it a try, and convert the raster to YAXArrays, if necessary.

@aasdelat
Copy link
Contributor Author

aasdelat commented Jun 11, 2024

I have the answer to my question. In order to select the variable msl depending on the values of latitude (which is not a dimension, but another variable that shares the dimension "values" with the variable msl), I have to do (supposing I want to select data at longitudes below 40º):
ds.msl[values=Where(v->ds.longitude[v] < 40)]
But it seems to last forever ... but it finally ends with the error:

julia> ds.msl[values=Where(v->ds.longitude[v] < 40)]
ERROR: ArgumentError: Unable to determine chunksize of non-range views.
Stacktrace:
  [1] eachchunk_view(::DiskArrays.Chunked{…}, vv::SubArray{…})
    @ DiskArrays ~/.julia/packages/DiskArrays/MpOpv/src/subarray.jl:29
  [2] eachchunk
    @ ~/.julia/packages/DiskArrays/MpOpv/src/subarray.jl:25 [inlined]
  [3] YAXArray
    @ ~/.julia/packages/YAXArrays/jjg2q/src/Cubes/Cubes.jl:136 [inlined]
  [4] rebuild(A::YAXArray{…}, data::DiskArrays.SubDiskArray{…}, dims::Tuple{…}, refdims::Tuple{}, name::DimensionalData.NoName, metadata::Dict{…})
    @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/jjg2q/src/Cubes/Cubes.jl:200
  [5] rebuild
    @ ~/.julia/packages/DimensionalData/BoJag/src/array/array.jl:85 [inlined]
  [6] rebuildsliced
    @ ~/.julia/packages/DimensionalData/BoJag/src/array/array.jl:100 [inlined]
  [7] rebuildsliced
    @ ~/.julia/packages/DimensionalData/BoJag/src/array/array.jl:99 [inlined]
  [8] view
    @ ~/.julia/packages/DimensionalData/BoJag/src/array/indexing.jl:125 [inlined]
  [9] _dim_view
    @ ~/.julia/packages/DimensionalData/BoJag/src/array/indexing.jl:110 [inlined]
 [10] #view#110
    @ ~/.julia/packages/DimensionalData/BoJag/src/array/indexing.jl:81 [inlined]
 [11] getindex(::YAXArray{Float64, 2, YAXArrayBase.NetCDFVariable{…}, Tuple{…}, Dict{…}}; kwargs::@Kwargs{values::Where{…}})
    @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/jjg2q/src/Cubes/Cubes.jl:487
 [12] top-level scope
    @ REPL[18]:1
Some type information was truncated. Use `show(err)` to see complete types.

However, if I read the data with Rasters, it returns the result instantly and correctly.

@felixcremer
Copy link
Member

I guess that you opened the data in Rasters with lazy=false which is the default.
And with YAXArrays we hit a very inefficient access pattern which we should fix in YAXArrays. Thanks for the report.
You could use YAXArrays.readcubedata(arr) to first load the whole data into memory to have a faster access pattern.

@aasdelat
Copy link
Contributor Author

aasdelat commented Jun 11, 2024

Wow! It runs now instantaneously. But, due to my particular data structure, I have to read the variables msl and longitude in different variables:

ds_m = readcubedata(ds.msl)
ds_l = readcubedata(ds.longitude)
# And the selection:
ds_m[values=Where(v->ds_l[v] < 40)]

because, if not:

julia> readcubedata(ds)
ERROR: MethodError: no method matching ndims(::Dataset)

Closest candidates are:
  ndims(::Type{Union{}}, Any...)
   @ Base abstractarray.jl:276
  ndims(::Type{<:Ref})
   @ Base refpointer.jl:96
  ndims(::Type{<:Base.Broadcast.Broadcasted{<:Base.Broadcast.AbstractArrayStyle{N}, Nothing}}) where N<:Integer
   @ Base broadcast.jl:278
  ...

Stacktrace:
 [1] dimnames(x::Dataset)
   @ YAXArrayBase ~/.julia/packages/YAXArrayBase/R6Frw/src/axisarrays/axisinterface.jl:46
 [2] caxes(x::Dataset)
   @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/jjg2q/src/Cubes/Cubes.jl:209
 [3] readcubedata(x::Dataset)
   @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/jjg2q/src/Cubes/Cubes.jl:230
 [4] top-level scope
   @ REPL[26]:1

@felixcremer
Copy link
Member

readcubedata was designed to be used on single arrays you have a dataset so it does not fit well but I think we should be able to make it work on Datasets as well.

@aasdelat
Copy link
Contributor Author

aasdelat commented Jun 12, 2024

But, is there a way to load a dataset into memory, other then readcubedata, or only readcubedata can load into memory?

@aasdelat
Copy link
Contributor Author

aasdelat commented Jul 1, 2024

I will summarize the answer here: The selection of one variable based on the values of another variable with which it shares a dimension, can be done via:

# Read data structure
ds = open_dataset(input_file_name)

# The structure of my data is:
julia> ds
YAXArray Dataset
Shared Axes: 
↓ values Sampled{Int64} 1:338880 ForwardOrdered Regular Points
Variables: 
latitude, 
msl
  ↓ Ti Sampled{DateTime} [1989-01-01T00:00:00, …, 1989-01-31T23:00:00] ForwardOrdered Irregular Points
longitude, 
Properties: Dict{String, Any}("edition" => "1", "source" => "/home/antonio/Documentos/datos/ERA5/ERA5_N320_1h_sl_sfc_NH15S_1978_2023_msl/ERA5_N320_1h_sl_sfc_NH15S_1989-01-01_1989-01-31_msl.grib", "centreDescription" => "European Centre for Medium-Range Weather Forecasts", "centre" => "ecmf", "subCentre" => "0", "Conventions" => "CF-1.7")

# Read data into memory
ds_m = readcubedata(ds.msl)
ds_l = readcubedata(ds.longitude)

# And the selection:
ds_m[values=Where(v->ds_l[v] < 40)]

Where msl is the main variable in which values I am interested, and longitude is the variable with which I have to make the selection.
It is necessary to read both variables into memory by means of readcubedata. If they are loaded just by open_dataset, you will get an error and this will not work.
On the other hand, Cube cannot be used in place of open_dataset in this case, because of the structure of my data (see the difference between a dataset and a cube).

@aasdelat aasdelat closed this as completed Jul 1, 2024
@felixcremer
Copy link
Member

Thanks for the write up. Do you want to open a pull request to add this info to the docs? This would fit well into the how to section.

@aasdelat
Copy link
Contributor Author

Ok, but I am learning github right now. So I am going to take a while...

@felixcremer
Copy link
Member

That is no problem, feel free to ask questions if you feel stuck in the process of opening a PR.

@felixcremer felixcremer reopened this Jul 11, 2024
@felixcremer felixcremer added the documentation Improvements or additions to documentation label Jul 11, 2024
@aasdelat
Copy link
Contributor Author

Ok, I am doing it.

@aasdelat
Copy link
Contributor Author

Pull request done!
I have proposed a large rearrange of the help, so I think it has to be studied.

@aasdelat
Copy link
Contributor Author

Hi, @felixcremer , I have opened a pull request. Could you tell me if I have to do anything else or If I have made the pull request correctly?

@lazarusA
Copy link
Collaborator

what PR was it? should we close this?

@aasdelat
Copy link
Contributor Author

The pull request is #414 and it has been already accepted and closed. So, yes, this issue can be closed too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants