Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection with monthly CESM output files (history files) #55

Open
AJueling opened this issue Jan 29, 2020 · 6 comments
Open

Collection with monthly CESM output files (history files) #55

AJueling opened this issue Jan 29, 2020 · 6 comments
Assignees

Comments

@AJueling
Copy link

We have many different CESM simulations and I would like to create an esm-intake collection of them. The output files are monthly mean netcdf files and contain many variables.
I have created a collection.json file:

{
    "esmcat_version": "0.1.0",
    "id": "CESM_simulations",
    "description": "This is an ESM collection for CESM1 simulations.",
    "catalog_file": "simulations.csv",
    "attributes": [
      { "column_name": "component",  "vocabulary": ""},
      { "column_name": "frequency",  "vocabulary": ""},
      { "column_name": "experiment", "vocabulary": ""},
      { "column_name": "variable",   "vocabulary": ""}
    ],
    "assets": {
      "column_name": "path",
      "format": "netcdf"
    }
}

and with a simulations.csv:

component,frequency,experiment,path
ocn,monthly,CTRL,simulation1.pop.h.0001-01.nc
ocn,monthly,CTRL,simulation1.pop.h.0001-02.nc

I can create a catalogue cat = intake.open_esm_datastore('collection.json').search(experiment=['CTRL']) which results in

CESM_simulations-ESM Collection with 2 entries:
	> 1 component(s)
	> 1 frequency(s)
	> 1 experiment(s)
	> 2 path(s)

but when I create a dataset with dset_dict = cat.to_dataset_dict(cdf_kwargs={'decode_times': False}) it returns a dataset with only a single time coordinate:

resulting xarray dataset

calling dset_dict['ocn.monthly.CTRL'] yields

<xarray.Dataset>
Dimensions:             (bnds: 2, d2: 2, nlat: 2400, nlon: 3600, time: 1, z_t: 42, z_t_150m: 12, z_w: 42, z_w_bot: 42, z_w_top: 42)
Coordinates:
  * time                (time) float64 7.302e+04
  * z_t                 (z_t) float32 500.622 1506.873 ... 562499.9 587499.9
  * z_t_150m            (z_t_150m) float32 500.622 1506.873 ... 14895.824
  * z_w                 (z_w) float32 0.0 1001.244 ... 549999.9 574999.9
  * z_w_top             (z_w_top) float32 0.0 1001.244 ... 549999.9 574999.9
  * z_w_bot             (z_w_bot) float32 1001.244 2012.502 ... 599999.9
    ULONG               (nlat, nlon) float64 ...
    ULAT                (nlat, nlon) float64 ...
    TLONG               (nlat, nlon) float64 ...
    TLAT                (nlat, nlon) float64 ...
Dimensions without coordinates: bnds, d2, nlat, nlon
Data variables:
    time_bound          (time, d2) float64 ...
    dz                  (z_t) float32 ...
    dzw                 (z_w) float32 ...
    KMT                 (nlat, nlon) float64 ...
    KMU                 (nlat, nlon) float64 ...
    REGION_MASK         (nlat, nlon) float64 ...
    UAREA               (nlat, nlon) float64 ...
    TAREA               (nlat, nlon) float64 ...
    HU                  (nlat, nlon) float64 ...
    HT                  (nlat, nlon) float64 ...
    DXU                 (nlat, nlon) float64 ...
    DYU                 (nlat, nlon) float64 ...
    DXT                 (nlat, nlon) float64 ...
    DYT                 (nlat, nlon) float64 ...
    HTN                 (nlat, nlon) float64 ...
    HTE                 (nlat, nlon) float64 ...
    HUS                 (nlat, nlon) float64 ...
    HUW                 (nlat, nlon) float64 ...
    ANGLE               (nlat, nlon) float64 ...
    ANGLET              (nlat, nlon) float64 ...
    days_in_norm_year   float64 ...
    grav                float64 ...
    omega               float64 ...
    radius              float64 ...
    cp_sw               float64 ...
    sound               float64 ...
    vonkar              float64 ...
    cp_air              float64 ...
    rho_air             float64 ...
    rho_sw              float64 ...
    rho_fw              float64 ...
    stefan_boltzmann    float64 ...
    latent_heat_vapor   float64 ...
    latent_heat_fusion  float64 ...
    ocn_ref_salinity    float64 ...
    sea_ice_salinity    float64 ...
    T0_Kelvin           float64 ...
    salt_to_ppt         float64 ...
    ppt_to_salt         float64 ...
    mass_to_Sv          float64 ...
    heat_to_PW          float64 ...
    salt_to_Svppt       float64 ...
    salt_to_mmday       float64 ...
    momentum_factor     float64 ...
    hflux_factor        float64 ...
    fwflux_factor       float64 ...
    salinity_factor     float64 ...
    sflux_factor        float64 ...
    nsurface_t          float64 ...
    nsurface_u          float64 ...
    KE                  (time, z_t, nlat, nlon) float32 ...
    TEMP                (time, z_t, nlat, nlon) float32 ...
    SALT                (time, z_t, nlat, nlon) float32 ...
    SSH2                (time, nlat, nlon) float32 ...
    SHF                 (time, nlat, nlon) float32 ...
    SFWF                (time, nlat, nlon) float32 ...
    EVAP_F              (time, nlat, nlon) float32 ...
    PREC_F              (time, nlat, nlon) float32 ...
    SNOW_F              (time, nlat, nlon) float32 ...
    MELT_F              (time, nlat, nlon) float32 ...
    ROFF_F              (time, nlat, nlon) float32 ...
    SALT_F              (time, nlat, nlon) float32 ...
    SENH_F              (time, nlat, nlon) float32 ...
    LWUP_F              (time, nlat, nlon) float32 ...
    LWDN_F              (time, nlat, nlon) float32 ...
    MELTH_F             (time, nlat, nlon) float32 ...
    IAGE                (time, z_t, nlat, nlon) float32 ...
    WVEL                (time, z_w_top, nlat, nlon) float32 ...
    UET                 (time, z_t, nlat, nlon) float32 ...
    VNT                 (time, z_t, nlat, nlon) float32 ...
    UES                 (time, z_t, nlat, nlon) float32 ...
    VNS                 (time, z_t, nlat, nlon) float32 ...
    PD                  (time, z_t, nlat, nlon) float32 ...
    HMXL                (time, nlat, nlon) float32 ...
    XMXL                (time, nlat, nlon) float32 ...
    TMXL                (time, nlat, nlon) float32 ...
    HBLT                (time, nlat, nlon) float32 ...
    XBLT                (time, nlat, nlon) float32 ...
    TBLT                (time, nlat, nlon) float32 ...
    SSH                 (time, nlat, nlon) float64 ...
    time_bnds           (time, bnds) float64 ...
    TAUX                (time, nlat, nlon) float64 ...
    TAUY                (time, nlat, nlon) float64 ...
    UVEL                (time, z_t, nlat, nlon) float64 ...
    VVEL                (time, z_t, nlat, nlon) float64 ...
Attributes:
    title:                      spinup_pd_maxcores_f05_t12
    history:                    Thu Sep 14 23:06:30 2017: ncks -A /projects/0...
    Conventions:                CF-1.0; http://www.cgd.ucar.edu/cms/eaton/net...
    contents:                   Diagnostic and Prognostic Variables
    source:                     CCSM POP2, the CCSM Ocean Component
    revision:                   $Id: tavg.F90 34115 2012-01-25 22:35:19Z njn01 $
    calendar:                   All years have exactly  365 days.
    start_time:                 This dataset was created on 2017-04-15 at 12:...
    cell_methods:               cell_methods = time: mean ==> the variable va...
    nsteps_total:               25052952
    tavg_sum:                   86399.99999999974
    CDI:                        Climate Data Interface version 1.7.0 (http://...
    CDO:                        Climate Data Operators version 1.7.0 (http://...
    NCO:                        "4.6.0"
    history_of_appended_files:  Thu Sep 14 23:06:30 2017: Appended file /proj...
    intake_esm_varname:         None

How do I concatenate along the time axis?

@andersy005
Copy link
Contributor

@AJueling, do you mind if I transfer this issue to this https://github.com/NCAR/intake-esm-datastore repo instead? I am planning on commenting once it's there

@AJueling
Copy link
Author

Thanks for the quick reply! I don't mind if you move it, of course. (I was not sure where to ask this in the first place.)

@andersy005 andersy005 transferred this issue from NCAR/esm-collection-spec Jan 29, 2020
@andersy005
Copy link
Contributor

@AJueling,

Are you working with time-slices (history files i.e. do you have one time step in each file with a bunch of data variables) or time-series (multiple time steps with one data variable)?

As @matt-long pointed out in intake/intake-esm#112

There is a widespread assumption in intake-esm that there is one variable per file. This precludes using the package with multi-variable files, such as those written directly by CESM.

Unfortunately, this issue of multi-variable files is still unresolved :(

How do I concatenate along the time axis?

If you were working with time-series (single data variable per file), the following would address the issue:

  • Add a time_range column in the csv that specifies the date ranges in each file.

  • Add an aggregation_control section to your collection.json:

{
  "esmcat_version": "0.1.0",
  "id": "CESM_simulations",
  "description": "This is an ESM collection for CESM1 simulations.",
  "catalog_file": "simulations.csv",
  "attributes": [
    {
      "column_name": "component",
      "vocabulary": ""
    },
    {
      "column_name": "frequency",
      "vocabulary": ""
    },
    {
      "column_name": "experiment",
      "vocabulary": ""
    },
    {
      "column_name": "variable",
      "vocabulary": ""
    },
    {
      " column_name": "time_range",
      "vocabulary": ""
    }
  ],
  "assets": {
    "column_name": "path",
    "format": "netcdf"
  },
  "aggregation_control": {
    "variable_column_name": "variable",
    "groupby_attrs": [
      "component",
      "experiment",
      "stream"
    ],
    "aggregations": [
      {
        "type": "union",
        "attribute_name": "variable"
      },
      {
        "type": "join_existing",
        "attribute_name": "time_range",
        "options": {
          "dim": "time",
          "coords": "minimal",
          "compat": "override"
        }
      }
    ]
  }
}

For reference, take a look at the collection for CESM2 runs (timeseries): https://github.com/NCAR/intake-esm-datastore/blob/master/catalogs/campaign-cesm2-cmip6-timeseries.json.

@AJueling
Copy link
Author

@andersy005 thank you for the reply. I am indeed working with time slice files that contain many variables which is the standard output format of CESM as far as I know. It's good to know that it does not work for my use case and I will use a different approach. I suppose we can close this for now and I will follow @matt-long's issue for any updates.

@andersy005 andersy005 changed the title Collection with monthly CESM output files Collection with monthly CESM output files (history files) Jan 30, 2020
@andersy005
Copy link
Contributor

It's likely that this issue is of interest to other users. So, Let's leave it open (as a reference) until the multi variable files are supported.

@andersy005
Copy link
Contributor

@AJueling, just wanted to let you know that we've been working on functionality for building and using catalogs for CESM runs. Recently, @mgrover1 put together a great blog post with details on how to build a catalog from CESM history files: https://ncar.github.io/esds/posts/ecgtools-history-files-example/

@andersy005 andersy005 added this to Xdev Oct 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

4 participants