Gen cesm catalog #5

mnlevy1981 · 2020-06-04T23:42:02Z

Added a script that generates an intake catalog for time series data generated by CESM if pyReshaper was run in post-processing.

$ ./gen_CESM_catalog.py -c /glade/p/cgd/oce/people/mlevy/cases/b.e22b05.B1850.f09_g17.timeseries_output_for_intake/
INFO (_gen_timeseries_catalog): Will catalog files in /glade/p/cgd/oce/people/mlevy/archive/b.e22b05.B1850.f09_g17.timeseries_output_for_intake
INFO (_gen_timeseries_catalog): Creating /glade/p/cgd/oce/people/mlevy/archive/b.e22b05.B1850.f09_g17.timeseries_output_for_intake/intake/cesm_catalog.csv.gz...

$ zcat /glade/p/cgd/oce/people/mlevy/archive/b.e22b05.B1850.f09_g17.timeseries_output_for_intake/intake/cesm_catalog.csv.gz | head -n 4
case,component,stream,variable,start_date,end_date,path,parent_branch_year,child_branch_year,parent_case
b.e22b05.B1850.f09_g17.timeseries_output_for_intake,atm,cam.h0,TAUBLJY,000101,000112,../atm/proc/tseries/month_1/b.e22b05.B1850.f09_g17.timeseries_output_for_intake.cam.h0.TAUBLJY.000101-000112.nc,-1,-1,-
b.e22b05.B1850.f09_g17.timeseries_output_for_intake,atm,cam.h0,num_c2SFWET,000101,000112,../atm/proc/tseries/month_1/b.e22b05.B1850.f09_g17.timeseries_output_for_intake.cam.h0.num_c2SFWET.000101-000112.nc,-1,-1,-
b.e22b05.B1850.f09_g17.timeseries_output_for_intake,atm,cam.h0,dst_c3,000101,000112,../atm/proc/tseries/month_1/b.e22b05.B1850.f09_g17.timeseries_output_for_intake.cam.h0.dst_c3.000101-000112.nc,-1,-1,-

Note that this file assumes intake-esm can handle a relative path from the csv.gz file to the netCDF data.

I can think of several improvements this script needs, some which might belong in this PR and others that might spawn new issue tickets.

Error handling if file name doesn't fit the {casename}.{stream}.{variable}.{start_date}-{end_date}.nc template (e.g. if history files and time series are collocated, which is default behavior of CESM postprocessing)
Backup plan for determining location of time series if pp_config is not available

I don't think

 catalog['parent_branch_year'] = entry_cnt*[run_config['RUN_REFDATE']]
 catalog['child_branch_year'] = entry_cnt*[run_config['RUN_STARTDATE']]

will work for determining branch point if a run is based off a reference case, as they aren't in run_config yet

And I'm sure additional issues will come up, but I wanted to open this PR to advertise that this script is working in tightly controlled instances.

Closes #2

First pass just prints to screen data needed from xml files

andersy005 · 2020-06-05T01:05:04Z

cesmcatalog/gen_CESM_catalog.py

+  for var in ['GET_REFCASE', 'RUN_REFCASE']:
+    run_config[var] = subprocess.check_output('./xmlquery --value {}'.format(var), shell=True)
+  DOUT_S = subprocess.check_output('./xmlquery --value DOUT_S', shell=True)
+  if DOUT_S == 'TRUE':
+    DOUT_S_ROOT = subprocess.check_output('./xmlquery --value DOUT_S_ROOT', shell=True)


To be safe, we may need some error handling here by wrapping these lines a try block. What do you think?

Yeah, error handling would be great. I'll add it to the list :)

Now that we're relying on the CIME.Case object instead of running these via subprocess, the failure mode is returning None rather than the expected variable value; that's worth checking for, but it'll be an if statement instead of a try / except block.

cesmcatalog/gen_CESM_catalog.py

I didn't like having {filename}... because it looked like we were creating a .csv.gz... file

Also, first time running with the pre-commit hooks picked up some formatting changes (unclear why this didn't happen in the CI framework)

Only look for {CASE}*.nc instead of *.nc

This entails opening netCDF files to get the long_name attribute; the current implementation opens one file per variable name per component, but if a variable is spread across multiple files it is assumed that the long_name does not change. Also created local copies of some of the other data in the catalog to make it easier to reference between columns (e.g. storing path locally so I don't need to access catalog['path'][-1] to get the most recent path value)

Right now, the script itself does not use the debug message level but running with -d will add some environment information to the output.

andersy005 · 2020-06-09T16:10:19Z

cesmcatalog/gen_CESM_catalog.py

+    try:
+        os.chdir(case_root)
+    except:
+        # TODO: set up logger instead of print statements
+        logger.error('{} does not exist'.format(case_root))
+        sys.exit(1)


We may want to replace this with a contextmanager approach:

from contextlib import contextmanager @contextmanager def chdir(path): """ Change working directory to `path` and restore it again This context manager is useful if `path` stops existing during your operations. """ old_dir = os.getcwd() os.chdir(path) try: yield finally: os.chdir(old_dir)

You can then use it as follows:

with chdir(case_root): ... DO SOME WORK

Nice, I'll look into this tomorrow (comment is "outdated" but hasn't been resolved yet)

cesmcatalog/gen_CESM_catalog.py

Still use xmlquery to get CIMEROOT, which is needed to import Case.

If we already know cimeroot, no need for xmlquery at all -- this will be useful once this script is part of the post-processing suite, but for now we fallback to using xmlquery to set up the path to the CIME python libraries.

jedwards4b

Looks good, thanks.

mnlevy1981 added 3 commits June 4, 2020 13:34

Script that queries CASEROOT for some info

cee6371

First pass just prints to screen data needed from xml files

Update how data is stored in memory

55338dd

Generate catalog from time series!

22d2a6d

andersy005 reviewed Jun 5, 2020

View reviewed changes

cesmcatalog/gen_CESM_catalog.py Outdated Show resolved Hide resolved

mnlevy1981 added 8 commits June 4, 2020 22:23

Should pass black checks now

426d99c

Should pass flake8 checks now

77573c7

Should pass isort checks now

c4c17f4

Updated wording on generation log message

ea36019

I didn't like having {filename}... because it looked like we were creating a .csv.gz... file

Added RUN_REFDATE and RUN_STARTDATE to run_config

239c182

Also, first time running with the pre-commit hooks picked up some formatting changes (unclear why this didn't happen in the CI framework)

Get CASE from xml file

357abae

Only look for {CASE}*.nc instead of *.nc

Add --debug option

05a88fb

Right now, the script itself does not use the debug message level but running with -d will add some environment information to the output.

andersy005 reviewed Jun 9, 2020

View reviewed changes

cesmcatalog/gen_CESM_catalog.py Outdated Show resolved Hide resolved

jedwards4b suggested changes Jun 9, 2020

View reviewed changes

cesmcatalog/gen_CESM_catalog.py Outdated Show resolved Hide resolved

cesmcatalog/gen_CESM_catalog.py Show resolved Hide resolved

mnlevy1981 added 3 commits June 9, 2020 16:37

Get CIME variables from Case object

e091854

Still use xmlquery to get CIMEROOT, which is needed to import Case.

Drop "import *" and make case read_only

a5ce885

Add optional --cimeroot argument

27f6860

If we already know cimeroot, no need for xmlquery at all -- this will be useful once this script is part of the post-processing suite, but for now we fallback to using xmlquery to set up the path to the CIME python libraries.

jedwards4b approved these changes Jun 9, 2020

View reviewed changes

Base automatically changed from master to main February 3, 2021 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gen cesm catalog #5

Gen cesm catalog #5

mnlevy1981 commented Jun 4, 2020

andersy005 Jun 5, 2020

mnlevy1981 Jun 5, 2020

mnlevy1981 Jun 9, 2020

andersy005 Jun 9, 2020

mnlevy1981 Jun 9, 2020

jedwards4b left a comment

Gen cesm catalog #5

Are you sure you want to change the base?

Gen cesm catalog #5

Conversation

mnlevy1981 commented Jun 4, 2020

andersy005 Jun 5, 2020

Choose a reason for hiding this comment

mnlevy1981 Jun 5, 2020

Choose a reason for hiding this comment

mnlevy1981 Jun 9, 2020

Choose a reason for hiding this comment

andersy005 Jun 9, 2020

Choose a reason for hiding this comment

mnlevy1981 Jun 9, 2020

Choose a reason for hiding this comment

jedwards4b left a comment

Choose a reason for hiding this comment