-
Notifications
You must be signed in to change notification settings - Fork 1
Gen cesm catalog #5
base: main
Are you sure you want to change the base?
Conversation
First pass just prints to screen data needed from xml files
cesmcatalog/gen_CESM_catalog.py
Outdated
for var in ['GET_REFCASE', 'RUN_REFCASE']: | ||
run_config[var] = subprocess.check_output('./xmlquery --value {}'.format(var), shell=True) | ||
DOUT_S = subprocess.check_output('./xmlquery --value DOUT_S', shell=True) | ||
if DOUT_S == 'TRUE': | ||
DOUT_S_ROOT = subprocess.check_output('./xmlquery --value DOUT_S_ROOT', shell=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be safe, we may need some error handling here by wrapping these lines a try block. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, error handling would be great. I'll add it to the list :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're relying on the CIME.Case
object instead of running these via subprocess
, the failure mode is returning None
rather than the expected variable value; that's worth checking for, but it'll be an if
statement instead of a try
/ except
block.
I didn't like having {filename}... because it looked like we were creating a .csv.gz... file
Also, first time running with the pre-commit hooks picked up some formatting changes (unclear why this didn't happen in the CI framework)
Only look for {CASE}*.nc instead of *.nc
This entails opening netCDF files to get the long_name attribute; the current implementation opens one file per variable name per component, but if a variable is spread across multiple files it is assumed that the long_name does not change. Also created local copies of some of the other data in the catalog to make it easier to reference between columns (e.g. storing path locally so I don't need to access catalog['path'][-1] to get the most recent path value)
Right now, the script itself does not use the debug message level but running with -d will add some environment information to the output.
cesmcatalog/gen_CESM_catalog.py
Outdated
try: | ||
os.chdir(case_root) | ||
except: | ||
# TODO: set up logger instead of print statements | ||
logger.error('{} does not exist'.format(case_root)) | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to replace this with a contextmanager approach:
from contextlib import contextmanager
@contextmanager
def chdir(path):
"""
Change working directory to `path` and restore it again
This context manager is useful if `path` stops existing during your
operations.
"""
old_dir = os.getcwd()
os.chdir(path)
try:
yield
finally:
os.chdir(old_dir)
You can then use it as follows:
with chdir(case_root):
... DO SOME WORK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I'll look into this tomorrow (comment is "outdated" but hasn't been resolved yet)
Still use xmlquery to get CIMEROOT, which is needed to import Case.
If we already know cimeroot, no need for xmlquery at all -- this will be useful once this script is part of the post-processing suite, but for now we fallback to using xmlquery to set up the path to the CIME python libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks.
Added a script that generates an intake catalog for time series data generated by CESM if
pyReshaper
was run in post-processing.Note that this file assumes
intake-esm
can handle a relative path from thecsv.gz
file to the netCDF data.I can think of several improvements this script needs, some which might belong in this PR and others that might spawn new issue tickets.
Error handling if file name doesn't fit the
{casename}.{stream}.{variable}.{start_date}-{end_date}.nc
template (e.g. if history files and time series are collocated, which is default behavior of CESM postprocessing)Backup plan for determining location of time series if
pp_config
is not availableI don't think
will work for determining branch point if a run is based off a reference case, as they aren't in
run_config
yetAnd I'm sure additional issues will come up, but I wanted to open this PR to advertise that this script is working in tightly controlled instances.
Closes #2