This repository contains code to reproduce the results in the paper WOMBAT: A fully Bayesian global flux-inversion framework. This README assumes at least a passing familiarity with the WOMBAT framework, or of Bayesian-synthesis inversions more generally.
Unless stated otherwise, all commands are to be run in the root directory of the repository.
This workflow requires both Python version 3+ and R version 4+, along with a variety of dependency packages in both languages. The easiest way to set up an environment in which to run this code is to use Anaconda. Instructions for setting up an appropriate conda environment are provided below. If, for some reason, you don't want to do that, you can adapt the instructions below to a local installation using pip
and your local copy of R - everything should work okay. For what remains, we assume you have Anaconda installed.
Create a conda environment and set it to use conda-forge:
conda create --yes --prefix .conda_env
conda activate ./.conda_env
conda config --env --add channels conda-forge
conda config --env --set channel_priority strict
Install conda packages
conda install python==3.8 r-base pkg-config udunits2 libgdal xarray scipy \
netcdf4 numpy==1.19.2 jinja2 netcdf-fortran cdo nco fortran-compiler==1.0.4 \
gfortran_linux-64==7.3.0
Set your CRAN mirror. WOMBAT was made in Australia, so we choose the mirror run by CSIRO, but you can pick your favourite:
cat > .conda_env/lib/R/etc/Rprofile.site <<- EOF
local({
r <- getOption('repos')
r['CRAN'] <- 'https://cran.csiro.au'
options(repos = r)
})
EOF
Install CRAN dependencies (this might take awhile):
Rscript -e "install.packages(c(
'devtools', 'raster', 'argparser', 'codetools', 'ncdf4', 'fst', 'matrixStats',
'readr', 'argparse', 'scoringRules', 'patchwork', 'lifecycle', 'sf',
'ggplot2', 'dplyr', 'tidyr', 'RcppEigen', 'rnaturalearth',
'rnaturalearthdata', 'rgeos'
))"
Install the required local R packages:
Rscript -e "devtools::install('fastsparse'); devtools::install('wombat')"
Check out the required version of GEOS-Chem:
(cd external/GEOS_Chem && git clone --branch 12.3.2 https://github.com/geoschem/geos-chem.git Code.v12.3.2)
If you have access to a GPU, it's worth installing tensorflow to speed up inversions considerably. This can be done with
pip3 install tensorflow-gpu
Rscript -e "install.packages('tensorflow')"
All input datasets go into the data
directory. There are a few files already there, but the rest will need to be retrieved from their primary sources.
Several publicly-available datasets are required to run any part of the code.
- The OCO-2 MIPv7 data files:
- The OCO-2 10 s averages, in a file named
OCO2_b70b_10sec_WL15_GOOD_v10.nc
. These are available from the OCO-2 MIPv7 website. Place this file in thedata
directory. - TCCON 30 minute averages, available from the OCO-2 MIPv7 website. The NetCDF files should be placed in the directory
data/TCCON/20170706
. - The ObsPack measurements used by the MIPv7. At present, these are not available from the MIPv7 website. Please ask the OCO-2 MIPv7 team for assistance, or reach out to us and we can help. The two relevant datasets are
obspack_co2_1_CT2016_prepped_inputs_RESTRICTED_2017-04-26
andobspack_co2_1_NRT_v3.3_prepped_inputs_2017-04-26
, and go into thedata
directory.
- The OCO-2 10 s averages, in a file named
- Some of the results of OCO-2 MIPv7:
- The posterior fluxes, available from the OCO-2 MIPv7 website under "Level 4 product download". These should be placed in the
data/mip-fluxes
directory. - The posterior fits to TCCON retrievals. These are available from the MIPv7 website, and can be downloaded by running
bash data/download-mip-tccon-fits.sh
- The posterior fluxes, available from the OCO-2 MIPv7 website under "Level 4 product download". These should be placed in the
GEOS-Chem requires meteorological fields and CO2 emission to run. These go into the data/GEOS_Chem
directory. There are instructions for how to download these files on the GEOS-Chem wiki.
The directories you need to download are:
ExtData/GEOS_2x2.5/GEOS_FP
ExtData/CHEM_INPUTS/MODIS_LAI_201204
ExtData/CHEM_INPUTS/Olson_Land_Map_201203
ExtData/HEMCO/AEIC/v2015-01
ExtData/HEMCO/CO2/v2015-04/OCEAN
ExtData/HEMCO/CO2/v2015-04/BIOFUEL
ExtData/HEMCO/QFED/v2018-07/2014
ExtData/HEMCO/QFED/v2018-07/2015
ExtData/HEMCO/QFED/v2018-07/2016
ExtData/HEMCO/QFED/v2018-07/2017
There are some other datasets required. These are
- The CarbonTracker 2019 biosphere prior fluxes. These are available at the CarbonTracker website. The files in particular you want are named
CT2019.prior_b4_3hrly_YYYY.nc
, whereYYYY
is a year. These should be placed intodata/CT2019_b4_prior
. - ODIAC 2018 fossil-fuel fluxes. These are available from the ODIAC website. Place the yearly files
odiac2018_1x1d_YYYY.nc
intodata/ODIAC_Dataset_2018
.
To avoid the need to run GEOS-Chem, we have placed some intermediate files online at the NIASRA CKAN website.
To use these, download the tar archive, then extract the files into the root directory of this repository with
tar xzf ~/path/to/wombat-inversion-intermediates-20211110.tar.gz
Then run
for run in random1 random2; do
mkdir -p 3_inversion/intermediates/GEOS_Chem/runs/run.v12.3.2.osse.$run
(
cd 3_inversion/intermediates/GEOS_Chem/runs/run.v12.3.2.osse.$run \
&& touch LAST_PREPARED RUN_COMPLETE
)
mkdir -p 3_inversion/intermediates/GEOS_Chem/matched-runs/run.v12.3.2.osse.$run
(
cd 3_inversion/intermediates/GEOS_Chem/matched-runs/run.v12.3.2.osse.$run \
&& touch LAST_SUBSETTED LAST_COMPUTED_XCO2 LAST_COMBINED
)
done
(cd 3_inversion/intermediates && touch *)
(cd 4_results/intermediates && touch *)
to bring the timestamps up to date. Then you can run the workflow from step 3 onwards as described in Running inversions and generating the figures.
First, get the required data as listed in Data required for all steps.
There are four main steps in the workflow, corresponding to four numbered directories. These are:
1_transport
: Sets up and performs GEOS-Chem runs for the prior mean of the CO2 mole-fraction field, and for the mole-fraction basis functions which arise from perturbing the flux field. The outputs of these runs are netcdf files containing the space/time field of the CO2 fluxes and the space/height/time field of the CO2 mole fractions. This is the most computationally intensive step in the workflow by far.2_matching
: Postprocesses the outputs in the previous step. In particular, monthly averages of the flux field are computed, and the mole-fraction field is subsetted to certain locations and times of interest (corresponding to locations/times with data).3_inversion
: The meat of the inversions. Using the reduced fields computed in the previous step, constructs a transport-model matrix. Preprocesses data to a suitable form for inversions. Performs the inversions using the WOMBAT statistical framework, and calculates posterior fluxes in a usable form.4_results
: Generates the figures and tables presented in the WOMBAT paper.
If you want to reproduce all the results, you need to run all the steps. The slowest is the first, running the transport model. If you just want to run inversions, you can find a link the required intermediate files in the Intermediate output of matching step to run inversions section of this document. These contain the results of the 2_matching
step. If for some reason you need the results of the full GEOS-Chem runs, which are several terabytes, please either run them yourself (see below, preferable), or get in contact and we will see if we can arrange access to them from our archives.
This step will complete the GEOS-Chem runs, and is by far the most computationally intensive part of the workflow. If you just want to do inversions on the same time period and data as we did, you can skip this step! That aside...
First, follow the steps in Data required for all steps and Data required for the GEOS-Chem runs. Then run
make setup_runs
will create the directories for the base and sensitivities runs of GEOS-Chem. These exist as
1_transport/intermediates/GEOS_Chem/runs/run.v12.3.2.base
- the base run.1_transport/intermediates/GEOS_Chem/runs/yyyy/run.v12.3.2.RrrMmm
- sensitivity runs over years (yyyy), regions (rr), and months (mm).
The base run needs to be completed before the sensitivity runs can complete, because the latter need restart files produced by the former.
How you actually run these depends on your system. On a single core machine, completing these runs takes several years of computation. For the WOMBAT paper, we completed the runs partly on our local HPC, and partly on the NCI's Gadi supercomputer. The absolute simplest way is to change into the run directory and run:
bash run_geos.mp
Alternatively, if your system has SLURM, you can change into the directory and schedule a single run with
sbatch run_geos.sbatch
or multiple runs with
make submit_geos_chem_base_run
for the base run and
make submit_geos_chem_sensitivity_runs
for the sensitivity runs. Keep in mind, you may need to edit the file run_geos.sbatch
for compatibility with your system. You can do this for all runs by editing 1_transport/src/GEOS_Chem/run.v12.3.2.base.template/run_geos.sbatch
and 1_transport/src/GEOS_Chem/run.v12.3.2.sensitivity.template/run_geos.sbatch
and running make setup_runs
again.
Finally, for the sensitivity runs there is a script run_geos.gadi
for NCI's Gadi, which may be useful if you want to use the same system.
Once all the runs are complete, run
make mark_runs_complete
which will allow subsequent steps to run.
If you have completed the GEOS-Chem runs, you can run the matching steps with
WOMBAT_MAX_WORKERS=8 make -j2 2_matching_targets
You can modify the WOMBAT_MAX_WORKERS
variable and the -j
option to suit your local system.
Alternatively, you can skip this step by downloading the intermediate files, see Intermediate output of matching step to run inversions.
If you have completed the matching step, or downloaded the required intermediate files (see Intermediate output of matching step to run inversions above), you can run
make -j4 4_results_targets
You can modify the -j
option to suit your local system. If you have access to a GPU and have installed tensorflow
(see Install tensorflow), you can run inversions using
WOMBAT_TENSORFLOW=1 make -j1 3_inversion_targets
make -j4 4_results_targets
Note that the above is split into two parts, the first with make parallelism turned off, because most GPUs only have enough memory to run one inversion at a time. So we complete the inversions before generating the figures with parallelism turned back on.