Skip to content

Running SMOKE using EPA's Emissions Modeling Platforms

Huy Tran edited this page Sep 25, 2024 · 38 revisions

Introduction

This documentation is intended to help users run SMOKE via EPA's Emissions Modeling Platforms. The SMOKE User's Manual (v5.0) details the individual SMOKE programs including input and output file formats, and program settings. The sections below discuss how the run scripts provided by the modeling platform are structured to run SMOKE and how users might need to modify those run scripts and their settings depending on the modeling being done.

(This document is still being developed and incomplete at this time)

Overview

EPA releases their own Emissions Modeling Platforms (EMP) which provide a full set of emissions inventories, other data files, run scripts, and helper tools for running SMOKE. A modeling platform is for a particular base year and may include future years.

Emissions are processed by sector which are groupings of emission inventories based on how the emissions are produced. Some examples of sectors include:

Sector Name Description
airports Aircraft and ground support equipment
livestock Agricultural livestock
onroad Vehicles driving on roads
ptegu Point source electric generating units (EGUs)

Command-line run scripts are used to

  • Manage running the individual SMOKE programs
    • Some SMOKE programs only need to be run once per sector, others need to be run multiple times (e.g. Smkmerge is run once per day in the modeling episode)
    • Make sure dependent programs are run in sequence
      • Smkinven must be run first
      • Spcmat, Grdmat, and Temporal depend on Smkinven intermediate files
      • Smkmerge combines intermediate files from all the prior programs
  • Coordinate inputs and settings across multiple SMOKE programs
  • Ensure consistent naming of intermediate and output files

System Requirements

  • Linux OS system (e.g., Ubuntu, CentOS, Fedora)
  • The run scripts included in EPA's EMP use C-shell, referenced at the top of all scripts as: "#/bin/csh". If /bin/csh is not available on your system, but /bin/tcsh is, you may need to either edit all run scripts to say /bin/tcsh at the top, or create a softlink in which /bin/csh -> /bin/tcsh
  • Python version 3.0 or later, along with select python libraries. Many of the helper scripts included in EMP package use python. The python scripts within this package reference '#!/usr/bin/env python' and may need to be changed for a new computing platform. All of the scripts have been updated for Python 3.5 to match the configuration on EPA's cluster. Using Python 2.7 requires the "future" modules. In most cases that can be done by running: sudo pip install future.
  • (Optional) SMOKE and IOAPI executables: EPA's EMP is provided with pre-compiled version of SMOKE and IOAPI (SMOKE v5.0 and IOAPI v3.2 as for EMP 2020). These pre-compiled executables were specifically compiled for EPA's cluster system and may or may not be executed on a different system. (See Incompatible SMOKE's pre-compiled executables in the Troubleshooting section for more information). In this case, SMOKE's executables must be re-compiled for this system that they are running on (See SMOKE Installation Instructions for how-to).

Installing a Emission Modeling Platform

For a particular EMP, EPA provides the data files and run scripts to download. This documentation uses EMP 2020 as illustrative example. From the 2020 Emissions Modeling Platform page, click 2020 Data Files and Summaries to go to the download area. Each platform includes a documentation file with details about the platform, requirements, and installation instructions. For 2020, this file is named info_2020ha2_package_22sep2023 .txt.

Obtaining meteorology input files

Meteorology input files that had been processed by the Meteorology-Chemistry Interface Processor (MCIP) are required for processing emissions from several sectors including biogenics (beis4), onroad, onroad_ca_adj, dust (afdust/canada_afdust/canada_ptdust), and point sources if Laypoint program is used to calculate plume rise. MCIP data is not included in the package but provided separately on CMAS Data Warehouse for various CMAQ modeling platform. The MCIP files for modeling year are provided in the CMAQ 2020 modeling platform

NOTE: The inventory year is not necessarily must be same as modeling year (e.g., EMP 2020 can be processed to model air quality in 2023 using MCIP files for 2023). See notes on Customizing directory_definitions.csh for new system for settings that might need to be made for SMOKE to find and read meteorology input files.

Directory Structure

Using the 2020 EMP as an example, after installation, the INSTALL_DIR contains the following directories:

2020ha2_cb6_20k

  • This directory contains the input inventories and run scripts for the "2020ha2_cb6_20k" case. A case is a way of referring to the inputs and settings used for your modeling scenario. This particular case naming scheme is used in the EPA's emissions modeling platforms.

ge_dat

  • This directory contains general ancillary files including those related to speciation, spatial allocation (gridding), temporalization profiles and profile cross-reference files. This data is not case-specific.

ioapi

smoke5.0

  • This directory contains the individual SMOKE executable programs (e.g. Smkinven, Grdmat, Spcmat, etc.) and various helper scripts.

Case-specific directories

[discuss types of files in each sub-directory]

2020ha2_cb6_20k

  • inputs: contains emission inventories for the year 2020 including national CEMS emissions, nonroad inventories and onroad activity data, and point and nonpoint inventories
  • intermed: contains intermediate output files and log files from SMOKE core programs
  • premerged: contains output files from smkmerge program for nonpoint source sector and airport sector. These output files can subsequently be merged by using mrggrd program
  • reports: contains output report file in ascii format created by smkreport program
  • scripts: contains directory_definitions.csh, helper files and SMOKE running scripts for nonpoint, point, biogenic and onroad sectors
  • smoke_out: contains SMOKE output files for point and merged files from mrggrd for nonpoint source sectors. The output files are grouped by model domain and chemistry mechanism

Customizing directory_definitions.csh for new system

In the following are list of variables of which value need to be changed in order to run EPA's EMP or adapting EMP for a new modeling study on a new system

  • INSTALL_DIR: Path to top-level directory that contains directories 2020ha2_cb6_20k, ge_dat, smoke5.0, ioapi
  • MET_ROOT: Location of MCIP input files (also see note on ASSIGNS_FILE)
  • CASE: Specifies top-level directory where SMOKE input and output files located in. If replicating EPA's EMP, value of CASE should not be changed (e.g., "2020ha2_cb6_20k").
  • REGION_IOAPI_GRIDNAME: Specifies model grid ID in the GRIDDESC. If replicating EPA's EMP, value of REGION_IOAPI_GRIDNAME should not be changed (e.g., "12US1_459X299").
  • (Optional) SMOKE_LOCATION: By default, SMOKE uses pre-compiled executables under INSTALL_DIR/smk${smkversion}/Linux2_x86_64ifort in which variable smk${smkversion} is defined by ASSIGNS_FILE based on setting of ${MODEL_LABEL} in directory_definitions.csh. Specifying SMOKE_LOCATION to use a customized SMOKE executable on the current system.
  • (Optional) IOAPI_LOCATION: By default, SMOKE uses pre-compiled IOAPI executables under INSTALL_DIR/ioapi. Specifying IOAPI_LOCATION to use a customized IOAPI executable on the current system.
  • ASSIGNS_FILE: Path to ASSIGNS_FILE should not be changed. However, certain modifications may need to be made to ASSIGNS_FILE for SMOKE to recognize MCIP input files. Generally speaking, the ASSIGNS_FILE utilizes Linux command to infer filename of MCIP input files from the episode date, but the inferred file naming may not match the input files (e.g., GRIDCRO2D.12US1.35L.YYYYMMDD.nc vs GRIDCRO2D.12US1.35L.YYMMDD.ncf)
  • Settings of EMF* variables are not relevant on non-EPA system and could be ignored

Run Script Structure

Run scripts are located in the case directory $CASESCRIPTS and are organized by sector: biogenics, nonpoint, onroad, and point. The merge directory contains scripts for merging multiple sectors together.

$CASESCRIPTS/nonpoint

  • Containing run scripts for nonpoint sectors including afdust, rail, np_oilgas, np_solvents, rwc, nonpt, livestock, fertilizer, nonroad, other nonpoint sectors in Canada and Mexico including canmex_area, canada_afdust, canada_onroad, mexico_onroad, canmex_ag
  • There is one run script for each source sector (unlike for point source sectors where the run scripts are made into two parts): annual or monthly run scripts.
  • Annual scripts: emission inputs are estimated annually
  • Monthly scripts: emission inputs are estimated monthly
  • By default, outputs of nonpoint sectors are written to $PROJECT_ROOT/$CASE/premerged/$SECTOR, and they can be subsequently merged with 2D or 3D gridded outputs of other source sectors
  • Notes on nonpoint sectors (taken from info_2020ha2_package_22sep2023.txt):
    • afdust: Particulate emissions from fugitive dust sources. This sector is processed in two steps. The first (Annual_afdust_12US1_) processes the annual inventory, and the second (Annual_afdust_adj_12US1) applies adjustments (transportable fraction and meteorologically-based) and outputs the adjusted emissions under the sector name "afdust_adj". The afdust scripts must be run in that order.
    • canada_afdust: Particulate emissions from fugitive dust sources in Canada. This sector was called "othafdust" in modeling platforms prior to 2020. Just like with afdust, this sector is processed in two steps. The first (Annual_canada_afdust_12US1_) processes the annual inventory, and the second (Annual_canada_afdust_adj_12US1) applies adjustments (transportable fraction and meteorologically-based) and outputs the adjusted emissions under the sector name "canada_afdust_adj". The canada_afdust scripts must be run in that order. Fugitive dust emissions in Mexico are included in the othar sector and do not need the same transportable fraction and meteorological adjustments that the Canada fugitive dust emissions in canada_afdust do.
    • canada_onroad: Mobile onroad source emissions from Canada. This sector was called "onroad_can" in modeling platforms prior to 2020.
    • canmex_ag: Area source emissions from Canada and Mexico agricultural sources. In emissions modeling platforms prior to 2020, only Canada ag sources were processed separately, but now both Canada and Mexico ag sources are processed together, all as area sources.
    • canmex_area: Area source emissions from Canada and Mexico, including mobile nonroad. This sector was called "othar" in modeling platforms prior to 2020.
    • fertilizer: Agricultural emissions from fertilizer (ammonia). Fertilizer emissions are not included in the sector merge because CMAQ was run with the bidirectional flux option for this case, but fertilizer emissions and scripts are provided in this package for those who wish to run this sector. The fertilizer inventory in this package is from 2020NEI and may not match the actual fertilizer emissions calculated inline by CMAQ.
    • livestock: Agricultural emissions from livestock. This is mostly ammonia, but includes other pollutants from livestock sources as well.
    • mexico_onroad: Mobile onroad source emissions from Mexico. This sector was called "onroad_mex" in modeling platforms prior to 2020. The onroad Mexico emissions inventory includes pre-speciated VOC emissions for an older CB6 mechanism, so there is an extra script for this sector to convert those emissions to the newer CB6 mechanism needed for CMAQ. This extra script is called *_part2_combine.csh and uses the combine utility to perform the CB6 mechanism conversion. The combine program is included, pre-compiled, in the SMOKE package along with pre-compiled SMOKE executables and I/O API utilities. To help make the distinction between emissions for the two CB6 mechanisms, older CB6 emissions use the sector name "mexico_onroad_cb6orig". The combine post-processing step creates emissions files with the final sector name "mexico_onroad".
    • nonpt: Area source emissions not included in other sectors.
    • nonroad: Off highway mobile source emissions. EPA processed this sector using a new SMOKE feature supporting output of multiple subsector emissions files in a single SMOKE run. Specifically, SMOKE outputs diesel and gas (including all non-diesel fuels such as LPG and CNG) in separate emissions files as if they are different sectors, in addition to the total nonroad emissions. By default, the scripts included in this package will output separate emissions for gas and diesel (in addition to the total nonroad emissions). To turn off this option and have SMOKE only generate a single nonroad emissions file per day, set environment variable SMK_SUB_SECTOR_OUTPUT_YN to N.
    • np_oilgas: Area source oil and gas emissions.
    • np_solvents: Area source emissions from solvents.
    • onroad: On highway mobile source emissions, excluding California. This sector is processed using SMOKE-MOVES with multiple scripts as described in section 4B of info_2020ha2_package_22sep2023.txt. EPA processed this sector using a new SMOKE feature supporting output of multiple subsector emissions files in a single SMOKE run. Specificially, SMOKE-MOVES outputs diesel and gas (including all non-diesel fuels such as E-85) as separate emissions files as if they are different sectors, in addition to the total onroad emissions. By default, the scripts included in this package will output separate emissions for gas and diesel (in addition to the total sector emissions). To turn off this option and have SMOKE only generate a single emissions file per day, set environment variable SMK_SUB_SECTOR_OUTPUT_YN to N.
    • onroad_ca_adj: On highway mobile source emissions, California only. This sector is processed using SMOKE-MOVES with multiple scripts as described in section 4B of info_2020ha2_package_22sep2023.txt. EPA processed this sector using a new SMOKE feature supporting output of multiple subsector emissions files in a single SMOKE run. Specificially, SMOKE-MOVES outputs diesel and gas (including all non-diesel fuels such as E-85) as separate emissions files as if they are different sectors, in addition to the total onroad emissions. By default, the scripts included in this package will output separate emissions for gas and diesel (in addition to the total sector emissions). To turn off this option and have SMOKE only generate a single emissions file per day, set environment variable SMK_SUB_SECTOR_OUTPUT_YN to N.
    • rail: Area source railway emissions
    • rwc: Area source residential wood combustion emissions

$CASESCRIPTS/point

  • Containing run scripts for point sectors including ptnonipm, pt_oilgas, ptfire, ptegu, ptagfire, ptfire-rx, pt-wild, cmv_c1c2, cmv_c3_12, airport and other point sectors in Canada and Mexico including canada_og2D, canada_ptdust, canmex_point
  • Run scripts for each point sectors are made into two parts: 'onetime' and 'daily' run scripts; the 'onetime' script must be run first.
  • Onetime run scripts:
  • Daily run scripts:
  • By default, outputs of point sectors are written to $PROJECT_ROOT/$CASE/smoke_out/$CASE/$REGION_ABBREV/$EMF_SPC/$SECTOR
  • Notes on sectors (taken from info_2020ha2_package_22sep2023.txt):
    • airports: unlike other point source sectors, airports is treated as low-level point sources (no inline files) and their outputs are written to $PROJECT_ROOT/$CASE/premerged/$SECTOR as 2D gridded emission files
    • canada_og2d: Point source emissions from low-level Canada upstream oil and gas sources. All emissions in this sector are low-level only (no inline files)
    • canada_ptdust: Point source particulate emissions from fugitive dust sources in Canada. This sector was called "othptdust" in modeling platforms prior to 2020. In Canada, dust emissions are in area source format for some sources (canada_afdust sector) and point source format for other sources (canada_ptdust sector). This is a 'point' sector with additional adjustments, and is processed via THREE scripts: the 'onetime' script, the 'daily' script, and then the adjust script (canada_ptdust_adj), in that order. All emissions in this sector are low-level only (no inline files).
    • canmex_point: Point source emissions from Canada and Mexico. This sector was called "othpt" in modeling platforms prior to 2020. This is a 'point' sector, and like all 'point' sectors, is processed via two scripts: the 'onetime' script, and the 'daily' script. The 'onetime' script must be run first. All emissions in this sector are elevated (no low-level contribution).
    • cmv_c1c2: Emissions from C1 and C2 commercial marine sources, including ports and navigable waterways. Includes C1/C2 marine emissions in the entire domain, including US, Canada, Mexico, and all bodies of water which lie outside the boundaries of those countries. Inventories for this sector are grid-specific and designed for the 12US1 grid, or 12km grids which are a subset of 12US1 (e.g. 12US2). Therefore, emissions are output under the sector name "cmv_c1c2_12". All emissions in this sector are elevated (no low-level contribution).
    • cmv_c3: Emissions from C3 commercial marine sources, including ports and navigable waterways. Includes C3 marine emissions in the entire domain, including US, Canada, Mexico, and all bodies of water which lie outside the boundaries of those countries. Inventories for this sector are grid-specific and designed for the 12US1 grid, or 12km grids which are a subset of 12US1 (e.g. 12US2). Therefore, emissions are output under the sector name "cmv_c3_12". All emissions in this sector are elevated (no low-level contribution).
    • ptagfire: Point source agricultural burning emissions. The ptagfire sector uses a daily point source inventory. All emissions in this sector are elevated (no low-level contribution).
    • ptegu: Electric generating unit emissions. This sector incorporates CEM (Continuous Emissions Monitoring) hourly emissions for a majority of sources. For ptegu there are two 'daily' scripts for different months of the year: 'summer' (May through September), 'winter' (December through February), and 'wintershld' ("shoulder" months; March/April/October/November). For sources without hourly CEM emissions, summer and winter use different hourly temporalization, and so they are run with separate inputs. All emissions in this sector are elevated (no low-level contribution).
    • ptnonipm: Point source emissions from industrial activities. All emissions in this sector are elevated (no low-level contribution).
    • ptfire-rx: Point source emissions from year specific controlled (prescribed) burning. Fires are processed in the 'inline' format for CMAQ, and are all elevated (no low-level contribution). The two daily scripts, "lowactivity" and "highactivity", are run for different months of the year, and both must be run.
    • ptfire-wild: Point source emissions from year specific wild fires. Fires are processed in the 'inline' format for CMAQ, and are all elevated (no low-level contribution). The two daily scripts, "lowactivity" and "highactivity", are run for different months of the year, and both must be run.
    • ptfire_othna: Point source emissions from year specific controlled burning and wild fires in the rest of North America ('OTHNA' = OTHer North America), including Canada and Mexico. In addition to Canada and Mexico, fire emissions for Central America and the Caribbean are also included. Emissions from those areas are ultimately not modeled due to being outside of the 12US1 modeling domain, but they are provided for possible use in larger grids. These fires are processed in the 'inline' format for CMAQ, and are all elevated (no low-level contribution).
    • pt_oilgas: Point source oil and gas emissions, including emissions from offshore oil rigs in the Gulf of Mexico. All emissions in this sector are elevated (no low-level contribution).

Run Script Variables

[document variables specific to run scripts (not SMOKE program inputs)]

  • SECTOR: naming identity of the targeting sector. Helper files run_settings.txt and sectorlist will apply the specified settings to corresponding SECTOR

  • EMISINV_* A single or multiple emission input files for point|nonpoint|onroad sectors and they are specified by EMISINV_* where '*' could be any letter or number and is not required to be in alphabetical or numerical order (e.g., a run script can have only EMISINV_E instead of EMISINV_A as the only emission input file). A helper script combine_data.csh will combine all variations of EMISINV_* specified in the run script to create a list file containing paths to all emission input files. This list file will be written out to $CASEINPUTS/$SECTOR

  • EMISLST Alternative to using EMISINV_* to specify individual emission input files, a list file containing paths to all emission input files can be specified as EMISLST

  • GSREFTMP_* and GSPROTMP_*: similarly to the use of EMISINV_* setting, the combine script combine_data.csh will combine all variations of speciation assignment GSREFTMP_* and speciation profiles GSPROTMP_*

  • REPCONFIG_INV

  • REPCONFIG_GRID

  • REPCONFIG_TEMP

  • RUN_MONTHS: Specifies the months that SMOKE will process emission for. If an entire calendar year is to be processed for, a list of number from 1 to 12 should be listed. By default the EMP is set up to process emissions for the full month but it could be configured to process emissions for only part of the month (See Section 12 in info_2020ha2_package_22sep2023.txt)

  • SPINUP_DURATION: Specifies number of spin up days preceding the months specified in RUN_MONTHS that emissions will be processed for. More information in Section 11 in info_2020ha2_package_22sep2023.txt

  • SPINUP_MONTH_END: Specifies whether the last $SPINUP_DURATION days of quarter 2/3/4 should be run at the end of a quarter (Y), or at the start of the next quarter (N). For example, if running with SPINUP_DURATION = 10: When N (old behavior), Q1 will include 10 day spinup and end on 3/21; Q2 will cover 3/22 through 6/20. When Y, Q1 will include 10 day spinup and end on 3/31 (including all of March), remaining quarters will function as if spinup = 0.

  • L_TYPE and M_TYPE:

    • L_TYPE: controls temporal program in processing daily emissions

    • M_TYPE: controls smkmerge program in merging intermediate outputs from other core programs including from temporal to generate final emission output file at desire frequent

    • Values of L_TYPE and M_TYPE:

      • all: Hourly emissions are calculated for every day of the modeling period. The output hourly emissions can vary between every day of the modeling period.
      • week: Hourly emissions are calculated for all days in one "representative" week of a month. That week is then duplicated for all weeks in the month. The output hourly emissions can vary between each day of the week, but will not vary week-to-week within the month.
      • mwdss: Hourly emissions are calculated for one representative Monday, representative weekday (Tuesday through Friday), representative Saturday, and representative Sunday for the month. These days are then duplicated throughout the month. The output hourly emissions can vary between Mondays, other weekdays, Saturdays and Sundays within the month, but will not vary week-to-week within the month.
      • aveday: Hourly emissions are calculated for one representative day of each month, meaning emissions for all days within a month are the same.

Helper Files

run_settings.txt

Descriptions of run_settings.txt are as follow:

Column Description
Sector Sector abbreviation or "all" (e.g., SECTOR in SMOKE run script)
Grid Grid abbreviation (e.g., GRID in ASSIGN_FILE or in SMOKE run script)
Environment Variable Name Lowercase name of SMOKE program (e.g., smkinven, spcmat, smkmerge)
Part Part number cooresponding to the RUN_PART* settings in the SMOKE scripts
Start Date Start date or 0 for programs not affected by dates in the scripts
End Date End date or 0 for programs not affected by dates in the scripts
Value Control - set to Y to run program and N to not run program.
  • For 'Sector' column, sector-specific entries will override "all" entries.
  • All rows not matching script arguments , , , and will be ignored for a given call to the script.
  • If either columns 'Start Date' or 'End Date' are set to 0, both will be ignored.
  • Lowercase values in column 'Value' are automatically converted to uppercase.
  • Full-line comments can use a # sign in the first character of the line.
  • End-of-line comments can use a ! sign before the comment
  • run_settings.txt should contain no blank lines unless a comment character is provided
  • A sector does not need to have its entries in run_settings.txt and "Y" in 'Value' column to have its SMOKE program processed since the defaults assumed to be "Y" for all SMOKE programs and all sectors. However, including "Y" entries will not hurt anything.
  • Sector-specific "Y" entries will override Sector = "all" "N" entries for the same grid, program, part, and date range.

Information on which PART number each SMOKE program is tied with is as the below table.

Area Biogenic Mobile Point
PART1 Smkinven, Spcmat, Grdmat, Cntlmat, Grwinven Normbeis3, Normbeis4, Metscan Smkinven, Spcmat, Grdmat, Cntlmat, Grwinven
PART2 Temporal Tmpbeis3, Tmpbeis4 Temporal
PART3      
PART4 Smkmerge, Mrggrid, Smk2emis Mrggrid, Smk2emis Movesmrg, Mrggrid, Smk2emis

NOTE: As of EMP 2018v2 (SMOKE v4.8.1), run_settings.txt was made to support "all" or "ALL" in place of PART1, PART2, etc.

List of SMOKE programs controlled by run_settings.txt: smkinven, spcmat, grdmat, smkreport, elevpoint, temporal, laypoint, smkmerge, mrggrid, smk2emis, emisfac, mbsetup, premobl, cemscan, cntlmat, grwinven, mrgelev, normbeis3, tmpbeis3, m3stat, domain, movesmrg, m3xtract, layalloc, normbeis4, tmpbeis4

Example content 2020ha2_cb6_20k/scripts/run_settings.txt

# Sector, Grid, Environment Variable Name, Part, Start Date, End Date, Value
beis4,12US1,normbeis4,PART1,0,0,N

In the above example, run_settings.txt instructs SMOKE to not run (Value = "N") program normbeis4 (PART1) for sector beis4 of grid-ID 12US1. The 2020ha2_cb6_20k platform comes with intermediate output from normbeis4 (under 2020ha2_cb6_20k/intermed/beis4/beis_norm_emis_12US1_2020ha2_cb6_20k.ncf) and therefore there's no need to run this program again.

Hint: Using run_settings.txt could be handy in testing new SMOKE emission model platform. For example, one can configure run_settings.txt so that only one SMOKE program run at a time to check for error in input data, or to generic a specific output data. In the below example, only smkreport is set to run to generate inventory reports files for QA/QC purpose without the need to rerun smkinven, grdmat and spcmat (assuming that these programs were successfully executed earlier)

rwc, 12US1, smkinven, all, 0, 0, N
rwc, 12US1, spcmat, all, 0, 0, N
rwc, 12US1, grdmat, all, 0, 0, N
rwc, 12US1, cntlmat, all, 0, 0, N
rwc, 12US1, temporal, all, 0, 0, N
rwc, 12US1, smkmerge, all, 0, 0, N
rwc, 12US1, smkreport, all, 0, 0, Y

sectorlist

Example of sectorlist content (excerpted from sectorlist_2020ha2_cb6_20k)

sector,sectorcase,sectbaseyr,mrgapproach,prevyrspinup,endzip,speciation,mergesector,projectroot
"ocean_cl2","none","2020","$OCL2ROOT$GRID$EXT","SectBaseYr","N","","Y",""
"afdust","2020ha2_cb6_20k","2020","week_Y","SectBaseYr","N","","N",""
"afdust_adj","2020ha2_cb6_20k","2020","all","SectBaseYr","Y","","Y",""
"airports","2020ha2_cb6_20k","2020","week_Y","SectBaseYr","N","","Y",""
"beis4","2020ha2_cb6_20k","2020","all","actualMet","N","","N",""
"canmex_ag","2020ha2_cb6_20k","2020","mwdss_N","SectBaseYr","N","","Y",""
"canada_og2D","2020ha2_cb6_20k","2020","mwdss_N","SectBaseYr","N","","Y",""
"fertilizer","2020ha2_cb6_20k","2020","all","SectBaseYr","Y","","N",""
"livestock","2020ha2_cb6_20k","2020","all","SectBaseYr","Y","","Y",""
"nonpt","2020ha2_cb6_20k","2020","week_Y","SectBaseYr","N","","Y",""
"nonroad_gas","2020ha2_cb6_20k","2020","mwdss_Y","SectBaseYr","N","","Y",""
"nonroad_diesel","2020ha2_cb6_20k","2020","mwdss_Y","SectBaseYr","N","","Y",""
"np_oilgas","2020ha2_cb6_20k","2020","aveday_N","SectBaseYr","N","","Y",""
"np_solvents","2020ha2_cb6_20k","2020","aveday_N","SectBaseYr","Y","","Y",""
"onroad_gas","2020ha2_cb6_20k","2020","all","SectBaseYr","Y","","Y",""
"onroad_diesel","2020ha2_cb6_20k","2020","all","SectBaseYr","Y","","Y",""
"onroad_ca_adj_gas","2020ha2_cb6_20k","2020","all","SectBaseYr","Y","","Y",""
"onroad_ca_adj_diesel","2020ha2_cb6_20k","2020","all","SectBaseYr","Y","","Y",""
"canada_onroad","2020ha2_cb6_20k","2020","week_N","SectBaseYr","N","","Y",""
"mexico_onroad","2020ha2_cb6_20k","2020","week_N","SectBaseYr","N","","Y",""
"pt_oilgas","2020ha2_cb6_20k","2020","mwdss_Y","SectBaseYr","N","","N",""
"ptnonipm","2020ha2_cb6_20k","2020","mwdss_Y","SectBaseYr","N","","N",""
"rwc","2020ha2_cb6_20k","2020","all","SectBaseYr","Y","","N",""

The sectorlist file controls which sectors are included in the sector merge. This file is included in the package along with the run scripts, and is located in the $CASE/scripts directory for each case. Within the sectorlist file, the 'mergesector' column controls which sectors are merged (Y or N). The column "mrgapproach" indicates the temporal approach used for each sector, i.e. nonroad uses the mwdss approach including holidays (mwdss_Y) while np_solvents uses aveday without holidays (aveday_N). The "mergesector" column indicates if the sector should be merged.

The merging run script will read the sectorlist file and determine which input files to provide to the SMOKE program Mrggrid for each day of the modeling period. The mapping of representative days for each day of the modeling period is specified in the smk_merge_dates file under smoke5.0 directory provided with the 2020 EMP package (e.g., smoke5.0/scripts/run/2020/smk_merge_dates_202001.txt)

By default, the sector merge scripts are configured to exclude biogenics and RWC. Some CMAQ modelers may wish to include biogenic emissions in the gridded emissions files and not have CMAQ compute biogenics inline. The newest versions of CMAQ include features which require the RWC sector emissions to be passed into CMAQ separately. So, both the output directory and the individual filenames of the merged emissions indicate whether beis and RWC are included. By default they are both excluded so the files are output to a directory called "merged_nobeis_norwc" and each filename says "nobeis_norwc". To run the sector merge with biogenics or RWC, set the mergesector column to Y for the 'beis4' and 'rwc' sectors.

To merge in alternative biogenic emissions files, edit the sectorlist by changing the 'beis4' sector name to the sector name of your choice, and make sure your biogenic emissions files exist in the $CASE/premerged/[sector name] directory with filenames adhering to the file name convention used by other sectors.

Many point sectors have mergesector set to N (e.g. ptegu, othpt) because they do not have any 2-D gridded emissions.

Fertilizer is also excluded from the sector merge by default, because ammonia emissions from fertilizer are generated within CMAQ using bidirectional flux.

Helper Scripts

[discuss scripts in the SMOKE/scripts/emf directory, log_analyzer]

Scripts Descriptions
smoke5.0/scripts/run/emf_cleanup.csh Check for all needed environment variables. If any aren't available, print a warning message and abort
smoke5.0/scripts/run/set_months_v4.csh Sets the months needed for a given script run and determines which months have spinup
smoke5.0/scripts/run/timetracker_v2.csh Put date/time information about SMOKE to a file
smoke5.0/scripts/run/combine_data_v6.csh Creates .lst files for inventory or concatenates datasets into a single dataset
smoke5.0/scripts/run/smk_run_v9.csh Runs the SMOKE processors
smoke5.0/scripts/run/qa_run_v10.csh Runs the SMOKE QA processors
smoke5.0/scripts/run/m3stat_chk_v6.csh Runs m3stat on SMKMERGE output to check for NaN and negative values
smoke5.0/scripts/run/set_days_v5.csh Sets the dates needed for a given script run and stores these days info
smoke5.0/scripts/log_analyzer/log_analyzer.py Parses and characterizes warnings and error messages from the SMOKE logs (More details in section 9 of info_2020ha2_package_22sep2023.txt)
smoke5.0/scripts/log_analyzer/known_messages.txt List of known error and warning messages in SMOKE log files (More details in section 9 of info_2020ha2_package_22sep2023.txt)
smoke5.0/scripts/run/duplicate_check.csh Checks a speciation, gridding, and temporal cross-reference file for duplicates
smoke5.0/scripts/annual_report/annual_report_v2.py Takes smkmerge state/province summaries and aggregates over them to create an annual report
smoke5.0/scripts/run/path_parser.py Takes the path to a file and parses out the directory

[add warning on combine_data_v6.csh casting a wide net of filename pattern to to create .lst file which may include unwanted files, e.g., pthour_*]

How-To Guides

Running part of a month

See Section 12 in info_2020ha2_package_22sep2023 .txt.

Running for a different grid

See general_for_running_other_grids.txt

Note: Processing meteorological adjusted fugitive dust (afdust_adj) requires grid-specific Afdust transportable fraction (XPORTFAC) file. This file could be created either by using EPA's provided package or by windowing from XPORTFAC file provided with the EMP for the 12US1 domain to target domain buy using IOAPI m3wndw tool.

Customizing emissions reports

By default, the run scripts run the Smkreport program, output to the $PROJECT_ROOT/$CASE/reports/inv directory. For sectors with annual inventories, reports are annual. For sectors with monthly inventories (e.g. nonroad), reports are monthly.

Most reports include all inventory pollutants and model species, although PM10 usually appears as zero due to a SMOKE quirk; to get PM10, sum PM2_5 and PMC in the report. For onroad, these reports reflect activity, not emissions, and include some double counting due to how SMOKE allocates activity to different processes; therefore, the reports/inv reports should not be used for the onroad or onroad_ca_adj sectors.

The following types of reports are generated. Note that not all types of reports are generated for all sectors:

  • *state.txt: State totals.
  • *county.txt: County totals.
  • *state_scc.txt: State/SCC totals.
  • *county_scc.txt: County/SCC totals.
  • *state_naics.txt: State/NAICS totals.
  • *cell_${GRID}.txt: Totals by grid cell.
  • *cell_county_${GRID}.txt: Totals by grid cell and county.
  • *state_grid_${GRID}.txt: State totals after gridding.
  • *srgid_${GRID}.txt: Emissions totals at various resolutions after gridding, and also including the spatial surrogate assignment.
  • *pm25prof.txt: Totals of PM2.5 at various resolutions, and also including the PM2.5 speciation profile assignment.
  • *vocprof.txt: Totals of VOC at various resolutions, and also including the VOC speciation profile assignment. For sectors which are integrated and have both NONHAPVOC and VOC, or have multiple modes of VOC, there may be multiple VOC profile reports for NONHAPVOC and (no-integrate) VOC and/or for each mode.

For all sectors except those processed with SMOKE-MOVES, smkmerge generates daily county total reports in the $PROJECT_ROOT/$CASE/reports/smkmerge directory. These are then summed to annual by state and county, output to the $PROJECT_ROOT/$CASE/reports/annual_report directory. Emissions totals in the reports/annual_report directory (post-temporalization) should be within 1-2% of the totals in the reports/inv directory (pre-temporalization).

For SMOKE-MOVES sectors, the $PROJECT_ROOT/$CASE/reports/smkmerge directory includes daily (or weekly if DAYS_PER_RUN=7) totals by county/SCC. Scripts to aggregate these totals to monthly or annual by state, county, state/SCC, and county/SCC are provided in the SMOKE utilities zip, movesmrg_report_postproc/ directory.

Troubleshootings

1. Error Message "Source: Too many arguments"

(Similar error and solution previously reported on CMAS forum: https://forum.cmascenter.org/t/source-too-many-arguments-while-running-daily-ptnonipm-script-2016v3/4129)

Solution: Check for consistency between the current shell being used in Linux (e.g., tcsh|csh; check by issuing command "echo $SHELL") and the header of emf runscript that is called at the bottom of SMOKE run script (e.g., $RUNSCRIPTS/emf/smk_ar_annual_emf.csh, which has header as #!/bin/tcsh -f; in this case, the script is using tcsh)

2. Incompatible SMOKE's pre-compiled executables

Examples of error messages:

./platform_2020ha2/smoke5.0/Linux2_x86_64ifort/smkinven/error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory

Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, MOVBE, POPCNT, F16C, AVX, FMA, BMI, LZCNT and AVX2 instructions.

Solution: SMOKE and IOAPI need to be compiled for the system that they are running on. Follow SMOKE Installation Instructions, then modify directory_definitions.csh to specify the custom path SMOKE_LOCATION and IOAPI_LOCATION.