Open Source Data

The dGen team at NREL has compiled and uploaded data needed to run dGen, along with descriptions of the data, to this Open Energy Data Initiative (OEDI) submission: https://data.openei.org/submissions/1931

High-Level Data Description:

Template PostgreSQL Database:

diffusion_config - This schema contains tables governing the possible configurations present in the input sheet.
diffusion_iso_rto_data - This schema contains tables related to county-state-ISO/RTO topology mapping, ISO/RTO load profiles by zone and sector, as well as participation factors.
diffusion_load_profiles - This schema contains tables relating to the load profiles used by agents generated by the NREL Buildings team.
diffusion_mapping - This schema contains additional tables related to county-state-ISO/RTO-NERC topology mapping as well as existing installed capacity.
diffusion_resource_solar - This schema contains a table, solar_resource_hourly, which contains the solar capacity factor for a given geographic-azimuth-tilt combination that matches to the same geographic-azimuth-tilt combination found in the pre-generated agents pickle file.
diffusion_shared - This schema contains tables used for inputs in the input sheet. Please browse these tables as the names of these tables are representative of what these data are.
diffusion_solar - This schema contains tables with additional data pertaining to modeling solar constraints, incentives, and costs.
diffusion_template - This schema contains tables that are copied to make a new schema upon completing a dgen model run. Many of these are populated with data from the input sheet, from various joins/functions done within the database, and of course data from the model run.
diffusion_wind - This schema contains tables with additional data pertaining to modeling wind constraints, incentives, and costs. This table is unsupported and not used for the alpha version of the model.

Pre-Generated Agents & Load Profiles:

Every dGen analysis starts with a base agent file that uses statistically-sampled agents meant to be comprehensive and representative of the modeled population. They are comprehensive in the sense they are intended to represent the summation of underlying statistics, e.g. the total retail electricity consumed in the state. They are representative in that agents are sampled to represent heterogeneity of the population, e.g. variance in the cost of electricity. As described in ( Sigrin et al. 2018) “during agent creation, each county in United States is seeded with sets of residential, commercial, and industrial agents, each instantiated at population-weighted random locations within the county’s geographic boundaries. Agents are referenced against geographic data sets to establish a load profile, solar resource availability, a feasible utility rate structure, and other techno-economic attributes specific to the agent’s location. Each agent is assigned a weight that is proportional to the number of customers the agent represents in its county. In this context, agents can be understood as statistically representative population clusters and do not represent individual entities.”

Overview of variables/column names in agent files:

Pgid
- A unique geographic block identifier specific to how dGen agents are generated
Tract_id_alias
- An additional geographic code corresponding to pgid
County_id
- See excel to be included in some file on github or wiki
Bin_id
- Is an unique id corresponding to to an unique agent_id, used for initial sampling of agents.
State_abbr
- The state abbreviation for the state that an agent_id is within (e.g. CA for California, DE for Delaware, etc.)
Census_division_abbr
- Abbreviations of EIA U.S. Census Divisions. https://www.eia.gov/outlooks/aeo/pdf/f1.pdf
- Not used in model, but this was kept to provide geographic context to the agents
Customers_in_bin_initial
- Number of buildings represented by a particular agent
Load_kwh_per_customer_in_bin_initial
- The total load, in kWh, for a single building represented by a particular agent
Load_kwh_in_bin_initial
- Total load, in kWh, for all of the buildings represented by a particular agent
- I.e the load_kwh_per_customer_in_bin_initial multiplied by customers_in_bin_initial (agent weight).
max_demand_kw
- The maximum demand assessed in from the 8760 load profile of the agent
avg_monthly_kwh
- Average load consumed by an agent on a monthly basis
crb_model
- The building class (e.g. hospital, single family home, multi-family home, office building, etc.)
Hdf_load_index
- Internal code formerly used for binning agents in specific load regions
- No longer used, but kept for internal context
owner_occupancy_status
- Integer
- Either a 1 or a 2, indicating owner or renter, respectively
cap_cost_multiplier
- Regional capital cost added to capital cost assessed in the Annual Technology Baseline (ATB).
- Regional system cost estimates are sourced from https://www.nrel.gov/docs/fy19osti/72399.pdf
solar_re_9809_gid
- Internal geographic identifier used to assess the hourly solar resource for an agent
- Data source: NSRDB (National Solar Radiation Database)
Tilt
- Tilt angle of roof/solar system
Azimuth
- Azimuth angle of the roof/solar system
developable_roof_sqft
- Final assessed area of an agent’s roof that is potentially developable for PV
pct_of_bldgs_developable
- Percentage of buildings that have roofs that are potentially developable
bldg_size_class
- Small is classified as less than 5000 square feet
- Medium is classified as greater than or equal to 5000 square feet
- Large is classified as greater than or equal to 25000 square feet
Sector_abbr
- The sector that an agent is representing (commercial or residential)
Sector_tech
- The technology the agent is representing. Is defaulted as ‘solar’ for the beta release.
roof_adjustment
- The adjustment factor used to assess the final developable_roof_sqft value. This incorporates adjustments for setting back PV systems half a meter from the edge of a roof as well as shading and obstructions of roofs as assessed by LiDAR data.
Tariff_name
- Name of applied tariff
- Data Source: Utility Rate Database https://openei.org/apps/USURDB/rate/view/5580540c5457a372516c88a6
tariff_dict
- Data structure (a dictionary) used to store tariff related information
Tariff_id
- A dGen specific identifier used for various tariffs
Eia_id
- Corresponds to the unique utility id associated with a region that a given agent is in.

Load Profiles:

Sampling methods were used to select representative buildings for different regions of the U.S. The sampled buildings’ energy use was calculated using EnergyPlus, a physics-based building energy simulation software. The EnergyPlus simulation results were then validated by comparing against the EIA 2009 Residential Energy Consumption Survey (RECS) and the 2012 Commercial Building Energy Consumption Survey (CBECS) datasets and state energy use estimates from Federal Energy Regulatory Commission filings.

After sampling geographic locations each agent is assigned annual electricity consumption as well as the building type, roof style, owner occupancy status, and the agent’s weight. As described in Sigrin et al. 2018 “electrical load is further constrained to match county-level customer counts and annual loads from (Ventyx 2012). To ensure county-level load constraints are maintained, the sampled electric-load values are treated as intensity measures rather than absolute values, and agent-level demand is scaled appropriately. Though this method is inexact, it preserves agent-level variability in consumption and ensures county-level aggregate consumption is accurate. Hourly normalized residential and commercial load profiles are used to scale the agent’s annual consumption based on local weather patterns. Profiles are simulated by weather station (n = 79) for 15 commercial building types and one residential building type (Davidson et al. 2015; Ong, Denholm, and Clark 2012). Annual retail consumption in each ISO/RTO territory was calibrated based on 2018 data published by each respective ISO/RTO, including future load growth.

Developable Roof Characteristics:

Roof characteristics, including unshaded area, tilt, azimuth, vary by building and are also an important input to rooftop solar technical potential. The dGen model relies on LiDAR-based aerial imagery (Gagnon et al. 2016) to attribute agent roof characteristics. As described in further in (Benjamin Sigrin et al. 2016) the lidar scans are processed to develop developable area-tilt-azimuth joint probability distributions for small, medium, and large buildings in the 128 cities canvassed by (Gagnon et al. 2016). Weighted random sample are drawn from these joint distributions to assign each agent their developable roof area, tilt, and azimuth. Finally, in a similar process as the load profile sampling, total developable roof area is linearly scaled to constrain county-sector developable area to estimates made by (Gagnon et al. 2016).

Solar Generation Profile:

As described in (Ben Sigrin et al. 2018) “Solar resource data are sourced from the NSRDB 10-km Gridded Hourly Solar Database (George, R et al. 2007), which consists of hourly solar radiation estimates for approximately 91,500 grid cells in the continental United States at a 10-km2 spatial resolution. The hourly radiation values for each grid cell are based on typical meteorological year data (TMY3)(DOE 2013) from 1998 through 2005.” The hourly irradiance data is processed in the NREL PVWATTS v5 tool (Dobos 2014) to develop hourly solar generation profiles using the agent’s tilt-azimuth and location.

Model Calibration:

Aggregates from refUSA consumer dataset and American Community Survey (ACS) data were used to calibrate estimates of pv system adoption. The data dictionary for the refUSA dataset and can be found in the "2020 Full Historical Consumer Layout 5.12.20.xlsx" excel sheet in the OEDI data submission. Variable descriptions and ACS data can be found at https://api.census.gov/data/2018/acs/acs5/groups.html.

Retail Tariffs (include more info regarding how these are developed/specified in input sheet):

As described in (Ben Sigrin et al. 2018) “utility rates are sourced from the NREL’s Utility Rate Database. Rates are assigned to agents based on a ranking algorithm that considers their location, sector, and voltage limits. If multiple tariffs are available, agents are assigned the tariff with the lowest cost of energy from those available to their class. Also, in the case of missing utility coverage, agents are supplied rates from the nearest covered utility.” The set of retail rates were curated by NREL staff and represent tariffs as of March 2019. Curation steps include transcribing utility tariff sheets into a consistent digital format, indicating the most-subscribed residential and commercial tariff for each utility, and checking for errors. Despite NREL staff best efforts some rates are erroneously transcribed; for this reason rates with an average cost ($/kWh) that differ by > ±50% from the state average price by sector (EIA 2018) are excluded.

Using the Data:

After navigating to https://data.openei.org/submissions/1931 and downloading the data one file at a time, unzip any zipped files and move these to an easily accessible location in your home directory. Do not put these files in the cloned repository directory as you will not be able to push changes to github due to the size of these files. Hence, the file that you do move into the input_agents folder within the cloned repository to do a model run must be removed before pushing any other changes to your branch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly