This data repository hosts datasets covering China's road and rail transportation networks. These datasets are compiled in The Distributional Impacts of Transportation Networks in China by Lin Ma and Yang Tang, published in the Journal of International Economics. The published version of the paper is here, and the ungated working paper version is here.
To use the data, please cite:
"Lin Ma and Yang Tang. The Distributional Impacts of Transportation Networks in China. Journal of International Economics (2024): 103873."
BibTex
@article{MT2024_Transportation_China,
title = {The Distributional Impacts of Transportation Networks in China},
journal = {Journal of International Economics},
pages = {103873},
year = {2024},
issn = {0022-1996},
doi = {https://doi.org/10.1016/j.jinteco.2023.103873},
url = {https://www.sciencedirect.com/science/article/pii/S0022199623001599},
author = {Lin Ma and Yang Tang},
keywords = {Regional trade, Migration, Welfare, Economic Geography},
abstract = {We document that the quality of roads and railroads vary substantially over time and space in China, and neglecting these variations biases the distributional impacts of transportation networks. To account for quality differences, we construct a new panel dataset and approximate quality using the design speed of roads and railroads that varies by vintage, class, and terrain at the pixel level. We then build a dynamic spatial general equilibrium model for multiple modes, transportation routes, and forward-looking migration decisions. Our findings demonstrate that disregarding quality differences leads to a median bias of approximately 31% in estimating real wage growth rates at the prefecture level. Moreover, this bias is non-random and correlates with the initial conditions of the prefectures, resulting in significant errors when predicting the distributional effects of transportation networks.}
}
This dataset contains three components:
- Prefecture-to-Prefecture Travel Time;
- Pixel-Level Information: Design Speed and Travel Time;
- Segment-Level Information: Name, Rate, Year of Construction, and Applicable Design Code.
The current version covers the transportation network between 1994 and 2017.
The folder sample_codes contains some sample codes for computing travel time between any pixels using Fast Marching. Please refer to the readme file in that folder for details.
This data set records the prefecture-to-prefecture travel time in units of hours for three modes of transportation: road, railroad (freight), and railroad (passenger). The current version covers the years 1994 to 2017 and contains 279 prefectures.
All the files are stored in the folder pref_pair.
The travel time is computed using the Fast Marching algorithm based on the pixel-level design speed. Ma and Tang (2024) provide more details on how to infer the pixel-level design speed and prefecture-to-prefecture travel time.
The dataset contains the following files.
cityinfo.csv
is the file that contains the basic information about the 279 prefectures.
-
id
: the index of a prefecture. -
dzcode
: the four-digit administrative division code. -
coord_long
: the longitude. -
coord_lat
: the latitude. -
pos_x
: the$x$ index in the 12669-by-8829 pixel-level matrix dataset. -
pos_y
: the$y$ index in the 12669-by-8829 pixel-level matrix dataset. -
cityname_chn
: prefecture name in Chinese. -
cityname_eng
: prefecture name in English. -
cpop2000
: total population in the unit of ten-thousands, 2000 census. -
cpop2010
: total population in the unit of ten-thousands, 2010 census. -
upop2000
: urban population in the unit of ten-thousands, 2000 census. -
upop2010
: urban population in the unit of ten-thousands, 2010 census. -
cityclass
: official city size classification based on the 2010 census:-
cityclass
= 7: Mega City (超大城市), with an urban population greater than 10 million. -
cityclass
= 6: Major City (特大城市), with an urban population between 5 and 10 million. -
cityclass
= 5: Type-I Large City (I型大城市), with an urban population between 3 and 5 million. -
cityclass
= 4: Type-II Large City (II型大城市), with an urban population between 1 and 3 million. -
cityclass
= 3: Medium City (中等城市), with an urban population between 500 thousand and 1 million. -
cityclass
= 2: Type-I Small City (I型小城市), with an urban population between 200 and 500 thousand. -
cityclass
= 1: Type-II Small City (II型小城市), with an urban population smaller than 200 thousand.
-
The following files contain the distance matrix:
pref_pair/time_cost_prefecture_pair_rail_good.csv
contains the travel time for freight transportation on the rail network.pref_pair/time_cost_prefecture_pair_rail_pass.csv
contains the travel time for passenger transportation on the rail network.pref_pair/time_cost_prefecture_pair_road.csv
contains the travel time for both freight and passenger transportation on the road network.
The travel time data files share the same structure. Each file contains
- The first two columns,
origin
anddestination,
are the four-digit admin codes of the origin and destination prefectures. year_yyyy
: the travel time between the two prefectures in the yearyyyy
in hours.
This data set contains the design speed of the roads and railroads on each pixel and the travel time to traverse these pixels in the
The travel time estimations between any two points, including the prefecture-to-prefecture travel time database reported above, are computed based on this data set using the Fast Marching Algorithm. The folder sample_codes contains examples of using these data to compute travel time between any two pixels.
All the files are stored in the folder pixel_info. The data files are named pixel_info_MMMM_YYYY.csv,
where MMMM
refers to the three modes of transportation: road, railroad (freight), and railroad (passenger), and YYYY
refers to the year.
Each row of the data file refers to a pixel with infrastructure build-up. The columns contain the following variables:
-
seg_id
: the unique index of a segment that the pixel belongs to. Theseg_id
is the same as the segment-level dataset, with details here. -
long
: the longitude of the pixel. -
lat
: the latitude of the pixel. -
pos_x
: the$x$ index in the 12669-by-8829 pixel-level matrix dataset. -
pos_y
: the$y$ index in the 12669-by-8829 pixel-level matrix dataset. -
index
: the index number of the pixel in the 12669-by-8829 matrix. The index number is the output of the following MATLAB function:index=sub2ind([8829 12669],pos_y,pos_x)
-
speed
: the design speed of the infrastructure on the pixel in kilometers per hour. -
time
: the time required to traverse the pixel in the unit of hours. See the note below on its computation. -
usage_type
: the usage type of the infrastructure that takes three values:-
both
refers to mixed freight and passenger transportation usage. All road transportation and the majority of railroads fall into this category. -
good
refers to freight-only railroad transportation. -
pass
refers to passenger-only railroad transportation.
-
-
terrain
: the terrain type of the pixel that takes four values. Refer to the Appendix to Ma and Tang (2024) for the terrain definition:-
0
: coastal areas. -
1
: plains. -
2
: low-rolling hills. -
3
: hills. -
4
: mountains.
-
Notes:
- To compute
time
fromspeed,
the authors used the following equation:$time = 0.5(1+\sqrt{2})*distance/speed$ . The variable$distance$ is computed based on the average distance to move to the four adjacent pixels. In most cases, the distance equals to 0.5097 km. The term$0.5(1+\sqrt{2})$ corrects for the fact that around half the time, travelers cross a pixel along the diagonal.
This data set records the information for roads and railroads at the segment level. A segment is a group of pixels that form part of a named road or railroad referred to as a path (such as the Beijing-Shanghai Railway or Beijing-Shijiazhuang Highway) constructed in a given year. Information that varies at the segment level includes rate, the year of construction, usage type, and design codes. See the detailed definitions of "segment" and "path" in Ma and Tang (2024).
All the files are stored in the folder seg_info. We create three data files for each mode of transportation. These files are as follows:
- segment-level information: year of construction, applicable standard, rates, and parent path.
- segment-year level information: usage type by year.
- segment-pixel mapping: the coordinates of each pixel within a segment.
The file seg_info_MMMM.csv
contains the segment-level information for the mode (MMMM
= road or rail). Each row in the file refers to a segment. The columns are as follows:
seg_id
: The unique id of the segment, which is the same as those used in the pixel-level dataset.rate
: The rate of the segment, such as "National I" for railroads or "First-Rate" for roads. See the subsection below for more details.year
: The year of construction.year_std
: The publication year of the applicable design code. For example, highways built in 2010 were subject to the highway design codes published in 2003. In this case,year
= 2010, andyear_std
= 2003. We did not assignyear_std
for HSR because we collected the design speed of each high-speed railway by hand. See the subsection below for more details.path_name
: (in Chinese) The name of the path to which the segment belongs. For example, Segmentrail_10
is part of the "滨绥铁路," and therefore we record the name of the path in this variable.path_supplement
: (in Chinese) Additional information about the path, such as original names, original usage, phases, etc. In the example above, the supplement information forrail_10
is "原东清铁路东线."notes
: (in Chinese) Additional information about the segment, usually regarding the segment's endpoints. We record this information for quality control purposes. In the example above, the notes forrail_10
is "海林到牡丹江."
The variable rate
in the dataset above records a codename for the rate of the railroad and roads. The details regarding the railroad rates in the dataset are as follows:
rate |
Full Name | Full Name in Chinese | Usage Type |
---|---|---|---|
GT1 | National I | 国家I级铁路,客货共线 | mixed |
GT2 | National II | 国家II级铁路,客货共线 | mixed |
GT3 | National III | 国家III级铁路,客货共线 | mixed |
GT4 | National IV | 国家IV级铁路, 客货共线(2012标准后) | mixed |
GT1G | National I (Freights) | 国家I级铁路,货运 | freights |
GT2G | National II (Freights) | 国家II级铁路,货运 | freights |
GT3G | National III (Freights) | 国家III级铁路,货运 | freights |
GT | National Rail, Unknown Rate | 国家铁路, 等级未知 | mixed |
HSR | High-Speed Rail | 高速铁路 | passenger |
DT1 | Local I | 地方I级铁路,客货共线 | mixed |
DT2 | Local II | 地方II级铁路,客货共线 | mixed |
DT3 | Local III | 地方III级铁路,客货共线 | mixed |
DT1G | Local I (Freights) | 地方I级铁路,货运 | freights |
DT | Local Rail, Unknown Rate | 地方铁路, 等级未知 | mixed |
IT1 | Industrial I | 工业企业I级铁路 | freights |
IT2 | Industrial II | 工业企业II级铁路 | freights |
IT3 | Industrial III | 工业企业III级铁路 | freights |
IT | Industrial Rail, Unknown Rate | 工业企业铁路, 等级未知 | freights |
UN | Unknown Rate | 等级未知 | mixed |
NA | Non-Active | 废置铁路 | - |
The road rates in the dataset are as follows:
rate |
Full Name | Full Name in Chinese | Usage Type |
---|---|---|---|
highway | Highway | 高速公路 | mixed |
first-rate | First-Rate Road | 国家I级公路 | mixed |
The variable year_std
records the publication year of the applicable design standard for a given segment. The detailed design standards are as follows:
year_std |
Full Name | Full Name in Chinese | Standard Code | Mode |
---|---|---|---|---|
1985 | Code for Design of Railway Line (1985) | 铁路线路设计规范 | GBJ90-85 | rail |
1987 | Code for Design of Standard Railway Line for Industrial Firms | 工业企业标准轨距铁路线路设计规范 | GBJ12-87 | rail |
1999 | Code for Design of Railway Line (1999) | 铁路线路设计规范 | GB50090-99 | rail |
2006 | Code for Design of Railway Line (2006) | 铁路线路设计规范 | GB50090-2006 | rail |
2012 | Code for Design of III and IV Rated Railway Line | III、IV级铁路设计规范 | GB50012-2012 | rail |
2017 | Code for Design of Railway Line (2017) | 铁路线路设计规范 | TB10098-2017 | rail |
1988 | Technical Standard of Highway Engineering (1988) | 公路工程技术标准 | JTJ01-88 | road |
1997 | Technical Standard of Highway Engineering (1997) | 公路工程技术标准 | JTJ01-97 | road |
2003 | Technical Standard of Highway Engineering (2003) | 公路工程技术标准 | JTG B01-2003 | road |
2014 | Technical Standard of Highway Engineering (2014) | 公路工程技术标准 | JTG B01-2014 | road |
The file seg_year_MMMM.csv
contains the segment-year level information. In the current version, the only variable that varies at this level is the usage type of railroads. Several railroads switched between "mixed-use" and "freight-only" during our sample period. Each row in the file refers to a segment. The columns are as follows:
seg_id
: the unique segment id.year_type_YYYY
: the usage type of the segment in the yearYYYY.
type= none
indicates that the road or railroad was no longer used that year.
The file seg_pixel_MMMM.csv
contains the mapping between segments and pixels for mode MMMM.
Each row in the file refers to a pixel. The columns are as follows:
-
seg_id
: the unique id of the segment to which the pixel belongs. -
long
: the longitude of the pixel. -
lat
: the latitude of the pixel. -
pos_x
: the$x$ index in the 12669-by-8829 raster map. -
pos_y
: the$y$ index in the 12669-by-8829 raster map. -
index
: the index number of the pixel in the 12669-by-8829 raster map. The index number is the output of the following MATLAB function:index=sub2ind([8829 12669],pos_y,pos_x)