This repository presents a set of large-scale ridesharing Dial-a-Ride Problem (DARP) instances. The instances were created as a standardized set of ridesharing DARP problems for the purpose of benchmarking and comparing different solution methods.
The instances are based on real demand and realistic travel time data from 3 different US cities, Chicago, New York City and Washington, DC. The instances consist of real travel requests from the selected period, positions of vehicles with their capacities and realistic shortest travel times between all pairs of locations in each city.
The instances and results of two solution methods, the Insertion Heuristic and the optimal Vehicle-group Assignment method, can be found in the linked dataset.
The dataset and methodology used to create it are described in the paper Large-scale Ridesharing DARP Instances Based on Real Travel Demand. This paper was accepted to the Intelligent Transportation Systems Conference 2023 (see the PowerPoint presentation).
- Instances and Results download
- Instances structure
- Results structure
- Instance creation
- Citation
- License
The dataset of instances and associated results are available through the dataset repository Zenodo. The dataset is compressed by 7zip to adhere to the Zenodo dataset size limits, with some of the archives split into multiple parts. The distance matrices, instances, and results are in separate archives. However, the folder structure inside the archives follows the schema described below. Thus, unpacking the distance matrix archives places them in an appropriate directory in the Instances
folder. The dataset is licensed under the Creative Commons Attribution 4.0 International license.
The instances are organized into directories based on their parameters. That is, an instance in an area, with a given start time, duration and max delay
📁 Instances/<area>/
├── 🗎 dm.hd5
└── 📁instances/start_<start time>/duration_<duration>/max_delay_<max delay>/
├── 🗎 vehicles.csv
└── 🗎 requests.csv
and consists of three files, vehicles.csv
, requests.csv
and dm.h5
. The vehicles.csv
and requests.csv
files are the main instance files, while the dm.h5
file is the distance matrix file that represents the travel time model
The instance folder contains the two main instance files:
📁Instances/<area>/instances/start_<start time>/duration_<duration>/max_delay_<max delay>/
-
🗎 requests.csv
- a 3 (4) column<tab>
separated file containing the list of requests$R$ with a header defining the following columns:-
time_ms
- a request time in milliseconds from the start of the day$t$ -
origin
- index of the origin node$o$ . Used for indexing into the distance matrix -
dest
- index of the destination node$d$ -
min_travel_time
(optional) - direct, minimal travel time in seconds between origin and destination nodes
-
-
🗎 vehicles.csv
- a 2-column<tab>
separated file containing the set of vehicles$V$ with no header row and the following column meaning:- vehicle starting node
$s$ - vehicle capacity
$c$
- vehicle starting node
A concrete example of an instance path is Instances/NYC/instances/start_18-00/duration_05_min/max_delay_03_min/
.
🗎 Instances/<area>/dm.hd5
The travel time model hdf5
format. To load the distance matrix into Python, use h5py
python package. The loading of the distance matrix is implemented in the MatrixTravelTimeProvider.from_hdf
. Method get_travel_time(from_index, to_index)
implements the access to the distance matrix and is equivalent to
In addition to the main instance files, the instance and area folders contain several additional files holding metadata about the instance used for instance generation, visualization, or analysis. The list of the files with their location in the directory tree is below.
📁Instances/
├── 📁NYC/
│ └── ...
├── 📁Manhattan/
│ └── ...
├── 📁DC/
│ └── ...
└── 📁Chicago/
├── 🗎 dm.h5 # Area-specific distance matrix
├── 📁map/
│ ├── 🖺 nodes.csv # List of nodes present in the area
│ ├── 🖺 edges.csv # List of edges present in the area
│ └── 📁shapefiles/ # Area shapefiles for visualization
│ ├── 🗺 nodes.[shx, shp, prh, dbf, cpg]
│ └── 🗺 edges.[shx, shp, prh, dbf, cpg]
└── 📁instances/
├── 📁start_<time>/
│ ├── 📁duration_<duration>/
│ │ ├── 📁max_delay_<delay>/
│ │ │ ├── 🖺 config.yaml # Instance generation config file
│ │ │ ├── 🗎 requests.csv # Requests file
│ │ │ ├── 🗎 vehicles.csv # Vehicles file
│ │ │ ├── 🖺 sizing.csv # (optional) - file holding data on the instance sizing process
│ │ │ ├── 🖺 vehicles_pre_sizing.csv # (optional) - file holding data on the vehicles before the sizing process
│ │ │ └── 📁shapefiles/ # Instance shapefiles for visualization
│ │ │ ├── 🗺 vehicles.[shx, shp, prh, dbf, cpg]
│ │ │ ├── 🗺 pickup.[shx, shp, prh, dbf, cpg]
│ │ │ └── 🗺 dropoff.[shx, shp, prh, dbf, cpg]
│ │ └── ...
│ └── ...
└── ...
📁 Instances/<area>/instances/start_<start time>/duration_<duration>/max_delay_<max delay>/
-
🖺 config.yaml
contains metadata used in the instance generation. Notable fields are-
demand:
-
min_time
anddemand: max_time
give the interval for the demand used in the instance. The format isyy-mm-dd HH:MM:SS
in the local timezone.
-
-
max_prolongation
: the maximum delay$\Delta$ (in seconds) -
vehicles:
-
start_time
: The datetime of the start of the vehicle operation. The format is the same as for the demand interval. -
vehicle_capacity
- sets the capacity parameter$c$ for the instance generation -
vehicle_count
- sets the number of vehicles for the instance generation
-
-
map
: the object for the map configuration-
SRID
: The SRID of the map projection. Example:4326
(GPS) -
SRID_plane
: The SRID of the map planar projection. Example:32618
(UTM zone 18N)
-
-
-
🖺 sizing.csv
contains the results of the instance sizing, which is the step in the instance generation process that selects the number of vehicles for the instance so that the solution found by the insertion heuristic can service all requests in the instance. See the article for details. The file uses a comma as a separator and contains three columns with a header:-
vehicle_count
- the number of vehicles used at a given step of the sizing process -
dropped_requests
- the number of requests that cannot be serviced by the given number of vehicles when solved by the insertion heuristic -
interval_size
- the size of the interval-halving step used in the sizing process
-
📁 Instances/<area>/map/
🖺 nodes.csv
contains information about processed road network nodes in the area. The file uses<tab>
as a separator and contains four columns with a header:id
- node index in the distance matrixdb_id
- node id in the database that was used for the instance generationx
- node x coordinate in the map planar projectiony
- node y coordinate in the map planar projection
🖺 edges.csv
contains information about processed road network edges in the area, including the speed. The file uses<tab>
as a separator and contains six columns with a header:u
- from nodeid
v
- to nodeid
db_id_from
- from nodedb_id
db_id_to
- to nodedb_id
length
- length of the edge in metersspeed
- speed of the edge used in travel time calculations, in km/h.
Contains area and instance files for visualization in e.g. Q-GIS
📁 Instances/<area>/map/shapefiles/
🗺 nodes.[shx, shp, prh, dbf, cpg]
🗺 edges.[shx, shp, prh, dbf, cpg]
📁 Instances/<area>/instances/start_<start time>/duration_<duration>/max_delay_<max delay>/shapefiles/
🗺 vehicles.[shx, shp, prh, dbf, cpg]
- starting vehicle locations🗺 pickup.[shx, shp, prh, dbf, cpg]
- request pickup points🗺 dropoff.[shx, shp, prh, dbf, cpg]
- request dropoff points
The results are stored in the 📁 Results/
folder. The folder structure follows a similar pattern as the 📁 Instance/
folder:
📁 Results/<area>/start_<start time>/duration_<duration>/max_delay_<max delay>/<method>/
├── 🖺 config.yaml # experiment config file
├── 🗎 config.yaml-solution.json # results from experiment defined by `config.yaml`
└── 🖺 config.yaml-performance.json # performance metrics from experiment defined by `config.yaml`
The <method>
folders are ih
for Insertion Heuristic and vga
for Vehicle Group Assignment method.
The solution is stored in 🗎 config.yaml-solution.json
and contains the following fields:
🗎 config.yaml-solution.json
cost
- total cost (total travel time of all vehicles) of the solution in seconds.cost_minutes
- total cost of the solution in minutes, rounded.dropped_requests
- list of requests that were dropped in this solution.plans
- list of vehicle plans; each plan contains a list of actions determining which requests are served by the given vehicle and in which order. The actions are "pickup" and "drop_off".
All locations in the solution file are node IDs from the road network. The node IDs are the same as in the 🖺 nodes.csv
file in the instance folder. All times are in seconds from the start of the day.
A complete description of the solution format is given by the JSON schema in this repository.
There are two files with meta-data for the solution, 🖺 config.yaml
and 🖺 config.yaml-performance.json
🖺 config.yaml
file contains the experiment configuration, such as the relative path to the instance, method-specific configuration and so on.
🖺 config.yaml-performance.json
file contains logged information on the run of the solver. The JSON has the following fields
total_time
- total time of the solver run in secondspeak_memory_KiB
- peak memory usage of the solver in KiBsolver_stats
- solver-specific statistics, if available. For example, for the VGA method,group_generation_time
andvehicle_assignment_time
are logged separately.
The methodology for the instance creation is described in the article. The process is divided into the following steps:
Many of the steps are implemented in the associated repository, but some of them rely on external binaries. That is why the published dataset contains full distance matrices for every area instead of the instance-specific, smaller distance matrices.
The sizing of the instances is performed with the insertion heuristic (IH):
- We find the lowest number of vehicles for an instance for which the IH solution drops 0 requests.
- We multiply this number by 1.05, adding a buffer of 5% of vehicles.
This number is then used in all the experimental results.
The following data sources were used to generate demand and travel time data:
Area | Demand Dataset | Zone Dataset | Request times |
---|---|---|---|
New York City and Manhattan | NYC Taxi and Limousine Commission | NYC taxi zones | exact |
Chicago | City of Chicago | Census tracts and community areas | generated |
Washington, DC | City of Washington, DC | Master Address Repository | generated |
When using the instances or the code, please cite the following paper:
[1] D. Fiedler and J. Mrkos, “Large-scale Ridesharing DARP Instances Based on Real Travel Demand.” arXiv, May 30, 2023. doi: 10.48550/arXiv.2305.18859.
Bibtex entry:
@misc{fiedler2023largescale,
title={Large-scale Ridesharing DARP Instances Based on Real Travel Demand},
author={David Fiedler and Jan Mrkos},
year={2023},
eprint={2305.18859},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
The code in the repository used to generate the instances is licensed using the GNU GENERAL PUBLIC LICENSE v3.
The dataset is licensed using the Creative Commons Attribution 4.0 International (CC BY 4.0) license.