-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial documentation #63
Changes from 9 commits
2790440
1b02f1f
366166a
2d12f14
0f1728d
3e8e046
e52aac9
57d5089
8b2a895
2ca4d10
d270a22
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
The config.toml file has an explanation for each parameter. You can copy the toml file, give it a name that is relevant to your project, and modify the parameters as needed. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
The folder contains all external datasets necessary to run the pipeline. Some can be downloaded, while others need to be generated. The README.md file in this folder provides a guide on where to find / generate each dataset. | ||
|
||
|
||
## Folder Structure | ||
|
||
The structure of the folder is as follows: | ||
|
||
```md | ||
. | ||
├── data | ||
│ ├── external | ||
│ │ ├── boundaries | ||
│ │ │ ├── MSOA_DEC_2021_EW_NC_v3.geojson | ||
│ │ │ ├── oa_england.geojson | ||
│ │ │ ├── study_area_zones.geojson | ||
│ │ ├── census_2011_rural_urban.csv | ||
│ │ ├── centroids | ||
│ │ │ ├── LSOA_Dec_2011_PWC_in_England_and_Wales_2022.csv | ||
│ │ │ ├── Output_Areas_Dec_2011_PWC_2022.csv | ||
│ │ ├── MSOA_2011_MSOA_2021_Lookup_for_England_and_Wales.csv | ||
│ │ ├── nts | ||
│ │ │ ├── filtered | ||
│ │ │ │ ├── nts_households.parquet | ||
│ │ │ │ ├── nts_individuals.parquet | ||
│ │ │ │ └── nts_trips.parquet | ||
│ │ │ └── UKDA-5340-tab | ||
│ │ │ ├── 5340_file_information.rtf | ||
│ │ │ ├── mrdoc | ||
│ │ │ │ ├── excel | ||
│ │ │ │ ├── UKDA | ||
│ │ │ │ └── ukda_data_dictionaries.zip | ||
│ │ │ └── tab | ||
│ │ │ ├── household_eul_2002-2022.tab | ||
│ │ │ ├── individual_eul_2002-2022.tab | ||
│ │ │ ├── psu_eul_2002-2022.tab | ||
│ │ │ ├── trip_eul_2002-2022.tab | ||
│ │ │ └── <other_nts_tables>.tab | ||
| | └── travel_times | ||
| | │ │ ├── oa | ||
| | │ │ | ├── travel_time_matrix.parquet | ||
| | | | └── msoa | ||
| | │ │ └── travel_time_matrix.parquet | ||
│ │ ├── ODWP01EW_OA.zip | ||
│ │ ├── ODWP15EW_MSOA_v1.zip | ||
│ │ ├── spc_output | ||
│ │ │ ├── <region>>_people_hh.parquet (Generated in Script 1) | ||
│ │ │ ├── <region>>_people_tu.parquet (Generated in Script 1) | ||
│ │ │ ├── raw | ||
│ │ │ │ ├── <region>_households.parquet | ||
│ │ │ │ ├── <region>_info_per_msoa.json | ||
│ │ │ │ ├── <region>.pb | ||
│ │ │ │ ├── <region>_people.parquet | ||
│ │ │ │ ├── <region>_time_use_diaries.parquet | ||
│ │ │ │ ├── <region>_venues.parquet | ||
│ │ │ │ ├── README.md | ||
|
||
``` | ||
|
||
## Data Sources | ||
|
||
|
||
`spc_output/` | ||
|
||
Use the code in the `Quickstart` [here](https://github.com/alan-turing-institute/uatk-spc/blob/55-output-formats-python/python/README.md) | ||
to get a parquet file and convert it to JSON. | ||
|
||
You have two options: | ||
1. Slow and memory-hungry: download the `.pb` file directly from [here](https://alan-turing-institute.github.io/uatk-spc/using_england_outputs.html) | ||
and load in the pbf file with the python package | ||
2. Faster: Run SPC to generate parquet outputs, and then load using the SPC toolkit python package. To generate parquet, you need to: | ||
1. Clone [uatk-spc](https://github.com/alan-turing-institute/uatk-spc/tree/main/docs) | ||
2. Run: | ||
```shell | ||
cargo run --release -- \ | ||
--rng-seed 0 \ | ||
--flat-output \ | ||
--year 2020 \ | ||
config/England/west-yorkshire.txt | ||
``` | ||
and replace `west-yorkshire` and `2020` with your preferred option. | ||
|
||
`boundaries/` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed, add data source There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done in 2ca4d10 |
||
|
||
- MSOA_DEC_2021_EW_NC_v3.geojson | ||
- oa_england.geojson | ||
- study_area_zones.geojson | ||
|
||
`centroids/` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed, add data source There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done in 2ca4d10 |
||
|
||
- LSOA_Dec_2011_PWC_in_England_and_Wales_2022.csv | ||
- Output_Areas_Dec_2011_PWC_2022.csv | ||
|
||
`nts/` | ||
|
||
UKDA-5340-tab: | ||
- Download the UKDA-5340-tab from the UK Data Service [here](https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=5340) | ||
- Step 1: Create an account | ||
- Step 2: Create a project and request access to the data | ||
- We use the `National Travel Survey, 2002-2023` dataset (SN: 5340) | ||
- Step 3: Download TAB file format | ||
|
||
`travel_times/` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed, add information on dataframe structure (i.e. expected columns) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done in 2ca4d10 |
||
|
||
- OPTIONAL Dataset - If it does not exist, it will be generated in the pipeline. They are added under oa/ or msoa/ subdirectories. | ||
- e.g. oa/`travel_time_matrix.parquet` or msoa/`travel_time_matrix.parquet` | ||
|
||
`ODWP01EW_OA.zip` | ||
`ODWP15EW_MSOA_v1.zip` | ||
`MSOA_2011_MSOA_2021_Lookup_for_England_and_Wales.csv` | ||
`census_2011_rural_urban.csv` | ||
|
||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,37 +1,22 @@ | ||
# Preparing synthetic population scripts | ||
|
||
## Datasets | ||
- [Synthetic Population Catalyst](https://github.com/alan-turing-institute/uatk-spc/blob/55-output-formats-python/python/README.md) | ||
- [National Travel Survey](https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=5340) | ||
- [Rural Urban Classification 2011 classification](https://geoportal.statistics.gov.uk/datasets/53360acabd1e4567bc4b8d35081b36ff/about) | ||
- [OA centroids](): TODO | ||
|
||
## Loading in the SPC synthetic population | ||
|
||
Use the code in the `Quickstart` [here](https://github.com/alan-turing-institute/uatk-spc/blob/55-output-formats-python/python/README.md) | ||
to get a parquet file and convert it to JSON. | ||
|
||
You have two options: | ||
1. Slow and memory-hungry: download the `.pb` file directly from [here](https://alan-turing-institute.github.io/uatk-spc/using_england_outputs.html) | ||
and load in the pbf file with the python package | ||
2. Faster: Run SPC to generate parquet outputs, and then load using the SPC toolkit python package. To generate parquet, you need to: | ||
1. Clone [uatk-spc](https://github.com/alan-turing-institute/uatk-spc/tree/main/docs) | ||
2. Run: | ||
```shell | ||
cargo run --release -- \ | ||
--rng-seed 0 \ | ||
--flat-output \ | ||
--year 2020 \ | ||
config/England/west-yorkshire.txt | ||
``` | ||
and replace `west-yorkshire` and `2020` with your preferred option. | ||
|
||
|
||
## Matching | ||
### Adding activity chains to synthetic populations | ||
The purpose of this script is to match each individual in the synthetic population to a respondant from the [National Travel Survey (NTS)](https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=5340). | ||
|
||
### Methods | ||
We will try two methods: | ||
1. categorical matching: joining on relevant socio-demographic variables | ||
2. statistical matching, as described in [An unconstrained statistical matching algorithm for combining individual and household level geo-specific census and survey data](https://doi.org/10.1016/j.compenvurbsys.2016.11.003). | ||
# Scripts | ||
|
||
## Synthetic Population Generation | ||
|
||
- 1_prep_synthpop.py: Create a synthetic population using the SPC | ||
|
||
## Adding Activity Patterns to Population | ||
|
||
- 2_match_households_and_individuals.py: Match individuals in the synthetic population to travel diaries in the NTS. This is based on statistical matching approach described in ... | ||
|
||
## Location Assignment | ||
|
||
- 3.1_assign_primary_feasible_zones.py</ins>: This script is used to obtain, for each activity, the feasible destination zones that the activity could take place in. This is done by using a travel time matrix between zones to identify the zones that can be reached given the NTS reported travel time and travel mode in the NTS. A travel time matrix should be provided before running the pipeline (in the correct format). If a travel time matrix does not exist, the code can create travel time estimates based on mode average speeds and crow fly distance. For tips on creating a travel time matrix, see the comment here https://github.com/Urban-Analytics-Technology-Platform/acbm/issues/20#issuecomment-2317037441 | ||
- [3.2.1_assign_primary_zone_edu.py](https://github.com/Urban-Analytics-Technology-Platform/acbm/blob/main/scripts/3.2.1_assign_primary_zone_edu.py): | ||
- [3.2.2_assign_primary_zone_work.py](https://github.com/Urban-Analytics-Technology-Platform/acbm/blob/main/scripts/3.2.2_assign_primary_zone_work.py) | ||
- [3.2.3_assign_secondary_zone.py](https://github.com/Urban-Analytics-Technology-Platform/acbm/blob/main/scripts/3.2.3_assign_secondary_zone.py) | ||
- [3.3_assign_facility_all.py](https://github.com/Urban-Analytics-Technology-Platform/acbm/blob/main/scripts/3.3_assign_facility_all.py) | ||
|
||
## Validation | ||
- 4_validate.py: Validate the synthetic population by comparing the distribution of activity chains in the NTS to our model outputs. | ||
|
||
## Output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I'll aim to update the notebooks and scripts to read/output files to match this structure (#53)