-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #9 from Urban-Analytics-Technology-Platform/8-task…
…-1-add-activity-patterns-to-synthetic-population Add activity patterns to synthetic population
- Loading branch information
Showing
8 changed files
with
12,091 additions
and
790 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Preparing the Synthetic Population" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We will use the spc package for our synthetic population. To add it as a dependancy in this virtual environment, I ran `poetry add git+https://github.com/alan-turing-institute/uatk-spc.git@55-output-formats-python#subdirectory=python`. The branch may change if the python package is merged into the main spc branch. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#import json\n", | ||
"import pandas as pd\n", | ||
"\n", | ||
"#https://github.com/alan-turing-institute/uatk-spc/blob/55-output-formats-python/python/examples/spc_builder_example.ipynb\n", | ||
"from uatk_spc.builder import Builder" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Loading in the SPC synthetic population\n", | ||
"\n", | ||
"I use the code in the `Quickstart` [here](https://github.com/alan-turing-institute/uatk-spc/blob/55-output-formats-python/python/README.md) to get a parquet file and convert it to JSON. \n", | ||
"\n", | ||
"You have two options:\n", | ||
"\n", | ||
"\n", | ||
"1- Slow and memory-hungry: Download the pbf file directly from [here](https://alan-turing-institute.github.io/uatk-spc/using_england_outputs.html) and load in the pbf file with the python package\n", | ||
"\n", | ||
"2- Faster: Covert the pbf file to parquet, and then load it using the python package. To convert to parquet, you need to:\n", | ||
"\n", | ||
"a. clone the [uatk-spc](https://github.com/alan-turing-institute/uatk-spc/tree/main/docs) \n", | ||
"\n", | ||
"b. Run `cargo run --release -- --rng-seed 0 --flat-output config/England/west-yorkshire.txt --year 2020` and replace `west-yorkshire` and `2020` with your preferred option\n", | ||
" " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Pick a region with SPC output saved\n", | ||
"path = \"../data/external/spc_output/raw/\"\n", | ||
"region = \"west-yorkshire\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### People and household data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# add people and households\n", | ||
"spc_people_hh = (\n", | ||
" Builder(path, region, backend=\"pandas\", input_type=\"parquet\")\n", | ||
" .add_households()\n", | ||
" .unnest([\"health\", \"employment\", \"details\"])\n", | ||
" # remove nssec column\n", | ||
" .build()\n", | ||
")\n", | ||
"\n", | ||
"spc_people_hh.head(5)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# we need to unnest the demographic data. If we do this above\n", | ||
"# we get an error as there will be two \"nssec8\" columns.\n", | ||
"\n", | ||
"# Unnest the JSON column\n", | ||
"demographics = pd.json_normalize(spc_people_hh['demographics'])\n", | ||
"\n", | ||
"# Remove the columns we don't want\n", | ||
"spc_people_hh = spc_people_hh.drop(['demographics', 'nssec8'], axis = 1)\n", | ||
"# Add the unnested demographics column\n", | ||
"spc_people_hh = pd.concat([spc_people_hh, demographics], axis = 1)\n", | ||
"\n", | ||
"spc_people_hh.head()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# save the output\n", | ||
"spc_people_hh.to_parquet('../data/external/spc_output/' + region + '_people_hh.parquet')\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"spc_people_hh['salary_yearly'].hist(bins=100)\n", | ||
"\n", | ||
"\n", | ||
"#plt.show()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"spc_people_hh['salary_yearly'].unique()\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### People and time-use data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\n", | ||
"# Subset of (non-time-use) features to include and unnest\n", | ||
"\n", | ||
"# The features can be found here: https://github.com/alan-turing-institute/uatk-spc/blob/main/synthpop.proto\n", | ||
"features = {\n", | ||
" \"health\": [\n", | ||
" \"bmi\",\n", | ||
" \"has_cardiovascular_disease\",\n", | ||
" \"has_diabetes\",\n", | ||
" \"has_high_blood_pressure\",\n", | ||
" \"self_assessed_health\",\n", | ||
" \"life_satisfaction\",\n", | ||
" ],\n", | ||
" \"demographics\": [\"age_years\",\n", | ||
" \"ethnicity\",\n", | ||
" \"sex\",\n", | ||
" \"nssec8\"\n", | ||
" ],\n", | ||
" \"employment\": [\"sic1d2007\",\n", | ||
" \"sic2d2007\",\n", | ||
" \"pwkstat\",\n", | ||
" \"salary_yearly\"\n", | ||
" ]\n", | ||
"\n", | ||
"}\n", | ||
"\n", | ||
"# build the table\n", | ||
"spc_people_tu = (\n", | ||
" Builder(path, region, backend=\"polars\", input_type=\"parquet\")\n", | ||
" .add_households()\n", | ||
" .add_time_use_diaries(features, diary_type=\"weekday_diaries\")\n", | ||
" .build()\n", | ||
")\n", | ||
"spc_people_tu.head()\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# save the output\n", | ||
"spc_people_tu.write_parquet('../data/external/spc_output/' + region + '_people_tu.parquet')" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "acbm-7iKwKWLy-py3.10", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.12" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Oops, something went wrong.