In this readme we provide detail the specific steps for rebuilding the GeoGCDF dataset, as well as accompanying products created during the build pipeline. If you have not already setup your environment and configuration settings, be sure to read the overview readme first.
python input_data/gcdf_v3/gcdf_v3_prep.py
python src/main.py
Note the timestamp (e.g., 2023_12_04_13_25) of the main processing run, as all subsequent scripts will need to be updated with this timestamp to use the correct output data
- Update run as latest
bash scripts/set_latest.sh gcdf_v3 2023_12_04_13_25
- Commit changes and push to development GitHub repo (gcdf-geospatial-data-rc)
- Create a new release in the development GitHub repo
- Upload the
all_combined_global.gpkg.zip
andosm_geojsons/OSM_grouped.zip
files to the release assets
- Set the production repo (gcdf-geospatial-data) as an upstream (e.g., "live") of your release candidate repo (gcdf-geospatial-data-rc) (e.g., "origin")
- Create a new branch in the production repo (e.g., "v301rc") from the main branch
- Push the release candidate repo code to new production branch
git fetch live git checkout -b live/v301rc git push live HEAD:v301rc
- Create a PR in the production repo from the new branch to the main branch
- Merge the PR
- Create a new release in the production GitHub repo for the main branch
- Upload the
all_combined_global.gpkg.zip
andosm_geojsons/OSM_grouped.zip
files to the release assets
- Edit adm lookup timestamp in Python script if needed
- Run
python scripts/adm_lookup.py
- Run
python scripts/generate_project_join.py
- Run
python stats/stats.py
- First run the individual dataset extractions:
python esg/critical_habitats/extract.py
python esg/protected_areas/extract.py
python esg/indigenous_lands/extract.py
python esg/PLAD/main.py
- Then combine into a single output:
python esg/gcdf_v3_combine_outputs.py