Skip to content

Commit

Permalink
Edit DVC deps to include cmd run files (#240)
Browse files Browse the repository at this point in the history
* Add files as dvc deps

* Switch deps

* Adjust refs

* R formatting

* Revert deletion

* Update refs for new cols

* Update new deps with hash
  • Loading branch information
wagnerlmichael authored Aug 19, 2024
1 parent 3de0486 commit dd0dcc3
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 1 deletion.
34 changes: 34 additions & 0 deletions dvc.lock
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@ schema: '2.0'
stages:
ingest:
cmd: Rscript pipeline/00-ingest.R
deps:
- path: pipeline/00-ingest.R
hash: md5
md5: c04f9224e873b1ee29a64fa68aa6c8d9
size: 23355
params:
params.yaml:
assessment:
Expand Down Expand Up @@ -61,6 +66,10 @@ stages:
train:
cmd: Rscript pipeline/01-train.R
deps:
- path: pipeline/01-train.R
hash: md5
md5: 46115d48cf066d35b0db14dc13a8d9b3
size: 17448
- path: input/training_data.parquet
hash: md5
md5: 680e07bdb2a55166b7070155c4ff5a38
Expand Down Expand Up @@ -332,6 +341,10 @@ stages:
assess:
cmd: Rscript pipeline/02-assess.R
deps:
- path: pipeline/02-assess.R
hash: md5
md5: 5e8c9b7d547ea41d9ec9441465e6e275
size: 22749
- path: input/assessment_data.parquet
hash: md5
md5: 5450bfd412c9b552a1a2722b04e49706
Expand Down Expand Up @@ -520,6 +533,10 @@ stages:
evaluate:
cmd: Rscript pipeline/03-evaluate.R
deps:
- path: pipeline/03-evaluate.R
hash: md5
md5: d33c8e642e5e29a0683463ce885771f8
size: 16292
- path: output/assessment_pin/model_assessment_pin.parquet
hash: md5
md5: f5641cb4506847814181996692064b6e
Expand Down Expand Up @@ -577,6 +594,10 @@ stages:
interpret:
cmd: Rscript pipeline/04-interpret.R
deps:
- path: pipeline/04-interpret.R
hash: md5
md5: 1cc57c0bcdaf2725fa343c6d88c1592c
size: 9619
- path: input/assessment_data.parquet
hash: md5
md5: 582a6197429e99ee24271a3d4f9e9323
Expand Down Expand Up @@ -700,6 +721,10 @@ stages:
finalize:
cmd: Rscript pipeline/05-finalize.R
deps:
- path: pipeline/05-finalize.R
hash: md5
md5: 5c5a5100ebae2013bc24e8f9333d136b
size: 8762
- path: output/intermediate/timing/model_timing_assess.parquet
hash: md5
md5: 5f93cb109c073d91a9c9b55b3a56755b
Expand Down Expand Up @@ -991,6 +1016,11 @@ stages:
size: 73
export:
cmd: Rscript pipeline/07-export.R
deps:
- path: pipeline/07-export.R
hash: md5
md5: b4615315b52165eed4a030c94def015b
size: 33718
params:
params.yaml:
assessment.year: '2023'
Expand Down Expand Up @@ -1024,6 +1054,10 @@ stages:
upload:
cmd: Rscript pipeline/06-upload.R
deps:
- path: pipeline/06-upload.R
hash: md5
md5: 3b7d11c518447cf6c14ec7668c488968
size: 11733
- path: output/assessment_card/model_assessment_card.parquet
hash: md5
md5: 7f558cd27ce54a39390180383a0af3fc
Expand Down
11 changes: 10 additions & 1 deletion dvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ stages:
desc: >
Ingest training and assessment data from Athena + generate townhome
complex identifiers
deps:
- pipeline/00-ingest.R
params:
- assessment
- input
Expand All @@ -15,14 +17,14 @@ stages:
- input/land_nbhd_rate_data.parquet
- input/land_site_rate_data.parquet
- input/training_data.parquet
frozen: true

train:
cmd: Rscript pipeline/01-train.R
desc: >
Train a LightGBM model with cross-validation. Generate model objects,
data recipes, and predictions on the test set (most recent 10% of sales)
deps:
- pipeline/01-train.R
- input/training_data.parquet
params:
- cv
Expand Down Expand Up @@ -58,6 +60,7 @@ stages:
County. Also generate flags, calculate land values, and make any
post-modeling changes
deps:
- pipeline/02-assess.R
- input/training_data.parquet
- input/assessment_data.parquet
- input/complex_id_data.parquet
Expand Down Expand Up @@ -86,6 +89,7 @@ stages:
2. An assessor-specific ratio study comparing estimated assessments to
the previous year's sales
deps:
- pipeline/03-evaluate.R
- output/test_card/model_test_card.parquet
- output/assessment_pin/model_assessment_pin.parquet
params:
Expand All @@ -109,6 +113,7 @@ stages:
Generate SHAP values for each card and feature as well as feature
importance metrics for each feature
deps:
- pipeline/04-interpret.R
- input/assessment_data.parquet
- input/training_data.parquet
- output/assessment_card/model_assessment_card.parquet
Expand All @@ -134,6 +139,7 @@ stages:
Save run timings and run metadata to disk and render a performance report
using Quarto.
deps:
- pipeline/05-finalize.R
- output/intermediate/timing/model_timing_train.parquet
- output/intermediate/timing/model_timing_assess.parquet
- output/intermediate/timing/model_timing_evaluate.parquet
Expand Down Expand Up @@ -164,6 +170,7 @@ stages:
outputs prior to upload and attach a unique run ID. This step requires
access to the CCAO Data AWS account, and so is assumed to be internal-only
deps:
- pipeline/06-upload.R
- output/parameter_final/model_parameter_final.parquet
- output/parameter_range/model_parameter_range.parquet
- output/parameter_search/model_parameter_search.parquet
Expand All @@ -189,6 +196,8 @@ stages:
Generate Desk Review spreadsheets and iasWorld upload CSVs from a finished
run. NOT automatically run since it is typically only run once. Manually
run once a model is selected
deps:
- pipeline/07-export.R
params:
- assessment.year
- input.min_sale_year
Expand Down

0 comments on commit dd0dcc3

Please sign in to comment.