Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update packages, drop unused code, ingest stage running w/o error #61

Merged
merged 47 commits into from
Dec 4, 2023
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
9a09d7c
Update packages, ingest stage running w/o error
wrridgeway Nov 16, 2023
91d1862
Linting
wrridgeway Nov 16, 2023
8723659
Linting
wrridgeway Nov 16, 2023
b16a3e6
Generate and upload model performance report in finalize pipeline step
jeancochrane Nov 21, 2023
186fe1e
Merge branch 'master' into 24-infra-updates-generate-and-publish-a-qu…
jeancochrane Nov 21, 2023
331b241
Include .html files in model_get_s3_artifacts_for_run
jeancochrane Nov 21, 2023
0144f28
Refactor repo to support reports/renv.lock lockfile
jeancochrane Nov 22, 2023
af287d2
Remove unnecessary changes to renv/activate.R
jeancochrane Nov 24, 2023
b616399
Fix missing column in performance report row of misc/file_dict.csv
jeancochrane Nov 24, 2023
25f2850
Update README with instructions on updating R dependencies
jeancochrane Nov 24, 2023
9f1c1ca
Add quarto to DESCRIPTION dependencies
jeancochrane Nov 24, 2023
aedd58b
Move reports/renv.lock -> renv/profiles/reporting/renv.lock
jeancochrane Nov 24, 2023
7ae4f6b
Properly style R/helpers.R
jeancochrane Nov 24, 2023
fd6538b
Install Quarto in Dockerfile
jeancochrane Nov 24, 2023
7b39d2f
Use the correct path to performance.qmd in 05-finalize.R step
jeancochrane Nov 27, 2023
65948dd
Move performance.qmd to the top level of the `reports/` subdir
jeancochrane Nov 29, 2023
dab451c
Factor out report generation into 05-report.R pipeline stage
jeancochrane Nov 29, 2023
d41f6b9
Presign the Quarto report URL in 05-finalize.R
jeancochrane Nov 29, 2023
626d34d
Temporarily adjust Dockerfile CMD to test paws
jeancochrane Nov 29, 2023
904e2a3
Merge branch 'master' into refactor-sales-flagging-out-of-model-repos
dfsnow Nov 30, 2023
1eba9ca
Revert "Temporarily adjust Dockerfile CMD to test paws"
jeancochrane Nov 30, 2023
dde61a6
Revert "Presign the Quarto report URL in 05-finalize.R"
jeancochrane Nov 30, 2023
24cf223
Factor S3/SNS operations out into new 06-upload.R stage
jeancochrane Nov 30, 2023
6772f45
Fix typo in README.Rmd and regenerate README
jeancochrane Nov 30, 2023
ec1c35f
Fix mixed up deps/outputs between finalize and upload stages
jeancochrane Nov 30, 2023
ce0a293
Update main pipeline dependencies
dfsnow Nov 30, 2023
ae750d5
Add missing run_id variable to upload pipeline stage
jeancochrane Dec 1, 2023
57bb779
Update pipeline to use new dictionary from CCAO package
dfsnow Dec 1, 2023
3efb3eb
Move sales validation code to dedicated repository
dfsnow Dec 1, 2023
64cc14d
Drop sales val info from README
dfsnow Dec 1, 2023
87a6671
Added changelog for 2024
dfsnow Dec 1, 2023
3140824
Partition Quarto performance report S3 uploads by year
jeancochrane Dec 1, 2023
ba7b677
Simplify Dockerfile and remove Pipenv dep
dfsnow Dec 1, 2023
1411406
Merge remote-tracking branch 'origin/24-infra-updates-generate-and-pu…
dfsnow Dec 1, 2023
10809f9
Update Dockerfile setup
dfsnow Dec 1, 2023
e724755
Merge branch 'master' into refactor-sales-flagging-out-of-model-repos
dfsnow Dec 1, 2023
9336108
Drop explicit DVC deps from pip install
dfsnow Dec 1, 2023
71e44a8
Update R version and deps
dfsnow Dec 1, 2023
4578c72
Bump reporting profile versions
dfsnow Dec 1, 2023
7f9587f
Ignore unnecessary renv files
dfsnow Dec 1, 2023
81824d9
Update files for new lintr rules
dfsnow Dec 2, 2023
78f1307
Include .dvc and .git stuff in container
dfsnow Dec 2, 2023
fb6da59
Suppress library load messages
dfsnow Dec 2, 2023
8cfecac
Update params to test 2023 run
dfsnow Dec 4, 2023
6bd4ffb
Add missing ccao dep to upload stage
dfsnow Dec 4, 2023
fc503f3
Make params.yaml consistent with 2023 assessment year
dfsnow Dec 4, 2023
2cf6ceb
Ignore quarto, snapshot finalize
dfsnow Dec 4, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
README.Rmd
README.md
docs/
input/
renv/library/
renv/sandbox/
renv/staging/
Comment on lines +5 to +7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring this when using COPY in Docker since it's manually replaced with the installed libraries.

12 changes: 2 additions & 10 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
.Rproj.user/

# knitr and R markdown default cache directories
/*_cache/
/cache/
*_cache/
cache/

# Temporary files created by R markdown
*.utf8.md
Expand All @@ -21,11 +21,3 @@
*.xlsx
*.xlsm
*.html

# Ignore python environment
pipenv/
pipenv

# Reporting files
*.html

Comment on lines -24 to -31
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping these since we dropped all Python deps in this repo.

4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# R specific hooks: https://github.com/lorenzwalthert/precommit
repos:
- repo: https://github.com/lorenzwalthert/precommit
rev: v0.3.2.9013
rev: v0.3.2.9027
hooks:
- id: style-files
args: [--style_pkg=styler, --style_fun=tidyverse_style]
Expand All @@ -12,7 +12,7 @@ repos:
- id: no-browser-statement
- id: no-debug-statement
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.5.0
hooks:
- id: check-added-large-files
args: ['--maxkb=200']
Expand Down
8 changes: 5 additions & 3 deletions .renvignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
README.Rmd
pipeline/00-ingest.R
pipeline/06-export.R
pipeline/07-api.R
reports/
pipeline/05-finalize.R
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Thought, non-blocking] Instead of ignoring the finalize script completely, I wonder if there's a way for us to specifically ignore the library(quarto) call, which is the only reporting dependency in that script? That seems to me like it would be the cleanest way to isolate the finalize step writ large from its report generation substep. But this is a speculative ask, so I don't think it needs to block merging! Just something for us to consider.

Copy link
Member

@dfsnow dfsnow Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it seems like having renv look at finalize also means it looks into the Quarto doc itself i.e. there's no way to ignore finalize while also ignoring the doc. I think it's fine since all the finalize dependencies (except git2r are covered by prior steps and are unlikely to change.

But let me play around with it some more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2cf6ceb!

pipeline/07-export.R
pipeline/08-api.R
Comment on lines +3 to +4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Praise] Oops, I missed this during #62, thanks for catching!

reports/*
cache/*
55 changes: 26 additions & 29 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,53 +1,50 @@
FROM rocker/r-ver:4.3.1
FROM rocker/r-ver:4.3.2

# Set the working directory to setup. Uses a dedicated directory instead of
# root since otherwise renv will try to scan every subdirectory
WORKDIR /setup
Comment on lines +3 to +5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight re-configuration of the Dockerfile here. renv was stupidly scanning the entire filesystem when using the root directory and thus timing out. I created a dedicated working directory just for setup stuff, sort of following the multi-stage build pattern.


# Use PPM for binary installs
ENV RENV_CONFIG_REPOS_OVERRIDE "https://packagemanager.posit.co/cran/__linux__/jammy/latest"
ENV RENV_CONFIG_SANDBOX_ENABLED FALSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not clear what this does but starting R inside the container recommends settings it ~for speed~.

ENV RENV_PATHS_LIBRARY renv/library
ENV RENV_PATHS_CACHE /setup/cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual location in the container the packages get installed to. They're then symlinked in the actual working dir.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Praise] Thanks for the clear explanations of the renv choices here!


# Install system dependencies
RUN apt-get update && apt-get install --no-install-recommends -y \
libcurl4-openssl-dev libssl-dev libxml2-dev libgit2-dev git \
libudunits2-dev python3-dev python3-pip libgdal-dev libgeos-dev \
libproj-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev pandoc \
curl gdebi-core
RUN apt-get update && \
apt-get install --no-install-recommends -y \
libcurl4-openssl-dev libssl-dev libxml2-dev libgit2-dev git \
libudunits2-dev python3-dev python3-pip libgdal-dev libgeos-dev \
libproj-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev pandoc \
curl gdebi-core && \
rm -rf /var/lib/apt/lists/*

# Install Quarto
RUN curl -o quarto-linux-amd64.deb -L \
https://github.com/quarto-dev/quarto-cli/releases/download/v1.3.450/quarto-1.3.450-linux-amd64.deb
RUN gdebi -n quarto-linux-amd64.deb

# Install pipenv for Python dependencies
RUN pip install pipenv

# Copy pipenv files into the image. The reason this is a separate step from
# the later step that adds files from the working directory is because we want
# to avoid having to reinstall dependencies every time a file in the directory
# changes, as Docker will bust the cache of every layer following a layer that
# needs to change
COPY Pipfile .
COPY Pipfile.lock .

# Install Python dependencies
RUN pipenv install --system --deploy
Comment on lines -19 to -31
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need any of this now. We only need DVC.

# Install pipeline Python dependencies globally
RUN pip install --no-cache-dir dvc[s3]

# Copy R bootstrap files into the image
COPY renv.lock .
COPY renv.lock .Rprofile ./
COPY renv/profiles/reporting/renv.lock reporting-renv.lock
COPY .Rprofile .
COPY renv/ renv/

# Install R dependencies
RUN Rscript -e 'renv::restore()'
# Install R dependencies. Restoring renv first ensures that it's
# using the same version as recorded in the lockfile
RUN Rscript -e 'renv::restore(packages = "renv"); renv::restore()'
RUN Rscript -e 'renv::restore(lockfile = "reporting-renv.lock")'

# Set the working directory to the app dir
WORKDIR /model-res-avm/

# Copy the directory into the container
ADD ./ model-res-avm/
COPY ./ .

# Copy R dependencies into the app directory
RUN rm -Rf model-res-avm/renv
RUN mv renv model-res-avm/

# Set the working directory to the app dir
WORKDIR model-res-avm/
RUN rm -Rf /model-res-avm/renv && \
mv /setup/renv /model-res-avm/renv

CMD dvc pull && dvc repro
12 changes: 0 additions & 12 deletions Pipfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Pipfile things were removed since they existed mostly to support sales validation. DVC is installed directly in the Dockerfile without a lock file.

This file was deleted.

Loading