Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update packages, drop unused code, ingest stage running w/o error #61

Merged
merged 47 commits into from
Dec 4, 2023

Conversation

wrridgeway
Copy link
Member

@wrridgeway wrridgeway commented Nov 16, 2023

This PR:

  • Removes all sales validation code and related Python dependencies
  • Bumps R package versions (and the R minor version)
  • Slightly reconfigures the Dockerfile setup
  • Lints pipeline files for consistency with new lintr rules

@wrridgeway wrridgeway linked an issue Nov 16, 2023 that may be closed by this pull request
@dfsnow dfsnow changed the title Update packages, ingest stage running w/o error Update packages, drop unused code, ingest stage running w/o error Dec 4, 2023
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything in here is just a dependency update (+ R version bump).

renv/activate.R Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the changes here are automated changes from renv and can be safely ignored.

@@ -5,3 +5,4 @@ local/
lock/
python/
staging/
profile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to commit the profile setting file OR include it in the Docker build, else the container will use the reporting profile.

renv.lock Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All version bumps + an R version bump.

py/flagging.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entirety of this script has been moved to model-sales-val.

@@ -10,7 +10,9 @@ suppressPackageStartupMessages({
library(arrow)
library(aws.s3)
library(aws.ec2metadata)
library(ccao)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was missing this as a dependency for get_town_triad() later in the script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only changes for compatibility with the latest version of lintr.

@@ -267,71 +267,15 @@ assessment_data_w_hie <- assessment_data %>%


#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# 5. Validate Sales ------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything here is moved to the dedicated sales val repo.

library(glue)
library(here)
library(magrittr)
suppressPackageStartupMessages({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this just to suppress annoying output in the logs.

Pipfile Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Pipfile things were removed since they existed mostly to support sales validation. DVC is installed directly in the Dockerfile without a lock file.

Comment on lines +5 to +7
renv/library/
renv/sandbox/
renv/staging/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring this when using COPY in Docker since it's manually replaced with the installed libraries.

Comment on lines +3 to +5
# Set the working directory to setup. Uses a dedicated directory instead of
# root since otherwise renv will try to scan every subdirectory
WORKDIR /setup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight re-configuration of the Dockerfile here. renv was stupidly scanning the entire filesystem when using the root directory and thus timing out. I created a dedicated working directory just for setup stuff, sort of following the multi-stage build pattern.


# Use PPM for binary installs
ENV RENV_CONFIG_REPOS_OVERRIDE "https://packagemanager.posit.co/cran/__linux__/jammy/latest"
ENV RENV_CONFIG_SANDBOX_ENABLED FALSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not clear what this does but starting R inside the container recommends settings it ~for speed~.

ENV RENV_PATHS_LIBRARY renv/library
ENV RENV_PATHS_CACHE /setup/cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual location in the container the packages get installed to. They're then symlinked in the actual working dir.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Praise] Thanks for the clear explanations of the renv choices here!

Comment on lines -19 to -31
# Install pipenv for Python dependencies
RUN pip install pipenv

# Copy pipenv files into the image. The reason this is a separate step from
# the later step that adds files from the working directory is because we want
# to avoid having to reinstall dependencies every time a file in the directory
# changes, as Docker will bust the cache of every layer following a layer that
# needs to change
COPY Pipfile .
COPY Pipfile.lock .

# Install Python dependencies
RUN pipenv install --system --deploy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need any of this now. We only need DVC.

@dfsnow dfsnow marked this pull request as ready for review December 4, 2023 16:02
@wrridgeway
Copy link
Member Author

I've reviewed the R scripts and params.yaml and nothing stuck out to me. I'll leave docker and workflow review to @jeancochrane

Copy link
Contributor

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent cleanup!

.renvignore Outdated
pipeline/05-finalize.R
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Thought, non-blocking] Instead of ignoring the finalize script completely, I wonder if there's a way for us to specifically ignore the library(quarto) call, which is the only reporting dependency in that script? That seems to me like it would be the cleanest way to isolate the finalize step writ large from its report generation substep. But this is a speculative ask, so I don't think it needs to block merging! Just something for us to consider.

Copy link
Member

@dfsnow dfsnow Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it seems like having renv look at finalize also means it looks into the Quarto doc itself i.e. there's no way to ignore finalize while also ignoring the doc. I think it's fine since all the finalize dependencies (except git2r are covered by prior steps and are unlikely to change.

But let me play around with it some more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2cf6ceb!

Comment on lines +4 to +5
pipeline/08-api.R
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Praise] Oops, I missed this during #62, thanks for catching!

ENV RENV_PATHS_LIBRARY renv/library
ENV RENV_PATHS_CACHE /setup/cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Praise] Thanks for the clear explanations of the renv choices here!

library(dplyr)
library(git2r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question, non-blocking] Where is git2r being used in the upload stage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's being used in finalize, see above comment.

@dfsnow dfsnow merged commit 80db35f into master Dec 4, 2023
3 checks passed
@dfsnow dfsnow deleted the refactor-sales-flagging-out-of-model-repos branch December 4, 2023 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor sales flagging out of model repos
3 participants