-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IA-4839] [DO NOT MERGE] Terra on Azure (ToA) base jupyter docker image #483
base: master
Are you sure you want to change the base?
Conversation
…with nbextensions
docker run \ | ||
--env GOOGLE_PROJECT \ | ||
--volume "${{ steps.auth.outputs.credentials_file_path }}":/tmp/credentials.json:ro \ | ||
--env GOOGLE_APPLICATION_CREDENTIALS="/tmp/credentials.json" \ | ||
--volume $GITHUB_WORKSPACE/terra-base-jupyter/tests:/tests \ | ||
--workdir=/tests \ | ||
--entrypoint="" \ | ||
terra-base-jupyter:smoke-test \ | ||
/bin/sh -c "pip3 install pytest; pytest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to either drop this into a scripts/run tests
(or something like this) and/or in combination use docker compose
to set this up for the next person working on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I would love to brainstorm that with you because that might slightly fall outside of this PR scope. But the question here is how can we easily test these docker images? I am not convinced that the current GHAs that we have is the way to go
# `brew install eslint` | ||
# `brew install npm` | ||
# `npm i` | ||
# `eslint --fix *.js` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to either:
- tie this in to a pre-commit local hook?, or
- make the build (GHA) fail if these are not run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely yes. These are very outdated scripts that I left in there for now, but i am not sure if we even need it. They might be gone in a sprint or two once I am done setting this up on a BEE 🤞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks sound and well-commented! Just left a few questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the heroic work in putting all these together @LizBaldo !
I am no expert in the Terra IA system, so only focused on the dockerfile itself.
The biggest concern I have, which I don't have a solution for, is python 3.10 is hardcoded in many places, which might make maintenance a little difficult down the road.
## create new conda environments on top of it. The important part is that jupyter IS NOT installed | ||
## in the base environment to provide isolation between the user environment, and the jupyter server | ||
## to avoid cross-contamination | ||
COPY conda-environment.yml . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly the goal here, maybe using conda-pack can help further reduce the image size.
When we build dockers that uses conda to manage envs, we use conda-pack combined with multi-stage builds to reduce image sizes.
Example here:
https://github.com/broadinstitute/long-read-pipelines/blob/4b50b3857d33fd195461e5eb5c8a83d7fe6dda27/docker/lr-papermill-base/Dockerfile#L8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add to this, it'd be good to have an understanding how large each layer is, before optimizing the sizes further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good call out. I will add a detail of how big each layer is in the Readme. Regarding conda-pack, I definitely get why you are using it, but since the plan is to have the majority of Terra user use this base image, I was thinking about setting up the base conda environment directly.
The use case of building a custom image on top of it is a bit of an edge case, and so the base image might not be perfectly curated for your need. Would this be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this would not be a problem, just an optimization (which could be evil as it might be pre-mature). OTH, our experience with conda-pack
is that it reduces the image size further.
This is where we borrowed the lesson. https://pythonspeed.com/articles/conda-docker-image-size/
I can give it a try once the other components are steady, actually since I'm the one mostly interested in it.
ENV PATH "${PATH}:${JUPYTER_HOME}/bin" | ||
|
||
# Install Poetry, set up the virtual environment for jupyter to run and then cleanup / uninstall poetry | ||
RUN curl -sSL https://install.python-poetry.org | POETRY_HOME=$POETRY_HOME $BASE_PYTHON_PATH \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that I was thinking about is maybe not using the user python to install the jupyter server, but instead use the system one to provide another layer of isolation
|
||
# Set the python version and corresponding conda installer | ||
ENV PYTHON_VERSION 3.10 | ||
ENV CONDA_INSTALLER https://repo.anaconda.com/miniconda/Miniconda3-py310_23.5.1-0-Linux-x86_64.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look into this as well: https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community
JIRA Ticket: https://broadworkbench.atlassian.net/browse/IA-4839
Our current offering of Jupyter Notebook on Terra on Azure is currently done through the Azure DSVM (Data Science VM). This PR is proposing a base docker image (similar to the one provided on Google) as an alternative to the DSVM to provide the following improvements:
Note: Even though this container is meant to work for both Azure and Google, the focus is to make it work on Azure first. This means that it might not work immediately out of the box as a custom image on Google. I am working on deploying this container via Terra next, and might still iterate on the dockerfile if (more like when) I find bugs.