Image build failures - could be caught in CI before PR merge #107
Labels
post-mortem-followup
Valuable to follow up for the open source ecosystem.
wontfix
This will not be worked on
I opted to not setup a full staging environment alongside the production environment, it could been nice to verify in a CI system that the build of the docker image succeeded before we merged something. Otherwise, all CI deployments will fail or lead to a broken JupyterHub.
When we had prePuller.hook.enabled by default, it would fail to deploy without disrupting the JupyterHub, but with it disabled as we did when it failed due to a limit of pods per node was reached, it will instead make the hub fail to start new servers because the image is not found.
We experienced a build failure in #106, and when #103 was merged afterwards it led to the JupyterHub attempting to spawn users with an image not available. I happened to notice this 30 seconds after the JupyterHub had been corrupt and did a
helm rollback
operation to restore it to past functionality.The text was updated successfully, but these errors were encountered: