21.09
DeepOps 21.09 Release Notes
What's New
Release 21.09 is mostly a bug fix release
General
- Support for DGX OS 5 in nvidia-dgx role
Slurm
- Slurm version 21.08.1
- HPC SDK 21.9
- Open OnDemand v2.0.9
- CUDA toolkit 11.4
- Slurm Pyxis plugin 0.11.1
- Enroot container runtime v3.2.0
- Hwloc 2.5.0, pmix 3.2.3
- Spack v0.16.2
K8s
- Kubernetes version v1.20.7 (kubespray v2.16.0)
- Helm version v3.5.4
- GPU Operator v1.8.2 (GPU driver 470.57.02)
- GPU Device Plugin v0.9.0
- GPU Feature Discovery v0.4.1
- NFS Client Provisioner v4.0.13
Changes
- Docker version 20.10
Bugs/Enhancements
- Improved cleanup in Slurm epilog (#965)
- Fix disabling NVIDIA driver install on Slurm cluster install (#948)
- Permit SFTP in default SSHD config (#980)
- Address different possible DCGM service names depending on version (#983)
- Fix PAM Slurm adopt/login (#989)
- Enroot: adjust cache directory to be per-user (#997)
- Adding proxy support for downloading of hwloc, pmix, nhc and slurm (#1002)
- Remove broken offline deployment support and clarify documentation (#1012)
- Grafana: add var for custom config template (#994)
- EasyBuild: Enable both shells on all distros (#993)
- Default to building Slurm with dynamic libs (#1021)
- ood-wrapper: Don't install python3-passlib on CentOS 7 (#995)
- Update ansible-role-enroot to 0.5.0 (#1030)
Upgrade steps
If you are upgrading to this version of DeepOps from a previous release you will need to follow the upgrade section of the Slurm or Kubernetes Deployment Guides. In addition to this, the ./scripts/setup.sh
script must be re-run and any new variables in the config.example files should be added to the existing config. For a full diff from release 21.06
run git diff 21.06 21.09 -- config.example/
. If you encounter problem please open a GitHub issue. See the update guide for additional guidance.