Skip to content

Releases: GoogleCloudPlatform/cluster-toolkit

v1.42.0: Filestore deletion protection, GCP maintenance as Slurm job, Docker daemon configuration

20 Nov 19:27
1a1e22a
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

Module Improvements 🔨

Improvements 🛠

Deprecations 💤

Version Updates ⏫

Bug fixes 🐞

  • Refactor mount/mode setting for local SSD RAID by @tpdownes in #3214
  • Fix a bug where try was hiding extraction of gpu driver version by @ankitkinra in #3257
  • Fix the gpu_installation_config default for case where no customer input by @ankitkinra in #3259
  • SlurmGCP. Fix bug that prevents resourcePolicies clean up. by @mr0re1 in #3266

New Contributors

Full Changelog: v1.41.0...v1.42.0

v1.41.0 Adoption of Slurm 24.05 and Improvements to GKE Support

25 Oct 16:58
26fafe0
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

New Modules 🧱

Module Improvements 🔨

Improvements 🛠

  • Create and use non-default service accounts in GKE by @annuay-google in #3123
  • Added documentation on cloud-ops-agent installation and stackdriver removal by @jrossthomson in #3029
  • Ensure local SSD filesystem is assembled into a RAID even upon power off/on cycles by @tpdownes in #3129

Deprecations 💤

Version Updates ⏫

Bug fixes 🐞

  • Fixed the exact number constraint problem for additional vpcs in gpu_direct checks by @sharabiani in #3078
  • Provide explicit project information by @wiktorn in #3060
  • Chrome Remote Desktop: increase resilience of apt operations by @tpdownes in #3093
  • Add mount parallelstore service to mount parallelstore for every reboot by @harshthakkar01 in #3125

New Contributors

Full Changelog: v1.40.1...v1.41.0

v1.40.1 Fix issue that affected GKE blueprints due to dynamic provisioning

10 Oct 01:20
eb00254
Compare
Choose a tag to compare

What's Changed

Other changes

  • Revert PR#3046 and add more line breaks for readability by @ankitkinra in #3115

Full Changelog: v1.40.0...v1.40.1

v1.40.0: A3 Mega and A3 High families supported in GKE

03 Oct 21:13
f9f9256
Compare
Choose a tag to compare

What's Changed

Important

All HPC VM images based upon CentOS 7 have been deprecated. This means that
referring to the "hpc-centos-7" family in the "cloud-hpc-image-public"
project will fail. We recommend migrating to the "hpc-rocky-linux-8" family
that is the new default throughout the Toolkit. If CentOS 7 is truly needed,
the final HPC CentOS 7 image can be used by its name: "hpc-centos-7-v20240712".

Key New Features 🎉

New Modules 🧱

Module Improvements 🔨

Improvements 🛠

Deprecations 💤

Version Updates ⏫

Bug fixes 🐞

Other changes

  • NeMo readme instructions for preloading gpt2 tokenizer by @koallison in #3075

New Contributors

Full Changelog: v1.39.0...v1.40.0

v1.39.0: Slurm reservations during maintenance windows, Improved GKE Support, removed CentOS 7 references

12 Sep 19:38
7699f5d
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

Module Improvements 🔨

Improvements 🛠

Bug fixes 🐞

  • Add slurmgcp-managed infix to resource policy name by @mr0re1 in #2892
  • Move pytest and other package installation to make by @annuay-google in #2890
  • Prevent use of google provider 6.0 where breaking changes are in use by @tpdownes in #2978
  • Fix local_ssd_config issue that forces node-pool recreation by @sharabiani in #2968
  • kubernetes provider added to gke-cluster module by @sharabiani in #2985
  • Fix for cleanup script. The last input is optional by @cdunbar13 in #2993
  • Catch "None" fields in slurm job datetime data for BigQuery by @fdmalone in #2992

Other changes

New Contributors

Full Changelog: v1.38.0...v1.39.0

v1.38.0: Slurm GCP v6 for a3-highgpu-8g and added ability to disable automatic updates

15 Aug 23:20
1e38ce0
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

New Modules 🧱

Module Improvements 🔨

Improvements 🛠

Deprecations 💤

Version Updates ⏫

Bug fixes 🐞

New Contributors

Full Changelog: v1.37.2...v1.38.0

v1.37.2 Fix SlurmGCP cleanup of resource policies

09 Aug 21:23
229803f
Compare
Choose a tag to compare

What's Changed

Bug fixes 🐞

  • Delete at most one resource policy at a time by @mr0re1 in #2895

Full Changelog: v1.37.1...v1.37.2

v1.37.1: Documentation update

02 Aug 18:13
9e68ecc
Compare
Choose a tag to compare

Fix minor typographical errors in documentation

Full Changelog: v1.37.0...v1.37.1

v1.37.0

31 Jul 21:14
54da9b7
Compare
Choose a tag to compare

The HPC Toolkit has been rebranded to Cluster Toolkit. More details will follow shortly. The github repository has been renamed to match. This should not break existing workflows. References to the old name should seamlessly redirect to the updated repo. The binary has been renamed to gcluster (formally ghpc) but ghpc has been symlinked and will continue to work. If any unexpected behavior is noticed as part of this transition, please report it.

What's Changed

Key New Features 🎉

Other changes

Full Changelog: v1.36.1...v1.37.0

v1.36.1: Fix Slurm GCP Cloud Parameter Defaults

26 Jul 22:45
493308e
Compare
Choose a tag to compare

What's Changed

Bug fixes 🐞

Full Changelog: v1.36.0...v1.36.1