Skip to content

Releases: GoogleCloudPlatform/cluster-toolkit

v1.36.0 - Parallelstore support

19 Jul 16:59
da56862
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

  • Add support for parallelstore in pre-existing-network-storage by @harshthakkar01 in #2701
  • Develop and adopt boot-time fix for EOL CentOS 7 repositories by @tpdownes in #2738

New Modules 🧱

Module Improvements 🔨

  • Add 'source' argument for path to prolog or epilog scripts by @andybubu in #2670
  • Allow users to turn on access to cluster via GCP public IP address space by @ankitkinra in #2687
  • Add known gpu types and their accelerators to gke module by @ankitkinra in #2680
  • Add disk_type for HTCondor's EP template by @aneo-ssam in #2705

Improvements 🛠

  • Update A3 mega blueprint to use Slurm-GCP 6.5.12 by @tpdownes in #2763

Bug fixes 🐞

  • Revert "Remove installation of enroot and pyxis from a3-highgpu-8g blueprint" by @samskillman in #2722
  • Only enable gpu taints if guest_acclerator list is not empty by @ankitkinra in #2727
  • Move GCESysPrep to provisioner in Windows scripts by @tpdownes in #2728
  • Modify a3-highgpu-8g image-building blueprint network by @tpdownes in #2744
  • Update image to new centos image for both login and builder nodes by @ankitkinra in #2780

Other changes

  • Add validator for Terraform version and SlurmGCP6 by @mr0re1 in #2772

New Contributors

Full Changelog: v1.35.1...v.1.36.0

v1.35.1: Fix SlurmGCP prolog/epilog scripts bug

26 Jun 23:51
dbe05ee
Compare
Choose a tag to compare

v1.35.0: Shared reservations, TF provider configuration, and targeted group deployment

20 Jun 23:35
eaeacfb
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

  • Ability to configure the Terraform provider in blueprint @cdunbar13 in #2635
  • Add --skip and --only to deploy and destroy commands by @mr0re1 in #2658
  • Add support for shared reservations by @mr0re1 in #2640

New Modules 🧱

Module Improvements 🔨

Improvements 🛠

  • Add topologically-aware NCCL tests solution to A3 Mega by @tpdownes in #2657

Deprecations 💤

  • SlurmGCP V6 remove support for custom instance templates by @mr0re1 in #2664
  • SlurmGCP V6 remove support for custom instance templates by @mr0re1 in #2667

Bug fixes 🐞

New Contributors

Full Changelog: v1.34.3...v1.35.0

v1.34.3 Documentation update

10 Jun 19:21
627b43a
Compare
Choose a tag to compare

What's Changed

Other changes

  • Add link to newly published A3 Mega documentation by @tpdownes in #2677

Full Changelog: v1.34.2...v1.34.3

v1.34.2: Documentation update

03 Jun 21:30
2af3c42
Compare
Choose a tag to compare

What's Changed

Other changes

  • Update A3 Mega instructions pending publication of documentation by @tpdownes in #2655

Full Changelog: v1.34.1...v1.34.2

v1.34.1: A3 Mega Slurm Clusters

30 May 17:27
e078007
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

  • New Blueprint to provision Slurm clusters with A3 Mega (a3-megagpu-8g) compute nodes
  • Simplification of a3-highgpu-8g blueprint by using recently added support for Enroot/Pyxis, PMIx in Slurm images and the new multivpc module for managing multiple GPU networks

Module Improvements 🔨

  • Update database version variable in slurm-cloudsql-federation module by @tfhartmann in #2606

Version Updates ⏫

Bug fixes 🐞

  • Modify a3-highgpu-8g blueprint cluster blueprint network by @tpdownes in #2648
  • Re-organize a3-highgpu-8g documentation by @tpdownes in #2647

New Contributors

Full Changelog: v1.34.0...v1.34.1

v1.34.0: Slurm-GCP v6 Generally Available

24 May 00:37
5b360ae
Compare
Choose a tag to compare

What's Changed

In this release, we promote Slurm-GCP V6 to GA, making it the recommended version of Slurm-GCP. Find out more at:
Announcement

Key New Features 🎉

Module Improvements 🔨

Improvements 🛠

Deprecations 💤

Version Updates ⏫

  • Update a3-highgpu-8g blueprint to use latest v5 tag by @tpdownes in #2572
  • Update Slurm-GCP v5 modules and examples to 5.11.1 by @tpdownes in #2595
  • Update Slurm-GCP v6 modules and examples to 6.5.2 by @tpdownes in #2594

Bug fixes 🐞

Other changes

  • Revert "Allow specific reservation for node-group in slurm-gcp v5" by @harshthakkar01 in #2621
  • Revert "Revert "Allow specific reservation for node-group in slurm-gcp v5"" by @harshthakkar01 in #2622

Full Changelog: v1.33.0...v1.34.0

v1.33.0: "ghpc_stage" function; Slurm-GCP v6 improvements

16 May 13:57
146ebbe
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

  • Add docs about ghpc_stage and other functions by @mr0re1 in #2485
  • Add startup-script option to automatically install Docker at boot by @tpdownes in #2489

New Modules 🧱

Module Improvements 🔨

  • Address feature requests for HTCondor functionality in Windows by @tpdownes in #2469
  • Slurm6. Replace service_account with service_account_email|scopes by @mr0re1 in #2495
  • Slurm6. Replace vars disable_X -> enable_X by @mr0re1 in #2486
  • Remove "hard" dependency between login instance and controller instance by @mr0re1 in #2413
  • Allow the wait-for-startup module to take a list of instance names by @rohitramu in #2515
  • Simplify "cleanup compute" by @mr0re1 in #2479
  • Copy labels from the batch-job-template module to the actual Batch job spec by @aaronegolden in #2514
  • Slurm6. Automatically set login intances name, don't put role into it by @mr0re1 in #2531
  • Adopt Slurm-GCP 6.4.6 by @tpdownes in #2511

Improvements 🛠

Bug fixes 🐞

Full Changelog: v1.32.1...v1.33.0

v1.32.1: Fix version number in modules

19 Apr 18:40
bec99bb
Compare
Choose a tag to compare

What's Changed

Fix version number in modules

Full Changelog: v1.32.0...v1.32.1

v1.32.0: Deployment files and Slurm-GCP v6 examples

18 Apr 18:20
d4754d4
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

  • Deployment files allow merging generic blueprints with configurations specific to single deployments

New Modules 🧱

  • Decoupling private access from Cloud SQL to allow multiple instances in same VPC by @cboneti in #2397

Improvements 🛠

Bug fixes 🐞

  • Revert "Add example using Slurm static compute nodes" by @nick-stroud in #2404
  • Fixes for workstation creation - new extension added for yaml by @cdunbar13 in #2421
  • Updating garther startup script and integration test by @cdunbar13 in #2449

Full Changelog: v1.31.1...v1.32.0