Releases: GoogleCloudPlatform/cluster-toolkit
Releases · GoogleCloudPlatform/cluster-toolkit
v1.36.0 - Parallelstore support
What's Changed
Key New Features 🎉
- Add support for parallelstore in pre-existing-network-storage by @harshthakkar01 in #2701
- Develop and adopt boot-time fix for EOL CentOS 7 repositories by @tpdownes in #2738
New Modules 🧱
- Create 'pre-existing-gke-cluster' module by @sharabiani in #2704
- Add parallelstore module and support for rocky 8, ubuntu 22.04 and debian 12 by @harshthakkar01 in #2695
- Add
schedmd-slurm-gcp-v6-nodeset-dynamic
module by @mr0re1 in #2696
Module Improvements 🔨
- Add 'source' argument for path to prolog or epilog scripts by @andybubu in #2670
- Allow users to turn on access to cluster via GCP public IP address space by @ankitkinra in #2687
- Add known gpu types and their accelerators to gke module by @ankitkinra in #2680
- Add disk_type for HTCondor's EP template by @aneo-ssam in #2705
Improvements 🛠
Bug fixes 🐞
- Revert "Remove installation of enroot and pyxis from a3-highgpu-8g blueprint" by @samskillman in #2722
- Only enable gpu taints if guest_acclerator list is not empty by @ankitkinra in #2727
- Move GCESysPrep to provisioner in Windows scripts by @tpdownes in #2728
- Modify a3-highgpu-8g image-building blueprint network by @tpdownes in #2744
- Update image to new centos image for both login and builder nodes by @ankitkinra in #2780
Other changes
New Contributors
- @sharabiani made their first contribution in #2704
- @aneo-ssam made their first contribution in #2705
Full Changelog: v1.35.1...v.1.36.0
v1.35.1: Fix SlurmGCP prolog/epilog scripts bug
Full Changelog: v1.35.0...v1.35.1
v1.35.0: Shared reservations, TF provider configuration, and targeted group deployment
What's Changed
Key New Features 🎉
- Ability to configure the Terraform provider in blueprint @cdunbar13 in #2635
- Add
--skip
and--only
todeploy
anddestroy
commands by @mr0re1 in #2658 - Add support for shared reservations by @mr0re1 in #2640
New Modules 🧱
- Add stand alone MIG module by @mr0re1 in #2682
- Add
pre-existing-subnetwork
module by @LasseHjorth in #2597
Module Improvements 🔨
- NFS export options for Filestore by @wiktorn in #2615
- Add option to specify address range for psa by @andybubu in #2633
- Add options to specify authorized networks for slurm-cloudsql by @andybubu in #2631
- Update GKE ML blueprint to use native GKE driver by @ankitkinra in #2662
- Exposes TreeWidth and TopologyPlugin in blueprints by @nick-stroud in #2683
- OFE: new features - edited history by @ek-nag in #2700
Improvements 🛠
Deprecations 💤
- SlurmGCP V6 remove support for custom instance templates by @mr0re1 in #2664
- SlurmGCP V6 remove support for custom instance templates by @mr0re1 in #2667
Bug fixes 🐞
- Update to fix OFE filestore by @cdunbar13 in #2688
New Contributors
- @andybubu made their first contribution in #2633
- @LasseHjorth made their first contribution in #2597
Full Changelog: v1.34.3...v1.35.0
v1.34.3 Documentation update
v1.34.2: Documentation update
v1.34.1: A3 Mega Slurm Clusters
What's Changed
Key New Features 🎉
- New Blueprint to provision Slurm clusters with A3 Mega (a3-megagpu-8g) compute nodes
- Simplification of a3-highgpu-8g blueprint by using recently added support for Enroot/Pyxis, PMIx in Slurm images and the new multivpc module for managing multiple GPU networks
Module Improvements 🔨
- Update database version variable in slurm-cloudsql-federation module by @tfhartmann in #2606
Version Updates ⏫
- Update Slurm-GCP modules to v6.5.4 by @tpdownes in #2618
- Adopt Slurm-GCP 6.5.5 by @tpdownes in #2641
- Adopt Slurm 6.5.6 to workaround long hostname issues in Debian 12 by @tpdownes in #2644
Bug fixes 🐞
- Modify a3-highgpu-8g blueprint cluster blueprint network by @tpdownes in #2648
- Re-organize a3-highgpu-8g documentation by @tpdownes in #2647
New Contributors
- @wiktorn made their first contribution in #2607
- @tfhartmann made their first contribution in #2606
Full Changelog: v1.34.0...v1.34.1
v1.34.0: Slurm-GCP v6 Generally Available
What's Changed
In this release, we promote Slurm-GCP V6 to GA, making it the recommended version of Slurm-GCP. Find out more at:
Announcement
Key New Features 🎉
- Roll TGP to v5.28.0 by @alyssa-sm in #2535
Module Improvements 🔨
- Slurm6. Add support for nodeset
network_storage
by @mr0re1 in #2522 - Add documentation about how to use max-distance with Slurm v6 by @nick-stroud in #2551
- Remove
login
infix from instance name by @mr0re1 in #2540 - New output for private-service-access by @cdunbar13 in #2562
- Make batch job runnables a list instead of single string by @aaronegolden in #2516
- Implement recovery of HTCondor spool (job queue) by @tpdownes in #2500
- Slurm6. Stop using SlurmGCP remote modules for partition and nodeset_dyn by @mr0re1 in #2558
- Fix deployment of multiple Batch jobs by @aaronegolden in #2543
- Add debugging output to spack and ramble installations by @cdunbar13 in #2568
Improvements 🛠
- High level changes in Slurm GCP v6 by @nick-stroud in #2620
- OFE: new features and fixes. by @ek-nag in #2512
- Update A3 blueprint guidance for reservations by @tpdownes in #2573
- Proceed with destruction of other groups even if current failed by @mr0re1 in #2575
- Add blueprint to support Apptainer by @wkharold in #2565
Deprecations 💤
- Make Slurm-GCP v5 examples and references as legacy by @harshthakkar01 in #2567
Version Updates ⏫
- Update a3-highgpu-8g blueprint to use latest v5 tag by @tpdownes in #2572
- Update Slurm-GCP v5 modules and examples to 5.11.1 by @tpdownes in #2595
- Update Slurm-GCP v6 modules and examples to 6.5.2 by @tpdownes in #2594
Bug fixes 🐞
- OFE: fixing broken partitions update logic. by @ek-nag in #2563
- Allow only specific reservation for nodeset in slurm-gcp v6 nodeset by @harshthakkar01 in #2612
- Allow specific reservation for node-group in slurm-gcp v5 by @harshthakkar01 in #2614
Other changes
- Revert "Allow specific reservation for node-group in slurm-gcp v5" by @harshthakkar01 in #2621
- Revert "Revert "Allow specific reservation for node-group in slurm-gcp v5"" by @harshthakkar01 in #2622
Full Changelog: v1.33.0...v1.34.0
v1.33.0: "ghpc_stage" function; Slurm-GCP v6 improvements
What's Changed
Key New Features 🎉
- Add docs about
ghpc_stage
and other functions by @mr0re1 in #2485 - Add startup-script option to automatically install Docker at boot by @tpdownes in #2489
New Modules 🧱
- MultiVPC Module by @cdunbar13 in #2450
Module Improvements 🔨
- Address feature requests for HTCondor functionality in Windows by @tpdownes in #2469
- Slurm6. Replace
service_account
withservice_account_email|scopes
by @mr0re1 in #2495 - Slurm6. Replace vars
disable_X -> enable_X
by @mr0re1 in #2486 - Remove "hard" dependency between login instance and controller instance by @mr0re1 in #2413
- Allow the
wait-for-startup
module to take a list of instance names by @rohitramu in #2515 - Simplify "cleanup compute" by @mr0re1 in #2479
- Copy labels from the batch-job-template module to the actual Batch job spec by @aaronegolden in #2514
- Slurm6. Automatically set login intances name, don't put role into it by @mr0re1 in #2531
- Adopt Slurm-GCP 6.4.6 by @tpdownes in #2511
Improvements 🛠
- Update hpc-slurm-ramble-gromac example to use Slurm-GCP v6 modules by @harshthakkar01 in #2407
- Add Slurm-gcp v6 example for hpc-amd-slurm blueprint and references by @harshthakkar01 in #2411
Bug fixes 🐞
- Go binary downloads must include patch version by @nick-stroud in #2465
- Resolve Ansible crash in Spack installation by @tpdownes in #2462
Full Changelog: v1.32.1...v1.33.0
v1.32.1: Fix version number in modules
v1.32.0: Deployment files and Slurm-GCP v6 examples
What's Changed
Key New Features 🎉
- Deployment files allow merging generic blueprints with configurations specific to single deployments
New Modules 🧱
- Decoupling private access from Cloud SQL to allow multiple instances in same VPC by @cboneti in #2397
Improvements 🛠
- Possible breaking change to workflows that call ghpc create: Set default validation level to
ERROR
by @mr0re1 in #2383 - Add example using Slurm static compute nodes by @nick-stroud in #2393
- Add lustre slurm blueprint for v6 version and integration test by @harshthakkar01 in #2329
- Add v6 version of ml-slurm blueprint and integration test by @harshthakkar01 in #2337
- Add Slurm-GCP v6 version of zone policies blueprint by @alyssa-sm in #2390
- In place update hpc-slurm-qwiklab blueprint to use slurm-gcp v6 by @harshthakkar01 in #2402
- Add SlurmGCP v6 version of CAE blueprint by @harshthakkar01 in #2325
- Roll forward "Add example using Slurm static compute nodes" by @nick-stroud in #2408
- Update gcc to 13.1.0 on Rocky 8 Gromacs example by @tpdownes in #2445
- Add SlurmGCP v6 version of hcls blueprint and integration test by @harshthakkar01 in #2366
- Update MPI & Gromacs for cache hit and better compatibility w/ Rocky8 default image by @nick-stroud in #2456
Bug fixes 🐞
- Revert "Add example using Slurm static compute nodes" by @nick-stroud in #2404
- Fixes for workstation creation - new extension added for yaml by @cdunbar13 in #2421
- Updating garther startup script and integration test by @cdunbar13 in #2449
Full Changelog: v1.31.1...v1.32.0