Skip to content

Releases: GoogleCloudPlatform/cluster-toolkit

v1.23.0 Added Shielded VM support and improve HTCondor module

18 Sep 22:42
0a30105
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

Module Improvements 🔨

Improvements 🛠

Version Updates ⏫

Other changes

Full Changelog: v1.22.1...v1.23.0

v1.22.1 Fix Chrome Remote Desktop with updated NVIDIA Grid driver for Ubuntu

06 Sep 17:14
27f24fc
Compare
Choose a tag to compare

V1.22.0: H3 VM family, Spack module redesign and public build cache support, HTCondor improvements

16 Aug 19:11
9a698ef
Compare
Choose a tag to compare

What's Changed

Key New Features 🎉

Module Improvements 🛠

  • Add G2 family support to GPU-normalizing code by @tpdownes in #1608
  • Increase default Packer VM scopes to cloud-platform by @tpdownes in #1627
  • Support specification of MIG target shape in HTCondor modules by @tpdownes in #1654

Improvements

Deprecations

Version Updates

Other changes

Full Changelog: v1.21.0...v1.22.0

V1.21.0: Improved compact placement support and error reporting

31 Jul 18:35
6113058
Compare
Choose a tag to compare

Key New Features

  • Add user guide for image building
  • "ghpc create" now provides line and column number hints for many errors that it encounters in blueprints

Module Improvements

  • gke-node-pool: support for ephemeral local SSD
    vm-instance
    • allow specification of max_distance when using compact placement
    • improve network block validation to prevent accidental use of default network

Deprecations

  • Deprecate all var.source_image_* variables in Slurm modules (#1524)

Community Contributions

What's Changed

  • Minor refactoring of modulewriter by @mr0re1 in #1522
  • Allow setting kubernetes labels on node group instances by @issacg in #1504
  • Deprecate source_image* fields in Slurm modules by @mr0re1 in #1524
  • Add support for Ephemeral Storage Local SSD API by @issacg in #1506
  • Disable dependabot OFE version update PRs by @mr0re1 in #1547
  • Update Slurm var.disk_type validation by @mr0re1 in #1532
  • Remove "null by omitting value" from blueprints by @mr0re1 in #1555
  • Restrict VM-instance network variables validation by @mr0re1 in #1553
  • Merge release v1.20.0 back into develop by @mr0re1 in #1561
  • Enable expressions in Blueprint.Vars by @mr0re1 in #1563
  • Do not use network when explicitly setting network_interfaces by @tpdownes in #1566
  • Update google provider -> 4.73.0 by @mr0re1 in #1564
  • Packer best practices and networking documentation by @tpdownes in #1562
  • Output multiple errors instead of first one by @mr0re1 in #1569
  • Allow to specify vm-instance.placement_policy.max_distance by @mr0re1 in #1567
  • Group errors of all sub-checks in validateConfig individually. by @mr0re1 in #1573
  • gvnic not supported by ubuntu_containerd by @issacg in #1542
  • Bump google.golang.org/api from 0.128.0 to 0.130.0 by @dependabot in #1557
  • Update list of supported VM images by @rohitramu in #1495
  • Insert a basic Ubuntu node pool into the GKE integration test by @nick-stroud in #1577
  • GKE node-pool provider version requirement by @nick-stroud in #1575
  • Udate vm-instance.placement_policy documentation by @mr0re1 in #1580
  • Deprecate DeploymentGroup.Kind by @mr0re1 in #1576
  • NVMe block storage for GKE by @issacg in #1559
  • Add precondition blocking use of c3 machine with pd-standard by @nick-stroud in #1579
  • Add enable_secure_boot var for GKE + docs by @issacg in #1582
  • Bump version to v1.21.0 by @tpdownes in #1628
  • Release v1.21.0 to main branch by @tpdownes in #1629

Full Changelog: v1.20.0...v1.21.0

V1.20.0 Filestore for GKE, Improved Windows support, & git-hosted Packer Modules

10 Jul 19:00
252694a
Compare
Choose a tag to compare

Key New Features

  • Native GKE support for Filestore: storage-gke example.
  • Improved support for Windows - Packer and windows-startup-script;
  • Packer "packages" - treat remote (git-hosted) Packer modules as packages when using Terraform's "//" notation;
  • Automate DAOS server/client images

New Modules

Module Improvements

  • vm_instance:

    • do not swap boot disk (and VM) each time a new disk image is available;
  • vpc:

    • enabling TCP tunneling to the WinRM port used by PowerShell;
    • add firewall rule for SSH from arbitrary IP ranges;
  • gke-node-pool:

    • add option for static node count;
    • add option to enable gcfs;
  • gke-cluster:

    • expose the option to not create a system node pool;
    • add option to create and update timeouts;
    • update service account variable to separate email and scopes;
  • gke-job-template: add templating for persistent volume claims

  • custom_image:

    • add disk_type support;
    • add Powershell script support;
    • improved support for Windows;
    • treat remote (git-hosted) Packer modules as packages when using Terraform's "//" notation
  • htcondor-install:

    • add support for fixed version of HTCondor;
    • improve resilience;
  • schedmd-slurm-gcp-v5-controller: allow providing short references for image project

  • batch-job-template: use Batch HPC CentOS images as default image

Version updates

  • Update to slurm-gcp 5.7.4
  • Update google-cloud-daos from v0.4.0 to v0.4.1

What's Changed

Read more

v1.19.1 Fix panic on null fields in terraform outputs

20 Jun 14:36
d6a3ef4
Compare
Choose a tag to compare

What's Changed

  • Hotfix: eliminate panic on null value in terraform outputs by @tpdownes in #1471

Full Changelog: v1.19.0...v1.19.1

v1.19.0: ghpc destroy command, automatic ssh configuration, and Ramble integration

15 Jun 17:46
5b01711
Compare
Choose a tag to compare

Key New Features

  • New destroy command that automates deletion of all infrastructure from a deployment
  • New ramble-execute module. Example blueprint: ramble.yaml.
  • Automated SSH configuration using startup-script module with configure_ssh_host_patterns setting.

Module Improvements

Improvements

Version updates

  • Intel DAOS from 0.3.0 to 0.4.0:
  • Upgraded Terraform provider from 4.63.1 to 4.65.2
  • Upgraded Spack default version from 0.19.0 to 0.20.0
  • Update to slurm-gcp 5.7.3
    • Allow metadata key slurmd_feature to initiate dynamic node setup.
    • Disable TreeWidth when dynamic nodes are configured.
    • Fix NVIDIA driver install after kernel upgrade for rocky-linux-8.

What's Changed

Read more

v1.18.1: Update Package Requirements for Open Front End

08 Jun 20:01
96c09b3
Compare
Choose a tag to compare

What's Changed

  • Bump cryptography from 40.0.2 to 41.0.0 in /community/front-end/ofe by @dependabot in #1418

Full Changelog: v1.18.0...v1.18.1

v1.18.0: ghpc deploy, new examples, better examples names, slurm-gcp 5.7.2

19 May 04:04
f1e2ec1
Compare
Choose a tag to compare

Key New Features

  • ghpc deploy is now the recommended way of deploying your environments
  • multigroup blueprints may now use module outputs from one group to another
    • e.g., a. network may be dynamically created in group 1 and its name will be available directly in group 2
  • New hpc-enterprise blueprint with various high performance options
  • New ML blueprints: ml-slurm.yaml and ml-gke.yaml
  • Blueprints renamed for more clarity
  • Ability to communicate variables across deployment groups with ghpc deploy or ghpc export-outputs and ghpc import-inputs
  • Slurm on GCP V4.x is now deprecated, all core examples are moved to V5.7.2

Examples

  • htc-slurm.yaml: shows how to provision a cluster with configuration tuned for many short-duration, loosely coupled jobs.
  • client-google-cloud-storage.yaml: demonstrates different ways to use Google Cloud Storage (GCS) buckets in the HPC Toolkit.

New Modules

  • gke-job-template: Creates a Kubernetes job templated file that can be used to submit jobs.
  • kubernetes-operations: Performs pre-defined operations on Kubernetes resources that would otherwise be executed using kubectl.

Module Improvements

  • gke-cluster: Added GPU support and automated installation of Nvidia drivers.

Deprecations

Version updates

  • schedmd-slurm-gcp-v5-controller: update SchedMD modules to 5.7.2
  • Min required Terraform version bumped 1.0 -> 1.2
  • Min required Packer version bumped 1.6 -> 1.7.9

What's Changed

Read more

v1.17.0: Initial Support for GKE, Slurm v5.6.3

04 May 19:05
f7abfbe
Compare
Choose a tag to compare

Key New Features

  • Initial Support for Kubernetes with GKE (example).
  • Enable specification of all fields of module outputs
  • Instructions to run the toolkit from Cloud Workstations

New Modules

Module Improvements

Improvements

  • Added support for OFE deployment from a configuration file

Version updates

What's Changed

  • Replace startup-srcipt examples with bool inputs by @mr0re1 in #1100
  • Copy all embedded modules into deployment, use unique source for locals by @mr0re1 in #1086
  • Close copy file descriptor in EmbeddedSourceReader by @mr0re1 in #1114
  • Improve error match in embedded_test by @mr0re1 in #1115
  • Adds a gke-cluster module to community by @nick-stroud in #1113
  • DAOS docs update by @cboneti in #1116
  • Simplify and relax type constraints for variables.tf by @mr0re1 in #1111
  • Make every integration test into individual build config by @mr0re1 in #1112
  • Fix validator test_deployment_variable_not_used by @mr0re1 in #1120
  • Add basic documentation for gke-cluster module and example by @nick-stroud in #1117
  • Updating packer documentation to make usage easier to find by @cboneti in #1118
  • Add image_storage_locations input to modules/packer/custom-image by @mr0re1 in #1123
  • Add TF definition for DAILY-test-X,PR-test-X, and PR-validation by @mr0re1 in #1119
  • Add "babysit_tests" tool to automatically approve PR tests by @mr0re1 in #1106
  • Solve state/world discrepancies in TF dev infra. by @mr0re1 in #1126
  • Move SlurmV5 tests affected by stockouts to us-west4-c by @mr0re1 in #1124
  • Improve variable references by @tpdownes in #1127
  • Remove test groups, update documentation by @mr0re1 in #1128
  • Fix bug in check for mixing module kinds within a group by @mr0re1 in #1130
  • Update GitHub bug report template by @mr0re1 in #1131
  • Remove deprecated pod_security_policy by @nick-stroud in #1133
  • Add test selectors to babysit tool by @mr0re1 in #1136
  • Add TF for legacy PR tests. To be removed after release by @mr0re1 in #1135
  • Add SPACK_CACHE secret to spack-gromacs test by @mr0re1 in #1132
  • Add instructions for connecting to the gke-cluster by @nick-stroud in #1138
  • Address need for SystemD override in HTCondor module by @tpdownes in #1139
  • Update TFLint and rules plugin for Google Cloud Platform by @tpdownes in #1146
  • Add double quotes on variables: SC2086 – ShellCheck by @nick-stroud in #1148
  • Add support for sensitive output values by @tpdownes in #1129
  • Represent TerraformBackend.Config with cty.Value by @mr0re1 in #1141
  • Bump github.com/otiai10/copy from 1.9.0 to 1.10.0 by @dependabot in #1143
  • Bump github.com/spf13/cobra from 1.6.1 to 1.7.0 by @dependabot in #1145
  • Truncate short sha length to 7 chars when filtering from cloud build by @nick-stroud in #1151
  • Bump google.golang.org/api from 0.114.0 to 0.117.0 by @dependabot in #1150
  • Bring develop up to date with release of v1.16.0 by @nick-stroud in #1153
  • Pin google terraform provider to latest version by @nick-stroud in #1154
  • Add selectors for batch and spack tests to babysit_tests tool by @nick-stroud in #1155
  • Reduce the number of execution hosts in pbs test to reduce the change… by @nick-stroud in #1149
  • Ensure that PBS test config explicitly uses network module by @tpdownes in #1159
  • Align internal use of Toolkit GitHub refs by @tpdownes in #1160
  • Move Ubuntu test and example to reduce chance of stockout by @nick-stroud in #1163
  • Fix HTCondor central manager configuration by @tpdownes in #1162
  • Add specialized tokenizer to handle ((HCL literals)) by @mr0re1 in #1167
  • Move Slurm v5 high io test to reduce stockouts by @nick-stroud in #1168
  • Gke node pool by @nick-stroud in #1140
  • Make babysit_tests compatible with Python3.7 (VertexAI) by @mr0re1 in #1173
  • Instructions to run the toolkit from Cloud Workstations by @cboneti in #1170
  • Write group metadata to deployment folder by @tpdownes in #1169
  • Update quantum example with new build instructions by @tpdownes in #1176
  • Add TransformSimpleToHcl for cty.Value by @mr0re1 in #1165
  • Developer setup on login is causing workstation to crash on startup by @nick-stroud in #1177
  • Add conditions on Slurm partition enable_placement, exclusive, Oversu… by @mr0re1 in #1174
  • Move tests to avoid stockouts by @nick-stroud in #1179
  • Use a unique Packer SSH username to avoid clashes with previous Packer builds by @nick-stroud in #1184
  • Bump google.golang.org/api from 0.117.0 to 0.118.0 by @dependabot in #1183
  • Bump cloud.google.com/go/compute from 1.19.0 to 1.19.1 by @dependabot in #1182
  • Update SchedMD modules to 5.6.3 (from 5.6.2) by @SkylerMalinowski in #1171
  • Updated chrome rem...
Read more