Releases: GoogleCloudPlatform/cluster-toolkit
Releases · GoogleCloudPlatform/cluster-toolkit
v1.23.0 Added Shielded VM support and improve HTCondor module
What's Changed
Key New Features 🎉
- Star ccm update to include Slurm by @jrossthomson in #1644
Module Improvements 🔨
- Add Shielded VM support to Packer module by @tpdownes in #1682
- Add Shielded VM features to HTCondor modules by @tpdownes in #1704
Improvements 🛠
- Support network_storage runners in HTCondor modules by @tpdownes in #1696
- hpc-enterprise-slurm gpu fix by @cboneti in #1691
Version Updates ⏫
Other changes
- Update reference version to v1.23.0 by @harshthakkar01 in #1761
- Release v1.23.0 by @harshthakkar01 in #1762
Full Changelog: v1.22.1...v1.23.0
v1.22.1 Fix Chrome Remote Desktop with updated NVIDIA Grid driver for Ubuntu
- Hotfix: Update NVIDIA Grid drivers to 15.3 by @nick-stroud in #1737
Full Changelog: v1.22.0...v1.22.1
V1.22.0: H3 VM family, Spack module redesign and public build cache support, HTCondor improvements
What's Changed
Key New Features 🎉
- HTCondor blueprint simplification by @tpdownes in #1612
- Add MIG ID as output/input in HTCondor modules by @tpdownes in #1617
- Support N>2 groups of HTCondor execute points by @tpdownes in #1626
- Setup Spack to pull from Google's Spack binary cache by default by @nick-stroud in #1650
- Breaking Changes - Spack Module Redesign by @nick-stroud in #1631
- Custom image functionality for Open Front End by @ek-nag in #1594
- Add H3 to
examples/hpc-enterprise-slurm
by @mr0re1 in #1657 - Add H3 partition to the hpc-slurm example by @mr0re1 in #1667
Module Improvements 🛠
- Add G2 family support to GPU-normalizing code by @tpdownes in #1608
- Increase default Packer VM scopes to cloud-platform by @tpdownes in #1627
- Support specification of MIG target shape in HTCondor modules by @tpdownes in #1654
Improvements
- Make the "Apply Changes" prompt clearer by @rohitramu in #1625
Deprecations
Version Updates
- Update DDN EXAscaler to 6.2 by @rohitramu in #1606
- Update to latest tf provider and resolve GKE conflict by @nick-stroud in #1641
- Update DDN EXAScaler image to 6.2 by @rohitramu in #1656
- Bump slurm-gcp to 5.7.5 (from 5.7.4) by @SkylerMalinowski in #1661
Other changes
- Make execute command ansible script generic so it can be reused by @nick-stroud in #1487
- Add commands feature to spack-install to allow execution of arbitrary commands by @nick-stroud in #1490
- Deprecate
caches_to_populate
onspack-install
module in favor of commands by @nick-stroud in #1491 - Add functionality for Spack module to take data runners by @nick-stroud in #1496
- Make spack output runner destination deterministic to fix known at apply error by @nick-stroud in #1500
- DEPRECATE spack-install.environments in favor of using data_files and commands by @nick-stroud in #1497
- DEPRECATE spack-install.packages in favor of using commands by @nick-stroud in #1501
- DEPRECATE spack-install.licenses in favor of using commands by @nick-stroud in #1518
- DEPRECATE spack-install.compilers in favor of using commands by @nick-stroud in #1502
- DEPRECATE install_flags and concretize_flags in spack-install module by @nick-stroud in #1526
- DEPRECATE
spack_cache_url
andgpg_keys
from spack-install by @nick-stroud in #1525 - DEPRECATE spack-install.configs in favor of using data_files and commands by @nick-stroud in #1528
- Add blueprint validation to tutorials and video blueprints by @nick-stroud in #1527
- Convert spack-install script to ansible by @nick-stroud in #1531
- Split spack functionality into setup and execute modules by @nick-stroud in #1583
- Refactor
validateConfig
by @mr0re1 in #1589 - Add explicit zone to batch-login-node module by @tpdownes in #1603
- Make output of execute_commands more user friendly by @nick-stroud in #1587
- Rename spack-install to spack-setup by @nick-stroud in #1630
- Add documentation for usage of spack-execute module by @nick-stroud in #1632
- Notify users that spack-install has moved to spack-setup by @nick-stroud in #1636
- Make machines to wait for Spack install lock by @nick-stroud in #1640
- Spack module tweeks to make compatible across supported images by @nick-stroud in #1645
- Provide info about logging to the user as ansible output will hang by @nick-stroud in #1647
- Add information about deprecation and breaking changes by @nick-stroud in #1648
- Remove duplicated and outdated outputs from spack-setup by @nick-stroud in #1649
- Release v1.22.0 by @tpdownes in #1686
Full Changelog: v1.21.0...v1.22.0
V1.21.0: Improved compact placement support and error reporting
Key New Features
- Add user guide for image building
- "ghpc create" now provides line and column number hints for many errors that it encounters in blueprints
Module Improvements
gke-node-pool
: support for ephemeral local SSD
vm-instance
- allow specification of max_distance when using compact placement
- improve network block validation to prevent accidental use of default network
Deprecations
- Deprecate all
var.source_image_*
variables in Slurm modules (#1524)
Community Contributions
What's Changed
- Minor refactoring of
modulewriter
by @mr0re1 in #1522 - Allow setting kubernetes labels on node group instances by @issacg in #1504
- Deprecate
source_image*
fields in Slurm modules by @mr0re1 in #1524 - Add support for Ephemeral Storage Local SSD API by @issacg in #1506
- Disable dependabot OFE version update PRs by @mr0re1 in #1547
- Update Slurm
var.disk_type
validation by @mr0re1 in #1532 - Remove "null by omitting value" from blueprints by @mr0re1 in #1555
- Restrict VM-instance network variables validation by @mr0re1 in #1553
- Merge release v1.20.0 back into develop by @mr0re1 in #1561
- Enable expressions in
Blueprint.Vars
by @mr0re1 in #1563 - Do not use network when explicitly setting network_interfaces by @tpdownes in #1566
- Update google provider -> 4.73.0 by @mr0re1 in #1564
- Packer best practices and networking documentation by @tpdownes in #1562
- Output multiple errors instead of first one by @mr0re1 in #1569
- Allow to specify
vm-instance.placement_policy.max_distance
by @mr0re1 in #1567 - Group errors of all sub-checks in
validateConfig
individually. by @mr0re1 in #1573 - gvnic not supported by ubuntu_containerd by @issacg in #1542
- Bump google.golang.org/api from 0.128.0 to 0.130.0 by @dependabot in #1557
- Update list of supported VM images by @rohitramu in #1495
- Insert a basic Ubuntu node pool into the GKE integration test by @nick-stroud in #1577
- GKE node-pool provider version requirement by @nick-stroud in #1575
- Udate
vm-instance.placement_policy
documentation by @mr0re1 in #1580 - Deprecate
DeploymentGroup.Kind
by @mr0re1 in #1576 - NVMe block storage for GKE by @issacg in #1559
- Add precondition blocking use of c3 machine with pd-standard by @nick-stroud in #1579
- Add enable_secure_boot var for GKE + docs by @issacg in #1582
- Bump version to v1.21.0 by @tpdownes in #1628
- Release v1.21.0 to main branch by @tpdownes in #1629
Full Changelog: v1.20.0...v1.21.0
V1.20.0 Filestore for GKE, Improved Windows support, & git-hosted Packer Modules
Key New Features
- Native GKE support for Filestore: storage-gke example.
- Improved support for Windows - Packer and
windows-startup-script
; - Packer "packages" - treat remote (git-hosted) Packer modules as packages when using Terraform's "//" notation;
- Automate DAOS server/client images
New Modules
gke-persistent-volume
: automatically creates persistent volumes and persistent volume claims for shared storage.windows-startup-script
: a simple module that curates scripts for customizing Windows VMs.
Module Improvements
-
- do not swap boot disk (and VM) each time a new disk image is available;
-
vpc
:- enabling TCP tunneling to the WinRM port used by PowerShell;
- add firewall rule for SSH from arbitrary IP ranges;
-
- add option for static node count;
- add option to enable gcfs;
-
- expose the option to not create a system node pool;
- add option to create and update timeouts;
- update service account variable to separate email and scopes;
-
gke-job-template
: add templating for persistent volume claims -
- add
disk_type
support; - add Powershell script support;
- improved support for Windows;
- treat remote (git-hosted) Packer modules as packages when using Terraform's "//" notation
- add
-
- add support for fixed version of HTCondor;
- improve resilience;
-
schedmd-slurm-gcp-v5-controller
: allow providing short references for image project -
batch-job-template
: use Batch HPC CentOS images as default image
Version updates
- Update to
slurm-gcp
5.7.4 - Update
google-cloud-daos
from v0.4.0 to v0.4.1
What's Changed
- Merge v1.18.1 back to develop by @nick-stroud in #1421
- Improve vpc module by @tpdownes in #1422
- Expose the option on gke-cluster to not create a system node pool by @nick-stroud in #1425
- Add option to gke-node-pool for static node count by @nick-stroud in #1424
- Add options for create and update timeouts to gke modules by @nick-stroud in #1426
- Update gke service account variable to separate email and scopes by @nick-stroud in #1427
- Update HTCondor example to use Rocky Linux 8 by @tpdownes in #1432
- Update develop with release-candidate: Fix Ansible installation upon re-run by @rohitramu in #1439
- Improver Packer support for Windows by @tpdownes in #1431
- Update HTCondor execute point module by @tpdownes in #1434
- Add option to enable gcfs on gke-node-pool by @nick-stroud in #1428
resreader.go
code clean up by @mr0re1 in #1445- Improve HTCondor example and integration tests by @tpdownes in #1440
- Do not use
log.Fatal
inpkg/config
by @mr0re1 in #1444 - Remove
Module.RequiredApis
by @mr0re1 in #1446 - Conditionally exclude nodeSelector when not needed by @nick-stroud in #1441
- Adopt latest release of startup-script modules by @tpdownes in #1411
- Add
docs/module-guidelines.md
by @mr0re1 in #1423 - Improve Packer experience for Windows by @tpdownes in #1447
- Add inert
Module.RequiresApis
for backward compatibility by @mr0re1 in #1450 - Bump google.golang.org/api from 0.125.0 to 0.126.0 by @dependabot in #1430
- Fix regex for validating GroupName by @mr0re1 in #1449
- Reduce size of expanded blueprint by adding
omitempty
where applicable by @mr0re1 in #1452 - Remove
Module.DeploymentSource
, compute it on demand by @mr0re1 in #1453 - Merging v1.19.0 from main back into develop by @rohitramu in #1462
- Fix static_check warnings in
cmd/root*.go
by @mr0re1 in #1460 - Slurm gcp 5.7.4 by @SkylerMalinowski in #1459
- Fix panic while attempting to tokenize Null-value by @mr0re1 in #1468
- Update "google", "google-beta" providers to 4.69.1 by @rohitramu in #1464
- Adds gke-persistent-volume module by @nick-stroud in #1442
- Add documentation for gke-persistent-volume-module by @nick-stroud in #1478
- Merge v1.19.1 hotfix release into develop by @tpdownes in #1480
- Bump golang.org/x/sys from 0.8.0 to 0.9.0 by @dependabot in #1475
- Bump github.com/otiai10/copy from 1.11.0 to 1.12.0 by @dependabot in #1476
- Bump google.golang.org/api from 0.126.0 to 0.128.0 by @dependabot in #1477
- Address minor warnings and lint issues by @tpdownes in #1481
- Relax
TestNetworkStorage
to accomodategke-persistent-volume
by @mr0re1 in #1483 - Deprecated
WrapSettingsWith
by @mr0re1 in #1466 - Print advanced instructions after
ghpc deploy
by @mr0re1 in #1463 - Add
terraform_backend_defaults
section to some examples by @mr0re1 in #1469 - Use consistent order in "product of module use" mark. by @mr0re1 in #1484
- Add support for Packer packages by @tpdownes in #1467
- Add community example of how to use filestore with gke by @nick-stroud in #1443
- Remove excessive error messages by @mr0re1 in #1485
- Add rich error messages with position and snippet by @mr0re1 in #1448
- Update "google" provider in OFE from 3.x to 4.x by @rohitramu in #1470
- Use strict
Path
builder to reduce human error. by @mr0re1 in #1489 - Remove
settingsToIgnore
fromuseModule
by @mr0re1 in #1486 - Allow providing short names for image project by @rohitramu in #1472
- Remove debug output from create command by @tpdownes in #1492
- Add custom unmarshaler for
Module.Use
for better error messaging by @mr0re1 in #1473 - Use
regexall
instead ofstrcontains
to stay compatible with terraform 1.2 by @rohitramu in #1493 - Don't swap VM boot disk (and VM) each time a new disk image is available by @issacg in #1474
- Module documentation update and improved DAOS examples by @tpdownes in #1488
- Add
automatic_restart
tovm-instance
by @mr0re1 in #1288 - Add
pipefail
to Makefile to prevent swallowing failed tests by @mr0re1 in #1498 - Vm instance boot disk lifecycle changes by @cboneti in #1494
- Remove "failed tests" check from
enforce_coverage
since it doesn't … by @mr0re1 in #1499 - Drop coverage requirement for pkg/shell by @tpdownes in #1507
- Update google-cloud-daos version from v0.4.0 to v0.4.1 b...
v1.19.1 Fix panic on null fields in terraform outputs
What's Changed
Full Changelog: v1.19.0...v1.19.1
v1.19.0: ghpc destroy command, automatic ssh configuration, and Ramble integration
Key New Features
- New
destroy
command that automates deletion of all infrastructure from a deployment - New
ramble-execute
module. Example blueprint:ramble.yaml
. - Automated SSH configuration using startup-script module with
configure_ssh_host_patterns
setting.
Module Improvements
ramble-setup
Made the module idempotent.- Blueprint
labels
are now added to all resources in these modules: packer/custom-image
: Remove temporary users from the final image.project/service-account
: Simplified Service Account usage.startup-script
: Enable custom service accounts with startup-scriptgke-cluster
: Exposed Container Storage Interface drivers addons for several different GKE storage types.- Eliminated the need to activate a Python virtual environment to run Ansible.
Improvements
- Add support for indexing to "simple blueprint expressions"
- Added community wrapper blueprint for LLNL flux-framework example
Version updates
- Intel DAOS from 0.3.0 to 0.4.0:
hpc-slurm-daos.yaml
: Server updatepfs-daos.yaml
: Client update
- Upgraded Terraform provider from 4.63.1 to 4.65.2
- Upgraded Spack default version from 0.19.0 to 0.20.0
- Update to slurm-gcp 5.7.3
- Allow metadata key slurmd_feature to initiate dynamic node setup.
- Disable TreeWidth when dynamic nodes are configured.
- Fix NVIDIA driver install after kernel upgrade for rocky-linux-8.
What's Changed
- Add integration tests runs for
release-candidate
brunch by @mr0re1 in #1335 - Non-exclusive debug partition for hpc-slurm by @cboneti in #1345
- Reword module descriptions so that they fit on single line by @nick-stroud in #1343
- Improve instance ID printout in tests by @tpdownes in #1346
- Add
hpc-enterprise-slurm
integration test by @mr0re1 in #1331 - Add pre-commit to check for ghpc_module label by @nick-stroud in #1344
- Add example for the lustre file system. by @rohitramu in #1348
- Fix label value validation. by @rohitramu in #1349
- making hpc-slurm-ubuntu debug partition non-exclusive by @cboneti in #1350
- Merge main into develop by @cboneti in #1358
- Change GCP API packages
cloud.google.com/go
>google.golang.org/api
by @mr0re1 in #1356 - Upgrading terraform provider to 4.65.2 by @cboneti in #1359
- Add a ramble execute module by @douglasjacobsen in #1310
- Disable release tests by @mr0re1 in #1361
- DAOSGCP-175 Updates for google-cloud-daos v0.4.0 by @mark-olson in #1351
- Bump google.golang.org/api from 0.122.0 to 0.123.0 by @dependabot in #1364
- Simplify adoption of Spack build caches in Google Cloud Storage by @tpdownes in #1352
- Remove
modulereader.ModuleFS
usesourcereader.ModuleFS
instead. by @mr0re1 in #1365 - Improve test coverage by @tpdownes in #1368
- Bump github.com/cloudflare/circl from 1.1.0 to 1.3.3 by @dependabot in #1372
- Update Django to 4.1.9 to address CVE-2023-31047 by @tpdownes in #1370
- Address CVE-2023-32681 by upgrading requests by @tpdownes in #1371
- Add test that all
file-system
mods outputnetwork_storage
by @mr0re1 in #1373 - Expose csi driver addons in gke-cluster by @nick-stroud in #1374
- Eliminate need to activate virtual environment to run Ansible by @tpdownes in #1353
- Update spack default version to v0.20.0 by @saltysoup in #1367
- Add all Ansible binaries to default PATH by @tpdownes in #1379
- Fix batch mpi example by @tpdownes in #1380
- Update ramble-setup module to be idempotent by @douglasjacobsen in #1375
- Ensure that all modules take labels if they create resources by @rohitramu in #1362
- Add support for indexing to "simple blueprint expressions" by @mr0re1 in #1377
- Remove omnia dependencies from GHPC virtual environment by @tpdownes in #1381
- Add validation for deprecated input variables by @rohitramu in #1390
- Add destroy command for deployments by @tpdownes in #1382
- Bump google.golang.org/api from 0.123.0 to 0.124.0 by @dependabot in #1385
- Disable auto_activate_base in conda examples by @tpdownes in #1392
- Ensure local users are not present in final image by @tpdownes in #1393
- add config-ssh as a startup script option by @cboneti in #1378
- Bump github.com/zclconf/go-cty from 1.13.1 to 1.13.2 by @dependabot in #1384
- Bump tomlkit from 0.11.7 to 0.11.8 in /community/front-end/ofe by @dependabot in #1395
- Bump google-cloud-storage from 2.8.0 to 2.9.0 in /community/front-end/ofe by @dependabot in #1396
- Bump github.com/go-git/go-git/v5 from 5.6.1 to 5.7.0 by @dependabot in #1383
- Bump cachetools from 5.3.0 to 5.3.1 in /community/front-end/ofe by @dependabot in #1397
- Bump typing-inspect from 0.8.0 to 0.9.0 in /community/front-end/ofe by @dependabot in #1398
- Fix typos by @tpdownes in #1402
- Bump urllib3 from 1.26.15 to 2.0.2 in /community/front-end/ofe by @dependabot in #1399
- Add community wrapper blueprint for LLNL flux-framework by @wkharold in #1369
- Enable remote git Packer by @tpdownes in #1401
- Update READMEs about "literal variables" by @mr0re1 in #1391
- Bump google.golang.org/api from 0.124.0 to 0.125.0 by @dependabot in #1409
- Simplify service-account module by @tpdownes in #1400
- Bump github.com/hashicorp/hcl/v2 from 2.16.2 to 2.17.0 by @dependabot in #1408
- Enable custom service accounts with startup-script by @tpdownes in #1404
- Update to slurm-gcp 5.7.3 by @SkylerMalinowski in #1410
- Validate a blueprint's top-level "labels" variable by @rohitramu in #1394
- Update vm-instance instructions to handle the case that there are zero instances by @nick-stroud in #1413
- Identify G2 family as having accelerators by @tpdownes in #1415
- Fix CRD Slurm example by @nick-stroud in #1414
- updating examples and documentation to point to newer SchedMD images by @cboneti in #1412
- Move hpc-enterprise-slurm test to avoid stockouts by @nick-stroud in #1416
- Fix nfs-server attached disk mounting. by @mr0re1 in #1406
- Do not swallow error during expanded.yaml write by @mr0re1 in #1417
- Update Open Front...
v1.18.1: Update Package Requirements for Open Front End
What's Changed
- Bump cryptography from 40.0.2 to 41.0.0 in /community/front-end/ofe by @dependabot in #1418
Full Changelog: v1.18.0...v1.18.1
v1.18.0: ghpc deploy, new examples, better examples names, slurm-gcp 5.7.2
Key New Features
ghpc deploy
is now the recommended way of deploying your environments- multigroup blueprints may now use module outputs from one group to another
- e.g., a. network may be dynamically created in group 1 and its name will be available directly in group 2
- New hpc-enterprise blueprint with various high performance options
- New ML blueprints: ml-slurm.yaml and ml-gke.yaml
- Blueprints renamed for more clarity
- Ability to communicate variables across deployment groups with
ghpc deploy
orghpc export-outputs
andghpc import-inputs
- Slurm on GCP V4.x is now deprecated, all core examples are moved to V5.7.2
Examples
htc-slurm.yaml
: shows how to provision a cluster with configuration tuned for many short-duration, loosely coupled jobs.client-google-cloud-storage.yaml
: demonstrates different ways to use Google Cloud Storage (GCS) buckets in the HPC Toolkit.
New Modules
gke-job-template
: Creates a Kubernetes job templated file that can be used to submit jobs.kubernetes-operations
: Performs pre-defined operations on Kubernetes resources that would otherwise be executed usingkubectl
.
Module Improvements
gke-cluster
: Added GPU support and automated installation of Nvidia drivers.
Deprecations
- Slurm V4.x modules: partition, controller and login-node.
Version updates
schedmd-slurm-gcp-v5-controller
: update SchedMD modules to 5.7.2- Min required Terraform version bumped 1.0 -> 1.2
- Min required Packer version bumped 1.6 -> 1.7.9
What's Changed
- Include group kind in deployment metadata by @tpdownes in #1213
- Bump google.golang.org/api from 0.118.0 to 0.119.0 by @dependabot in #1209
- Bump github.com/otiai10/copy from 1.10.0 to 1.11.0 by @dependabot in #1210
- Increase get URL timeout for CRD module by @tpdownes in #1211
- Use optimize utilization autoscaling profile by @nick-stroud in #1214
- Retry project cleanup up to 4 times each night by @tpdownes in #1217
- Use deadline instead of retries in wait-for-startup by @mr0re1 in #1216
- Silence daily cleanup notifications and enable retries for other builds by @tpdownes in #1218
- Bump minimal Terraform version 1.0 -> 1.2 by @mr0re1 in #1178
- Implement stub export-outputs command by @tpdownes in #1219
- Bump minimum Terraform in golden copy deployments by @tpdownes in #1222
- Use Dict for Module.Settings, derive connectivity from it by @mr0re1 in #1205
- Initial implementation of export-outputs command by @tpdownes in #1225
- Minor refactoring config.go by @mr0re1 in #1223
- Implement stub import-inputs command by @tpdownes in #1226
- Add better version comparator for Makefile by @mr0re1 in #1215
- Fix whitespace in deployment directories by @tpdownes in #1227
- Update git clone instruction to use HTTPS instead of SSH by @mr0re1 in #1233
- Add ghpc version to expanded blueprint by @mr0re1 in #1224
- Update GKE settings to match recommendations from GKE team by @nick-stroud in #1231
- Bump min packer version to 1.7.9 by @mr0re1 in #1232
- Fail wait-for-startup fast if log can not be fetched by @mr0re1 in #1220
- Remove typo in README heading by @nick-stroud in #1237
- Fix missing command to print out by @mr0re1 in #1238
- Handle "wrong-type-of-packer" in
make warn-packer-missing
by @mr0re1 in #1239 - Fix Chrome Remote Desktop NVIDIA Grid installation by @tpdownes in #1240
- Address
shellcheck -o all wait-for-startup-status.sh
by @mr0re1 in #1242 - Fix retry configuration for daily integration tests by @tpdownes in #1236
- Do not store ModuleInfo in DeploymentConfig by @mr0re1 in #1230
- Create a gke-job-template module, which creates a Kubernetes job file by @nick-stroud in #1234
- Ensure that terraform cleanup always runs by @tpdownes in #1235
- Remove unused method
HasKind
by @mr0re1 in #1246 - Add option to select zones for gke-node-pool by @nick-stroud in #1245
- Add the gke-job-template module to the list of modules by @nick-stroud in #1243
- Initial implementation of import-inputs command by @tpdownes in #1228
- Remove ansible-lint to unblock PRs by @mr0re1 in #1257
- Skip TestFindTerraform if no terraform is installed by @mr0re1 in #1255
- Unify shared code of create and expand commands by @mr0re1 in #1244
- Bump google.golang.org/api from 0.119.0 to 0.120.0 by @dependabot in #1253
- Add documentation warning about lustre license cost by @nick-stroud in #1254
- Remove modReference by @mr0re1 in #1247
- Bump cryptography from 40.0.1 to 40.0.2 in /community/front-end/ofe by @dependabot in #1252
- Bump protobuf from 4.22.1 to 4.22.3 in /community/front-end/ofe by @dependabot in #1250
- Bump pyasn1-modules from 0.2.8 to 0.3.0 in /community/front-end/ofe by @dependabot in #1251
- Bump pyasn1 from 0.4.8 to 0.5.0 in /community/front-end/ofe by @dependabot in #1248
- Make Expression into interface by @mr0re1 in #1260
- Refactor create_deployment.sh by @nick-stroud in #1258
- Address usability suggestions for multi-group deployments by @tpdownes in #1262
- Eliminate deployment metadata by @tpdownes in #1265
- Use dedicated dtype ModuleID and GroupName instead of string by @mr0re1 in #1264
- Adds a basic gke test which provisions and destroys a cluster by @nick-stroud in #1259
- Fix link in image builder example by @tpdownes in #1269
- Eliminate warnings by @tpdownes in #1277
- Add Terraform state download command to stdout of integration tests by @tpdownes in #1278
- Write Packer intergroup input values by @tpdownes in #1268
- Resolve conflicts before merging
main
...
v1.17.0: Initial Support for GKE, Slurm v5.6.3
Key New Features
- Initial Support for Kubernetes with GKE (example).
- Enable specification of all fields of module outputs
- Instructions to run the toolkit from Cloud Workstations
New Modules
gke-cluster
: module to create a Google Kubernetes Engine (GKE) clustergke-node-pool
: module to create a Google Kubernetes Engine (GKE) node pool
Module Improvements
startup-script
: replace example scripts with bool inputscustom-image
: addedimage_storage_locations
inputcustom-image
: use a unique Packer SSH username to avoid clashes with previous Packer buildshtcondor-configure
: address need for SystemD overridehtcondor-configure
: ensure that a central manager optimization is configured even when high availability is not enabledchrome-remote-desktop
: updated for Slurm image support
Improvements
- Added support for OFE deployment from a configuration file
Version updates
schedmd-slurm-gcp-v5-controller
: update SchedMD modules to 5.6.3
What's Changed
- Replace startup-srcipt examples with bool inputs by @mr0re1 in #1100
- Copy all embedded modules into deployment, use unique source for locals by @mr0re1 in #1086
- Close copy file descriptor in EmbeddedSourceReader by @mr0re1 in #1114
- Improve error match in embedded_test by @mr0re1 in #1115
- Adds a gke-cluster module to community by @nick-stroud in #1113
- DAOS docs update by @cboneti in #1116
- Simplify and relax type constraints for variables.tf by @mr0re1 in #1111
- Make every integration test into individual build config by @mr0re1 in #1112
- Fix validator test_deployment_variable_not_used by @mr0re1 in #1120
- Add basic documentation for gke-cluster module and example by @nick-stroud in #1117
- Updating packer documentation to make usage easier to find by @cboneti in #1118
- Add
image_storage_locations
input tomodules/packer/custom-image
by @mr0re1 in #1123 - Add TF definition for DAILY-test-X,PR-test-X, and PR-validation by @mr0re1 in #1119
- Add "babysit_tests" tool to automatically approve PR tests by @mr0re1 in #1106
- Solve state/world discrepancies in TF dev infra. by @mr0re1 in #1126
- Move SlurmV5 tests affected by stockouts to us-west4-c by @mr0re1 in #1124
- Improve variable references by @tpdownes in #1127
- Remove test groups, update documentation by @mr0re1 in #1128
- Fix bug in check for mixing module kinds within a group by @mr0re1 in #1130
- Update GitHub bug report template by @mr0re1 in #1131
- Remove deprecated pod_security_policy by @nick-stroud in #1133
- Add test selectors to babysit tool by @mr0re1 in #1136
- Add TF for legacy PR tests. To be removed after release by @mr0re1 in #1135
- Add SPACK_CACHE secret to spack-gromacs test by @mr0re1 in #1132
- Add instructions for connecting to the gke-cluster by @nick-stroud in #1138
- Address need for SystemD override in HTCondor module by @tpdownes in #1139
- Update TFLint and rules plugin for Google Cloud Platform by @tpdownes in #1146
- Add double quotes on variables: SC2086 – ShellCheck by @nick-stroud in #1148
- Add support for sensitive output values by @tpdownes in #1129
- Represent TerraformBackend.Config with cty.Value by @mr0re1 in #1141
- Bump github.com/otiai10/copy from 1.9.0 to 1.10.0 by @dependabot in #1143
- Bump github.com/spf13/cobra from 1.6.1 to 1.7.0 by @dependabot in #1145
- Truncate short sha length to 7 chars when filtering from cloud build by @nick-stroud in #1151
- Bump google.golang.org/api from 0.114.0 to 0.117.0 by @dependabot in #1150
- Bring develop up to date with release of v1.16.0 by @nick-stroud in #1153
- Pin google terraform provider to latest version by @nick-stroud in #1154
- Add selectors for batch and spack tests to babysit_tests tool by @nick-stroud in #1155
- Reduce the number of execution hosts in pbs test to reduce the change… by @nick-stroud in #1149
- Ensure that PBS test config explicitly uses network module by @tpdownes in #1159
- Align internal use of Toolkit GitHub refs by @tpdownes in #1160
- Move Ubuntu test and example to reduce chance of stockout by @nick-stroud in #1163
- Fix HTCondor central manager configuration by @tpdownes in #1162
- Add specialized tokenizer to handle
((HCL literals))
by @mr0re1 in #1167 - Move Slurm v5 high io test to reduce stockouts by @nick-stroud in #1168
- Gke node pool by @nick-stroud in #1140
- Make babysit_tests compatible with Python3.7 (VertexAI) by @mr0re1 in #1173
- Instructions to run the toolkit from Cloud Workstations by @cboneti in #1170
- Write group metadata to deployment folder by @tpdownes in #1169
- Update quantum example with new build instructions by @tpdownes in #1176
- Add TransformSimpleToHcl for cty.Value by @mr0re1 in #1165
- Developer setup on login is causing workstation to crash on startup by @nick-stroud in #1177
- Add conditions on Slurm partition enable_placement, exclusive, Oversu… by @mr0re1 in #1174
- Move tests to avoid stockouts by @nick-stroud in #1179
- Use a unique Packer SSH username to avoid clashes with previous Packer builds by @nick-stroud in #1184
- Bump google.golang.org/api from 0.117.0 to 0.118.0 by @dependabot in #1183
- Bump cloud.google.com/go/compute from 1.19.0 to 1.19.1 by @dependabot in #1182
- Update SchedMD modules to 5.6.3 (from 5.6.2) by @SkylerMalinowski in #1171
- Updated chrome rem...