Skip to content

Latest commit

 

History

History
262 lines (202 loc) · 14.4 KB

README.md

File metadata and controls

262 lines (202 loc) · 14.4 KB

Custom Images in the HPC Toolkit

Introduction

This module uses Packer to create an image within an HPC Toolkit deployment. Packer operates by provisioning a short-lived VM in Google Cloud on which it executes scripts to customize the boot disk for repeated use. The VM's boot disk is specified from a source image that defaults to the HPC VM Image. This Packer "template" supports customization by the following approaches following a recommended use:

They can be specified independently of one another, so that anywhere from 1 to 3 solutions can be used simultaneously. In the case that 0 scripts are supplied, the source boot disk is effectively copied to your project without customization. This can be useful in scenarios where increased control over the image maintenance lifecycle is desired or when policies restrict the use of images to internal projects.

Order of execution

The startup script specified in metadata executes in parallel with the other supported methods. However, the remaining methods execute in a well-defined order relative to one another.

  1. All shell scripts will execute in the configured order
  2. After shell scripts complete, all Ansible playbooks will execute in the configured order

NOTE: if both startup_script and startup_script_file are specified, then startup_script_file takes precedence.

Recommended use

Because the metadata startup script executes in parallel with the other solutions, conflicts can arise, especially when package managers (yum or apt) lock their databases during package installation. Therefore, it is recommended to choose one of the following approaches:

  1. Specify either startup_script or startup_script_file and do not specify shell_scripts or ansible_playbooks.
  2. Specify any combination of shell_scripts and ansible_playbooks and do not specify startup_script or startup_script_file.

If any of the shell_scripts or ansible_playbooks fail by returning a code other than 0, Packer will determine that the build has failed and refuse to save the resulting disk.

NOTE: there is an existing issue that can cause failures of the startup_script or startup_script_file not to be detected as failures by Packer.

External access with SSH

The shell scripts and Ansible playbooks customization solutions both require SSH access to the VM from the Packer execution environment. SSH access can be enabled one of 2 ways:

  1. The VM is created without a public IP address and SSH tunnels are created using Identity-Aware Proxy (IAP).
    • Allow use_iap to take on its default value of true
  2. The VM is created with an IP address on the public internet and firewall rules allow SSH access from the Packer execution environment.
    • Set omit_external_ip = false (or omit_external_ip: false in a blueprint)
    • Add firewall rules that open SSH to the VM

The Packer template defaults to using to the 1st IAP-based solution because it is more secure (no exposure to public internet) and because the Toolkit VPC module automatically sets up all necessary firewall rules for SSH tunneling and outbound-only access to the internet through Cloud NAT.

In either SSH solution, customization scripts should be supplied as files in the shell_scripts and ansible_playbooks settings.

Environments without SSH access

Many network environments disallow SSH access to VMs. In these environments, the metadata-based startup scripts are appropriate because they execute entirely independently of the Packer execution environment.

In this scenario, a single scripts should be supplied in the form of a string to the startup_script input variable. This solution integrates well with Toolkit runners. Runners operate by using a single startup script whose behavior is extended by downloading and executing a customizable set of runners from Cloud Storage at startup.

NOTE: Packer will attempt to use SSH if either shell_scripts or ansible_playbooks are set to non-empty values. Leave them at their default, empty values to ensure access by SSH is disabled.

Supplying startup script as a string

The startup_script parameter accepts scripts formatted as strings. In Packer and Terraform, multi-line strings can be specified using heredoc syntax in an input Packer variables file (*.pkrvars.hcl) For example, the following snippet defines a multi-line bash script followed by an integer representing the size, in GiB, of the resulting image:

startup_script = <<-EOT
  #!/bin/bash
  yum install -y epel-release
  yum install -y jq
  EOT

disk_size = 100

In a blueprint, the equivalent syntax is:

...
    settings:
      startup_script: |
        #!/bin/bash
        yum install -y epel-release
        yum install -y jq
      disk_size: 100
...

Monitoring startup script execution

When using startup script customization, Packer will print very limited output to the console. For example:

==> example.googlecompute.toolkit_image: Waiting for any running startup script to finish...
==> example.googlecompute.toolkit_image: Startup script not finished yet. Waiting...
==> example.googlecompute.toolkit_image: Startup script not finished yet. Waiting...
==> example.googlecompute.toolkit_image: Startup script, if any, has finished running.

Using the default value for [var.scopes][#input_scopes], the output of startup script execution will be stored in Cloud Logging. It can be examined using the Cloud Logging Console or with a gcloud logging read command (substituting <<PROJECT_ID>> with your project ID):

$ gcloud logging --project <<PROJECT_ID>> read \
    'logName="projects/<<PROJECT_ID>>/logs/GCEMetadataScripts" AND jsonPayload.message=~"^startup-script: "' \
    --format="table[box](timestamp, resource.labels.instance_id, jsonPayload.message)" --freshness 2h

Note that this command will print all startup script entries within the project within the "freshness" window in reverse order. You may need to identify the instance ID of the Packer VM and filter further by that value using gcloud or grep. To print the entries in the order they would have appeared on your console, we recommend piping the output of this command to the standard Linux utility tac.

Example

The included blueprint demonstrates a solution that builds an image using:

  • The HPC VM Image as a base upon which to customize
  • A VPC network with firewall rules that allow IAP-based SSH tunnels
  • A Toolkit runner that installs a custom script

Please review the examples README for usage instructions.

License

Copyright 2022 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Requirements

No requirements.

Providers

No providers.

Modules

No modules.

Resources

No resources.

Inputs

Name Description Type Default Required
accelerator_count Number of accelerator cards to attach to the VM; not necessary for familes that always include GPUs (A2). number null no
accelerator_type Type of accelerator cards to attach to the VM; not necessary for familes that always include GPUs (A2). string null no
ansible_playbooks A list of Ansible playbook configurations that will be uploaded to customize the VM image
list(object({
playbook_file = string
galaxy_file = string
extra_arguments = list(string)
}))
[] no
deployment_name HPC Toolkit deployment name string n/a yes
disk_size Size of disk image in GB number null no
image_family The family name of the image to be built. Image name will also be derived from this value. Defaults to deployment_name string null no
labels Labels to apply to the short-lived VM map(string) null no
machine_type VM machine type on which to build new image string "n2-standard-4" no
network_project_id Project ID of Shared VPC network string null no
omit_external_ip Provision the image building VM without a public IP address bool true no
on_host_maintenance Describes maintenance behavior for the instance. If left blank this will default to MIGRATE except the use of GPUs requires it to be TERMINATE string null no
project_id Project in which to create VM and image string n/a yes
scopes Service account scopes to attach to the instance. See
https://cloud.google.com/compute/docs/access/service-accounts.
list(string)
[
"https://www.googleapis.com/auth/userinfo.email",
"https://www.googleapis.com/auth/compute",
"https://www.googleapis.com/auth/devstorage.full_control",
"https://www.googleapis.com/auth/logging.write"
]
no
service_account_email The service account email to use. If null or 'default', then the default Compute Engine service account will be used. string null no
shell_scripts A list of paths to local shell scripts which will be uploaded to customize the VM image list(string) [] no
source_image Source OS image to build from string null no
source_image_family Alternative to source_image. Specify image family to build from latest image in family string "hpc-centos-7" no
source_image_project_id A list of project IDs to search for the source image. Packer will search the
first project ID in the list first, and fall back to the next in the list,
until it finds the source image.
list(string) null no
ssh_username Username to use for SSH access to VM string "packer" no
startup_script Startup script (as raw string) used to build the custom VM image (overridden by var.startup_script_file if both are supplied) string null no
startup_script_file Path to local shell script that will be uploaded as a startup script to customize the VM image string null no
subnetwork_name Name of subnetwork in which to provision image building VM string n/a yes
tags Assign network tags to apply firewall rules to VM instance list(string) null no
use_iap Use IAP proxy when connecting by SSH bool true no
use_os_login Use OS Login when connecting by SSH bool false no
wrap_startup_script Wrap startup script with Packer-generated wrapper bool true no
zone Cloud zone in which to provision image building VM string n/a yes

Outputs

No outputs.