Skip to content

This repository contains the code and configuration files for setting up Karpenter on an existing Amazon EKS cluster, as described in the blog post

Notifications You must be signed in to change notification settings

aniketwdubey/Karpenter-on-Existing-EKS-Cluster

Repository files navigation

Karpenter Setup on Existing EKS Cluster

This repository contains the code and configuration files for setting up Karpenter on an existing Amazon EKS cluster, as described in the blog post Karpenter Setup on Existing EKS cluster: Optimizing Kubernetes Node Management.

Table of Contents

  1. Introduction
  2. Brief Overview of Karpenter
  3. Key Differences from Cluster Autoscaler
  4. Prerequisites
  5. Setup Steps
  6. Testing the Setup
  7. Repository Contents
  8. Usage
  9. Contributing
  10. License
  11. Acknowledgments
  12. References and Resources

Introduction

This guide walks you through the process of setting up Karpenter on an existing Amazon EKS cluster. Karpenter is an open-source node provisioning project that automatically launches just the right compute resources to handle your Kubernetes cluster's applications.

Brief Overview of Karpenter

Karpenter is designed to improve the efficiency and cost of running workloads on Kubernetes by:

  • Just-in-Time Provisioning: Rapidly launching nodes in response to pending pods, often in seconds.
  • Bin-Packing: Efficiently packing pods onto nodes to maximize resource utilization.
  • Diverse Instance Types: Provisioning a wide range of instance types, including spot instances, to optimize for cost and performance.
  • Custom Resource Provisioning: Allowing fine-grained control over node provisioning based on pod requirements.
  • Automatic Node Termination: Removing nodes that are no longer needed, helping to reduce costs.

Key Differences from Cluster Autoscaler

  • Karpenter doesn't rely on node groups, offering more flexibility in instance selection.
  • It can provision nodes faster and more efficiently, reducing scheduling latency.
  • Karpenter integrates directly with the AWS APIs, allowing for more efficient cloud resource management.
  • For more details, see this comparison.

Prerequisites

  • AWS CLI and kubectl installed and configured
  • Helm installed
  • Terraform - for IAM role and policy setup
  • Existing EKS cluster setup
  • Existing VPC and subnets
  • Existing security groups
  • Nodes in one or more node groups
  • OIDC provider for service accounts in your cluster

Setup Steps

  1. Set Environment Variables
  2. Create OpenID Connect (OIDC) provider
  3. Set Up IAM Roles and Policies
  4. Tag Subnets and Security Groups
  5. Update aws-auth ConfigMap
  6. Deploy Karpenter
  7. Create NodePool

1. Set Environment Variables

Define cluster-specific information and AWS account details that will be used throughout the migration process.

KARPENTER_NAMESPACE=kube-system
CLUSTER_NAME=<your cluster name>

AWS_PARTITION="aws"
AWS_REGION="$(aws configure list | grep region | tr -s " " | cut -d" " -f3)"
OIDC_ENDPOINT="$(aws eks describe-cluster --name "${CLUSTER_NAME}" \
    --query "cluster.identity.oidc.issuer" --output text)"
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' \
    --output text)
K8S_VERSION=1.28
ARM_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-arm64/recommended/image_id --query Parameter.Value --output text)"
AMD_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2/recommended/image_id --query Parameter.Value --output text)"
GPU_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-gpu/recommended/image_id --query Parameter.Value --output text)"

Make sure to replace <your cluster name> with your actual EKS cluster name.

2. Creating OpenID Connect (OIDC) provider

Check if you have an existing IAM OIDC provider for your cluster

Create an IAM OIDC provider for your cluster

3. Set Up IAM Roles and Policies

Set up the necessary permissions for Karpenter to interact with AWS services. This includes a role for Karpenter-provisioned nodes and a role for the Karpenter controller itself.

In this step, Terraform is used to create the necessary IAM roles and policies for Karpenter to function properly with your EKS cluster setup. Specifically, it's creating:

  • A Karpenter Node IAM Role: This role is assumed by the EC2 instances that Karpenter creates, allowing them to join the EKS cluster and access necessary AWS resources.

  • A Karpenter Controller IAM Role: An IAM role that the Karpenter controller will use to provision new instances. The controller will be using IAM Roles for Service Accounts (IRSA) which requires an OIDC endpoint.

The Terraform configuration uses the variables defined in the variables.tf file (such as aws_region, cluster_name, karpenter_namespace, and k8s_version) to customize the IAM resources for your specific EKS cluster setup.

By using Terraform for this step, you ensure that all the necessary IAM resources are created consistently and can be easily managed as infrastructure as code.

terraform init
terraform apply

From the terrrafrom script, you will create 2 IAM Roles:

IAM Roles

4. Tag Subnets and Security Groups

After applying the Terraform configuration, tag the subnets and security groups to enable Karpenter to discover and use the appropriate subnets and security groups when provisioning new nodes.

Use the following AWS CLI commands to tag the relevant resources:

for NODEGROUP in $(aws eks list-nodegroups --cluster-name "${CLUSTER_NAME}" --query 'nodegroups' --output text); do
    aws ec2 create-tags \
        --tags "Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}" \
        --resources $(aws eks describe-nodegroup --cluster-name "${CLUSTER_NAME}" \
        --nodegroup-name "${NODEGROUP}" --query 'nodegroup.subnets' --output text )
done

NODEGROUP=$(aws eks list-nodegroups --cluster-name "${CLUSTER_NAME}" \
    --query 'nodegroups[0]' --output text)

LAUNCH_TEMPLATE=$(aws eks describe-nodegroup --cluster-name "${CLUSTER_NAME}" \
    --nodegroup-name "${NODEGROUP}" --query 'nodegroup.launchTemplate.{id:id,version:version}' \
    --output text | tr -s "\t" ",")

# For EKS setups using only Cluster security group:
SECURITY_GROUPS=$(aws eks describe-cluster \
    --name "${CLUSTER_NAME}" --query "cluster.resourcesVpcConfig.clusterSecurityGroupId" --output text)

# For setups using security groups in the Launch template of a managed node group:
SECURITY_GROUPS="$(aws ec2 describe-launch-template-versions \
    --launch-template-id "${LAUNCH_TEMPLATE%,*}" --versions "${LAUNCH_TEMPLATE#*,}" \
    --query 'LaunchTemplateVersions[0].LaunchTemplateData.[NetworkInterfaces[0].Groups||SecurityGroupIds]' \
    --output text)"

aws ec2 create-tags \
    --tags "Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}" \
    --resources "${SECURITY_GROUPS}"

5. Update aws-auth ConfigMap

Update aws-auth ConfigMap to allow nodes created by Karpenter to join the EKS cluster by granting the necessary permissions to the Karpenter node IAM role.

kubectl edit configmap aws-auth -n kube-system

Add the following section to the mapRoles:

- groups:
  - system:bootstrappers
  - system:nodes
  rolearn: arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}
  username: system:node:{{EC2PrivateDNSName}}

Replace ${AWS_PARTITION}, ${AWS_ACCOUNT_ID}, and ${CLUSTER_NAME} with your actual values.

6. Deploy Karpenter

Install the Karpenter controller and its associated resources in the cluster, configuring it to work with your specific EKS setup.

export KARPENTER_VERSION="1.0.1"

helm template karpenter oci://public.ecr.aws/karpenter/karpenter --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" \
    --set "settings.clusterName=${CLUSTER_NAME}" \
    --set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER_NAME}" \
    --set controller.resources.requests.cpu=1 \
    --set controller.resources.requests.memory=1Gi \
    --set controller.resources.limits.cpu=1 \
    --set controller.resources.limits.memory=1Gi > karpenter.yaml

6.1 Set Node Affinity for Karpenter

Ensure Karpenter runs on existing node group nodes, maintaining high availability and preventing it from scheduling itself on nodes it manages.

Edit the karpenter.yaml file to set node affinity:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: karpenter.sh/nodepool
          operator: DoesNotExist
        - key: eks.amazonaws.com/nodegroup
          operator: In
          values:
          - ${NODEGROUP}
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - topologyKey: "kubernetes.io/hostname"

Replace ${NODEGROUP} with your actual node group name.

6.2: Apply Karpenter Configuration

kubectl create namespace "${KARPENTER_NAMESPACE}" || true
kubectl create -f \
    "https://raw.githubusercontent.com/aws/karpenter-provider-aws/v${KARPENTER_VERSION}/pkg/apis/crds/karpenter.sh_nodepools.yaml"
kubectl create -f \
    "https://raw.githubusercontent.com/aws/karpenter-provider-aws/v${KARPENTER_VERSION}/pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml"
kubectl create -f \
    "https://raw.githubusercontent.com/aws/karpenter-provider-aws/v${KARPENTER_VERSION}/pkg/apis/crds/karpenter.sh_nodeclaims.yaml"
kubectl apply -f karpenter.yaml

7. Create NodePool

Define the default configuration for nodes that Karpenter will provision, including instance types, capacity type, and other constraints.

Create a nodepool.yaml file with your desired configuration and apply it:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default #can change 
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"] 
        - key: kubernetes.io/os
          operator: In  
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"] #on-demand #prioritizes spot
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 1000
  # disruption:
  #   consolidationPolicy: WhenUnderutilized 
  #   expireAfter: 720h # 30 * 24h = 720h
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2 # Amazon Linux 2
  role: "KarpenterNodeRole-karpenter-cluster" # replace with your cluster name
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: karpenter-cluster # replace with your cluster name
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: karpenter-cluster # replace with your cluster name
  amiSelectorTerms:
    - id: ami-0d1d7c8c32a476XXX
    - id: ami-0f91781eb6d40dXXX
    - id: ami-058b426ded9842XXX # <- GPU Optimized AMD AMI 
#   - name: "amazon-eks-node-${K8S_VERSION}-*" # <- automatically upgrade when a new AL2 EKS Optimized AMI is released. This is unsafe for production workloads. Validate AMIs in lower environments before deploying them to production.
kubectl apply -f nodepool.yaml

Testing the Setup

Deploy a sample application to test Karpenter's autoscaling capabilities:

Create a file named inflate.yaml with the following content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 4
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      terminationGracePeriodSeconds: 0
      securityContext:
        runAsUser: 1000
        runAsGroup: 3000
        fsGroup: 2000
      containers:
      - name: inflate
        image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
        resources:
          requests:
            # cpu: 100m
            cpu: 1
        securityContext:
          allowPrivilegeEscalation: false

This deployment creates 4 replicas of a pod that requests 1 CPU core each. The pause container is a minimal container that does nothing but sleep, making it perfect for testing resource allocation without actually consuming resources.

▪︎ Initially we had 2 Nodes:

Initial Nodes

kubectl apply -f inflate.yaml

▪︎ After deploying inflate application:

If Karpenter is working correctly, you should see new nodes being added to your cluster to accommodate the inflate pods. This demonstrates Karpenter's ability to automatically scale your cluster based on resource demands.

After Deployment 1

After Deployment 2

After Deployment 3

After Deployment 4

▪︎ Delete the Deployment

kubectl delete -f inflate.yaml

And you will see the node which was created will be deleted.

Repository Contents

  • terraform/: Directory containing Terraform configurations
    • main.tf: Main Terraform configuration
    • variables.tf: Terraform variables
    • outputs.tf: Terraform outputs
    • versions.tf: Terraform version constraints
    • node_role.tf: IAM role for Karpenter nodes
    • controller_role.tf: IAM role for Karpenter controller
  • karpenter.yaml: Karpenter Helm chart values
  • nodepool.yaml: Karpenter NodePool and EC2NodeClass configuration
  • inflate.yaml: Sample application for testing Karpenter

Usage

  1. Clone this repository
  2. Set up your AWS credentials
  3. Modify the Terraform variables in terraform/variables.tf as needed
  4. Follow the steps in the blog post to deploy Karpenter

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

References and Resources

For more detailed instructions, please refer to the full blog post.

About

This repository contains the code and configuration files for setting up Karpenter on an existing Amazon EKS cluster, as described in the blog post

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages