CA scales only once (OCI - Instance Pools) #7518

LaeraFelipe · 2024-11-21T20:25:17Z

Kubernetes version: 1.31.2
CA version: 1.31.1

My pods are pending:

But CA giving these logs:

There is no node called:
template-node-for-ocid1.instancepool.oc1.sa-saopaulo-1.aaaaaaaa4kcfnj536kix5nq7bwljcgtu7afesrmryoe77mfiyolk736jdqna-2342242805848131422-upcoming-0

adrianmoisey · 2024-11-22T10:04:33Z

/area cluster-autoscaler

LaeraFelipe · 2024-11-22T14:51:01Z

After installing the AC it worked normally. It scaled several nodes according to demand, but now it only scales one node.

This is the cluster autoscaler config map status before increasing the cluster workload:

Name:         cluster-autoscaler-status
Namespace:    kube-system
Labels:       <none>
Annotations:  cluster-autoscaler.kubernetes.io/last-updated: 2024-11-22 14:30:52.869056102 +0000 UTC

Data
====
status:
----
time: 2024-11-22 14:30:52.869056102 +0000 UTC
autoscalerStatus: Running
clusterWide:
  health:
    status: Healthy
    nodeCounts:
      registered:
        total: 2
        ready: 2
        notStarted: 0
      longUnregistered: 0
      unregistered: 0
    lastProbeTime: "2024-11-22T14:30:52.869056102Z"
    lastTransitionTime: "2024-11-22T01:36:00.574554404Z"
  scaleUp:
    status: NoActivity
    lastProbeTime: "2024-11-22T14:30:52.869056102Z"
    lastTransitionTime: "2024-11-22T01:38:27.014129855Z"
  scaleDown:
    status: NoCandidates
    lastProbeTime: "2024-11-22T14:30:52.869056102Z"
    lastTransitionTime: "2024-11-22T01:52:05.35997395Z"
nodeGroups:
- name: ocid1.instancepool.oc1.sa-saopaulo-1.aaaaaaaamyyjuqltv3mso7g5opnsu4z5avey7sdiwjzpmy6oztunzbhoc4na
  health:
    status: Healthy
    cloudProviderTarget: 0
    minSize: 0
    maxSize: 10
  scaleUp:
    status: NoActivity
    lastTransitionTime: "2024-11-22T01:38:27.014129855Z"
  scaleDown:
    status: NoCandidates
    lastProbeTime: "2024-11-22T14:30:52.869056102Z"
    lastTransitionTime: "2024-11-22T01:52:05.35997395Z"

After increasing the workload, CA scale one instance and give me these logs:

And the status config map says:

   Name:         cluster-autoscaler-status
Namespace:    kube-system
Labels:       <none>
Annotations:  cluster-autoscaler.kubernetes.io/last-updated: 2024-11-22 14:39:26.864104577 +0000 UTC

Data
====
status:
----
time: 2024-11-22 14:39:26.864104577 +0000 UTC
autoscalerStatus: Running
clusterWide:
 health:
   status: Healthy
   nodeCounts:
     registered:
       total: 3
       ready: 2
       notStarted: 0
       beingDeleted: 1
     longUnregistered: 0
     unregistered: 0
   lastProbeTime: "2024-11-22T14:39:26.864104577Z"
   lastTransitionTime: "2024-11-22T01:36:00.574554404Z"
 scaleUp:
   status: NoActivity
   lastProbeTime: "2024-11-22T14:39:26.864104577Z"
   lastTransitionTime: "2024-11-22T14:36:59.819105176Z"
 scaleDown:
   status: NoCandidates
   lastProbeTime: "2024-11-22T14:39:26.864104577Z"
   lastTransitionTime: "2024-11-22T01:52:05.35997395Z"
nodeGroups:
- name: ocid1.instancepool.oc1.sa-saopaulo-1.aaaaaaaamyyjuqltv3mso7g5opnsu4z5avey7sdiwjzpmy6oztunzbhoc4na
 health:
   status: Healthy
   nodeCounts:
     registered:
       total: 1
       ready: 0
       notStarted: 0
       beingDeleted: 1
     longUnregistered: 0
     unregistered: 0
   cloudProviderTarget: 1
   minSize: 0
   maxSize: 10
   lastProbeTime: "2024-11-22T14:39:26.864104577Z"
 scaleUp:
   status: NoActivity
   lastProbeTime: "2024-11-22T14:39:26.864104577Z"
   lastTransitionTime: "2024-11-22T14:36:59.819105176Z"
 scaleDown:
   status: NoCandidates
   lastProbeTime: "2024-11-22T14:39:26.864104577Z"
   lastTransitionTime: "2024-11-22T01:52:05.35997395Z"

The nodes:

After the first scaled node CA dosent scale anymore. As you can see there are a lot of pending pods:

An these are the CA logs:

The auto scaled node description:

Name:               inst-lpfp2-laerus-cloud-k8s-worker-default-pool
Roles:              worker
Labels:             beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/instance-type=VM.Standard.A1.Flex
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/zone=SA-SAOPAULO-1-AD-1
                    kubernetes.io/arch=arm64
                    kubernetes.io/hostname=inst-lpfp2-laerus-cloud-k8s-worker-default-pool
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=worker
                    node.kubernetes.io/instance-type=VM.Standard.A1.Flex
                    topology.kubernetes.io/zone=SA-SAOPAULO-1-AD-1
Annotations:        csi.volume.kubernetes.io/nodeid: {"csi.tigera.io":"inst-lpfp2-laerus-cloud-k8s-worker-default-pool"}
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    oci.oraclecloud.com/compartment-id: ocid1.compartment.oc1..aaaaaaaanor7eo3fz2m3jxwwnq2ipzek57c3wlui4me72c7wdwbtlbju5idq
                    oci.oraclecloud.com/instance-id: ocid1.instance.oc1.sa-saopaulo-1.antxeljrtbrujwyc5cy3apjzec5wh6pqazfly572fewc6hxhhcaikh6fuj5a
                    oci.oraclecloud.com/instancepool-id: ocid1.instancepool.oc1.sa-saopaulo-1.aaaaaaaamyyjuqltv3mso7g5opnsu4z5avey7sdiwjzpmy6oztunzbhoc4na
                    projectcalico.org/IPv4Address: 10.0.1.238/24
                    projectcalico.org/IPv4VXLANTunnelAddr: 192.168.245.64
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 22 Nov 2024 14:36:03 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  inst-lpfp2-laerus-cloud-k8s-worker-default-pool
  AcquireTime:     <unset>
  RenewTime:       Fri, 22 Nov 2024 14:48:07 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Fri, 22 Nov 2024 14:37:04 +0000   Fri, 22 Nov 2024 14:37:04 +0000   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Fri, 22 Nov 2024 14:43:11 +0000   Fri, 22 Nov 2024 14:36:03 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Fri, 22 Nov 2024 14:43:11 +0000   Fri, 22 Nov 2024 14:36:03 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Fri, 22 Nov 2024 14:43:11 +0000   Fri, 22 Nov 2024 14:36:03 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Fri, 22 Nov 2024 14:43:11 +0000   Fri, 22 Nov 2024 14:36:47 +0000   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  10.0.1.238
  Hostname:    inst-lpfp2-laerus-cloud-k8s-worker-default-pool
Capacity:
  cpu:                2
  ephemeral-storage:  46212176Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             8111088Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  42589141332
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             8008688Ki
  pods:               110
System Info:
  Machine ID:                 e605b8db9b3e4cb6beac8dfd01f8aae0
  System UUID:                e605b8db-9b3e-4cb6-beac-8dfd01f8aae0
  Boot ID:                    06afcad5-60e5-4c9e-84c7-b710fbaf7be2
  Kernel Version:             6.8.0-1013-oracle
  OS Image:                   Ubuntu 24.04.1 LTS
  Operating System:           linux
  Architecture:               arm64
  Container Runtime Version:  containerd://1.7.23
  Kubelet Version:            v1.31.3
  Kube-Proxy Version:         v1.31.3
PodCIDR:                      192.168.8.0/24
PodCIDRs:                     192.168.8.0/24
ProviderID:                   ocid1.instance.oc1.sa-saopaulo-1.antxeljrtbrujwyc5cy3apjzec5wh6pqazfly572fewc6hxhhcaikh6fuj5a
Non-terminated Pods:          (8 in total)
  Namespace                   Name                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                               ------------  ----------  ---------------  -------------  ---
  calico-system               calico-node-bldsl                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         12m
  calico-system               calico-typha-5cb64fcdd6-drd24      0 (0%)        0 (0%)      0 (0%)           0 (0%)         11m
  calico-system               csi-node-driver-thzhj              0 (0%)        0 (0%)      0 (0%)           0 (0%)         12m
  integrinha                  integrinha-api-5854766f74-nshbt    500m (25%)    500m (25%)  500Mi (6%)       500Mi (6%)     14m
  integrinha                  integrinha-api-5854766f74-pq2jq    500m (25%)    500m (25%)  500Mi (6%)       500Mi (6%)     14m
  integrinha                  integrinha-api-5854766f74-px5nq    500m (25%)    500m (25%)  500Mi (6%)       500Mi (6%)     12m
  integrinha                  integrinha-api-5854766f74-xvgbd    500m (25%)    500m (25%)  500Mi (6%)       500Mi (6%)     10m
  kube-system                 kube-proxy-bbxsf                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         12m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                2 (100%)      2 (100%)
  memory             2000Mi (25%)  2000Mi (25%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
  hugepages-32Mi     0 (0%)        0 (0%)
  hugepages-64Ki     0 (0%)        0 (0%)
Events:
  Type     Reason                   Age                From             Message
  ----     ------                   ----               ----             -------
  Normal   Starting                 11m                kube-proxy
  Warning  InvalidDiskCapacity      12m                kubelet          invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced  12m                kubelet          Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory  12m (x2 over 12m)  kubelet          Node inst-lpfp2-laerus-cloud-k8s-worker-default-pool status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    12m (x2 over 12m)  kubelet          Node inst-lpfp2-laerus-cloud-k8s-worker-default-pool status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     12m (x2 over 12m)  kubelet          Node inst-lpfp2-laerus-cloud-k8s-worker-default-pool status is now: NodeHasSufficientPID
  Normal   RegisteredNode           12m                node-controller  Node inst-lpfp2-laerus-cloud-k8s-worker-default-pool event: Registered Node inst-lpfp2-laerus-cloud-k8s-worker-default-pool in Controller
  Normal   NodeReady                11m                kubelet          Node inst-lpfp2-laerus-cloud-k8s-worker-default-pool status is now: NodeReady

LaeraFelipe added the kind/bug Categorizes issue or PR as related to a bug. label Nov 21, 2024

k8s-ci-robot added the area/cluster-autoscaler label Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CA scales only once (OCI - Instance Pools) #7518

CA scales only once (OCI - Instance Pools) #7518

LaeraFelipe commented Nov 21, 2024

adrianmoisey commented Nov 22, 2024

LaeraFelipe commented Nov 22, 2024 •

edited

Loading

CA scales only once (OCI - Instance Pools) #7518

CA scales only once (OCI - Instance Pools) #7518

Comments

LaeraFelipe commented Nov 21, 2024

adrianmoisey commented Nov 22, 2024

LaeraFelipe commented Nov 22, 2024 • edited Loading

LaeraFelipe commented Nov 22, 2024 •

edited

Loading