Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA scales only once (OCI - Instance Pools) #7518

Open
LaeraFelipe opened this issue Nov 21, 2024 · 2 comments
Open

CA scales only once (OCI - Instance Pools) #7518

LaeraFelipe opened this issue Nov 21, 2024 · 2 comments
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.

Comments

@LaeraFelipe
Copy link

Kubernetes version: 1.31.2
CA version: 1.31.1

My pods are pending:

image

But CA giving these logs:
image

There is no node called:
template-node-for-ocid1.instancepool.oc1.sa-saopaulo-1.aaaaaaaa4kcfnj536kix5nq7bwljcgtu7afesrmryoe77mfiyolk736jdqna-2342242805848131422-upcoming-0

image

@LaeraFelipe LaeraFelipe added the kind/bug Categorizes issue or PR as related to a bug. label Nov 21, 2024
@adrianmoisey
Copy link
Member

/area cluster-autoscaler

@LaeraFelipe
Copy link
Author

LaeraFelipe commented Nov 22, 2024

After installing the AC it worked normally. It scaled several nodes according to demand, but now it only scales one node.

This is the cluster autoscaler config map status before increasing the cluster workload:

Name:         cluster-autoscaler-status
Namespace:    kube-system
Labels:       <none>
Annotations:  cluster-autoscaler.kubernetes.io/last-updated: 2024-11-22 14:30:52.869056102 +0000 UTC

Data
====
status:
----
time: 2024-11-22 14:30:52.869056102 +0000 UTC
autoscalerStatus: Running
clusterWide:
  health:
    status: Healthy
    nodeCounts:
      registered:
        total: 2
        ready: 2
        notStarted: 0
      longUnregistered: 0
      unregistered: 0
    lastProbeTime: "2024-11-22T14:30:52.869056102Z"
    lastTransitionTime: "2024-11-22T01:36:00.574554404Z"
  scaleUp:
    status: NoActivity
    lastProbeTime: "2024-11-22T14:30:52.869056102Z"
    lastTransitionTime: "2024-11-22T01:38:27.014129855Z"
  scaleDown:
    status: NoCandidates
    lastProbeTime: "2024-11-22T14:30:52.869056102Z"
    lastTransitionTime: "2024-11-22T01:52:05.35997395Z"
nodeGroups:
- name: ocid1.instancepool.oc1.sa-saopaulo-1.aaaaaaaamyyjuqltv3mso7g5opnsu4z5avey7sdiwjzpmy6oztunzbhoc4na
  health:
    status: Healthy
    cloudProviderTarget: 0
    minSize: 0
    maxSize: 10
  scaleUp:
    status: NoActivity
    lastTransitionTime: "2024-11-22T01:38:27.014129855Z"
  scaleDown:
    status: NoCandidates
    lastProbeTime: "2024-11-22T14:30:52.869056102Z"
    lastTransitionTime: "2024-11-22T01:52:05.35997395Z"

After increasing the workload, CA scale one instance and give me these logs:

image

And the status config map says:

   Name:         cluster-autoscaler-status
Namespace:    kube-system
Labels:       <none>
Annotations:  cluster-autoscaler.kubernetes.io/last-updated: 2024-11-22 14:39:26.864104577 +0000 UTC

Data
====
status:
----
time: 2024-11-22 14:39:26.864104577 +0000 UTC
autoscalerStatus: Running
clusterWide:
 health:
   status: Healthy
   nodeCounts:
     registered:
       total: 3
       ready: 2
       notStarted: 0
       beingDeleted: 1
     longUnregistered: 0
     unregistered: 0
   lastProbeTime: "2024-11-22T14:39:26.864104577Z"
   lastTransitionTime: "2024-11-22T01:36:00.574554404Z"
 scaleUp:
   status: NoActivity
   lastProbeTime: "2024-11-22T14:39:26.864104577Z"
   lastTransitionTime: "2024-11-22T14:36:59.819105176Z"
 scaleDown:
   status: NoCandidates
   lastProbeTime: "2024-11-22T14:39:26.864104577Z"
   lastTransitionTime: "2024-11-22T01:52:05.35997395Z"
nodeGroups:
- name: ocid1.instancepool.oc1.sa-saopaulo-1.aaaaaaaamyyjuqltv3mso7g5opnsu4z5avey7sdiwjzpmy6oztunzbhoc4na
 health:
   status: Healthy
   nodeCounts:
     registered:
       total: 1
       ready: 0
       notStarted: 0
       beingDeleted: 1
     longUnregistered: 0
     unregistered: 0
   cloudProviderTarget: 1
   minSize: 0
   maxSize: 10
   lastProbeTime: "2024-11-22T14:39:26.864104577Z"
 scaleUp:
   status: NoActivity
   lastProbeTime: "2024-11-22T14:39:26.864104577Z"
   lastTransitionTime: "2024-11-22T14:36:59.819105176Z"
 scaleDown:
   status: NoCandidates
   lastProbeTime: "2024-11-22T14:39:26.864104577Z"
   lastTransitionTime: "2024-11-22T01:52:05.35997395Z"

The nodes:

image

After the first scaled node CA dosent scale anymore. As you can see there are a lot of pending pods:

image

An these are the CA logs:

image

The auto scaled node description:

Name:               inst-lpfp2-laerus-cloud-k8s-worker-default-pool
Roles:              worker
Labels:             beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/instance-type=VM.Standard.A1.Flex
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/zone=SA-SAOPAULO-1-AD-1
                    kubernetes.io/arch=arm64
                    kubernetes.io/hostname=inst-lpfp2-laerus-cloud-k8s-worker-default-pool
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=worker
                    node.kubernetes.io/instance-type=VM.Standard.A1.Flex
                    topology.kubernetes.io/zone=SA-SAOPAULO-1-AD-1
Annotations:        csi.volume.kubernetes.io/nodeid: {"csi.tigera.io":"inst-lpfp2-laerus-cloud-k8s-worker-default-pool"}
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    oci.oraclecloud.com/compartment-id: ocid1.compartment.oc1..aaaaaaaanor7eo3fz2m3jxwwnq2ipzek57c3wlui4me72c7wdwbtlbju5idq
                    oci.oraclecloud.com/instance-id: ocid1.instance.oc1.sa-saopaulo-1.antxeljrtbrujwyc5cy3apjzec5wh6pqazfly572fewc6hxhhcaikh6fuj5a
                    oci.oraclecloud.com/instancepool-id: ocid1.instancepool.oc1.sa-saopaulo-1.aaaaaaaamyyjuqltv3mso7g5opnsu4z5avey7sdiwjzpmy6oztunzbhoc4na
                    projectcalico.org/IPv4Address: 10.0.1.238/24
                    projectcalico.org/IPv4VXLANTunnelAddr: 192.168.245.64
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 22 Nov 2024 14:36:03 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  inst-lpfp2-laerus-cloud-k8s-worker-default-pool
  AcquireTime:     <unset>
  RenewTime:       Fri, 22 Nov 2024 14:48:07 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Fri, 22 Nov 2024 14:37:04 +0000   Fri, 22 Nov 2024 14:37:04 +0000   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Fri, 22 Nov 2024 14:43:11 +0000   Fri, 22 Nov 2024 14:36:03 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Fri, 22 Nov 2024 14:43:11 +0000   Fri, 22 Nov 2024 14:36:03 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Fri, 22 Nov 2024 14:43:11 +0000   Fri, 22 Nov 2024 14:36:03 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Fri, 22 Nov 2024 14:43:11 +0000   Fri, 22 Nov 2024 14:36:47 +0000   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  10.0.1.238
  Hostname:    inst-lpfp2-laerus-cloud-k8s-worker-default-pool
Capacity:
  cpu:                2
  ephemeral-storage:  46212176Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             8111088Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  42589141332
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             8008688Ki
  pods:               110
System Info:
  Machine ID:                 e605b8db9b3e4cb6beac8dfd01f8aae0
  System UUID:                e605b8db-9b3e-4cb6-beac-8dfd01f8aae0
  Boot ID:                    06afcad5-60e5-4c9e-84c7-b710fbaf7be2
  Kernel Version:             6.8.0-1013-oracle
  OS Image:                   Ubuntu 24.04.1 LTS
  Operating System:           linux
  Architecture:               arm64
  Container Runtime Version:  containerd://1.7.23
  Kubelet Version:            v1.31.3
  Kube-Proxy Version:         v1.31.3
PodCIDR:                      192.168.8.0/24
PodCIDRs:                     192.168.8.0/24
ProviderID:                   ocid1.instance.oc1.sa-saopaulo-1.antxeljrtbrujwyc5cy3apjzec5wh6pqazfly572fewc6hxhhcaikh6fuj5a
Non-terminated Pods:          (8 in total)
  Namespace                   Name                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                               ------------  ----------  ---------------  -------------  ---
  calico-system               calico-node-bldsl                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         12m
  calico-system               calico-typha-5cb64fcdd6-drd24      0 (0%)        0 (0%)      0 (0%)           0 (0%)         11m
  calico-system               csi-node-driver-thzhj              0 (0%)        0 (0%)      0 (0%)           0 (0%)         12m
  integrinha                  integrinha-api-5854766f74-nshbt    500m (25%)    500m (25%)  500Mi (6%)       500Mi (6%)     14m
  integrinha                  integrinha-api-5854766f74-pq2jq    500m (25%)    500m (25%)  500Mi (6%)       500Mi (6%)     14m
  integrinha                  integrinha-api-5854766f74-px5nq    500m (25%)    500m (25%)  500Mi (6%)       500Mi (6%)     12m
  integrinha                  integrinha-api-5854766f74-xvgbd    500m (25%)    500m (25%)  500Mi (6%)       500Mi (6%)     10m
  kube-system                 kube-proxy-bbxsf                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         12m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                2 (100%)      2 (100%)
  memory             2000Mi (25%)  2000Mi (25%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
  hugepages-32Mi     0 (0%)        0 (0%)
  hugepages-64Ki     0 (0%)        0 (0%)
Events:
  Type     Reason                   Age                From             Message
  ----     ------                   ----               ----             -------
  Normal   Starting                 11m                kube-proxy
  Warning  InvalidDiskCapacity      12m                kubelet          invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced  12m                kubelet          Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory  12m (x2 over 12m)  kubelet          Node inst-lpfp2-laerus-cloud-k8s-worker-default-pool status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    12m (x2 over 12m)  kubelet          Node inst-lpfp2-laerus-cloud-k8s-worker-default-pool status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     12m (x2 over 12m)  kubelet          Node inst-lpfp2-laerus-cloud-k8s-worker-default-pool status is now: NodeHasSufficientPID
  Normal   RegisteredNode           12m                node-controller  Node inst-lpfp2-laerus-cloud-k8s-worker-default-pool event: Registered Node inst-lpfp2-laerus-cloud-k8s-worker-default-pool in Controller
  Normal   NodeReady                11m                kubelet          Node inst-lpfp2-laerus-cloud-k8s-worker-default-pool status is now: NodeReady

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants