Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s-keystone-auth] kubeadm init phase fails when webhook not ready #2575

Open
heytrav opened this issue Apr 17, 2024 · 8 comments
Open

[k8s-keystone-auth] kubeadm init phase fails when webhook not ready #2575

heytrav opened this issue Apr 17, 2024 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@heytrav
Copy link

heytrav commented Apr 17, 2024

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:

kubeadm is unable to bootstrap the admin user when --authorization-mode has Webhook at cluster init.

For background, we're running a ClusterAPI-based platform for deplying clusters and have been using the following configuration to set up the kube-apiserver to use the k8s-keystone-auth webhook.

  clusterConfiguration:
    apiServer:
      extraArgs:
        cloud-provider: external
        authorization-mode: Node,Webhook,RBAC
        authentication-token-webhook-config-file: /etc/kubernetes/webhooks/keystone_webhook_config.yaml
        authorization-webhook-config-file: /etc/kubernetes/webhooks/keystone_webhook_config.yaml

This is applied when kubeadm init runs and is bootstrapping the control plane and has the net effect of creating the /etc/kubernetes/manifests/kube-apiserver.yaml with the necessary arguments for the auth webhook.

Once the control plane has been bootstrapped we are using a Helm chart to deploy k8s-keystone-auth similar to the example code. Up until Kubernetes v1.28.9 this has worked fine even though the webhook initially can't reach the k8s-keystone-auth Pod.

In v1.29 however kubeadm init fails, presumably because the webhook is not responding and also because the admin user is no longer in system:masters and can not authenticate (See kubernetes/kubernetes#121305).

Apr 10 02:00:38 gitlab-1-29-3-42132-nv6dtqdoo7wv-control-plane-63f90477-qjv27 kubeadm.sh[1666]: error execution phase upload-config/kubeadm: could not bootstrap the admin user in file admin.conf: unable to create ClusterRoleBinding: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

What you expected to happen:

How to reproduce it:

Create a v1.29 cluster with the kube-apiserver arguments when kubeadm init first initialises control plane:

     -  --authorization-mode: Node,Webhook,RBAC
     -  --authentication-token-webhook-config-file: /etc/kubernetes/webhooks/keystone_webhook_config.yaml
     -  --authorization-webhook-config-file: /etc/kubernetes/webhooks/keystone_webhook_config.yaml

Anything else we need to know?:

I have a work around in our ClusterAPI setup in which I do not add the --authorization-* arguments until after kubeadm init run. Once kubeadm init has finished I run kustomize to add the arguments to the kube-apiserver.yaml manifest.

It seems to work, but we're concerned about a potential race condition and also wondering if there is a cleaner approach.

Your docs suggest using static pods, however I am not sure how to set up the needed ServiceAccount, ClusterRoleBindings, etc. It seems like the pod would also just not run until that is in place. So far my attempts to use a static pod at cluster init have also failed.

Environment:

  • Kubernetes v1.29.3
  • k8s-keystone-auth: v1.29.0
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 17, 2024
@heytrav heytrav changed the title [k8s-keystone-auth] kubeadm init phase fail when webhook not ready [k8s-keystone-auth] kubeadm init phase fails when webhook not ready Apr 17, 2024
@dulek
Copy link
Contributor

dulek commented Apr 24, 2024

Well, sounds like static pod is the way to go, but as we can read in docs:

Note: The spec of a static Pod cannot refer to other API objects (e.g., ServiceAccount, ConfigMap, Secret, etc).

I see that SA is only needed to fetch the ConfigMap from K8s API. I think that you can use --keystone-policy-file instead of --policy-configmap-name and just feed this file directly to the pod as hostPath? Same should be done with certs.

@heytrav
Copy link
Author

heytrav commented Apr 24, 2024

Well, sounds like static pod is the way to go, but as we can read in docs:

Note: The spec of a static Pod cannot refer to other API objects (e.g., ServiceAccount, ConfigMap, Secret, etc).

I see that SA is only needed to fetch the ConfigMap from K8s API. I think that you can use --keystone-policy-file instead of --policy-configmap-name and just feed this file directly to the pod as hostPath? Same should be done with certs.

@dulek Ok that sounds promising. I'll give that a try. Thank you!

@heytrav
Copy link
Author

heytrav commented May 1, 2024

I've encountered the same issue running the k8s-keystone-auth application as a static pod.

The static pod starts, but kubeadm init still fails with the same error message.

kubeadm.sh[1666]: error execution phase upload-config/kubeadm: could not bootstrap the admin user in file admin.conf: unable to create ClusterRoleBinding: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

The pod logs are full of:

Failed to make webhook authorizer request: Post "https://127.0.0.1:8443/webhook?timeout=30s": dial tcp 127.0.0.1:8443: connect: connection refused

It appears that even though the pod runs, it is not actually listening for connections on the host.

Without a serviceAccount in the static pod, k8s-keystone-auth requires access to a kubeconfig in order to run. At init time the options are to

  • mount one of the existing kubconfig files
    • admin.conf
    • super-admin.conf
  • use kubeadm kubeconfig user --client-name=whatever > /etc/kubernetes/keystone-auth/kubeconfig to generate one

Using admin.conf doesn't work because kubeadm needs to use credentials for the admin user in order to bootstrap the admin user. Generating a custom kubeconfig doesn't work for the same reason. The only conf that works is super-admin.conf which is not ideal.

So far the only solution that has worked for me is to create the cluster without Webhook or the path to the webhook config.

Create the following kustomize definition:

  files:
    - path: /etc/kubernetes/keystone-kustomization/kustomization.yml
      permissions: "0644"
      owner: root:root
      content: |
        resources:
        - kube-apiserver.yaml
        patches:
        - patch: |-
            - op: add
              path: /spec/containers/0/command/-
              value: --authentication-token-webhook-config-file=/etc/kubernetes/webhooks/keystone_webhook_config.yaml
            - op: add
              path: /spec/containers/0/command/-
              value: --authorization-webhook-config-file=/etc/kubernetes/webhooks/keystone_webhook_config.yaml
            - op: add
              path: /spec/containers/0/command/-
              value: --authorization-mode=Webhook
          target:
            kind: Pod

Add the following pre and postKubeadmCommands to modify kube-apiserver.yaml after kubeadm init has completed:

  preKubeadmCommands:
    - mkdir /etc/kubernetes/keystone-kustomization
    - 
  postKubeadmCommands:
    - cp /etc/kubernetes/manifests/kube-apiserver.yaml /etc/kubernetes/keystone-kustomization/kube-apiserver.yaml
    - kubectl kustomize /etc/kubernetes/keystone-kustomization -o /etc/kubernetes/manifests/kube-apiserver.yaml

With this setup kubeadm init completes bootstrap of the admin user. After kubeadm init kustomize then modifies the kube-apiserver to use the webhook.

@mnaser
Copy link
Contributor

mnaser commented Jul 3, 2024

I think the issue with this is that when/if the cluster is upgraded, the configuration will get wiped.. no?

@heytrav
Copy link
Author

heytrav commented Jul 3, 2024

The pre/postKubeadmCommands run for an upgrade the same as they do for cluster initialization so kustomize modifies the kube-apiserver.yaml manifest to add the auth webhook.

@mnaser
Copy link
Contributor

mnaser commented Jul 13, 2024

The pre/postKubeadmCommands run for an upgrade the same as they do for cluster initialization so kustomize modifies the kube-apiserver.yaml manifest to add the auth webhook.

Perfect, alright, so that sounds like a not ideal but functional solution.

okozachenko1203 added a commit to 0x00ace/magnum-cluster-api that referenced this issue Aug 2, 2024
it does not add the --authorization-* arguments until after kubeadm init run.
Once kubeadm init has finished, run kustomize to add the arguments to the kube-apiserver.yaml manifest.

ref: kubernetes/cloud-provider-openstack#2575
okozachenko1203 added a commit to 0x00ace/magnum-cluster-api that referenced this issue Aug 2, 2024
it does not add the --authorization-* arguments until after kubeadm init run.
Once kubeadm init has finished, run kustomize to add the arguments to the kube-apiserver.yaml manifest.

ref: kubernetes/cloud-provider-openstack#2575
mnaser added a commit to vexxhost/magnum-cluster-api that referenced this issue Aug 6, 2024
* update patch versions and add zuul CI jobs for new versions

* Use cloud images as base

Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>

* Update versions to build cleanly

* use kustomize to enable keystone webhook after kubeadm init

it does not add the --authorization-* arguments until after kubeadm init run.
Once kubeadm init has finished, run kustomize to add the arguments to the kube-apiserver.yaml manifest.

ref: kubernetes/cloud-provider-openstack#2575

* fix lint error and add 1.29 and 1.30 jobs

* append webhook authz mode only to avoid duplication with defaults

api-server sets Node and RBAC as default authz modes in its command args.
And does not allow the mode specified more than once.

* fix typo

* fix lint error

* make a workaround for cilium conformance test failures

cilium/cilium#29913
kubernetes/kubernetes#120069
cilium/cilium#9207

* fix flake8 errors

---------

Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>
Co-authored-by: okozachenko1203 <okozachenko@vexxhost.com>
Co-authored-by: Mohammed Naser <mnaser@vexxhost.com>
Co-authored-by: Oleksandr K. <okozachenko1203@gmail.com>
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 11, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants