Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added webhook-annotations flag to VPA admission-controller #7472

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wcarlsen
Copy link
Contributor

@wcarlsen wcarlsen commented Nov 7, 2024

What type of PR is this?

/kind bug
/kind feature

What this PR does / why we need it:

The VPA admission controller when deployed is bundled with a certificate. The responsibility of certificate renewal can be automatically handled with cert-manager, upon renewal the admission controller reloads the certificate, but the caBundle on the mutatingwebhook managed by the admission controller isn't updated.

Cert-manager comes with a ca-injector that exactly handles updating the CA certificates for mutating webhooks via an annotation cert-manager.io/inject-ca-from: kube-system/vpa-cert-by-certmanager see ca-injector docs. Thanks to @abstrask for coming up with the idea.

Which issue(s) this PR fixes:

Related to PR #7454 and issue #6665

Special notes for your reviewer:

I still think the PR #7454 is in the end the right approach, but this at least offer a low hanging alternative solution.

Does this PR introduce a user-facing change?

Added webhook-annotations flag to VPA admission-controller. This is specially useful for having cert-managers ca-injector automatically updating  the CA certificates on the mutating webhook.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Signed-off-by: Willi Carlsen <carlsenwilli@gmail.com>
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. labels Nov 7, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wcarlsen
Once this PR has been reviewed and has the lgtm label, please assign voelzmo for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 7, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @wcarlsen. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 7, 2024
@adrianmoisey
Copy link
Member

/ok-to-test

Forgive me for not understanding, but if cert-manager is handling the CA bundle in the mutating webhook, doesn't the VPA also need to reload when the CA bundle changes?

Or is this handling the case when the CA stays static, and only the certificates themselves are rotating?

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 7, 2024
@wcarlsen
Copy link
Contributor Author

wcarlsen commented Nov 7, 2024

/ok-to-test

Forgive me for not understanding, but if cert-manager is handling the CA bundle in the mutating webhook, doesn't the VPA also need to reload when the CA bundle changes?

Or is this handling the case when the CA stays static, and only the certificates themselves are rotating?

It already has this capability as far as I understand

or am I missing something here?

@adrianmoisey
Copy link
Member

At the moment it only reloads the cert and server key. If the CA bundle changes, my understanding is that it won't reload.

If cert-manager is rotating the CA too (I don't know if it does), then this PR depends on #7454.

If cert-manager is not rotating the CA, and is only rotating the server cert and key, then this doesn't depend on #7454

But, I'm not too familiar with this flow, so I could be wrong.

@adrianmoisey
Copy link
Member

Ok, I had a quick look at the code.

The VPA uses clientCaFile in order to register the webhook.
It uses client-ca-file and tls-cert-file to server HTTPS traffic.

This PR will allow cert-manager to manage both CA and certificate/key. But cert-manager will be responsible for writing it into the webhook's configuration and VPA will be responsible for reloading the certificate/key for itself.

That makes me think that this change will conflict with #7454, since setting --reload-cert=true will mean that CA bundle and certificate/key will be handled by VPA.
I guess the VPA needs a reload-cert and a reload-ca option

@wcarlsen
Copy link
Contributor Author

wcarlsen commented Nov 7, 2024

Ok, I had a quick look at the code.

The VPA uses clientCaFile in order to register the webhook. It uses client-ca-file and tls-cert-file to server HTTPS traffic.

This PR will allow cert-manager to manage both CA and certificate/key. But cert-manager will be responsible for writing it into the webhook's configuration and VPA will be responsible for reloading the certificate/key for itself.

That makes me think that this change will conflict with #7454, since setting --reload-cert=true will mean that CA bundle and certificate/key will be handled by VPA. I guess the VPA needs a reload-cert and a reload-ca option

At least now as soon as a certificate is renewed, by e.g. cert-manager, you get TLS handshake error and all pods with resources managed by a VPA resource if restarted will end up restarting endlessly. And according to my testing setting that annotation fixes that the admission controller do not really fully manage the lifecycle of the webhook.

My take is that #7454 is the long term goal and the right solution in the end, but annotating the webhook offer an easier reviewable solution right now.

But sure if one wanted split responsibility of updating the webhook it could make sense to introduce reload-ca option too.

@adrianmoisey
Copy link
Member

It seems like #7454 will solve this issue, since when cert-manager rotates the cert and ca, VPA will update the webhook and it will reload the certs.

Any reason to not wait for that change?

@wcarlsen
Copy link
Contributor Author

wcarlsen commented Nov 7, 2024

It seems like #7454 will solve this issue, since when cert-manager rotates the cert and ca, VPA will update the webhook and it will reload the certs.

Any reason to not wait for that change?

You are right and I'm totally cool with that. It is just hard for me to predict where we are at in terms of progress on that PR and a release. The experience is not great once you finally hit the edge case and it is kind of hard to debug. But it can be mitigated by manually adding the ca-injector annotation, since it won't be reconciled by the admission controller and then wait for an upstream fix.

You are welcome to close the PR unless we can up with another argument on why it would be nice with an annotation flag.

@adrianmoisey
Copy link
Member

You are right and I'm totally cool with that. It is just hard for me to predict where we are at in terms of progress on that PR and a release.

I'm unsure about a release, but I don't think this PR will shortcut the other one and be released first.
That other PR is next on my list to try get merged.

You are welcome to close the PR unless we can up with another argument on why it would be nice with an annotation flag.

I agree that an annotation may be useful, I just don't see this use case as being one of them

@wcarlsen
Copy link
Contributor Author

wcarlsen commented Nov 7, 2024

You are right and I'm totally cool with that. It is just hard for me to predict where we are at in terms of progress on that PR and a release.

I'm unsure about a release, but I don't think this PR will shortcut the other one and be released first. That other PR is next on my list to try get merged.

That is awesome news.

You are welcome to close the PR unless we can up with another argument on why it would be nice with an annotation flag.

I agree that an annotation may be useful, I just don't see this use case as being one of them

I will let that being up to you to decide and I'm totally fine with updating the release notes section too. No pressure and thanks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants