Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow VPA admission controller to reload the caBundle certificate and patch the webhook #7454

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

maxcao13
Copy link

@maxcao13 maxcao13 commented Nov 2, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

There is currently no functionality to reload the caBundle in the vpa webhook if it was changed. Currently only the server cert and keys are reloaded.

Additionallity, this PR changes the way the files are watched. Previously, the watcher was watching the entire directory (i.e. /etc/tls-certs which resulted in three fsnotify.Create events on 1 change because of the way Kubernetes updates mounted Secret/ConfigMaps. Here's a blog that explains it more thoroughly, but the mounted files are actually symbolic links that are linked to another symbolic link.

This PR changes the implementation to watch the files directly, which only results in fsnotify.Remove events. This was changed so that we can figure out the names of the watched files being changed. From the blog:

If there’s an update to the Secret/ConfigMap, kubelet will create a new timestamped directory, write files to it, update ..data symlink to the new timestamped directory (remember, it’s something you can do atomically, and finally “delete” the old timestamped directory. It’s how the files from a Secret/ConfigMap volume are always complete and consistent with one another.

This behavior is also why you only get a IN_DELETE_SELF (file deleted) event when the file on a Kubernetes Secret/ConfigMap volume you’re watching is updated.

Which issue(s) this PR fixes:

Related to: #6665

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Probably needs an update to the reload-cert flag e.g.

Added `reload-cert` flag to the VPA admission-controller, which will reload the certificates and certificate authority certs it uses, when they change.

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 2, 2024
Copy link

linux-foundation-easycla bot commented Nov 2, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Nov 2, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @maxcao13!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: maxcao13
Once this PR has been reviewed and has the lgtm label, please assign jbartosik for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Hi @maxcao13. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/vertical-pod-autoscaler labels Nov 2, 2024
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 2, 2024
@adrianmoisey
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 2, 2024
@maxcao13 maxcao13 force-pushed the reload-ca branch 2 times, most recently from cb06d72 to 5681a01 Compare November 4, 2024 17:56
@@ -92,6 +119,36 @@ func (cr *certReloader) load() error {
return nil
}

func (cr *certReloader) reloadWebhookCA() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try add a test for this?

Copy link
Author

@maxcao13 maxcao13 Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote some in 4a09aa3.

EDIT: Hmm... some failed because of race-like conditions, but yes I am writing races in an effort to poll the reloadCA function call. Is there any way to get around this without polling it? I guess mutex locking the variables may work.

EDIT 2: Fixed in 8c9d3fa

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 4, 2024
Comment on lines 143 to 145
_, err2 := client.Patch(context.TODO(), webhookConfigName, types.StrategicMergePatchType, patch, metav1.PatchOptions{})
if err2 != nil {
return err2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use err2?

Suggested change
_, err2 := client.Patch(context.TODO(), webhookConfigName, types.StrategicMergePatchType, patch, metav1.PatchOptions{})
if err2 != nil {
return err2
_, err := client.Patch(context.TODO(), webhookConfigName, types.StrategicMergePatchType, patch, metav1.PatchOptions{})
if err != nil {
return err

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest. Thanks!

…anged.

Also changes the way the certReloader watches the certs/keys by watchign the files individually instead of the mounted directory.

Signed-off-by: Max Cao <macao@redhat.com>
Signed-off-by: Max Cao <macao@redhat.com>
@adrianmoisey
Copy link
Member

This seems good to me. Just two changes:

  1. The description of --reload-cert needs an update here
  2. Can you update the e2e test (or add a new one) to handle this case? Some changed have just landed in master to help making running e2e locally easier. I'm happy to help out with this if needed since I've recently worked on this area of the VPA.

@adrianmoisey
Copy link
Member

Oh, and the release note needs changing:

Added reload-cert flag to the VPA admission-controller, which will reload the certificates and certificate authority certs it uses, when they change.

This PR extends the reload-cert flag to include the CA cert, it doesn't add it.

Comment on lines +81 to +98
if event.Has(fsnotify.Remove) || event.Has(fsnotify.Create) || event.Has(fsnotify.Write) {
if event.Name == cr.tlsCertPath || event.Name == cr.tlsKeyPath {
klog.V(2).InfoS("New certificate found, reloading")
if err := cr.load(); err != nil {
klog.ErrorS(err, "Failed to reload certificate")
}
} else if event.Name == cr.clientCaPath {
if err := cr.reloadWebhookCA(); err != nil {
klog.ErrorS(err, "Failed to reload client CA")
}
} else {
continue
}
// watches get removed along with the symlinks, so we need to add them back
if event.Has(fsnotify.Remove) {
if err = watcher.Add(event.Name); err != nil {
klog.ErrorS(err, "Failed to add watcher for file", "filename", event.Name)
}
Copy link
Member

@omerap12 omerap12 Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace all those if statements with a switch? The current code is hard to follow, and using a switch could improve readability. For example:

if !event.Has(fsnotify.Remove) && !event.Has(fsnotify.Create) && !event.Has(fsnotify.Write) {
    return
}
switch event.Name {
case cr.tlsCertPath, cr.tlsKeyPath:
    klog.V(2).InfoS("New certificate found, reloading")
    if err := cr.load(); err != nil {
        klog.ErrorS(err, "Failed to reload certificate")
    }
case cr.clientCaPath:
    if err := cr.reloadWebhookCA(); err != nil {
        klog.ErrorS(err, "Failed to reload client CA")
    }
default:
    return
}

if event.Has(fsnotify.Remove) {
    if err := watcher.Add(event.Name); err != nil {
        klog.ErrorS(err, "Failed to add watcher for file", "filename", event.Name)
    }
}

if webhook == nil {
return fmt.Errorf("webhook not found")
}
if len(webhook.Webhooks) > 0 {
Copy link
Member

@omerap12 omerap12 Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we rearrange the statements?

if len(webhook.Webhooks) == 0 {
   return fmt.Errorf("webhook configuration has no webhooks")
}
currentBundle := webhook.Webhooks[0].ClientConfig.CABundle[:]
...

Copy link
Member

@omerap12 omerap12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to say, this is beyond my knowledge base, so my review is mostly just small notes about the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants