Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Regression in Env Injector breaks postgres deployments managed by CloudNativePg #605

Open
johanra opened this issue Sep 7, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@johanra
Copy link

johanra commented Sep 7, 2023

Components and versions
[X] Env-Injector (webhook), version: 1.5.0
[X] Helm Release (2.5.0)

Describe the bug
The latest release (1.5.0) of the env injector causes a problem when running postgres clusters with the Cloudnative Postgres Operator. These postgres clusters work fine when using the previous release (1.4.0) which was installed with Helm Release (2.4.2)

To Reproduce

  1. Install the CloudNative Postgres Operator using helm version 0.18.2 with the default values.
  2. Setup a namespace "postgres-test"
apiVersion: v1
kind: Namespace
metadata:
  name: postgres-test
  labels:
    azure-key-vault-env-injection: enabled
  1. then setup a postgres cluster in the postgres-test namespace
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgres
spec:
  imageName: ghcr.io/cloudnative-pg/postgresql:13.12-6
  instances: 1
  storage:
    size: 1Gi
  1. The CloudNative Operator will create a pod (postgres-1-initdb-......) which does not start with the following error message
    Error: container has runAsNonRoot and image has non-numeric user (nonroot), cannot verify user is non-root (pod: "postgres-1-initdb-5pq6x_bynubian-dev-02(edf17686-61ad-4165-b280-2a19c9400eda)", container: bootstrap-controller)

Expected behavior
The pods started by the CloudNative Operator should start.

Additional context
Reverting the akv2k8s helm chart back to 2.4.2 which also reverts the env injector to 1.4.0 fixes the issue and postgres deployments are possible again

@mischavandenburg
Copy link

mischavandenburg commented Oct 13, 2023

We were having exactly the same problem and it took a lot of time to figure out why the security context of the pods was empty even though it was configured in the deployment and replicaset 😅 Downgrading fixed the issue for now, but I hope it can be addressed in a future release.

@dougalII
Copy link

Hey guys,

Think i found the issue:
1e464ad

podSpec.SecurityContext = &corev1.PodSecurityContext{
 RunAsNonRoot: &[]bool{viper.GetBool("webhook_pod_spec_security_context_non_root")}[0],
 }

Iv updated that to be:

if viper.GetBool("webhook_pod_spec_security_context_non_root") {
		podSpec.SecurityContext.RunAsNonRoot = &[]bool{viper.GetBool("webhook_pod_spec_security_context_non_root")}[0]
	}

This respects the original pod securityContext unless we force webhook_pod_spec_security_context_non_root to true where we edit that value only.

I've posted a message on the dev slack to try see if i can get a dev environment going for this, but if anyone is interested:

diff --git a/cmd/azure-keyvault-secrets-webhook/pod.go b/cmd/azure-keyvault-secrets-webhook/pod.go
index f94ef5b..cf313b6 100644
--- a/cmd/azure-keyvault-secrets-webhook/pod.go
+++ b/cmd/azure-keyvault-secrets-webhook/pod.go
@@ -278,8 +278,8 @@ func (p podWebHook) mutatePodSpec(ctx context.Context, pod *corev1.Pod) error {
        var authServiceSecret *corev1.Secret
        var err error
        podSpec := &pod.Spec
-       podSpec.SecurityContext = &corev1.PodSecurityContext{
-               RunAsNonRoot: &[]bool{viper.GetBool("webhook_pod_spec_security_context_non_root")}[0],
+       if viper.GetBool("webhook_pod_spec_security_context_non_root") {
+               podSpec.SecurityContext.RunAsNonRoot = &[]bool{viper.GetBool("webhook_pod_spec_security_context_non_root")}[0]
        }
 
        if p.useAuthService {

to make a new image:

make build images

@danfinn
Copy link

danfinn commented Aug 1, 2024

This is broken for us as well and keeping us from being able to move to workload identity. Could we get some attention on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants