You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A regression in version 1.6.0 of the Env-Injector prevents the use of imagePullSecrets as valid registry authentication when using authType="azureCloudConfig". More specifically, the logic present in the current 1.6.0 version (covered below in detail) assumes that both the Key Vault and container registry must be using MSIs as the authentication method. My use case involves utilizing a third-party container registry, and not an ACR. As a result, I cannot set up any form of MSI to our container registry, which is why we use imagePullSecrets. My understanding from reading through the documentation and code from 1.5.0 and below is that this use case should be supported.
After investigating, I've found that the changes introduced by #631 break the previous logic that allowed imagePullSecrets. The function getContainerRegistryRemoteOptions() in registry.go (a) uses authType to decide how the container registry authentication should be handled, and (b) assumes that an ACR will be used when authType="azureCloudConfig":
klog.InfoS("using imagePullSecrets for registry authentication", "config.authType", authType)
*authChain, err=k8schain.New(
ctx,
client,
k8schain.Options{
Namespace: container.Namespace,
ServiceAccountName: container.ServiceAccountName,
ImagePullSecrets: container.ImagePullSecrets,
},
)
iferr!=nil {
returnnil, err
}
}
For context, the previous approach used a getImageConfig() function whose only branch includes imagePullSecrets (for clarity, this same logic is preserved in the default case statement in 1.6.0):
returnnil, fmt.Errorf("cannot convert image descriptor to v1.Image: %w", err)
}
configFile, err:=image.ConfigFile()
iferr!=nil {
returnnil, fmt.Errorf("cannot extract config file of image: %w", err)
}
return&configFile.Config, nil
}
This seems to be a restrictive approach, for several reasons:
authType seems to be documented as an option to specify key vault authentication only. I haven't seen this option mention container registries (Chart README.md, Chart env-injector-deployment.yaml). Aside from the issues being discussed here, this alone is cause for potential confusion.
Making the container registry authentication option tied to the Key Vault authentication option is unnecessarily restrictive. Aside from the current use case, another use case that is excluded is that of custom authentication (i.e. authType="environment") with ACR MSI authentication. The newly added code branch in registry.go will never be taken because authType does not equal azureCloudConfig.
To Reproduce
Important
For ease of testing within the context of this project, these replication steps will include creating an ACR. To be clear, the two use cases (ACR without MSI, and third party registry entirely) fail the same way and for the purposes of this issue can be regarded as equivalent.
Note
I've replaced all the instances of identifiable names with their <pseudo> equivalents. Example: <registryName>.azurecr.io
Provision an AKS Cluster
Kubernetes Version: 1.26.10
Tier: Free
AAD Pod Identity: Disabled
Authentication and Authorization: Local accounts with Kubernetes RBAC
Networking: Azure CNI, Calico
Provision a Key Vault
With access to the cluster via an access policy on the agentpool MSI
Add a secret to this vault to be accessed later by a workload pod
For the purposes of these steps, name it admin-password
Provision a container registry. We'll be using token based authentication, not MSIs.
Deploy a workload pod that uses the pull secret and requests the secret variable with admin-password@azurekeyvault
apiVersion: apps/v1kind: Deploymentmetadata:
name: alpine-deploymentlabels:
app: alpinespec:
replicas: 1selector:
matchLabels:
app: alpinetemplate:
metadata:
labels:
app: alpinespec:
containers:
- name: alpineimage: <registryName>.azurecr.io/alpine-test:latestimagePullPolicy: Always # This is here to eliminate node cache from the variables in this testenv:
- name: "MY_SECRET"value: "admin-password@azurekeyvault"imagePullSecrets:
- name: regcred
Notice that Pods are not deploying. Describe the ReplicaSet to show the unauthorized message:
$ kubectl describe rs alpine-deployment-<hash>Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreate 13s replicaset-controller Error creating: Internal error occurred: failed calling webhook "pods.env-injector.admission.spv.no": failed to call webhook: an error on the server ("{\"response\":{\"uid\":\"4946a53f-76c1-4f90-82ab-ab0f76272a79\",\"allowed\":false,\"status\":{\"metadata\":{},\"status\":\"Failure\",\"message\":\"failed to get auto cmd, error: cannot fetch image descriptor: GET https://<registryName>.azurecr.io/oauth2/token?scope=repository%3Aalpine-test%3Apull\\u0026service=<registryName>.azurecr.io: UNAUTHORIZED: authentication required, visit https://aka.ms/acr/authorization for more information.\"}}}") has prevented the request from succeeding
Expected behavior
Based on the above replication steps, I would expect to see the env-injector use the imagePullSecrets in the deployment specification to pull the image from the registry. Instead, it tries to pull it using invalid/anonymous credentials.
As a point of comparison, installing akv2k8s at application version 1.5.0 (chart version 2.5.0) and reinstalling the deployment will show the intended behavior. Of course, the container is nonfunctional (it crash loops because it's default Alpine), which is beside the point.
Logs
Full ReplicaSet describe log:
Name: alpine-deployment-bbd99dbf5
Namespace: default
Selector: app=alpine,pod-template-hash=bbd99dbf5
Labels: app=alpine
pod-template-hash=bbd99dbf5
Annotations: deployment.kubernetes.io/desired-replicas: 1
deployment.kubernetes.io/max-replicas: 2
deployment.kubernetes.io/revision: 1
Controlled By: Deployment/alpine-deployment
Replicas: 0 current / 1 desired
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=alpine
pod-template-hash=bbd99dbf5
Containers:
alpine:
Image: <registry>.azurecr.io/alpine-test:latest
Port: <none>
Host Port: <none>
Environment:
MY_SECRET: admin-password@azurekeyvault
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
ReplicaFailure True FailedCreate
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 16s replicaset-controller Error creating: Internal error occurred: failed calling webhook "pods.env-injector.admission.spv.no": failed to call webhook: an error on the server ("{\"response\":{\"uid\":\"e0ac9a89-737b-4a68-b348-4d4017c09858\",\"allowed\":false,\"status\":{\"metadata\":{},\"status\":\"Failure\",\"message\":\"failed to get auto cmd, error: cannot fetch image descriptor: GET https://<registry>.azurecr.io/oauth2/token?scope=repository%3Aalpine-test%3Apull\\u0026service=<registry>.azurecr.io: UNAUTHORIZED: authentication required, visit https://aka.ms/acr/authorization for more information.\"}}}") has prevented the request from succeeding
# <snip> (the same event is repeated about 20 times)
env-injector logs at time of failure:
2024/01/09 06:10:51 [DEBUG] reviewing request 67fe5091-2a8f-4239-a599-d2236f22a837, named: default/
I0109 06:10:51.012438 1 main.go:139] "found pod to mutate" pod="default/"
I0109 06:10:51.012454 1 pod.go:295] "creating client certificate to use with auth service" default/="(MISSING)"
I0109 06:10:51.012462 1 clientCert.go:25] "creating x509 key pair for ca cert and key"
I0109 06:10:51.012752 1 clientCert.go:32] "parse certificate"
I0109 06:10:51.012772 1 clientCert.go:38] "generating client key"
I0109 06:10:51.029883 1 clientCert.go:44] "generating serial number"
I0109 06:10:51.029909 1 clientCert.go:66] "crating x509 certificate"
I0109 06:10:51.032023 1 pod.go:303] "create authentication service secret" default/="(MISSING)"
I0109 06:10:51.045707 1 round_trippers.go:553] POST https://10.0.0.1:443/api/v1/namespaces/default/secrets 409 Conflict in 13 milliseconds
I0109 06:10:51.051601 1 round_trippers.go:553] PUT https://10.0.0.1:443/api/v1/namespaces/default/secrets/akv2k8s-alpine-deployment 200 OK in 5 milliseconds
I0109 06:10:51.052088 1 pod.go:317] "mutate init-containers" default/="(MISSING)"
I0109 06:10:51.052106 1 pod.go:323] "mutate containers" default/="(MISSING)"
I0109 06:10:51.052113 1 pod.go:138] "found container to mutate" container="default/alpine"
I0109 06:10:51.052119 1 pod.go:141] "checking for env vars to inject" container="default/alpine"
I0109 06:10:51.052145 1 pod.go:144] "found env var to inject" env="admin-password@azurekeyvault" container="default/alpine"
I0109 06:10:51.052170 1 registry.go:30] "getting container command for container" container="default/alpine"
I0109 06:10:51.052232 1 registry.go:36] "no cmd override in kubernetes for container, checking docker image configuration for entrypoint and cmd" image="<registryName>.azurecr.io/alpine-test:latest" container="default/alpine"
I0109 06:10:51.052256 1 registry.go:130] "using registry" imageRegistry="<registryName>.azurecr.io"
I0109 06:10:51.052279 1 registry.go:135] "using cloudConfig for registry authentication" config.authType="azureCloudConfig"
I0109 06:10:51.052319 1 acr.go:78] using managed identity for acr credentials
I0109 06:10:51.052349 1 provider.go:285] "azure: using managed identity extension to retrieve access token" id="076f5645-0759-4528-b014-7cf8ac51f90f"
I0109 06:10:51.052359 1 provider.go:292] "azure: using managed identity extension to retrieve access token" id="076f5645-0759-4528-b014-7cf8ac51f90f"
I0109 06:10:51.063450 1 acr.go:184] "discovering auth redirects" url="<registryName>.azurecr.io"
I0109 06:10:51.096246 1 acr.go:190] exchanging an acr refresh_token
I0109 06:10:51.424071 1 round_trippers.go:553] GET https://10.0.0.1:443/api/v1/namespaces/default/serviceaccounts/default 200 OK in 3 milliseconds
E0109 06:10:51.657377 1 main.go:162] "failed to mutate" err="failed to get auto cmd, error: cannot fetch image descriptor: GET https://<registryName>.azurecr.io/oauth2/token?scope=repository%3Aalpine-test%3Apull&service=<registryName>.azurecr.io: UNAUTHORIZED: authentication required, visit https://aka.ms/acr/authorization for more information." pod="default/"
2024/01/09 06:10:51 [ERROR] admission webhook error: failed to get auto cmd, error: cannot fetch image descriptor: GET https://<registryName>.azurecr.io/oauth2/token?scope=repository%3Aalpine-test%3Apull&service=<registryName>.azurecr.io: UNAUTHORIZED: authentication required, visit https://aka.ms/acr/authorization for more information.
Note the two new (as of #631) INFO logs near the bottom, as it helps to track code execution.
using registry... shows execution is in getContainerRegistryRemoteOptions(), before the switch statement
using cloudConfig for registry authentication... shows execution is in the first branch of the switch statement
Additional context
In terms of potential direction for resolution, I'd probably be most in favor of the following:
Adding separate configuration options (probably in the form of a global and overrides just like the Key Vault option) that configures whether MSI should be used for the container registry.
I just bumped into same issue trying pull container images from a third-party registry using imagePullSecrets.
Would be super happy for any pointers, directions, or workarround ideas.
I am also running into this. I am a little confused though. I've got 3 pods all from the same registry in the same namespace using the same pullsecret, but I am only having issues pulling 1 of them.
Note: Make sure to check out known issues (https://akv2k8s.io/troubleshooting/known-issues/) before submitting
Note: Tangentially related to #495, through #631.
Components and versions
[ ]
Controller, version:x.x.x
(docker image tag)[X] Env-Injector (webhook), version:
1.6.0
(docker image tag)[ ]
OtherDescribe the bug
A regression in version 1.6.0 of the Env-Injector prevents the use of
imagePullSecrets
as valid registry authentication when usingauthType="azureCloudConfig"
. More specifically, the logic present in the current 1.6.0 version (covered below in detail) assumes that both the Key Vault and container registry must be using MSIs as the authentication method. My use case involves utilizing a third-party container registry, and not an ACR. As a result, I cannot set up any form of MSI to our container registry, which is why we useimagePullSecrets
. My understanding from reading through the documentation and code from 1.5.0 and below is that this use case should be supported.After investigating, I've found that the changes introduced by #631 break the previous logic that allowed
imagePullSecrets
. The functiongetContainerRegistryRemoteOptions()
inregistry.go
(a) usesauthType
to decide how the container registry authentication should be handled, and (b) assumes that an ACR will be used whenauthType="azureCloudConfig"
:azure-key-vault-to-kubernetes/pkg/docker/registry/registry.go
Lines 123 to 182 in 647b715
For context, the previous approach used a
getImageConfig()
function whose only branch includesimagePullSecrets
(for clarity, this same logic is preserved in thedefault
case statement in 1.6.0):azure-key-vault-to-kubernetes/pkg/docker/registry/registry.go
Lines 112 to 158 in 7141e5f
This seems to be a restrictive approach, for several reasons:
authType
seems to be documented as an option to specify key vault authentication only. I haven't seen this option mention container registries (Chart README.md, Chartenv-injector-deployment.yaml
). Aside from the issues being discussed here, this alone is cause for potential confusion.authType="environment"
) with ACR MSI authentication. The newly added code branch inregistry.go
will never be taken becauseauthType
does not equalazureCloudConfig
.To Reproduce
Important
For ease of testing within the context of this project, these replication steps will include creating an ACR. To be clear, the two use cases (ACR without MSI, and third party registry entirely) fail the same way and for the purposes of this issue can be regarded as equivalent.
Note
I've replaced all the instances of identifiable names with their
<pseudo>
equivalents. Example:<registryName>.azurecr.io
1.26.10
Free
Disabled
Local accounts with Kubernetes RBAC
Azure CNI
,Calico
admin-password
mytoken
_repositories_push_metadata_write
akv2k8s
at application version1.6.0
(Helm chart version2.6.0
)azure-key-vault-env-injection
annotationdefault
, for ease.AzureKeyVaultSecret
in that namespaceadmin-password@azurekeyvault
Expected behavior
Based on the above replication steps, I would expect to see the
env-injector
use theimagePullSecrets
in the deployment specification to pull the image from the registry. Instead, it tries to pull it using invalid/anonymous credentials.As a point of comparison, installing
akv2k8s
at application version1.5.0
(chart version2.5.0
) and reinstalling the deployment will show the intended behavior. Of course, the container is nonfunctional (it crash loops because it's default Alpine), which is beside the point.Logs
Full
ReplicaSet
describe log:env-injector
logs at time of failure:Note the two new (as of #631)
INFO
logs near the bottom, as it helps to track code execution.using registry...
shows execution is ingetContainerRegistryRemoteOptions()
, before theswitch
statementusing cloudConfig for registry authentication...
shows execution is in the first branch of theswitch
statementAdditional context
In terms of potential direction for resolution, I'd probably be most in favor of the following:
azureCloudConfig
); andimagePullSecrets
, regardless of the option's value)imagePullSecrets
, to avoid making this a breaking change. Otherwise, the breaking change should be communicated.The text was updated successfully, but these errors were encountered: