Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated AKS error codes and created a new category just for error codes #1729

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
title: Troubleshoot the VMExtensionError_CNIDownloadTimeout error code
description: Learn how to troubleshoot the VMExtensionError_CNIDownloadTimeout error when you try to create and deploy an Azure Kubernetes Service (AKS) cluster.
ms.date: 05/03/2023
editor: v-jsitser
ms.reviewer: axelg, chiragpa, v-leedennis
ms.service: azure-kubernetes-service
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the VMExtensionError_CNIDownloadTimeout error code so that I can successfully create and deploy an Azure Kubernetes Service (AKS) cluster.
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
---
# Troubleshoot the VMExtensionError_CNIDownloadTimeout error code

This article discusses how to identify and resolve the `VMExtensionError_CNIDownloadTimeout` error (also known as error code `ERR_CNI_DOWNLOAD_TIMEOUT`, error number 41) that occurs when you try to create and deploy a Microsoft Azure Kubernetes Service (AKS) cluster.

## Prerequisites

- The [Curl](https://curl.se/download.html) command-line tool

## Symptoms

When you try to create an AKS cluster, you receive the following error message:

> **Code**: VMExtensionError_CNIDownloadTimeout
>
> **Message**: Agents are unable to connect to the endpoint that's used to download the container network interface libraries. It's likely that a network virtual appliance is blocking SSL communication or an SSL certificate, please see https://aka.ms/aks-error/VMExtensionError_CNIDownloadTimeout for more information
>
>
> **Details** </br>
> &ensp;**Code**: VMExtensionProvisioningError</br>
> &ensp;**Message**: VM has reported a failure when processing extension 'vmssCSE' (publisher 'Microsoft.Azure.Extensions' and type 'CustomScript'). Error message: 'Enable failed: failed to execute command: command terminated with exit status=41\n[stdout]...


## Cause

Your cluster nodes cannot connect to the endpoint that are used to download the container network interface (CNI) libraries. In most cases, this issue occurs because a network virtual appliance is blocking Secure Sockets Layer (SSL) communication or an SSL certificate.

## Solution

Run Curl commands to verify that your nodes can download the binaries:

```bash
curl https://acs-mirror.azureedge.net/cni/azure-vnet-cni-linux-amd64-v1.0.25.tgz

curl --fail --ssl https://acs-mirror.azureedge.net/cni/azure-vnet-cni-linux-amd64-v1.0.25.tgz --output /opt/cni/downloads/azure-vnet-cni-linux-amd64-v1.0.25.tgz
```

If you cannot download these files, make sure that traffic is allowed to the downloading endpoint. For more information, see [Azure Global required FQDN / application rules](/azure/aks/outbound-rules-control-egress#azure-global-required-fqdn--application-rules).

## References

- [General troubleshooting of AKS cluster creation issues](troubleshoot-aks-cluster-creation-issues.md)

[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: VMExtensionError_K8SAPIServerConnFail error code
description: Learn how to troubleshoot the VMExtensionError_K8SAPIServerConnFail error when you try to start or create and deploy an Azure Kubernetes Service (AKS) cluster.
ms.date: 01/24/2024
ms.reviewer: rissing, chiragpa, erbookbi, v-leedennis, jovieir
ms.service: azure-kubernetes-service
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the VMExtensionError_K8SAPIServerConnFail error code so that I can successfully start or create and deploy an Azure Kubernetes Service (AKS) cluster.
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
---
# Troubleshoot the VMExtensionError_K8SAPIServerConnFail error code

This article discusses how to identify and resolve the `VMExtensionError_K8SAPIServerConnFail` error (also known as error code ERR_K8S_API_SERVER_CONN_FAIL, error number 51) that occurs when you try to start or create and deploy a Microsoft Azure Kubernetes Service (AKS) cluster.

## Prerequisites

- The [Netcat](https://linuxcommandlibrary.com/man/netcat) (nc) command-line tool

## Symptoms

When you try to start or create an AKS cluster, you receive the following error message:

> **Code**: VMExtensionError_K8SAPIServerConnFail
>
> **Message**: Unable to establish connection from agents to Kubernetes API server. Please see https://aka.ms/aks-error/VMExtensionError_K8SAPIServerConnFail and https://aka.ms/aks-required-ports-and-addresses for more information.
>
>
> **Details** </br>
> &ensp;**Code**: VMExtensionProvisioningError</br>
> &ensp;**Message**: VM has reported a failure when processing extension 'vmssCSE' (publisher 'Microsoft.Azure.Extensions' and type 'CustomScript'). Error message: 'Enable failed: failed to execute command: command terminated with exit status=51\n[stdout]...

## Cause

Your cluster nodes cannot connect to your cluster API server pod.

## Solution

Run a Netcat command to verify that your nodes can resolve the cluster's fully qualified domain name (FQDN):

```shell
nc -vz <cluster-fqdn> 443
```

If you're using egress filtering through a firewall, make sure that traffic is allowed to your cluster FQDN.

In rare cases, the firewall's outbound IP address can be blocked if you've authorized IP addresses that are enabled on your cluster. In this scenario, you must add the outbound IP address of your firewall to the list of authorized IP ranges for the cluster. For more information, see [Secure access to the API server using authorized IP address ranges in AKS](/azure/aks/api-server-authorized-ip-ranges).

## More information

- [General troubleshooting of AKS cluster creation issues](troubleshoot-aks-cluster-creation-issues.md)

[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
title: Troubleshoot the VMExtensionError_K8SAPIServerDNSLookupFail error code
description: Learn how to troubleshoot the VMExtensionError_K8SAPIServerDNSLookupFail when you try to start or create and deploy an Azure Kubernetes Service (AKS) cluster.
ms.date: 01/24/2024
ms.reviewer: rissing, chiragpa, erbookbi, v-leedennis, jovieir
ms.service: azure-kubernetes-service
#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the VMExtensionError_K8SAPIServerDNSLookupFail error code so that I can successfully start or create and deploy an Azure Kubernetes Service (AKS) cluster.
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
---
# Troubleshoot the VMExtensionError_K8SAPIServerDNSLookupFail error code

This article discusses how to identify and resolve the `VMExtensionError_K8SAPIServerDNSLookupFail` error (also known as error code ERR_K8S_API_SERVER_DNS_LOOKUP_FAIL, error number 52) that occurs when you try to start or create and deploy a Microsoft Azure Kubernetes Service (AKS) cluster.

## Prerequisites

- The [nslookup](/windows-server/administration/windows-commands/nslookup) DNS lookup tool for Windows nodes or the [dig](https://linuxize.com/post/how-to-use-dig-command-to-query-dns-in-linux/) tool for Linux nodes.

- [Azure CLI](/cli/azure/install-azure-cli), version 2.0.59 or a later version. If Azure CLI is already installed, you can find the version number by running `az --version`.

## Symptoms

When you try to start or create an AKS cluster, you receive the following error message:

> **Code**: VMExtensionError_K8SAPIServerDNSLookupFail
>
> **Message**: Agents are unable to resolve Kubernetes API server name. It's likely custom DNS server is not correctly configured, please see https://aka.ms/aks-error/VMExtensionError_K8SAPIServerDNSLookupFail and https://aka.ms/aks/private-cluster#hub-and-spoke-with-custom-dns for more information.
>
>
> **Details** </br>
> &ensp;**Code**: VMExtensionProvisioningError</br>
> &ensp;**Message**: VM has reported a failure when processing extension 'vmssCSE' (publisher 'Microsoft.Azure.Extensions' and type 'CustomScript'). Error message: 'Enable failed: failed to execute command: command terminated with exit status=52\n[stdout]...

## Cause

The cluster nodes cannot resolve the fully qualified domain name (FQDN) of the cluster in Azure DNS. Run the following DNS lookup command on the failed cluster node to find DNS resolutions that are valid.

| Node OS | Command |
| ------- | ------------------------- |
| Linux | `dig <cluster-fqdn>` |
| Windows | `nslookup <cluster-fqdn>` |

## Solution

On your DNS servers and firewall, make sure that nothing blocks the resolution to your cluster's FQDN. Your custom DNS server might be incorrectly configured if something is blocking even after you run the `nslookup` or `dig` command and apply any necessary fixes. For help to configure your custom DNS server, review the following articles:

- [Create a private AKS cluster](/azure/aks/private-clusters)
- [Private Azure Kubernetes service with custom DNS server](https://github.com/Azure/terraform/tree/00d15e09c54f25fb6387330c36aa4366122c5aaa/quickstart/301-aks-private-cluster)
- [What is IP address 168.63.129.16?](/azure/virtual-network/what-is-ip-address-168-63-129-16)

When you use a private cluster that has a custom DNS, a DNS zone is created. The DNS zone must be linked to the virtual network. This occurs after the cluster is created. Creating a private cluster that has a custom DNS fails during creation. However, you can restore the creation process to a "success" state by reconciling the cluster. To do this, run the [az resource update](/cli/azure/resource#az-resource-update) command in Azure CLI, as follows:

```azurecli-interactive
az resource update --resource-group <resource-group-name> \
--name <cluster-name> \
--namespace Microsoft.ContainerService \
--resource-type ManagedClusters
```

Also verify that your DNS server is configured correctly for your private cluster, as described earlier.

> [!NOTE]
> Conditional Forwarding doesn't support subdomains.

## More information

- [General troubleshooting of AKS cluster creation issues](troubleshoot-aks-cluster-creation-issues.md)

[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
title: Troubleshoot the VMExtensionError_OutboundConnFail error code
description: Learn how to troubleshoot the VMExtensionError_OutboundConnFail error (50) when you try to start or create and deploy an Azure Kubernetes Service (AKS) cluster.
ms.date: 01/24/2024
ms.reviewer: rissing, chiragpa, v-leedennis, jovieir
ms.service: azure-kubernetes-service
ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool)
---
# Troubleshoot the VMExtensionError_OutboundConnFail error code

This article describes how to identify and resolve the `VMExtensionError_OutboundConnFail` error (also known as error code `ERR_OUTBOUND_CONN_FAIL`, error number 50) that might occur if you try to start or create and deploy a Microsoft Azure Kubernetes Service (AKS) cluster.

## Prerequisites

- The [Netcat](https://linuxcommandlibrary.com/man/netcat) (nc) command-line tool

- The [dig](https://linux.die.net/man/1/dig) command-line tool

- The Client URL ([cURL](https://curl.se/download.html)) tool

## Symptoms

When you try to start or create an AKS cluster, you receive the following error message:

> **Code**: VMExtensionError_OutboundConnFail
>
> **Message**: Unable to establish outbound connection from agents, please see https://aka.ms/aks-error/VMExtensionError_OutboundConnFail and https://aka.ms/aks-required-ports-and-addresses for more information.
>
>
> **Details** </br>
> &ensp;**Code**: VMExtensionProvisioningError</br>
> &ensp;**Message**: VM has reported a failure when processing extension 'vmssCSE' (publisher 'Microsoft.Azure.Extensions' and type 'CustomScript'). Error message: 'Enable failed: failed to execute command: command terminated with exit status=50\n[stdout]...

## Cause

The custom script extension that downloads the necessary components to provision the nodes could not establish the necessary outbound connectivity to obtain packages. For public clusters, the nodes try to communicate with the Microsoft Container Registry (MCR) endpoint (`mcr.microsoft.com`) on port 443.

There are many reasons why the traffic might be blocked. In any of these situations, the best way to test connectivity is to use the Secure Shell protocol (SSH) to connect to the node. To make the connection, follow the instructions in [Connect to Azure Kubernetes Service (AKS) cluster nodes for maintenance or troubleshooting](/azure/aks/node-access). Then, test the connectivity on the cluster by following these steps:

1. After you connect to the node, run the `nc` and `dig` commands:

```bash
nc -vz mcr.microsoft.com 443
dig mcr.microsoft.com 443
```

> [!NOTE]
> If you can't access the node through SSH, you can test the outbound connectivity by running the [az vmss run-command invoke](/cli/azure/vmss/run-command#az-vmss-run-command-invoke) command against the Virtual Machine Scale Set instance:
>
> ```azurecli
> # Get the VMSS instance IDs.
> az vmss list-instances --resource-group <mc-resource-group-name> \
> --name <vmss-name> \
> --output table
>
> # Use an instance ID to test outbound connectivity.
> az vmss run-command invoke --resource-group <mc-resource-group-name> \
> --name <vmss-name> \
> --command-id RunShellScript \
> --instance-id <vmss-instance-id> \
> --output json \
> --scripts "nc -vz mcr.microsoft.com 443"
> ```

1. If you try to create an AKS cluster by using an HTTP proxy, run the `nc`, `curl`, and `dig` commands after you connect to the node:

```bash
# Test connectivity to the HTTP proxy server from the AKS node.
nc -vz <http-s-proxy-address> <port>

# Test traffic from the HTTP proxy server to HTTPS.
curl --proxy http://<http-proxy-address>:<port>/ --head https://mcr.microsoft.com

# Test traffic from the HTTPS proxy server to HTTPS.
curl --proxy https://<https-proxy-address>:<port>/ --head https://mcr.microsoft.com

# Test DNS functionality.
dig mcr.microsoft.com 443
```

> [!NOTE]
> If you can't access the node through SSH, you can test the outbound connectivity by running the `az vmss run-command invoke` command against the Virtual Machine Scale Set instance:
>
> ```azurecli
> # Get the VMSS instance IDs.
> az vmss list-instances --resource-group <mc-resource-group-name> \
> --name <vmss-name> \
> --output table
>
> # Use an instance ID to test connectivity from the HTTP proxy server to HTTPS.
> az vmss run-command invoke --resource-group <mc-resource-group-name> \
> --name <vmss-name> \
> --command-id RunShellScript \
> --instance-id <vmss-instance-id> \
> --output json \
> --scripts "curl --proxy http://<http-proxy-address>:<port>/ --head https://mcr.microsoft.com"
>
> # Use an instance ID to test connectivity from the HTTPS proxy server to HTTPS.
> az vmss run-command invoke --resource-group <mc-resource-group-name> \
> --name <vmss-name> \
> --command-id RunShellScript \
> --instance-id <vmss-instance-id> \
> --output json \
> --scripts "curl --proxy https://<https-proxy-address>:<port>/ --head https://mcr.microsoft.com"
>
> # Use an instance ID to test DNS functionality.
> az vmss run-command invoke --resource-group <mc-resource-group-name> \
> --name <vmss-name> \
> --command-id RunShellScript \
> --instance-id <vmss-instance-id> \
> --output json \
> --scripts "dig mcr.microsoft.com 443"
> ```

## Solution

The following table lists specific reasons why traffic might be blocked, and the corresponding solution for each reason.

| Issue | Solution |
| ----- | -------- |
| Traffic is blocked by firewall rules or a proxy server | In this scenario, a firewall or a proxy server does egress filtering. To verify that all required domains and ports are allowed, see [Control egress traffic for cluster nodes in Azure Kubernetes Service (AKS)](/azure/aks/limit-egress-traffic). |
| Traffic is blocked by a cluster network security group (NSG) | On any NSGs that are attached to your cluster, verify that there's no blocking on port 443, port 53, or any other port that might have to be used to connect to the endpoint. For more information, see [Control egress traffic for cluster nodes in Azure Kubernetes Service (AKS)](/azure/aks/limit-egress-traffic). |
| The AAAA (IPv6) record is blocked on the firewall | On your firewall, verify that nothing exists that would block the endpoint from resolving in Azure DNS. |
| Private cluster can't resolve internal Azure resources | In private clusters, the Azure DNS IP address (`168.63.129.16`) must be added as an upstream DNS server if custom DNS is used. Verify that the address is set on your DNS servers. For more information, see [Create a private AKS cluster](/azure/aks/private-clusters) and [What is IP address 168.63.129.16?](/azure/virtual-network/what-is-ip-address-168-63-129-16) |

## More information

- [General troubleshooting of AKS cluster creation issues](troubleshoot-aks-cluster-creation-issues.md)

[!INCLUDE [Third-party disclaimer](../../../includes/third-party-contact-disclaimer.md)]

[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
10 changes: 10 additions & 0 deletions support/azure/azure-kubernetes/toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -365,3 +365,13 @@
href: extensions/troubleshoot-kubernetes-event-driven-autoscaling-add-on.md
- name: Breaking changes in KEDA add-on 2.15 and 2.14
href: extensions/changes-in-kubernetes-event-driven-autoscaling-add-on-214-215.md
- name: Troubleshoot by Error codes
items:
- name: VMExtensionError_CNIDownloadTimeout
href: error-codes/VMExtensionError_CNIDownloadTimeout.md
- name: VMExtensionError_OutboundConnFail
href: error-codes/VMExtensionError_OutboundConnFail.md
- name: VMExtensionError_K8SAPIServerConnFail
href: error-codes/VMExtensionError_K8SAPIServerConnFail.md
- name: VMExtensionError_K8SAPIServerDNSLookupFail
href: error-codes/VMExtensionError_K8SAPIServerDNSLookupFail.md