diff --git a/support/azure/azure-kubernetes/errorcodes/VMExtensionError_CNIDownloadTimeout.md b/support/azure/azure-kubernetes/errorcodes/VMExtensionError_CNIDownloadTimeout.md new file mode 100644 index 0000000000..e5e62081b2 --- /dev/null +++ b/support/azure/azure-kubernetes/errorcodes/VMExtensionError_CNIDownloadTimeout.md @@ -0,0 +1,53 @@ +--- +title: Troubleshoot the VMExtensionError_CNIDownloadTimeout error code +description: Learn how to troubleshoot the VMExtensionError_CNIDownloadTimeout error when you try to create and deploy an Azure Kubernetes Service (AKS) cluster. +ms.date: 05/03/2023 +editor: v-jsitser +ms.reviewer: axelg, chiragpa, v-leedennis +ms.service: azure-kubernetes-service +#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the VMExtensionError_CNIDownloadTimeout error code so that I can successfully create and deploy an Azure Kubernetes Service (AKS) cluster. +ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool) +--- +# Troubleshoot the VMExtensionError_CNIDownloadTimeout error code + +This article discusses how to identify and resolve the `VMExtensionError_CNIDownloadTimeout` error (also known as error code `ERR_CNI_DOWNLOAD_TIMEOUT`, error number 41) that occurs when you try to create and deploy a Microsoft Azure Kubernetes Service (AKS) cluster. + +## Prerequisites + +- The [Curl](https://curl.se/download.html) command-line tool + +## Symptoms + +When you try to create an AKS cluster, you receive the following error message: + +> **Code**: VMExtensionError_CNIDownloadTimeout +> +> **Message**: Agents are unable to connect to the endpoint that's used to download the container network interface libraries. It's likely that a network virtual appliance is blocking SSL communication or an SSL certificate, please see https://aka.ms/aks-error/VMExtensionError_CNIDownloadTimeout for more information +> +> +> **Details**
+>  **Code**: VMExtensionProvisioningError
+>  **Message**: VM has reported a failure when processing extension 'vmssCSE' (publisher 'Microsoft.Azure.Extensions' and type 'CustomScript'). Error message: 'Enable failed: failed to execute command: command terminated with exit status=41\n[stdout]... + + +## Cause + +Your cluster nodes cannot connect to the endpoint that are used to download the container network interface (CNI) libraries. In most cases, this issue occurs because a network virtual appliance is blocking Secure Sockets Layer (SSL) communication or an SSL certificate. + +## Solution + +Run Curl commands to verify that your nodes can download the binaries: + +```bash +curl https://acs-mirror.azureedge.net/cni/azure-vnet-cni-linux-amd64-v1.0.25.tgz + +curl --fail --ssl https://acs-mirror.azureedge.net/cni/azure-vnet-cni-linux-amd64-v1.0.25.tgz --output /opt/cni/downloads/azure-vnet-cni-linux-amd64-v1.0.25.tgz +``` + +If you cannot download these files, make sure that traffic is allowed to the downloading endpoint. For more information, see [Azure Global required FQDN / application rules](/azure/aks/outbound-rules-control-egress#azure-global-required-fqdn--application-rules). + +## References + +- [General troubleshooting of AKS cluster creation issues](troubleshoot-aks-cluster-creation-issues.md) + +[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)] diff --git a/support/azure/azure-kubernetes/errorcodes/VMExtensionError_K8SAPIServerConnFail.md b/support/azure/azure-kubernetes/errorcodes/VMExtensionError_K8SAPIServerConnFail.md new file mode 100644 index 0000000000..53277bd395 --- /dev/null +++ b/support/azure/azure-kubernetes/errorcodes/VMExtensionError_K8SAPIServerConnFail.md @@ -0,0 +1,51 @@ +--- +title: VMExtensionError_K8SAPIServerConnFail error code +description: Learn how to troubleshoot the VMExtensionError_K8SAPIServerConnFail error when you try to start or create and deploy an Azure Kubernetes Service (AKS) cluster. +ms.date: 01/24/2024 +ms.reviewer: rissing, chiragpa, erbookbi, v-leedennis, jovieir +ms.service: azure-kubernetes-service +#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the VMExtensionError_K8SAPIServerConnFail error code so that I can successfully start or create and deploy an Azure Kubernetes Service (AKS) cluster. +ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool) +--- +# Troubleshoot the VMExtensionError_K8SAPIServerConnFail error code + +This article discusses how to identify and resolve the `VMExtensionError_K8SAPIServerConnFail` error (also known as error code ERR_K8S_API_SERVER_CONN_FAIL, error number 51) that occurs when you try to start or create and deploy a Microsoft Azure Kubernetes Service (AKS) cluster. + +## Prerequisites + +- The [Netcat](https://linuxcommandlibrary.com/man/netcat) (nc) command-line tool + +## Symptoms + +When you try to start or create an AKS cluster, you receive the following error message: + +> **Code**: VMExtensionError_K8SAPIServerConnFail +> +> **Message**: Unable to establish connection from agents to Kubernetes API server. Please see https://aka.ms/aks-error/VMExtensionError_K8SAPIServerConnFail and https://aka.ms/aks-required-ports-and-addresses for more information. +> +> +> **Details**
+>  **Code**: VMExtensionProvisioningError
+>  **Message**: VM has reported a failure when processing extension 'vmssCSE' (publisher 'Microsoft.Azure.Extensions' and type 'CustomScript'). Error message: 'Enable failed: failed to execute command: command terminated with exit status=51\n[stdout]... + +## Cause + +Your cluster nodes cannot connect to your cluster API server pod. + +## Solution + +Run a Netcat command to verify that your nodes can resolve the cluster's fully qualified domain name (FQDN): + +```shell +nc -vz 443 +``` + +If you're using egress filtering through a firewall, make sure that traffic is allowed to your cluster FQDN. + +In rare cases, the firewall's outbound IP address can be blocked if you've authorized IP addresses that are enabled on your cluster. In this scenario, you must add the outbound IP address of your firewall to the list of authorized IP ranges for the cluster. For more information, see [Secure access to the API server using authorized IP address ranges in AKS](/azure/aks/api-server-authorized-ip-ranges). + +## More information + +- [General troubleshooting of AKS cluster creation issues](troubleshoot-aks-cluster-creation-issues.md) + +[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)] diff --git a/support/azure/azure-kubernetes/errorcodes/VMExtensionError_K8SAPIServerDNSLookupFail.md b/support/azure/azure-kubernetes/errorcodes/VMExtensionError_K8SAPIServerDNSLookupFail.md new file mode 100644 index 0000000000..3204c3552a --- /dev/null +++ b/support/azure/azure-kubernetes/errorcodes/VMExtensionError_K8SAPIServerDNSLookupFail.md @@ -0,0 +1,68 @@ +--- +title: Troubleshoot the VMExtensionError_K8SAPIServerDNSLookupFail error code +description: Learn how to troubleshoot the VMExtensionError_K8SAPIServerDNSLookupFail when you try to start or create and deploy an Azure Kubernetes Service (AKS) cluster. +ms.date: 01/24/2024 +ms.reviewer: rissing, chiragpa, erbookbi, v-leedennis, jovieir +ms.service: azure-kubernetes-service +#Customer intent: As an Azure Kubernetes user, I want to troubleshoot the VMExtensionError_K8SAPIServerDNSLookupFail error code so that I can successfully start or create and deploy an Azure Kubernetes Service (AKS) cluster. +ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool) +--- +# Troubleshoot the VMExtensionError_K8SAPIServerDNSLookupFail error code + +This article discusses how to identify and resolve the `VMExtensionError_K8SAPIServerDNSLookupFail` error (also known as error code ERR_K8S_API_SERVER_DNS_LOOKUP_FAIL, error number 52) that occurs when you try to start or create and deploy a Microsoft Azure Kubernetes Service (AKS) cluster. + +## Prerequisites + +- The [nslookup](/windows-server/administration/windows-commands/nslookup) DNS lookup tool for Windows nodes or the [dig](https://linuxize.com/post/how-to-use-dig-command-to-query-dns-in-linux/) tool for Linux nodes. + +- [Azure CLI](/cli/azure/install-azure-cli), version 2.0.59 or a later version. If Azure CLI is already installed, you can find the version number by running `az --version`. + +## Symptoms + +When you try to start or create an AKS cluster, you receive the following error message: + +> **Code**: VMExtensionError_K8SAPIServerDNSLookupFail +> +> **Message**: Agents are unable to resolve Kubernetes API server name. It's likely custom DNS server is not correctly configured, please see https://aka.ms/aks-error/VMExtensionError_K8SAPIServerDNSLookupFail and https://aka.ms/aks/private-cluster#hub-and-spoke-with-custom-dns for more information. +> +> +> **Details**
+>  **Code**: VMExtensionProvisioningError
+>  **Message**: VM has reported a failure when processing extension 'vmssCSE' (publisher 'Microsoft.Azure.Extensions' and type 'CustomScript'). Error message: 'Enable failed: failed to execute command: command terminated with exit status=52\n[stdout]... + +## Cause + +The cluster nodes cannot resolve the fully qualified domain name (FQDN) of the cluster in Azure DNS. Run the following DNS lookup command on the failed cluster node to find DNS resolutions that are valid. + +| Node OS | Command | +| ------- | ------------------------- | +| Linux | `dig ` | +| Windows | `nslookup ` | + +## Solution + +On your DNS servers and firewall, make sure that nothing blocks the resolution to your cluster's FQDN. Your custom DNS server might be incorrectly configured if something is blocking even after you run the `nslookup` or `dig` command and apply any necessary fixes. For help to configure your custom DNS server, review the following articles: + +- [Create a private AKS cluster](/azure/aks/private-clusters) +- [Private Azure Kubernetes service with custom DNS server](https://github.com/Azure/terraform/tree/00d15e09c54f25fb6387330c36aa4366122c5aaa/quickstart/301-aks-private-cluster) +- [What is IP address 168.63.129.16?](/azure/virtual-network/what-is-ip-address-168-63-129-16) + +When you use a private cluster that has a custom DNS, a DNS zone is created. The DNS zone must be linked to the virtual network. This occurs after the cluster is created. Creating a private cluster that has a custom DNS fails during creation. However, you can restore the creation process to a "success" state by reconciling the cluster. To do this, run the [az resource update](/cli/azure/resource#az-resource-update) command in Azure CLI, as follows: + +```azurecli-interactive +az resource update --resource-group \ + --name \ + --namespace Microsoft.ContainerService \ + --resource-type ManagedClusters +``` + +Also verify that your DNS server is configured correctly for your private cluster, as described earlier. + +> [!NOTE] +> Conditional Forwarding doesn't support subdomains. + +## More information + +- [General troubleshooting of AKS cluster creation issues](troubleshoot-aks-cluster-creation-issues.md) + +[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)] diff --git a/support/azure/azure-kubernetes/errorcodes/VMExtensionError_OutboundConnFail.md b/support/azure/azure-kubernetes/errorcodes/VMExtensionError_OutboundConnFail.md new file mode 100644 index 0000000000..514ad95b4a --- /dev/null +++ b/support/azure/azure-kubernetes/errorcodes/VMExtensionError_OutboundConnFail.md @@ -0,0 +1,132 @@ +--- +title: Troubleshoot the VMExtensionError_OutboundConnFail error code +description: Learn how to troubleshoot the VMExtensionError_OutboundConnFail error (50) when you try to start or create and deploy an Azure Kubernetes Service (AKS) cluster. +ms.date: 01/24/2024 +ms.reviewer: rissing, chiragpa, v-leedennis, jovieir +ms.service: azure-kubernetes-service +ms.custom: sap:Create, Upgrade, Scale and Delete operations (cluster or nodepool) +--- +# Troubleshoot the VMExtensionError_OutboundConnFail error code + +This article describes how to identify and resolve the `VMExtensionError_OutboundConnFail` error (also known as error code `ERR_OUTBOUND_CONN_FAIL`, error number 50) that might occur if you try to start or create and deploy a Microsoft Azure Kubernetes Service (AKS) cluster. + +## Prerequisites + +- The [Netcat](https://linuxcommandlibrary.com/man/netcat) (nc) command-line tool + +- The [dig](https://linux.die.net/man/1/dig) command-line tool + +- The Client URL ([cURL](https://curl.se/download.html)) tool + +## Symptoms + +When you try to start or create an AKS cluster, you receive the following error message: + +> **Code**: VMExtensionError_OutboundConnFail +> +> **Message**: Unable to establish outbound connection from agents, please see https://aka.ms/aks-error/VMExtensionError_OutboundConnFail and https://aka.ms/aks-required-ports-and-addresses for more information. +> +> +> **Details**
+>  **Code**: VMExtensionProvisioningError
+>  **Message**: VM has reported a failure when processing extension 'vmssCSE' (publisher 'Microsoft.Azure.Extensions' and type 'CustomScript'). Error message: 'Enable failed: failed to execute command: command terminated with exit status=50\n[stdout]... + +## Cause + +The custom script extension that downloads the necessary components to provision the nodes could not establish the necessary outbound connectivity to obtain packages. For public clusters, the nodes try to communicate with the Microsoft Container Registry (MCR) endpoint (`mcr.microsoft.com`) on port 443. + +There are many reasons why the traffic might be blocked. In any of these situations, the best way to test connectivity is to use the Secure Shell protocol (SSH) to connect to the node. To make the connection, follow the instructions in [Connect to Azure Kubernetes Service (AKS) cluster nodes for maintenance or troubleshooting](/azure/aks/node-access). Then, test the connectivity on the cluster by following these steps: + +1. After you connect to the node, run the `nc` and `dig` commands: + + ```bash + nc -vz mcr.microsoft.com 443 + dig mcr.microsoft.com 443 + ``` + + > [!NOTE] + > If you can't access the node through SSH, you can test the outbound connectivity by running the [az vmss run-command invoke](/cli/azure/vmss/run-command#az-vmss-run-command-invoke) command against the Virtual Machine Scale Set instance: + > + > ```azurecli + > # Get the VMSS instance IDs. + > az vmss list-instances --resource-group \ + > --name \ + > --output table + > + > # Use an instance ID to test outbound connectivity. + > az vmss run-command invoke --resource-group \ + > --name \ + > --command-id RunShellScript \ + > --instance-id \ + > --output json \ + > --scripts "nc -vz mcr.microsoft.com 443" + > ``` + +1. If you try to create an AKS cluster by using an HTTP proxy, run the `nc`, `curl`, and `dig` commands after you connect to the node: + + ```bash + # Test connectivity to the HTTP proxy server from the AKS node. + nc -vz + + # Test traffic from the HTTP proxy server to HTTPS. + curl --proxy http://:/ --head https://mcr.microsoft.com + + # Test traffic from the HTTPS proxy server to HTTPS. + curl --proxy https://:/ --head https://mcr.microsoft.com + + # Test DNS functionality. + dig mcr.microsoft.com 443 + ``` + + > [!NOTE] + > If you can't access the node through SSH, you can test the outbound connectivity by running the `az vmss run-command invoke` command against the Virtual Machine Scale Set instance: + > + > ```azurecli + > # Get the VMSS instance IDs. + > az vmss list-instances --resource-group \ + > --name \ + > --output table + > + > # Use an instance ID to test connectivity from the HTTP proxy server to HTTPS. + > az vmss run-command invoke --resource-group \ + > --name \ + > --command-id RunShellScript \ + > --instance-id \ + > --output json \ + > --scripts "curl --proxy http://:/ --head https://mcr.microsoft.com" + > + > # Use an instance ID to test connectivity from the HTTPS proxy server to HTTPS. + > az vmss run-command invoke --resource-group \ + > --name \ + > --command-id RunShellScript \ + > --instance-id \ + > --output json \ + > --scripts "curl --proxy https://:/ --head https://mcr.microsoft.com" + > + > # Use an instance ID to test DNS functionality. + > az vmss run-command invoke --resource-group \ + > --name \ + > --command-id RunShellScript \ + > --instance-id \ + > --output json \ + > --scripts "dig mcr.microsoft.com 443" + > ``` + +## Solution + +The following table lists specific reasons why traffic might be blocked, and the corresponding solution for each reason. + +| Issue | Solution | +| ----- | -------- | +| Traffic is blocked by firewall rules or a proxy server | In this scenario, a firewall or a proxy server does egress filtering. To verify that all required domains and ports are allowed, see [Control egress traffic for cluster nodes in Azure Kubernetes Service (AKS)](/azure/aks/limit-egress-traffic). | +| Traffic is blocked by a cluster network security group (NSG) | On any NSGs that are attached to your cluster, verify that there's no blocking on port 443, port 53, or any other port that might have to be used to connect to the endpoint. For more information, see [Control egress traffic for cluster nodes in Azure Kubernetes Service (AKS)](/azure/aks/limit-egress-traffic). | +| The AAAA (IPv6) record is blocked on the firewall | On your firewall, verify that nothing exists that would block the endpoint from resolving in Azure DNS. | +| Private cluster can't resolve internal Azure resources | In private clusters, the Azure DNS IP address (`168.63.129.16`) must be added as an upstream DNS server if custom DNS is used. Verify that the address is set on your DNS servers. For more information, see [Create a private AKS cluster](/azure/aks/private-clusters) and [What is IP address 168.63.129.16?](/azure/virtual-network/what-is-ip-address-168-63-129-16) | + +## More information + +- [General troubleshooting of AKS cluster creation issues](troubleshoot-aks-cluster-creation-issues.md) + +[!INCLUDE [Third-party disclaimer](../../../includes/third-party-contact-disclaimer.md)] + +[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)] diff --git a/support/azure/azure-kubernetes/toc.yml b/support/azure/azure-kubernetes/toc.yml index b9ca13d9ee..1684f37dc9 100644 --- a/support/azure/azure-kubernetes/toc.yml +++ b/support/azure/azure-kubernetes/toc.yml @@ -365,3 +365,13 @@ href: extensions/troubleshoot-kubernetes-event-driven-autoscaling-add-on.md - name: Breaking changes in KEDA add-on 2.15 and 2.14 href: extensions/changes-in-kubernetes-event-driven-autoscaling-add-on-214-215.md + - name: Troubleshoot by Error codes + items: + - name: VMExtensionError_CNIDownloadTimeout + href: error-codes/VMExtensionError_CNIDownloadTimeout.md + - name: VMExtensionError_OutboundConnFail + href: error-codes/VMExtensionError_OutboundConnFail.md + - name: VMExtensionError_K8SAPIServerConnFail + href: error-codes/VMExtensionError_K8SAPIServerConnFail.md + - name: VMExtensionError_K8SAPIServerDNSLookupFail + href: error-codes/VMExtensionError_K8SAPIServerDNSLookupFail.md \ No newline at end of file