Troubleshooting EKS Managed Node Group Ec2SubnetInvalidConfiguration Error

Troubleshooting EKS Managed Node Group Ec2SubnetInvalidConfiguration Error

Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes service that allows many Kubernetes administrators to quickly create and easily run Kubernetes clusters on AWS in a matter of minutes via commands, simplifying many operations. In 2019, EKS released new API support for Managed Node Groups 1, which can automatically create and manage EC2 instances and add them to Kubernetes clusters, making it easier for users to add and expand computational nodes required by Kubernetes clusters, and even upgrade node versions via API or one-click integration.

However, during the process of adding or upgrading Managed Node Groups, it is possible to encounter the Ec2SubnetInvalidConfiguration error. Therefore, this article will further analyze the cause, common scenarios, and solutions of this error.

How to identify this issue?

To check whether there is an Ec2SubnetInvalidConfiguration error in EKS Managed Node Groups, you can confirm whether there are any health issues through the EKS Console or AWS CLI commands. For example, by clicking on the “Compute” tab under Cluster > Node groups to enter the detailed page of the node group, you can check whether there is any error message in the Health Issues tab:

Nodegroup create failed

According to the EKS Managed Node Group upgrade process 2, if node upgrades or new nodes are stuck after 15-20 minutes, there may be some problems with the work nodes during operation, and there is an opportunity to further troubleshoot possible causes through this information after a period of time. The example below shows the use of AWS CLI commands:

$ aws eks describe-nodegroup --nodegroup-name broken-nodegroup --cluster eks --region eu-west-1
{
    "nodegroup": {
        "nodegroupName": "broken-nodegroup",
        "clusterName": "eks",
        "version": "1.25",
        "releaseVersion": "1.25.9-20230526",
        "status": "CREATE_FAILED",
        "capacityType": "ON_DEMAND",
        "subnets": [
            "subnet-AAAAAAAAAAAAAAAAA",
            "subnet-BBBBBBBBBBBBBBBBB"
        ],
        "amiType": "AL2_x86_64",
        "health": {
            "issues": [
                {
                    "code": "Ec2SubnetInvalidConfiguration",
                    "message": "One or more Amazon EC2 Subnets of [subnet-AAAAAAAAAAAAAAAAA, subnet-BBBBBBBBBBBBBBBBB] for node group broken-nodegroup does not automatically assign public IP addresses to instances launched into it. If you want your instances to be assigned a public IP address, then you need to enable auto-assign public IP address for the subnet. See IP addressing in VPC guide: <https://docs.aws.amazon.com/vpc/latest/userguide/vpc-ip-addressing.html#subnet-public-ip>",
                    "resourceIds": [
                        "subnet-AAAAAAAAAAAAAAAAA",
                        "subnet-BBBBBBBBBBBBBBBBB"
                    ]
                }
            ]
        },
        "updateConfig": {
            "maxUnavailable": 1
        },
        ...
    }
}

The above example shows that when creating a node group, it failed and is in a CREATE_FAILED state.

Common scenarios and causes

According to the definition of error exceptions 3, it is easy to know that this error is usually caused by the subnet specified by EKS Managed Node Group not enabling auto-assign public IP address.

By default, when Managed Node Groups create EC2 instances, they need to rely on the subnet itself to enable this function. If the subnet does not enable auto-assign public IP address, EC2 instances will not be able to obtain public IPv4 addresses and will be unable to communicate with the Internet, leading to this error. Therefore, this gives rise to two different usage scenarios: Public Subnet and Private Subnet.

Public Subnet and Private Subnet refer to different subnets set in Amazon Virtual Private Cloud (VPC). Public Subnet means that resources in this subnet can communicate directly with the Internet and connect to the public Internet, while Private Subnet cannot communicate with the Internet directly:

Subnet diagram

(Image Source: Amazon Virtual Private Cloud User Guide 4)

In short, if a subnet has a route pointing to an Internet Gateway resource (0.0.0.0/0), it can usually be considered that this subnet will be able to directly connect to the Internet. Otherwise, this subnet is expected to exist in a limited and private network environment.

Encountering this error in Public Subnet

Previously, EKS would enable auto-assign public IPv4 addresses for all Node Groups under Managed Node Groups when creating them, regardless of whether they were in a Public Subnet or Private Subnet. However, this feature was updated in 2020 5. Therefore, if you expect the EKS Managed Node Groups you deploy to be assigned public IPv4 addresses and allow direct connection to the Internet, you need to ensure that the Subnet you use to create or upgrade Managed Node Groups with Public Subnet enables auto-assign public IPv4 addresses (MapPublicIpOnLaunch):

$ aws ec2 describe-subnets --subnet-ids subnet-XXXXXXXXX
"Subnets": [
        {
            "CidrBlock": "192.168.64.0/19",
            "MapPublicIpOnLaunch": true,
            "State": "available",
            "SubnetId": "subnet-XXXXXXXXXXX",
            "VpcId": "vpc-XXXXXXXXXXX",
            ...
        }
    ]

Encountering this error in Private Subnet

You may say, “I originally expected this subnet to be a Private Subnet, and maybe it was normal when I first created it.” You may wonder why EKS Managed Groups may report that the subnet specified by my Node Groups does not automatically associate public IPv4 addresses during runtime or upgrade.

For Amazon EKS, the VPC associated subnet properties still apply to the principles mentioned earlier 6, so in many cases, this usually involves routing table settings related to the Subnet:

  • A public subnet is a subnet with a route table that includes a route to an Internet gateway, whereas a private subnet is a subnet with a route table that doesn’t include a route to an Internet gateway.

For example, the following is an example of a node group in the environment that is in a DEGRADED state due to subnet configuration issues after running Node Groups for a period of time, and there is an error message:

Nodegroup degraded

Solutions and Steps

Public Subnet

To solve this problem, if you choose Public Subnet to deploy Managed Node Groups, you need to ensure that the Subnet enables the Auto-assign Public IP address function. Refer to the following steps:

  1. Log in to the AWS Console and go to the VPC Management Console.
  2. Choose the Subnet you want to use > Edit Subnet Settings, and select “Enable auto-assign public IPv4 address”.
  3. Check the option and click Save.

Enable auto assign settings

After completing these steps, if the node status fails, you can delete the failed Managed Node Groups and create a new one again, selecting the above Subnet. When EKS starts these EC2 instances, they can obtain Public IP addresses correctly based on the Subnet settings.

If the problem you encounter is not caused by the subnet not enabling auto-assign public IP address, please refer to the error information. 3

Private Subnet

In the previous content, we described an error that can occur when encountering an expected private subnet. This is usually because EKS tries to start or check the subnet used by Managed Node Groups and assumes that the subnet belongs to the Public Subnet property due to the route pointing to the Internet Gateway, rather than the expected Private Subnet property, resulting in the following exception message:

Incorrect private subnet route

To resolve this issue, you can configure the Route Table corresponding to the Private Subnet correctly, preserve internal requests within the VPC, and set the correct non-VPC request routing (such as removing the Internet Gateway for the 0.0.0.0/0 route) to point to other targets such as the NAT Gateway, which provides the ability to access the internet only internally from outside, ensuring that the worker nodes can still download images from other sources (such as Docker Hub) and interact with the EKS API Server when starting up.

Summary

In this content, we mentioned common scenarios and causes of the Ec2SubnetInvalidConfiguration error when creating or upgrading an EKS Managed Node Group. In the case of Public Subnets, it is necessary to ensure that the Subnet has the “Auto-assign Public IP address” feature enabled; in the case of Private Subnets, it is necessary to correctly configure the Route Table corresponding to the Private Subnet and keep the internal requests within the VPC, while setting the correct non-VPC request routing to other targets, such as NAT Gateway.

By following the above solutions and related steps, we hope to help you, who are reading this content, to troubleshoot and solve this error more effectively, ensuring that the Managed Node Group runs smoothly.

Reference Resources

Eason Cao
Eason Cao Eason is an engineer working at FANNG and living in Europe. He was accredited as AWS Professional Solution Architect, AWS Professional DevOps Engineer and CNCF Certified Kubernetes Administrator. He started his Kubernetes journey in 2017 and enjoys solving real-world business problems.
comments powered by Disqus