AWS Instance Metadata Service v2 can cause outages. Make sure you are safe!

AWS Instance Metadata Service v2 can cause outages. Make sure you are safe!

Introduction

Recently AWS switched to IMDSv2 as the default for EC2 instances (Amazon EC2 Instance Metadata Service IMDSv2 by default | Amazon Web Services) . This change, which was meant to improve security, ended up causing quite a bit of trouble for a lot of AWS Clients who had been using old launch template for their autoscaling group and were using the AWS metadata service.

In this post, I’ll walk you through what the Instance Metadata Service (IMDS) is, why AWS made this change, and most importantly, how you can avoid running into the same issues.

What is the Instance Metadata Service (IMDS)?

Let’s start with the basics. The Instance Metadata Service, or IMDS, is something that runs on every EC2 instance. It provides important details about the instance, like its ID, IP address, and the IAM role credentials assigned to it. This information is accessed through a special internal IP address, and it’s crucial for many tasks within the instance, from identifying the instance itself to fetching data that applications running on the instance might need.

For example, if you wanted to get the instance’s ID, you could simply run a command like this:

INSTANCE_ID=`curl -s http://169.254.169.254/latest/meta-data/instance-id`
echo $INSTANCE_ID

And just like that, you’d have the instance ID ready to use in your scripts or applications.

The Change from IMDSv1 to IMDSv2

So, what changed? In mid-2024, AWS made IMDSv2 the default for newly launched EC2 instances. This new version of IMDS introduced an extra step for security: before you can access any metadata, you need to first obtain a session token.

Here’s how it works now with IMDSv2:

  1. Get a session token:
TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
  1. Use the token to get the instance ID:
INSTANCE_ID=`curl -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/instance-id`

While this may seem like just a small change, it has significant security implications.

How can this change cause an outage?

Here’s where things go wrong. If your existing code is written for IMDSv1, which doesn’t require a session token. When AWS switches to IMDSv2, the code that was supposed to fetch the instance ID will stop working. Instead of getting the instance ID, the variable will be empty. Also the failure will be graceful which means your scripts will continue processing without exception.

How did our client see an outage?

Our client was using the metadata service in their startup script. The instance ID was a critical piece of information in their instance startup script. Without it, the script couldn’t complete successfully. As a result, the instances in the autoscaling group couldn’t start up properly, which meant that the whole system couldn’t scale as needed. This led to downtime and a lot of stress as we scrambled to figure out what had gone wrong.

The Benefits of IMDSv2

Now, you might be wondering why AWS made this change in the first place, especially if it can cause such issues. The answer lies in security.

  • Stronger Security: IMDSv2 makes it much harder for attackers to exploit the metadata service. By requiring a session token, it adds a barrier that protects sensitive information like IAM role credentials from unauthorized access.
  • Lower Risk of Attacks: One common vulnerability is SSRF (Server-Side Request Forgery), where an attacker tricks a server into making a request on their behalf. IMDSv2’s token requirement helps mitigate this risk, making it harder for attackers to misuse the service.
  • Meeting Security Standards: With security standards constantly evolving, adopting IMDSv2 ensures that your infrastructure aligns with the latest best practices, helping you meet compliance requirements.
  • Preparation for Future Updates: By adopting IMDSv2, you’re also preparing your systems for future AWS updates, reducing the risk of running into unexpected issues down the line.

How to Update Your Code for IMDSv2

When AWS introduced IMDSv2, the process of accessing instance metadata became slightly more involved. But don’t worry—it’s a simple adjustment once you know what to do. Let’s look at a before-and-after example to make it clear.

Before (Using IMDSv1):

In IMDSv1, fetching the instance ID was straightforward. Here’s how you might have done it:

INSTANCE_ID=`curl -s http://169.254.169.254/latest/meta-data/instance-id`
echo $INSTANCE_ID

This command would directly return the instance ID without any additional steps.

After (Using IMDSv2):

With IMDSv2, you need to add one more step: obtaining a session token. Here’s the updated process:

# Step 1: Get a session token
TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
# Step 2: Use the token to get the instance ID
INSTANCE_ID=`curl -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/instance-id`
echo $INSTANCE_ID

In this updated version, you first request a session token with a specific time-to-live (in this case, 6 hours). You then use this token to securely access the instance metadata, including the instance ID.

By adding this token, you’re boosting the security of your instance’s metadata, making it much harder for unauthorized processes or attackers to access this sensitive information.

Steps to Update Your Code:

  1. Audit Your Scripts: Review any scripts that use IMDSv1 and identify where metadata is being accessed.
  2. Implement the IMDSv2 Process: Replace old IMDSv1 commands with the two-step IMDSv2 process as shown above.
  3. Test Thoroughly: Run your updated scripts in a staging environment to ensure they work correctly before deploying them in production.

This small change will go a long way in ensuring your infrastructure is secure and compliant with AWS’s latest standards.

Lessons Learned

Here are a few takeaways from this experience:

  • Stay Updated: Even small changes from cloud providers like AWS can have a big impact. It’s important to stay informed about these updates and how they might affect your infrastructure.
  • Test in a Staging Environment: Always test updates in a non-production environment first. This can save you from unexpected outages and help you catch issues early.
  • Be Proactive: Regularly review and update your code to ensure it’s following the latest best practices. This way, you’ll be better prepared for any future changes.

Conclusion

In summary, while the switch to IMDSv2 caused some headaches, it’s ultimately a positive change that improves security and aligns with modern best practices. By updating your scripts and being proactive about these kinds of changes, you can avoid the pitfalls and keep your AWS environment running smoothly.

If you haven’t already, take the time to review your current setup and make sure you’re ready for IMDSv2. It’s a small effort that can make a big difference in keeping your infrastructure secure and reliable.