Table of Contents
Running out of IP addresses in your subnet can bring your infrastructure to a halt. When your applications can't get new IP addresses, deployments fail, auto-scaling breaks, and new services won't start.
This problem hits hardest during traffic spikes or when teams deploy multiple services simultaneously. By the time you notice the issue, it's often too late - your applications are already failing.
AWS Lambda offers a simple solution to monitor subnet IP availability and alert you before you hit the limit. This guide shows you how to build an automated monitoring system that tracks available IPs and sends alerts when capacity drops below your threshold.
Architecture Overview
graph TD
A[CloudWatch Events] --> B[Lambda Function]
B --> C[EC2 API]
C --> D[Subnet Analysis]
D --> E{IPs Below Threshold?}
E -->|Yes| F[SNS Topic]
E -->|No| G[CloudWatch Metrics]
F --> H[Email/Slack Alert]
G --> I[Dashboard]
Implementation
Step 1: Create the Lambda Function
Create a new Lambda function with the following configuration:
Runtime: Python 3.9+
Memory: 128 MB
Timeout: 30 seconds
import boto3
import json
import os
def lambda_handler(event, context):
ec2 = boto3.client('ec2')
sns = boto3.client('sns')
cloudwatch = boto3.client('cloudwatch')
# Configuration from environment variables
subnet_ids = os.environ['SUBNET_IDS'].split(',')
threshold = int(os.environ['THRESHOLD'])
sns_topic_arn = os.environ['SNS_TOPIC_ARN']
for subnet_id in subnet_ids:
subnet_id = subnet_id.strip()
available_ips = get_available_ips(ec2, subnet_id)
# Send metrics to CloudWatch
send_cloudwatch_metric(cloudwatch, subnet_id, available_ips)
# Check threshold and alert if needed
if available_ips < threshold:
send_alert(sns, sns_topic_arn, subnet_id, available_ips, threshold)
return {
'statusCode': 200,
'body': json.dumps('IP monitoring completed successfully')
}
def get_available_ips(ec2, subnet_id):
try:
response = ec2.describe_subnets(SubnetIds=[subnet_id])
subnet = response['Subnets'][0]
# Calculate total IPs (subtract 5 AWS reserved IPs)
total_ips = 2 ** (32 - int(subnet['CidrBlock'].split('/')[1])) - 5
# Get used IPs from network interfaces
ni_response = ec2.describe_network_interfaces(
Filters=[{'Name': 'subnet-id', 'Values': [subnet_id]}]
)
used_ips = len(ni_response['NetworkInterfaces'])
available_ips = total_ips - used_ips
return max(0, available_ips)
except Exception as e:
print(f"Error processing subnet {subnet_id}: {str(e)}")
return 0
def send_cloudwatch_metric(cloudwatch, subnet_id, available_ips):
try:
cloudwatch.put_metric_data(
Namespace='AWS/Subnet',
MetricData=[
{
'MetricName': 'AvailableIPs',
'Dimensions': [
{
'Name': 'SubnetId',
'Value': subnet_id
}
],
'Value': available_ips,
'Unit': 'Count'
}
]
)
except Exception as e:
print(f"Error sending CloudWatch metric: {str(e)}")
def send_alert(sns, topic_arn, subnet_id, available_ips, threshold):
try:
message = f"""
ALERT: Low IP availability in subnet {subnet_id}
Available IPs: {available_ips}
Threshold: {threshold}
Action Required: Consider expanding subnet or cleaning up unused resources.
"""
sns.publish(
TopicArn=topic_arn,
Message=message,
Subject=f'Low IP Alert - Subnet {subnet_id}'
)
except Exception as e:
print(f"Error sending SNS alert: {str(e)}")
Step 2: Configure Environment Variables
Set these environment variables in your Lambda function:
SUBNET_IDS=subnet-12345,subnet-67890 # Comma-separated list
THRESHOLD=10 # Alert when below this number
SNS_TOPIC_ARN=arn:aws:sns:region:account:topic-name
Step 3: Create IAM Role
Your Lambda function needs these permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeSubnets",
"ec2:DescribeNetworkInterfaces"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"sns:Publish"
],
"Resource": "arn:aws:sns:*:*:*"
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
]
}
Step 4: Set Up CloudWatch Events
Create a CloudWatch Events rule to run your Lambda function periodically:
aws events put-rule \
--name subnet-ip-monitor \
--schedule-expression "rate(15 minutes)"
aws events put-targets \
--rule subnet-ip-monitor \
--targets "Id"="1","Arn"="arn:aws:lambda:region:account:function:subnet-monitor"
Step 5: Create SNS Topic for Alerts
aws sns create-topic --name subnet-ip-alerts
aws sns subscribe \
--topic-arn arn:aws:sns:region:account:subnet-ip-alerts \
--protocol email \
--notification-endpoint your-email@company.com
Enhanced Monitoring with CloudWatch Dashboard
Create a dashboard to visualize IP usage across subnets:
{
"widgets": [
{
"type": "metric",
"properties": {
"metrics": [
["AWS/Subnet", "AvailableIPs", "SubnetId", "subnet-12345"],
[".", ".", ".", "subnet-67890"]
],
"period": 300,
"stat": "Average",
"region": "us-east-1",
"title": "Available IPs by Subnet"
}
}
]
}
Advanced Features
Multi-Region Support
Extend the function to monitor subnets across regions:
def monitor_all_regions():
regions = ['us-east-1', 'us-west-2', 'eu-west-1']
for region in regions:
ec2 = boto3.client('ec2', region_name=region)
# Process subnets in this region
Slack Integration
Replace SNS with direct Slack notifications:
import requests
def send_slack_alert(webhook_url, subnet_id, available_ips):
payload = {
"text": f"🚨 Low IP Alert: Subnet {subnet_id} has only {available_ips} IPs available"
}
requests.post(webhook_url, json=payload)
Cost Optimization
This monitoring solution costs approximately:
- Lambda executions: $0.20/month (96 runs/day × 30 days)
- CloudWatch metrics: $0.30/month per subnet
- SNS notifications: $0.50/month per 1000 messages
Total monthly cost for monitoring 5 subnets: ~$2.50
Security Considerations
- Least Privilege: Use specific resource ARNs where possible
- Encryption: Enable encryption for SNS topics
- VPC Endpoints: Use VPC endpoints if Lambda runs in private subnets
- Secrets: Store sensitive values in AWS Secrets Manager
Conclusion
Monitoring subnet IP availability prevents infrastructure outages and deployment failures. This Lambda-based solution provides real-time visibility into IP usage with minimal cost and complexity.
The automated alerting ensures your team knows about capacity issues before they impact production. Combined with CloudWatch dashboards, you get complete visibility into your network resource utilization.
Set up this monitoring system before you need it - when you're scrambling to diagnose why deployments are failing, it's too late to implement proper monitoring.