How to Create Custom Metrics on Log Groups Using IaC

Table of Contents

Your application logs are full of useful data, but CloudWatch doesn't automatically turn them into metrics. You need metric filters to extract numbers from log events and create alarms that actually matter.

This guide shows you how to set up custom metrics from log groups using Infrastructure as Code.

Understanding Metric Filters

Metric filters scan log events and extract numerical data based on patterns you define. When a log event matches your filter pattern, it increments a custom CloudWatch metric.

Implementation with Terraform

Here's a complete example that creates a metric filter to track application errors and sets up an alarm:

# Create the log group
resource "aws_cloudwatch_log_group" "app_logs" {
  name              = "/aws/lambda/my-application"
  retention_in_days = 7
}

# Create metric filter for error tracking
resource "aws_cloudwatch_log_metric_filter" "error_filter" {
  name           = "application-error-count"
  log_group_name = aws_cloudwatch_log_group.app_logs.name
  pattern        = "[timestamp, request_id, ERROR, message]"
  
  metric_transformation {
    name      = "ApplicationErrors"
    namespace = "MyApp/Errors"
    value     = "1"
  }
}

# Create alarm for error metric
resource "aws_cloudwatch_metric_alarm" "error_alarm" {
  alarm_name          = "high-error-rate"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "ApplicationErrors"
  namespace           = "MyApp/Errors"
  period              = "300"
  statistic           = "Sum"
  threshold           = "10"
  alarm_description   = "This metric monitors application error rate"
  
  alarm_actions = [aws_sns_topic.alerts.arn]
}

# SNS topic for notifications
resource "aws_sns_topic" "alerts" {
  name = "app-error-alerts"
}

Advanced Metric Patterns

Track specific error types with different patterns:

# Database connection errors
resource "aws_cloudwatch_log_metric_filter" "db_errors" {
  name           = "database-connection-errors"
  log_group_name = aws_cloudwatch_log_group.app_logs.name
  pattern        = "?ERROR ?Database ?connection"
  
  metric_transformation {
    name      = "DatabaseErrors"
    namespace = "MyApp/Database"
    value     = "1"
  }
}

# Response time tracking
resource "aws_cloudwatch_log_metric_filter" "response_time" {
  name           = "api-response-time"
  log_group_name = aws_cloudwatch_log_group.app_logs.name
  pattern        = "[timestamp, request_id, duration_ms > 1000]"
  
  metric_transformation {
    name      = "SlowRequests"
    namespace = "MyApp/Performance"
    value     = "$duration_ms"
  }
}

Testing Your Setup

Verify metric filters work correctly:

# Generate test log entry
aws logs put-log-events \
  --log-group-name "/aws/lambda/my-application" \
  --log-stream-name "test-stream" \
  --log-events timestamp=$(date +%s000),message="ERROR Database connection failed"

# Check metric data
aws cloudwatch get-metric-statistics \
  --namespace "MyApp/Errors" \
  --metric-name "ApplicationErrors" \
  --start-time $(date -u -d '5 minutes ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum

Key Configuration Tips

Filter Patterns: Use specific patterns to avoid false positives. Test patterns with sample log data before deployment.

Metric Namespaces: Organize metrics with clear namespaces like MyApp/Errors or MyApp/Performance for easier management.

Alarm Thresholds: Start with conservative thresholds and adjust based on actual traffic patterns.

Cost Optimization: Limit metric filter scope to essential log groups. Each custom metric costs $0.30 per month.

Conclusion

Custom metrics from log groups provide real-time visibility into application behavior. Using IaC ensures your monitoring setup is reproducible and version-controlled. Start with basic error tracking, then expand to performance and business metrics as needed.

The combination of metric filters and alarms creates an automated monitoring system that alerts you to issues before they impact users.