A simple framework to help you Rightsize your cloud resources!

Table of Contents

Introduction

I’ve learnt a few things about autoscaling and rightsizing that tend to work well in most scenarios. No guarantee they’ll fit your use case exactly, but they can serve as helpful references to get started.

I’ve been in situations where I downsized resources for a customer and it caused an outage and I’ve also been in situations where I over-provisioned for a customer and they got not so happy for with the increase in bills.

In this blog, we’ll provide you simple thumb rules that you can look at to take rightsizing decisions!

  • What metrics actually matter for each service
  • How to avoid overcommitting early
  • When it’s safe to scale down

Before we go further, please remember - Rightsizing is not same as autoscaling. Autoscaling is something that happens daily without manual intervention. Rightsizing is one time tuning of your resources periodically. You can always use the tips provided here as a reference point in your rightsizing journey

Rightsizing Criteria by Service Type

Managed Databases (e.g., PostgreSQL, MySQL)

Key Metrics: CPU usage, connection count, read/write IOPS, query latency

Rule of Thumb:

If
- Average CPU < 40% consistently for 4 weeks
- Connection count well below max limit
- No sustained disk IOPS spikes
- query performance is consistent
Then: Consider dropping to the next lower instance class

Why are we suggesting to look for 40% DB utilisation for 4 weeks before downsizing? You have to be absolutely sure about the resource utilisation. 4 weeks is a good enough time to take an application through multiple cycles. If you prematurely downsize - an unexpected cron can bring your systems down.

Caching Services (e.g., Redis, Memcached)

Key Metrics: Memory usage, cache hit ratio, CPU utilization

Rule of Thumb:

If
- Memory usage < 60% consistently
- CPU usage < 40%
- High cache hit ratio (> 90%)
Then:
Memory usage is consistently below 60% and latency is stable
A smaller cache size or instance tier is likely appropriate.

EC2 Instances

Pro Tip: If you are autoscaling based on EC2 cloudwatch metrics - then do enable “Advanced Metrics” to get accurate data for autoscaling decisions. The free metrics have 5 min granularity and are extremely unreliable.

Thumb Rule to scale-down:

Average CPU < 40% for the last 30 days
Memory usage < 50% (if monitored)
Network and disk I/O remain low
No burstable credits are being used (for T-series)

Thumb Rule to scale-up

Average CPU > 60% consistently. 
Seeing frequent utilisation of burstable credits 

Bonus Tip:
Use AWS Compute Optimizer or Rightsizing Recommendations in Cost Explorer to identify oversized EC2s automatically. This can lead to massive cost saving. Over the years the recommendations provided by AWS have improved a lot. If you can afford - you should strongly consider buying the compute savings plans for longer commitments and lower bills.

AWS Lambda

Key Metrics:
Average memory used, invocation duration, error rate, concurrency

Rule of Thumb:

If:
Average memory used is < 60% of the allocated memory
Function duration is well below the timeout
No timeout or memory errors

Then:
→ Reduce memory size. This also lowers the allocated CPU and cost.
→ Profile your function using Lambda Power Tuning to find the sweet spot between performance and cost.

🛠 Bonus Tip:
Don’t overprovision Lambda memory "just in case." Instead, tune it using real metrics from CloudWatch Logs Insights or the Lambda console. 

Additional Considerations

Understand Scaling vs. Rightsizing

Autoscaling handles temporary spikes; rightsizing is about long-term efficiency. Use 30–90 day windows for analysis.

Switch to EC2 launch type If you can

If you are running 100s of tasks on ECS - switching to EC2 based runtimes can bring down the overall cost quite a lot. But do remember - this would increase the operational overhead - so assess if you’re ready for it or not.

Conclusion

Rightsizing decisions can be tough. Especially when you’re the one calling the shots. When you take a decision - you also take the responsibility of the outcomes that come with it. I hope this blog gave you a framework to think about rightsizing for certain AWS Services - these numbers apply to services on other platforms as well.

Want to get the rigthsizing done for your Infrastructure? Kubenine can help with a free audit.