Quick Summary: This case study details proven strategies for optimizing AWS costs in enterprise environments. I'll share the tools, techniques, and decision-making process that led to significant savings without compromising on performance or security.
💰 The Starting Point: Enterprise Cost Challenge
In many enterprise environments, AWS bills grow faster than revenue due to inefficient resource allocation. This case study examines a scenario where approximately $30,000 per month was spent on AWS infrastructure, with much of this spending being inefficient.
Before Optimization
Monthly AWS Spend
- Over-provisioned instances
- No reserved instances
- Inefficient storage
- No cost monitoring
After Optimization
Monthly AWS Spend
- Right-sized instances
- Strategic reserved instances
- Optimized storage
- Continuous monitoring
📊 Our Cost Optimization Strategy
We approached cost optimization systematically, focusing on the highest-impact areas first. Here's our proven methodology:
Right-Sizing Instances
Analyzed actual usage patterns and downsized over-provisioned instances. Saved 25% on EC2 costs.
Reserved Instances
Purchased 1-year and 3-year RIs for predictable workloads. Saved 30-60% on baseline compute.
Auto Scaling
Implemented intelligent auto-scaling to match capacity with demand. Reduced idle resources by 40%.
Storage Optimization
Optimized S3 storage classes and EBS volumes. Saved 35% on storage costs.
🔍 Step 1: Cost Analysis & Discovery
Before making any changes, we needed to understand where our money was going. Here's how we analyzed our AWS spending:
Tools We Used for Analysis
- AWS Cost Explorer: Identified spending patterns and trends
- AWS Trusted Advisor: Found immediate optimization opportunities
- CloudWatch: Analyzed resource utilization metrics
- Custom Scripts: Automated cost reporting and alerting
# AWS CLI Script for Cost Analysis
#!/bin/bash
# Get top 10 most expensive services
echo "Top 10 AWS Services by Cost (Last 30 days):"
aws ce get-cost-and-usage \
--time-period Start=2024-11-01,End=2024-12-01 \
--granularity MONTHLY \
--metrics "BlendedCost" \
--group-by Type=DIMENSION,Key=SERVICE \
--query 'ResultsByTime[0].Groups[?Amount.Amount>`10`]|sort_by(@, &Amount.Amount)|reverse(@)|[:10]'
# Get untagged resources (cost allocation)
echo "Untagged EC2 Instances:"
aws ec2 describe-instances \
--query 'Reservations[].Instances[?!Tags].[InstanceId,InstanceType,State.Name]' \
--output table
Key Findings from Our Analysis
Cost Breakdown Discovery:
- EC2 Instances: 60% of total cost ($18,000/month)
- RDS Databases: 20% of total cost ($6,000/month)
- Data Transfer: 10% of total cost ($3,000/month)
- Storage (S3, EBS): 10% of total cost ($3,000/month)
⚡ Step 2: Right-Sizing EC2 Instances
Our biggest win came from right-sizing EC2 instances. We found that 70% of our instances were over-provisioned.
Right-Sizing Process
- Monitor for 2 weeks: Collected CPU, memory, and network utilization
- Identify candidates: Found instances with <30% average utilization
- Test downsizing: Gradually reduced instance sizes in staging
- Implement changes: Applied optimizations to production
# Python script to identify right-sizing opportunities
import boto3
import json
from datetime import datetime, timedelta
cloudwatch = boto3.client('cloudwatch')
ec2 = boto3.client('ec2')
def get_instance_utilization(instance_id, days=14):
"""Get average CPU utilization for an instance"""
end_time = datetime.utcnow()
start_time = end_time - timedelta(days=days)
response = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
Dimensions=[
{'Name': 'InstanceId', 'Value': instance_id}
],
StartTime=start_time,
EndTime=end_time,
Period=3600, # 1 hour periods
Statistics=['Average']
)
if response['Datapoints']:
avg_cpu = sum(d['Average'] for d in response['Datapoints']) / len(response['Datapoints'])
return round(avg_cpu, 2)
return 0
# Analyze all running instances
instances = ec2.describe_instances(
Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
)
rightsizing_candidates = []
for reservation in instances['Reservations']:
for instance in reservation['Instances']:
instance_id = instance['InstanceId']
instance_type = instance['InstanceType']
avg_cpu = get_instance_utilization(instance_id)
if avg_cpu < 30: # Under-utilized threshold
rightsizing_candidates.append({
'InstanceId': instance_id,
'InstanceType': instance_type,
'AvgCPU': avg_cpu,
'Recommendation': 'Consider downsizing'
})
print(f"Found {len(rightsizing_candidates)} right-sizing candidates:")
for candidate in rightsizing_candidates:
print(f"Instance: {candidate['InstanceId']} ({candidate['InstanceType']}) - Avg CPU: {candidate['AvgCPU']}%")
Right-Sizing Results
Instance Optimization Results:
- Downsized 15 instances from m5.large to m5.medium
- Switched 8 instances from m5.xlarge to m5.large
- Moved development instances to t3.medium with burstable performance
- Total Savings: $7,500/month (25% of EC2 costs)
📅 Step 3: Strategic Reserved Instance Purchases
After right-sizing, we analyzed our stable workloads and purchased Reserved Instances strategically:
Our RI Strategy
- Production instances: 3-year All Upfront RIs (50-60% savings)
- Staging instances: 1-year Partial Upfront RIs (30-40% savings)
- Development instances: On-demand with spot instances for testing
{
"reserved_instance_strategy": {
"production": {
"instance_types": ["m5.large", "m5.xlarge", "c5.large"],
"commitment": "3_year_all_upfront",
"expected_savings": "50-60%",
"monthly_savings": "$4500"
},
"staging": {
"instance_types": ["t3.medium", "t3.large"],
"commitment": "1_year_partial_upfront",
"expected_savings": "30-40%",
"monthly_savings": "$1200"
},
"development": {
"strategy": "spot_instances",
"fallback": "on_demand",
"expected_savings": "60-70%",
"monthly_savings": "$800"
}
}
}
📈 Step 4: Auto Scaling Implementation
We implemented intelligent auto-scaling to automatically adjust capacity based on demand:
# Auto Scaling Group Configuration
Resources:
WebServerAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
VPCZoneIdentifier:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
LaunchTemplate:
LaunchTemplateId: !Ref WebServerLaunchTemplate
Version: !GetAtt WebServerLaunchTemplate.LatestVersionNumber
MinSize: 2
MaxSize: 10
DesiredCapacity: 3
TargetGroupARNs:
- !Ref ApplicationLoadBalancerTargetGroup
HealthCheckType: ELB
HealthCheckGracePeriod: 300
WebServerScaleUpPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AdjustmentType: ChangeInCapacity
AutoScalingGroupName: !Ref WebServerAutoScalingGroup
Cooldown: 300
ScalingAdjustment: 2
WebServerScaleDownPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AdjustmentType: ChangeInCapacity
AutoScalingGroupName: !Ref WebServerAutoScalingGroup
Cooldown: 300
ScalingAdjustment: -1
💾 Step 5: Storage Optimization
We optimized both S3 and EBS storage to reduce costs:
S3 Optimization Strategy
{
"Rules": [
{
"ID": "LogsLifecycle",
"Status": "Enabled",
"Filter": {
"Prefix": "logs/"
},
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER"
},
{
"Days": 365,
"StorageClass": "DEEP_ARCHIVE"
}
]
},
{
"ID": "BackupsLifecycle",
"Status": "Enabled",
"Filter": {
"Prefix": "backups/"
},
"Transitions": [
{
"Days": 7,
"StorageClass": "STANDARD_IA"
},
{
"Days": 30,
"StorageClass": "GLACIER"
}
],
"Expiration": {
"Days": 2555 // 7 years retention
}
}
]
}
EBS Optimization
- Converted unutilized Provisioned IOPS volumes to GP3
- Implemented automated EBS snapshot lifecycle management
- Right-sized EBS volumes based on actual usage
- Enabled EBS optimization for all instances
📊 Step 6: Continuous Monitoring & Alerting
We set up comprehensive cost monitoring to prevent cost drift:
# CloudWatch Billing Alarm
BillingAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmDescription: 'AWS Bill Alert - Monthly spend exceeds $20,000'
AlarmActions:
- !Ref BillingAlarmTopic
MetricName: EstimatedCharges
Namespace: AWS/Billing
Statistic: Maximum
Period: 86400 # 24 hours
EvaluationPeriods: 1
Threshold: 20000
ComparisonOperator: GreaterThanThreshold
Dimensions:
- Name: Currency
Value: USD
# Daily Cost Report Lambda
DailyCostReportFunction:
Type: AWS::Lambda::Function
Properties:
Runtime: python3.9
Handler: index.lambda_handler
Code:
ZipFile: |
import boto3
import json
from datetime import datetime, timedelta
def lambda_handler(event, context):
ce = boto3.client('ce')
sns = boto3.client('sns')
# Get yesterday's costs
yesterday = datetime.now() - timedelta(days=1)
start_date = yesterday.strftime('%Y-%m-%d')
end_date = datetime.now().strftime('%Y-%m-%d')
response = ce.get_cost_and_usage(
TimePeriod={
'Start': start_date,
'End': end_date
},
Granularity='DAILY',
Metrics=['BlendedCost'],
GroupBy=[
{'Type': 'DIMENSION', 'Key': 'SERVICE'}
]
)
# Format and send report
costs = response['ResultsByTime'][0]['Groups']
total_cost = sum(float(cost['Metrics']['BlendedCost']['Amount']) for cost in costs)
message = f"Daily AWS Cost Report for {start_date}:\n"
message += f"Total Cost: ${total_cost:.2f}\n\n"
for cost in sorted(costs, key=lambda x: float(x['Metrics']['BlendedCost']['Amount']), reverse=True)[:10]:
service = cost['Keys'][0]
amount = float(cost['Metrics']['BlendedCost']['Amount'])
message += f"{service}: ${amount:.2f}\n"
sns.publish(
TopicArn=os.environ['SNS_TOPIC_ARN'],
Subject='Daily AWS Cost Report',
Message=message
)
return {'statusCode': 200}
🎯 Results & Key Metrics
Final Results After 6 Months
📉 40% Cost Reduction: From $30,000 to $18,000 monthly
💰 Annual Savings: $144,000 per year
⚡ Performance: Improved by 15% through right-sizing
🔄 Automation: 90% of optimizations now automated
Breakdown of Savings by Strategy
- Right-sizing instances: $7,500/month (25% of total savings)
- Reserved instances: $6,500/month (22% of total savings)
- Storage optimization: $2,000/month (7% of total savings)
- Auto-scaling & spot instances: $1,800/month (6% of total savings)
- Network optimization: $1,200/month (4% of total savings)
📚 Lessons Learned & Best Practices
What Worked Best
- Start with biggest spenders: Focus on services consuming 80% of your budget
- Monitor before optimizing: Collect 2+ weeks of metrics before making changes
- Automate everything: Manual processes don't scale and lead to cost drift
- Regular reviews: Monthly cost reviews prevent regression
Common Pitfalls to Avoid
- Over-optimization: Don't sacrifice performance for minor savings
- Ignoring data transfer costs: These can add up quickly
- No monitoring: Costs will drift without continuous monitoring
- Team silos: Include developers in cost optimization efforts
🚀 Next Steps & Continuous Improvement
Cost optimization is an ongoing process. Here's what we're implementing next:
Future Optimization Plans:
- Kubernetes cost optimization: Implement cluster autoscaling and resource quotas
- Serverless migration: Move appropriate workloads to Lambda
- Multi-cloud strategy: Evaluate competitive pricing for specific workloads
- Advanced monitoring: Implement cost allocation tagging
🛠️ Tools & Resources
Here are the essential tools that made our cost optimization successful:
- AWS Cost Explorer: Primary cost analysis tool
- AWS Trusted Advisor: Automated recommendations
- CloudWatch: Resource utilization monitoring
- AWS Cost Anomaly Detection: Automated cost spike alerts
- Terraform: Infrastructure as code for consistent deployments
- Custom dashboards: Real-time cost visualization
Ready to optimize your AWS costs? I'd love to help you implement similar strategies for your infrastructure. Connect with me to discuss your specific challenges and optimization opportunities.
📧 Email: mddavid11204@gmail.com
💼 LinkedIn:
davidwebmaster2002
🌐 Portfolio:
davidwebmaster.xyz
About the Author: David M is a DevOps & Observability Engineer at Finstein, specializing in cloud cost optimization, AWS architecture, and infrastructure automation. He has experience helping organizations reduce their cloud costs by 30-50% while improving performance and reliability.