AWS for Backend Engineers
March 31, 2026|7 min read
Lesson 11 / 15

11. Cost Optimization — Stop Wasting Money

The median startup wastes 30-35% of its AWS spend. Large enterprises waste even more. The problem is not that AWS is expensive — it is that engineers deploy resources and forget about them, pick the wrong pricing model, or ignore the data transfer tax that silently eats their budget.

This lesson covers practical, engineering-driven strategies to cut your AWS bill without sacrificing performance or reliability.

AWS Pricing Models

AWS offers four main pricing models for compute. Choosing the right one is the single highest-leverage cost decision you will make.

Model Discount Commitment Best For
On-Demand 0% None Development, spiky workloads
Reserved Instances Up to 72% 1 or 3 years Steady-state databases, baseline compute
Savings Plans Up to 72% 1 or 3 years Flexible compute (can change instance types)
Spot Instances Up to 90% None (can be interrupted) Batch processing, CI/CD, stateless workers

Reserved Instances vs Savings Plans

Reserved Instances lock you to a specific instance type in a specific region. Savings Plans commit to a dollar-per-hour spend, but let you change instance types, regions (for Compute Savings Plans), and even services.

Decision guide:

  • Running RDS or ElastiCache? Reserved Instances (Savings Plans do not cover these).
  • Running EC2 with stable baseline? Compute Savings Plans (flexible across instance families).
  • Running Lambda with predictable usage? Compute Savings Plans (covers Lambda too).
  • Not sure about future architecture? Start with Savings Plans — less lock-in.
# Check your Savings Plans recommendations
aws ce get-savings-plans-purchase-recommendation \
  --savings-plans-type COMPUTE_SP \
  --term-in-years ONE_YEAR \
  --payment-option NO_UPFRONT \
  --lookback-period-in-days SIXTY_DAYS

Spot Instances

Spot Instances give you spare EC2 capacity at up to 90% discount. The catch: AWS can reclaim them with 2 minutes notice.

Use them for:

  • Batch processing and data pipelines
  • CI/CD build runners
  • Worker nodes in ECS/EKS clusters (with proper drain handling)
  • Load testing

Never use them for:

  • Databases
  • Single-instance applications
  • Anything that cannot tolerate interruption
# Launch a spot instance with interruption handling
import boto3

ec2 = boto3.client('ec2')

response = ec2.run_instances(
    ImageId='ami-0abcdef1234567890',
    InstanceType='c6g.xlarge',  # Graviton - cheaper than x86
    MinCount=1,
    MaxCount=1,
    InstanceMarketOptions={
        'MarketType': 'spot',
        'SpotOptions': {
            'SpotInstanceType': 'one-time',
            'InstanceInterruptionBehavior': 'terminate'
        }
    },
    TagSpecifications=[{
        'ResourceType': 'instance',
        'Tags': [{'Key': 'Purpose', 'Value': 'batch-processing'}]
    }]
)

The Silent Cost Killers

NAT Gateway — The $32/month-per-GB Tax

NAT Gateway charges $0.045/hour ($32.40/month) plus $0.045 per GB processed. A single NAT Gateway handling 1 TB/month of traffic costs $77.40/month. Most engineers do not realize how much traffic flows through their NAT Gateway.

Cost reduction strategies:

  1. Gateway VPC endpoints for S3 and DynamoDB — free, and they keep traffic off the NAT Gateway
  2. Interface VPC endpoints for other AWS services ($7.20/month each, but cheaper than NAT at scale)
  3. Place Lambda functions outside VPC unless they need VPC resources
  4. Use NAT instances for dev/staging (t3.nano at $3.75/month vs $32.40 for NAT Gateway)
# Create a free S3 gateway endpoint — stops S3 traffic from hitting NAT
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-abc123 \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-private123

Data Transfer Costs

Data transfer pricing on AWS is asymmetric and confusing:

Path Cost
Data into AWS Free
Data out to internet $0.09/GB (first 10 TB)
Cross-AZ within region $0.01/GB each way ($0.02 total)
Cross-region $0.02/GB
Same AZ Free (using private IP)

Cross-AZ traffic adds up fast. An application doing 100 GB/day of cross-AZ traffic pays $60/month just for internal networking. Multi-AZ is critical for availability, but be intentional about which traffic actually needs to cross AZ boundaries.

Unused EBS Volumes

When you terminate an EC2 instance, its EBS volumes do not automatically delete (unless you set DeleteOnTermination). These orphaned volumes sit there costing money:

# Find all unattached EBS volumes
aws ec2 describe-volumes \
  --filters "Name=status,Values=available" \
  --query 'Volumes[*].{ID:VolumeId,Size:Size,Type:VolumeType,Created:CreateTime}' \
  --output table

A forgotten 500 GB gp3 volume costs $40/month. Multiply that across dozens of engineers over months, and you have a real budget leak.

Idle Load Balancers

ALBs cost $16.20/month minimum, plus LCU charges. If you have load balancers with zero registered targets or zero requests, you are paying for nothing:

# Find ALBs with no healthy targets
for arn in $(aws elbv2 describe-load-balancers --query 'LoadBalancers[*].LoadBalancerArn' --output text); do
  tg_arns=$(aws elbv2 describe-target-groups --load-balancer-arn "$arn" --query 'TargetGroups[*].TargetGroupArn' --output text)
  for tg in $tg_arns; do
    health=$(aws elbv2 describe-target-health --target-group-arn "$tg" --query 'TargetHealthDescriptions[*].TargetHealth.State' --output text)
    if [ -z "$health" ]; then
      echo "IDLE: $arn (target group $tg has no targets)"
    fi
  done
done

Unattached Elastic IPs

An Elastic IP that is not attached to a running instance costs $0.005/hour ($3.60/month). It is small per IP, but organizations often accumulate dozens:

# Find unattached EIPs
aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==`null`].{IP:PublicIp,AllocId:AllocationId}' \
  --output table

S3 Storage Class Optimization

S3 has seven storage classes. Using the wrong one wastes money:

Class Storage Cost (GB/mo) Retrieval Best For
Standard $0.023 Free Frequently accessed data
Intelligent-Tiering $0.023 (+ monitoring fee) Free Unknown access patterns
Standard-IA $0.0125 $0.01/GB Monthly access
One Zone-IA $0.01 $0.01/GB Reproducible, infrequent data
Glacier Instant $0.004 $0.03/GB Quarterly access, millisecond retrieval
Glacier Flexible $0.0036 Minutes to hours Annual access
Glacier Deep Archive $0.00099 12-48 hours Compliance archives

Use S3 Lifecycle policies to automatically transition objects:

{
  "Rules": [
    {
      "ID": "OptimizeStorageCosts",
      "Status": "Enabled",
      "Filter": {"Prefix": "logs/"},
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER_IR"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {
        "Days": 2555
      }
    }
  ]
}

For data with unpredictable access patterns, S3 Intelligent-Tiering automatically moves objects between tiers. It costs $0.0025 per 1,000 objects/month for monitoring, but saves money on anything accessed less than once a month.

Lambda Cost Optimization

Lambda billing has two dimensions: requests ($0.20 per million) and duration (based on memory allocated).

Cost Optimization Decision Tree

Memory Tuning

Lambda charges per GB-second. More memory means higher per-ms cost but often lower duration. The sweet spot minimizes total cost (memory x duration):

# Use AWS Lambda Power Tuning to find the optimal memory
# https://github.com/alexcasalboni/aws-lambda-power-tuning

# Example results for a data processing function:
# 128 MB  → 3200 ms → cost: $0.0000066672 (slow and cheap per-ms, expensive total)
# 512 MB  → 1100 ms → cost: $0.0000091850 (balanced)
# 1024 MB →  620 ms → cost: $0.0000103432 (fast but more expensive)
# 1769 MB →  380 ms → cost: $0.0000109458 (full vCPU, fast, most expensive)

# For this function, 512 MB is the cost-optimal configuration

ARM/Graviton

Lambda functions on ARM (Graviton2) are 20% cheaper and often 10-15% faster than x86:

# In SAM template
MyFunction:
  Type: AWS::Serverless::Function
  Properties:
    Runtime: python3.12
    Architectures:
      - arm64  # 20% cheaper than x86_64
    MemorySize: 512
    Handler: app.handler

Unless your function uses native x86 binaries, always use ARM.

DynamoDB: On-Demand vs Provisioned

The cost crossover depends on your usage pattern:

Metric On-Demand Provisioned
Write cost $1.25 per million WRU $0.00065 per WCU/hour
Read cost $0.25 per million RRU $0.00013 per RCU/hour
Scaling Instant Auto-scaling (with delay)

Break-even point: If your table consistently uses more than ~18% of its provisioned capacity, provisioned mode is cheaper. Below that, on-demand wins.

On-demand write cost for 1M writes/day:
  1,000,000 × $1.25 / 1,000,000 = $1.25/day = $37.50/month

Provisioned equivalent (assuming even distribution):
  1,000,000 / 86,400 seconds ≈ 12 WCU
  12 WCU × $0.00065/hour × 730 hours = $5.69/month

Provisioned is 6.6x cheaper — IF traffic is steady.

Start with on-demand for new tables. Switch to provisioned with auto-scaling once you understand the traffic pattern.

Compute Optimizer

AWS Compute Optimizer analyzes your usage patterns and recommends right-sizing:

# Enable Compute Optimizer
aws compute-optimizer update-enrollment-status --status Active

# Get EC2 recommendations
aws compute-optimizer get-ec2-instance-recommendations \
  --query 'instanceRecommendations[*].{
    Instance:instanceArn,
    Current:currentInstanceType,
    Recommended:recommendationOptions[0].instanceType,
    Savings:recommendationOptions[0].estimatedMonthlySavings.value
  }' \
  --output table

Compute Optimizer covers EC2 instances, Auto Scaling groups, EBS volumes, and Lambda functions. Check it monthly.

Tagging Strategy for Cost Allocation

Without tags, your bill is a single incomprehensible number. With tags, you can attribute costs to teams, services, and environments:

# Required cost allocation tags
aws ce update-cost-allocation-tags-status \
  --cost-allocation-tags-status '[
    {"TagKey": "Environment", "Status": "Active"},
    {"TagKey": "Service", "Status": "Active"},
    {"TagKey": "Team", "Status": "Active"},
    {"TagKey": "CostCenter", "Status": "Active"}
  ]'

Enforce tagging with SCPs:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RequireTags",
      "Effect": "Deny",
      "Action": [
        "ec2:RunInstances",
        "lambda:CreateFunction",
        "rds:CreateDBInstance"
      ],
      "Resource": "*",
      "Condition": {
        "Null": {
          "aws:RequestTag/Environment": "true",
          "aws:RequestTag/Service": "true"
        }
      }
    }
  ]
}

No tags? No resources. This prevents unattributed spending from day one.

Cost Explorer and Budgets

Cost Explorer

Cost Explorer gives you visibility into where money is going:

# Get last month's cost breakdown by service
aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-03-31 \
  --granularity MONTHLY \
  --metrics BlendedCost \
  --group-by Type=DIMENSION,Key=SERVICE \
  --query 'ResultsByTime[0].Groups[?Metrics.BlendedCost.Amount>`10`] | sort_by(@, &to_number(Metrics.BlendedCost.Amount))[-10:]'

Budgets with Alerts

Set budgets and get alerted before bills surprise you:

aws budgets create-budget \
  --account-id 123456789012 \
  --budget '{
    "BudgetName": "monthly-total",
    "BudgetLimit": {"Amount": "5000", "Unit": "USD"},
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[{
    "Notification": {
      "NotificationType": "ACTUAL",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 80,
      "ThresholdType": "PERCENTAGE"
    },
    "Subscribers": [{
      "SubscriptionType": "EMAIL",
      "Address": "[email protected]"
    }]
  }]'

Set alerts at 50%, 80%, and 100% of your budget. Add a forecasted alert at 100% so you know if you are trending over before the month ends.

The Cost Optimization Checklist

Run through this monthly:

  • Review Cost Explorer — any unexpected service charges?
  • Check Compute Optimizer recommendations — any oversized instances?
  • Find and delete unused EBS volumes, idle load balancers, unattached EIPs
  • Review NAT Gateway data processing — can you add VPC endpoints?
  • Check Savings Plans utilization — are you using what you committed to?
  • Review S3 storage — are lifecycle policies moving cold data to cheaper tiers?
  • Verify DynamoDB tables are on the right capacity mode
  • Check Lambda memory settings — have traffic patterns changed?
  • Audit data transfer — is cross-AZ traffic higher than expected?
  • Ensure all resources are tagged for cost allocation

What is Next

Now that you know how to keep costs under control, the next lesson tackles one of the biggest architectural decisions you will face: Serverless vs Containers — when to use Lambda, when to use ECS, and when the answer is both.