Storage Tiering & Costs: Optimizing Data Lifecycle Across Clouds
#cloud#storage#finops#data-engineering#cost-optimization
Storage is often the largest and most overlooked line item in cloud bills. The key insight: not all data deserves the same tier. Moving cold data to cheaper tiers and automating lifecycle policies can cut storage costs by 50-70% without impacting performance.
Storage Tier Comparison (per GB/month, approximate 2026 pricing)
| Tier | AWS S3 | GCP GCS | Azure Blob | Access latency | Use case |
|---|---|---|---|---|---|
| Hot / Standard | $0.023 | $0.020 | $0.018 | Milliseconds | Active applications, serving |
| Infrequent Access | $0.0125 | $0.010 | $0.010 | Milliseconds | Monthly reports, backups |
| One Zone IA | $0.010 | — | $0.010 (Cool) | Milliseconds | Reproducible data, logs |
| Archive / Coldline | $0.004 (Glacier IR) | $0.004 | $0.002 (Cold) | Minutes | Compliance, audit trails |
| Deep Archive | $0.00099 | $0.0012 | $0.00099 (Archive) | Hours (12h+) | Legal hold, raw backups |
Retrieval costs matter. Deep archive is cheap to store but expensive to retrieve. A full restore of 10 TB from Glacier Deep Archive costs ~100 in request fees.
Access Pattern Decision Matrix
| Pattern | Reads/month | Latency need | Recommended tier | Savings vs Hot |
|---|---|---|---|---|
| Real-time serving | >1000/object | <50ms | Hot / Standard | Baseline |
| Weekly analytics | 10-100/object | <1s | Infrequent Access | ~45% |
| Monthly reporting | 1-10/object | <1s | Infrequent Access | ~45% |
| Quarterly compliance | <1/object | Minutes OK | Archive / Coldline | ~80% |
| Legal retention | ~0/object | Hours OK | Deep Archive | ~95% |
| ML training data | Burst reads | <1s | IA + caching layer | ~40% |
Lifecycle Automation Strategy
Day 0 Day 30 Day 90 Day 365 Day 2555 (7y)
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌─────────┐ ┌───────────┐ ┌────────────┐ ┌──────────────┐ ┌─────────┐
│ Hot │───>│Infrequent │───>│ Archive │───>│ Deep Archive │───>│ DELETE │
│ Standard│ │ Access │ │ Glacier IR │ │ │ │ │
└─────────┘ └───────────┘ └────────────┘ └──────────────┘ └─────────┘
100% ~45% cost ~80% cost ~95% cost $0
Key lifecycle rules to implement:
- Incomplete multipart uploads: Delete after 7 days (often forgotten, accumulates cost)
- Previous versions: Move to IA after 30 days, delete after 90 days
- Access logging: Archive after 30 days, deep archive after 180 days
- Data lake raw zone: IA after 60 days, archive after 180 days
Cost Savings Estimation
| Data category | Volume | Current tier | Target tier | Monthly before | Monthly after | Savings |
|---|---|---|---|---|---|---|
| Application logs | 5 TB | Standard | IA (30d) + Archive (90d) | $115 | $35 | $80/mo |
| Raw data lake | 20 TB | Standard | IA after 60d | $460 | $250 | $210/mo |
| ML training sets | 10 TB | Standard | IA + prefetch | $230 | $125 | $105/mo |
| Compliance archives | 50 TB | Standard | Deep Archive | $1,150 | $50 | $1,100/mo |
| Database backups | 8 TB | Standard | IA (7d) + Glacier (30d) | $184 | $48 | $136/mo |
| Total | 93 TB | $2,139 | $508 | $1,631/mo |
That is a 76% reduction on storage alone.
Intelligent Tiering: When to Use It
AWS S3 Intelligent-Tiering automatically moves objects between tiers based on access patterns. It charges a small monitoring fee ($0.0025/1000 objects/month) but eliminates the need for manual lifecycle policies.
| Scenario | Use Intelligent-Tiering? | Rationale |
|---|---|---|
| Unpredictable access patterns | Yes | Automatic optimization |
| Well-understood lifecycle | No | Manual rules are cheaper (no monitoring fee) |
| Many small objects | No | Monitoring fee dominates |
| Few large objects | Yes | Monitoring fee negligible |
| Compliance with fixed retention | No | Lifecycle rules + legal hold |
Resources
- AWS S3 Storage Classes
- GCP Cloud Storage Classes
- Azure Blob Access Tiers
- Infracost — Shift-left Cloud Costs
:::