tadata
Back to home

Storage Tiering & Costs: Optimizing Data Lifecycle Across Clouds

#cloud#storage#finops#data-engineering#cost-optimization

Storage is often the largest and most overlooked line item in cloud bills. The key insight: not all data deserves the same tier. Moving cold data to cheaper tiers and automating lifecycle policies can cut storage costs by 50-70% without impacting performance.

Storage Tier Comparison (per GB/month, approximate 2026 pricing)

TierAWS S3GCP GCSAzure BlobAccess latencyUse case
Hot / Standard$0.023$0.020$0.018MillisecondsActive applications, serving
Infrequent Access$0.0125$0.010$0.010MillisecondsMonthly reports, backups
One Zone IA$0.010$0.010 (Cool)MillisecondsReproducible data, logs
Archive / Coldline$0.004 (Glacier IR)$0.004$0.002 (Cold)MinutesCompliance, audit trails
Deep Archive$0.00099$0.0012$0.00099 (Archive)Hours (12h+)Legal hold, raw backups

Retrieval costs matter. Deep archive is cheap to store but expensive to retrieve. A full restore of 10 TB from Glacier Deep Archive costs ~200inretrievalfeesplus200 in retrieval fees plus 100 in request fees.

Access Pattern Decision Matrix

PatternReads/monthLatency needRecommended tierSavings vs Hot
Real-time serving>1000/object<50msHot / StandardBaseline
Weekly analytics10-100/object<1sInfrequent Access~45%
Monthly reporting1-10/object<1sInfrequent Access~45%
Quarterly compliance<1/objectMinutes OKArchive / Coldline~80%
Legal retention~0/objectHours OKDeep Archive~95%
ML training dataBurst reads<1sIA + caching layer~40%

Lifecycle Automation Strategy

Day 0              Day 30             Day 90             Day 365            Day 2555 (7y)
  │                  │                  │                   │                   │
  ▼                  ▼                  ▼                   ▼                   ▼
┌─────────┐    ┌───────────┐    ┌────────────┐    ┌──────────────┐    ┌─────────┐
│   Hot   │───>│Infrequent │───>│  Archive   │───>│ Deep Archive │───>│ DELETE  │
│ Standard│    │  Access   │    │ Glacier IR │    │              │    │         │
└─────────┘    └───────────┘    └────────────┘    └──────────────┘    └─────────┘
  100%            ~45% cost        ~80% cost         ~95% cost         $0

Key lifecycle rules to implement:

  • Incomplete multipart uploads: Delete after 7 days (often forgotten, accumulates cost)
  • Previous versions: Move to IA after 30 days, delete after 90 days
  • Access logging: Archive after 30 days, deep archive after 180 days
  • Data lake raw zone: IA after 60 days, archive after 180 days

Cost Savings Estimation

Data categoryVolumeCurrent tierTarget tierMonthly beforeMonthly afterSavings
Application logs5 TBStandardIA (30d) + Archive (90d)$115$35$80/mo
Raw data lake20 TBStandardIA after 60d$460$250$210/mo
ML training sets10 TBStandardIA + prefetch$230$125$105/mo
Compliance archives50 TBStandardDeep Archive$1,150$50$1,100/mo
Database backups8 TBStandardIA (7d) + Glacier (30d)$184$48$136/mo
Total93 TB$2,139$508$1,631/mo

That is a 76% reduction on storage alone.

Intelligent Tiering: When to Use It

AWS S3 Intelligent-Tiering automatically moves objects between tiers based on access patterns. It charges a small monitoring fee ($0.0025/1000 objects/month) but eliminates the need for manual lifecycle policies.

ScenarioUse Intelligent-Tiering?Rationale
Unpredictable access patternsYesAutomatic optimization
Well-understood lifecycleNoManual rules are cheaper (no monitoring fee)
Many small objectsNoMonitoring fee dominates
Few large objectsYesMonitoring fee negligible
Compliance with fixed retentionNoLifecycle rules + legal hold

Resources

:::