tadata
Back to home

Database Sizing & FinOps: Managing Growth, Costs & Capacity

#databases#finops#postgresql#mongodb#cloud#cost-optimization

Databases are often the largest line item in cloud bills — and the hardest to right-size. This guide covers how to estimate, monitor, and optimize database size for both PostgreSQL and MongoDB, with practical FinOps strategies.

Why Database Sizing Matters

ImpactWhat happens when you get it wrong
CostOversized instances waste money; undersized ones cause outages
PerformanceWhen data exceeds RAM, queries hit disk — latency jumps 100x
OperationsLarge tables slow backups, migrations, and schema changes
ComplianceRetaining data beyond policy increases risk and storage costs

PostgreSQL: Sizing & Monitoring

Table and index sizes

Key queries every DBA should know:

What to measureWhy
Table size (data only)Understand raw data volume per entity
Table total size (data + indexes + TOAST)Actual disk footprint
Index sizeIndexes can exceed table size — a sign of over-indexing
Bloat (dead tuples)MVCC overhead — wasted space from updates/deletes
Row countOperational planning, partition decisions

Typical size reference points

Table rowsApproximate raw size (100-byte rows)With indexes (3 B-tree)Notes
100 K~10 MB~40 MBFits entirely in RAM, no concerns
1 M~100 MB~400 MBStill comfortable in most setups
10 M~1 GB~4 GBMonitor query plans, consider partial indexes
100 M~10 GB~40 GBPartitioning recommended, VACUUM tuning needed
1 B~100 GB~400 GBRequires partitioning, careful index strategy, dedicated hardware

Column type storage costs

TypeSize per valueNotes
BOOLEAN1 byte
SMALLINT2 bytesUse instead of INT when values < 32K
INTEGER4 bytes
BIGINT8 bytesDon't default to BIGINT if INT suffices
REAL4 bytes
DOUBLE PRECISION8 bytes
NUMERIC(p,s)Variable (5–20+ bytes)Expensive — use only when exact precision required
TEXT / VARCHAR1 byte + string length+ 4-byte overhead for strings > 126 bytes
UUID16 bytesLarger than INT (4B) — consider for distributed systems only
TIMESTAMPTZ8 bytes
JSONBVariableConvenient but expensive — denormalize hot paths to columns
ARRAYVariableOften better as a separate table for large/queried arrays

PostgreSQL bloat and MVCC overhead

PostgreSQL's MVCC creates dead tuples on every UPDATE and DELETE. Without proper VACUUM, tables can bloat to 2–5x their logical size.

Real-world example: A 10 GB table with heavy updates can grow to 30–50 GB if autovacuum is misconfigured. This directly impacts:

  • Storage costs (3–5x)
  • Backup size and duration
  • Query performance (scanning dead tuples)
  • Replication lag (more WAL to ship)

PostgreSQL FinOps levers

LeverImpactEffort
Right-size instance20–40 % cost reduction typicalLow — monitor CPU/RAM usage, downsize
Tune autovacuumReduce bloat 2–5xMedium — adjust per-table settings
Use appropriate column types10–30 % storage reductionMedium — audit schema types
Partial indexes50–90 % smaller indexesMedium — identify query patterns
Partition and drop old dataUp to 80 % storage savingsHigh — requires schema changes
TOAST compression50–75 % for large text/JSONLow — enabled by default, tune threshold
Move to ARM instances (Graviton)20 % cost savings on AWSLow — same PostgreSQL, cheaper compute
Reserved instances / committed use30–60 % vs on-demandLow — financial commitment
Read replicasOffload reads, smaller primaryMedium — application routing needed
Connection pooling (PgBouncer)Smaller instance sufficientMedium — deploy and configure pooler

MongoDB: Sizing & Monitoring

Document and collection sizes

What to measureWhy
Collection data sizeRaw document bytes
Collection storage sizeOn-disk with WiredTiger compression (~60–70 % of data size)
Index sizesEach index costs RAM — must fit in memory for performance
Average document sizeDetect document growth over time
Document countOperational planning, sharding decisions

Typical size reference points

DocumentsAvg doc size 1 KBWith 3 indexesNotes
100 K~100 MB (raw) / ~35 MB (compressed)~50 MB indexesFits in RAM
1 M~1 GB / ~350 MB~500 MBComfortable
10 M~10 GB / ~3.5 GB~5 GBMonitor index memory, working set
100 M~100 GB / ~35 GB~50 GBConsider sharding
1 B~1 TB / ~350 GB~500 GBSharding required, multi-node cluster

MongoDB storage specifics

WiredTiger compression: MongoDB's default storage engine compresses data (snappy) and indexes (prefix compression). Typical compression ratios:

Data typeCompression ratio
JSON-heavy documents3:1 to 5:1
Binary data (images, files)1.1:1 to 1.5:1
Highly repetitive data5:1 to 10:1
Mixed workloads2:1 to 4:1

Document size limit: 16 MB per document. If you're near this limit, your schema needs rethinking — use references instead of embedding.

MongoDB schema patterns and their size impact

PatternSize impactWhen to use
Embedding (nested documents)Larger documents, less collectionsData always accessed together, bounded growth
Referencing (foreign keys)Smaller documents, more lookupsUnbounded growth (comments, logs), many-to-many
Bucket patternFewer documents, predictable sizeTime-series, IoT — group N events per document
Subset patternKeep hot data smallLarge documents where only part is frequently read
Outlier patternHandle exceptions separatelyMost docs small, few are huge (celebrity followers)

MongoDB FinOps levers

LeverImpactEffort
Right-size cluster tier20–40 % savingsLow — Atlas auto-scaling helps
Use appropriate shard keyEven data distribution → fewer nodes neededHigh — must choose at creation
TTL indexesAutomatic expiration of old dataLow — add TTL index on date field
Compact collectionsReclaim space after heavy deletesLow — run compact or initial sync
Schema optimization30–60 % size reductionMedium — shorter field names, correct types
Atlas cluster auto-scalingPay only for what you useLow — enable in Atlas
Move cold data to Atlas Online Archive50–80 % cost savings on old dataMedium — configure archival rules
Reserved clusters (Atlas)30–50 % vs pay-as-you-goLow — financial commitment
Downgrade from NVMe to standard storage20–40 % storage cost savingsLow — if IOPS requirements are low

Comparative FinOps: PostgreSQL vs MongoDB

DimensionPostgreSQLMongoDB
Storage efficiencyRow storage + TOAST compressionWiredTiger compression (typically better out-of-box)
Bloat riskHigh (MVCC dead tuples)Lower (WiredTiger handles internally)
Index overheadModerate (B-tree)Higher (every collection needs _id index minimum)
Scaling cost modelVertical first → expensive at scaleHorizontal (sharding) → cost grows linearly
Managed cost (AWS)RDS/Aurora: instance + storage + IOPSDocumentDB/Atlas: instance + storage + data transfer
Backup costRDS snapshots (free up to DB size)Atlas continuous backup included, or S3 snapshots
LicensingFree (open source)Free (Community) or paid (Enterprise/Atlas)

General FinOps Framework for Databases

1. Visibility — Know what you're spending

MetricTarget
Cost per queryTrack with APM (Datadog, New Relic)
Cost per GB storedCompare across tiers and providers
Cost per environmentDev/staging often 30–50 % of total — use smaller instances
Idle resource costUnused replicas, oversized dev databases

2. Optimization — Reduce waste

Quick wins (week 1):

  • Downsize dev/staging instances to minimum viable
  • Enable auto-pause for non-production databases
  • Delete unused snapshots and backups beyond retention policy
  • Shut down databases in dev environments outside business hours

Medium-term (month 1–3):

  • Right-size production based on actual CPU/memory utilization (target 60–70 %)
  • Implement data retention policies — archive or delete old data
  • Move to reserved/committed pricing for stable workloads
  • Consolidate multiple small databases into fewer managed instances

Strategic (quarter 1–2):

  • Evaluate managed vs self-hosted total cost of ownership
  • Consider multi-tier storage (hot/warm/cold)
  • Implement chargeback — teams see their database costs
  • Automate scaling policies based on workload patterns

3. Governance — Prevent future waste

PracticeWhat it prevents
Tagging policyUnattributed costs (enforce team/project/env tags)
Provisioning guardrailsDevelopers spinning up oversized instances
Budget alertsUnexpected cost spikes
Quarterly reviewsCost drift, forgotten resources
Architecture reviews for new projectsWrong database choice costing 10x long-term

Real-World Sizing Examples

Example 1: SaaS B2B — Customer CRM

ParameterValue
Customers50,000
Avg records per customer500 (contacts, deals, activities)
Total rows25 M
Avg row size200 bytes
Raw data~5 GB
With indexes (5 B-tree)~20 GB
Recommended instancedb.r6g.large (2 vCPU, 16 GB RAM)
Monthly cost (RDS PostgreSQL)~$200/month

Example 2: IoT Platform — Sensor Data

ParameterValue
Sensors10,000
Events per sensor per day1,440 (one per minute)
Daily events14.4 M
Monthly events~430 M
Avg event size150 bytes
Monthly raw data~65 GB
Retention12 months → ~780 GB
Recommended setupMongoDB sharded cluster (3 shards) or TimescaleDB with partitioning
Monthly cost~$800–1,500/month (Atlas M40 or equivalent)

Example 3: E-commerce — Product Catalog + Orders

ParameterValue
Products500,000 (variable attributes → MongoDB)
Orders10 M/year (relational → PostgreSQL)
Product catalog size~2 GB (MongoDB, compressed)
Orders + line items~15 GB/year (PostgreSQL)
Recommended setupPolyglot: PostgreSQL for orders + MongoDB for catalog
Monthly cost~$350/month combined

Growth Planning

Growth rateAction
< 10 GB/monthMonitor quarterly, no urgency
10–100 GB/monthPlan partitioning strategy, set up archival
100 GB–1 TB/monthActive capacity planning, consider sharding/horizontal scaling
> 1 TB/monthDedicated capacity team, multi-region strategy, tiered storage

Rule of thumb: Plan for 2x your projected 12-month data size. Storage is cheap; emergency migrations are not.

Resources

:::