Databases are often the largest line item in cloud bills — and the hardest to right-size. This guide covers how to estimate, monitor, and optimize database size for both PostgreSQL and MongoDB, with practical FinOps strategies.
Why Database Sizing Matters
| Impact | What happens when you get it wrong |
|---|
| Cost | Oversized instances waste money; undersized ones cause outages |
| Performance | When data exceeds RAM, queries hit disk — latency jumps 100x |
| Operations | Large tables slow backups, migrations, and schema changes |
| Compliance | Retaining data beyond policy increases risk and storage costs |
PostgreSQL: Sizing & Monitoring
Table and index sizes
Key queries every DBA should know:
| What to measure | Why |
|---|
| Table size (data only) | Understand raw data volume per entity |
| Table total size (data + indexes + TOAST) | Actual disk footprint |
| Index size | Indexes can exceed table size — a sign of over-indexing |
| Bloat (dead tuples) | MVCC overhead — wasted space from updates/deletes |
| Row count | Operational planning, partition decisions |
Typical size reference points
| Table rows | Approximate raw size (100-byte rows) | With indexes (3 B-tree) | Notes |
|---|
| 100 K | ~10 MB | ~40 MB | Fits entirely in RAM, no concerns |
| 1 M | ~100 MB | ~400 MB | Still comfortable in most setups |
| 10 M | ~1 GB | ~4 GB | Monitor query plans, consider partial indexes |
| 100 M | ~10 GB | ~40 GB | Partitioning recommended, VACUUM tuning needed |
| 1 B | ~100 GB | ~400 GB | Requires partitioning, careful index strategy, dedicated hardware |
Column type storage costs
| Type | Size per value | Notes |
|---|
BOOLEAN | 1 byte | |
SMALLINT | 2 bytes | Use instead of INT when values < 32K |
INTEGER | 4 bytes | |
BIGINT | 8 bytes | Don't default to BIGINT if INT suffices |
REAL | 4 bytes | |
DOUBLE PRECISION | 8 bytes | |
NUMERIC(p,s) | Variable (5–20+ bytes) | Expensive — use only when exact precision required |
TEXT / VARCHAR | 1 byte + string length | + 4-byte overhead for strings > 126 bytes |
UUID | 16 bytes | Larger than INT (4B) — consider for distributed systems only |
TIMESTAMPTZ | 8 bytes | |
JSONB | Variable | Convenient but expensive — denormalize hot paths to columns |
ARRAY | Variable | Often better as a separate table for large/queried arrays |
PostgreSQL bloat and MVCC overhead
PostgreSQL's MVCC creates dead tuples on every UPDATE and DELETE. Without proper VACUUM, tables can bloat to 2–5x their logical size.
Real-world example: A 10 GB table with heavy updates can grow to 30–50 GB if autovacuum is misconfigured. This directly impacts:
- Storage costs (3–5x)
- Backup size and duration
- Query performance (scanning dead tuples)
- Replication lag (more WAL to ship)
PostgreSQL FinOps levers
| Lever | Impact | Effort |
|---|
| Right-size instance | 20–40 % cost reduction typical | Low — monitor CPU/RAM usage, downsize |
| Tune autovacuum | Reduce bloat 2–5x | Medium — adjust per-table settings |
| Use appropriate column types | 10–30 % storage reduction | Medium — audit schema types |
| Partial indexes | 50–90 % smaller indexes | Medium — identify query patterns |
| Partition and drop old data | Up to 80 % storage savings | High — requires schema changes |
| TOAST compression | 50–75 % for large text/JSON | Low — enabled by default, tune threshold |
| Move to ARM instances (Graviton) | 20 % cost savings on AWS | Low — same PostgreSQL, cheaper compute |
| Reserved instances / committed use | 30–60 % vs on-demand | Low — financial commitment |
| Read replicas | Offload reads, smaller primary | Medium — application routing needed |
| Connection pooling (PgBouncer) | Smaller instance sufficient | Medium — deploy and configure pooler |
MongoDB: Sizing & Monitoring
Document and collection sizes
| What to measure | Why |
|---|
| Collection data size | Raw document bytes |
| Collection storage size | On-disk with WiredTiger compression (~60–70 % of data size) |
| Index sizes | Each index costs RAM — must fit in memory for performance |
| Average document size | Detect document growth over time |
| Document count | Operational planning, sharding decisions |
Typical size reference points
| Documents | Avg doc size 1 KB | With 3 indexes | Notes |
|---|
| 100 K | ~100 MB (raw) / ~35 MB (compressed) | ~50 MB indexes | Fits in RAM |
| 1 M | ~1 GB / ~350 MB | ~500 MB | Comfortable |
| 10 M | ~10 GB / ~3.5 GB | ~5 GB | Monitor index memory, working set |
| 100 M | ~100 GB / ~35 GB | ~50 GB | Consider sharding |
| 1 B | ~1 TB / ~350 GB | ~500 GB | Sharding required, multi-node cluster |
MongoDB storage specifics
WiredTiger compression: MongoDB's default storage engine compresses data (snappy) and indexes (prefix compression). Typical compression ratios:
| Data type | Compression ratio |
|---|
| JSON-heavy documents | 3:1 to 5:1 |
| Binary data (images, files) | 1.1:1 to 1.5:1 |
| Highly repetitive data | 5:1 to 10:1 |
| Mixed workloads | 2:1 to 4:1 |
Document size limit: 16 MB per document. If you're near this limit, your schema needs rethinking — use references instead of embedding.
MongoDB schema patterns and their size impact
| Pattern | Size impact | When to use |
|---|
| Embedding (nested documents) | Larger documents, less collections | Data always accessed together, bounded growth |
| Referencing (foreign keys) | Smaller documents, more lookups | Unbounded growth (comments, logs), many-to-many |
| Bucket pattern | Fewer documents, predictable size | Time-series, IoT — group N events per document |
| Subset pattern | Keep hot data small | Large documents where only part is frequently read |
| Outlier pattern | Handle exceptions separately | Most docs small, few are huge (celebrity followers) |
MongoDB FinOps levers
| Lever | Impact | Effort |
|---|
| Right-size cluster tier | 20–40 % savings | Low — Atlas auto-scaling helps |
| Use appropriate shard key | Even data distribution → fewer nodes needed | High — must choose at creation |
| TTL indexes | Automatic expiration of old data | Low — add TTL index on date field |
| Compact collections | Reclaim space after heavy deletes | Low — run compact or initial sync |
| Schema optimization | 30–60 % size reduction | Medium — shorter field names, correct types |
| Atlas cluster auto-scaling | Pay only for what you use | Low — enable in Atlas |
| Move cold data to Atlas Online Archive | 50–80 % cost savings on old data | Medium — configure archival rules |
| Reserved clusters (Atlas) | 30–50 % vs pay-as-you-go | Low — financial commitment |
| Downgrade from NVMe to standard storage | 20–40 % storage cost savings | Low — if IOPS requirements are low |
Comparative FinOps: PostgreSQL vs MongoDB
| Dimension | PostgreSQL | MongoDB |
|---|
| Storage efficiency | Row storage + TOAST compression | WiredTiger compression (typically better out-of-box) |
| Bloat risk | High (MVCC dead tuples) | Lower (WiredTiger handles internally) |
| Index overhead | Moderate (B-tree) | Higher (every collection needs _id index minimum) |
| Scaling cost model | Vertical first → expensive at scale | Horizontal (sharding) → cost grows linearly |
| Managed cost (AWS) | RDS/Aurora: instance + storage + IOPS | DocumentDB/Atlas: instance + storage + data transfer |
| Backup cost | RDS snapshots (free up to DB size) | Atlas continuous backup included, or S3 snapshots |
| Licensing | Free (open source) | Free (Community) or paid (Enterprise/Atlas) |
General FinOps Framework for Databases
1. Visibility — Know what you're spending
| Metric | Target |
|---|
| Cost per query | Track with APM (Datadog, New Relic) |
| Cost per GB stored | Compare across tiers and providers |
| Cost per environment | Dev/staging often 30–50 % of total — use smaller instances |
| Idle resource cost | Unused replicas, oversized dev databases |
2. Optimization — Reduce waste
Quick wins (week 1):
- Downsize dev/staging instances to minimum viable
- Enable auto-pause for non-production databases
- Delete unused snapshots and backups beyond retention policy
- Shut down databases in dev environments outside business hours
Medium-term (month 1–3):
- Right-size production based on actual CPU/memory utilization (target 60–70 %)
- Implement data retention policies — archive or delete old data
- Move to reserved/committed pricing for stable workloads
- Consolidate multiple small databases into fewer managed instances
Strategic (quarter 1–2):
- Evaluate managed vs self-hosted total cost of ownership
- Consider multi-tier storage (hot/warm/cold)
- Implement chargeback — teams see their database costs
- Automate scaling policies based on workload patterns
3. Governance — Prevent future waste
| Practice | What it prevents |
|---|
| Tagging policy | Unattributed costs (enforce team/project/env tags) |
| Provisioning guardrails | Developers spinning up oversized instances |
| Budget alerts | Unexpected cost spikes |
| Quarterly reviews | Cost drift, forgotten resources |
| Architecture reviews for new projects | Wrong database choice costing 10x long-term |
Real-World Sizing Examples
Example 1: SaaS B2B — Customer CRM
| Parameter | Value |
|---|
| Customers | 50,000 |
| Avg records per customer | 500 (contacts, deals, activities) |
| Total rows | 25 M |
| Avg row size | 200 bytes |
| Raw data | ~5 GB |
| With indexes (5 B-tree) | ~20 GB |
| Recommended instance | db.r6g.large (2 vCPU, 16 GB RAM) |
| Monthly cost (RDS PostgreSQL) | ~$200/month |
Example 2: IoT Platform — Sensor Data
| Parameter | Value |
|---|
| Sensors | 10,000 |
| Events per sensor per day | 1,440 (one per minute) |
| Daily events | 14.4 M |
| Monthly events | ~430 M |
| Avg event size | 150 bytes |
| Monthly raw data | ~65 GB |
| Retention | 12 months → ~780 GB |
| Recommended setup | MongoDB sharded cluster (3 shards) or TimescaleDB with partitioning |
| Monthly cost | ~$800–1,500/month (Atlas M40 or equivalent) |
Example 3: E-commerce — Product Catalog + Orders
| Parameter | Value |
|---|
| Products | 500,000 (variable attributes → MongoDB) |
| Orders | 10 M/year (relational → PostgreSQL) |
| Product catalog size | ~2 GB (MongoDB, compressed) |
| Orders + line items | ~15 GB/year (PostgreSQL) |
| Recommended setup | Polyglot: PostgreSQL for orders + MongoDB for catalog |
| Monthly cost | ~$350/month combined |
Growth Planning
| Growth rate | Action |
|---|
| < 10 GB/month | Monitor quarterly, no urgency |
| 10–100 GB/month | Plan partitioning strategy, set up archival |
| 100 GB–1 TB/month | Active capacity planning, consider sharding/horizontal scaling |
| > 1 TB/month | Dedicated capacity team, multi-region strategy, tiered storage |
Rule of thumb: Plan for 2x your projected 12-month data size. Storage is cheap; emergency migrations are not.
Resources
:::