tadata
Back to home

Database Optimization: Indexing, Query Plans, Partitioning & Caching

#databases#sql#performance#data-engineering#devops

A database is only as fast as how well you use it. This guide covers the core techniques for squeezing maximum performance out of any database — from index design to connection pooling.

Indexing Strategies

Indexes are the single most impactful performance lever. A missing index turns a 2 ms query into a 20 second full table scan.

B-Tree Indexes (the default)

B-trees are the workhorse index. They maintain sorted data and support equality, range queries, and prefix matching.

-- Equality lookup
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'alice@example.com';  -- index seek: O(log n)

-- Range query
CREATE INDEX idx_orders_date ON orders(created_at);
SELECT * FROM orders WHERE created_at > '2026-01-01';  -- index range scan

How B-trees work internally:

                    ┌───────────────┐
                    │  [50 | 100]   │     ← root (2 keys, 3 pointers)
                    └──┬────┬────┬──┘
                       │    │    │
              ┌────────┘    │    └────────┐
              ▼             ▼             ▼
        ┌──────────┐ ┌──────────┐ ┌──────────┐
        │ [10|30]  │ │ [60|80]  │ │[120|150] │  ← internal nodes
        └─┬──┬──┬──┘ └─┬──┬──┬──┘ └─┬──┬──┬──┘
          │  │  │       │  │  │       │  │  │
          ▼  ▼  ▼       ▼  ▼  ▼       ▼  ▼  ▼
        [leaf pages with actual row pointers]

Pages are typically 8–16 KB. A 3-level B-tree can index billions of rows with only 3 disk reads per lookup.

Composite (Multi-Column) Indexes

Column order matters enormously. The leftmost prefix rule applies:

CREATE INDEX idx_orders_multi ON orders(customer_id, status, created_at);

-- ✅ Uses index (leftmost prefix)
SELECT * FROM orders WHERE customer_id = 42;
SELECT * FROM orders WHERE customer_id = 42 AND status = 'shipped';
SELECT * FROM orders WHERE customer_id = 42 AND status = 'shipped' AND created_at > '2026-01-01';

-- ❌ Cannot use index (skips customer_id)
SELECT * FROM orders WHERE status = 'shipped';
SELECT * FROM orders WHERE created_at > '2026-01-01';

Rules for column ordering:

  1. Equality columns first — columns used with =
  2. Range column last — the column used with >, <, BETWEEN
  3. High selectivity first — columns that filter out the most rows

Covering Indexes

An index that contains all columns needed by a query — the database never touches the table.

-- If your query is:
SELECT customer_id, status, total FROM orders WHERE customer_id = 42 AND status = 'shipped';

-- A covering index:
CREATE INDEX idx_orders_covering ON orders(customer_id, status) INCLUDE (total);
-- PostgreSQL INCLUDE avoids bloating the B-tree keys while still covering the query

This is an index-only scan — dramatically faster for read-heavy workloads.

Hash Indexes

O(1) exact-match lookups, but no range queries, no ordering, no prefix matching.

-- PostgreSQL hash index
CREATE INDEX idx_sessions_token ON sessions USING hash(token);
-- Only useful for: WHERE token = 'abc123'

Use case: session tokens, UUID lookups, exact-match-only patterns.

GIN & GiST Indexes (PostgreSQL)

TypeForExample
GIN (Generalized Inverted Index)Arrays, JSONB, full-text searchCREATE INDEX ON docs USING gin(metadata jsonb_path_ops)
GiST (Generalized Search Tree)Geometric data, ranges, nearest-neighborCREATE INDEX ON places USING gist(location)
BRIN (Block Range Index)Naturally ordered data (timestamps)CREATE INDEX ON events USING brin(created_at) — tiny index, huge table

Partial Indexes

Index only the rows that matter:

-- Only index active orders (80% of queries, 5% of rows)
CREATE INDEX idx_active_orders ON orders(customer_id, created_at)
WHERE status != 'archived';

Smaller index → faster lookups, less storage, faster writes.

Index Anti-Patterns

Anti-patternProblemFix
Index on every columnWrite amplification, storage bloatIndex based on actual query patterns
Missing composite indexMultiple single-column indexes are much slowerCreate one multi-column index
Wrong column orderIndex exists but can't be usedEquality first, range last
Index on low-cardinality columnWHERE gender = 'M' still scans 50% of rowsOnly index selective columns or use partial indexes
Never analyzingStale statistics → bad query plansRun ANALYZE regularly (or enable autovacuum in PostgreSQL)

Reading Query Plans

If you can read an EXPLAIN output, you can fix 90% of slow queries.

PostgreSQL EXPLAIN

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, c.name, o.total
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
  AND o.created_at > '2026-01-01';
Nested Loop  (cost=0.85..124.50 rows=15 width=52) (actual time=0.05..0.30 rows=12 loops=1)
  Buffers: shared hit=48
  ->  Index Scan using idx_orders_status_date on orders o  (cost=0.43..62.15 rows=15 width=20)
        Index Cond: ((status = 'pending') AND (created_at > '2026-01-01'))
        Buffers: shared hit=12
  ->  Index Scan using customers_pkey on customers c  (cost=0.42..4.15 rows=1 width=36)
        Index Cond: (id = o.customer_id)
        Buffers: shared hit=36
Planning Time: 0.15 ms
Execution Time: 0.35 ms

Key things to look for

What you seeWhat it meansAction
Seq Scan on a large tableFull table scan, no index usedAdd an index or check if one exists but isn't being used
actual rowsrows (estimated)Statistics are staleRun ANALYZE tablename
Nested Loop with high outer rowsO(n×m) joinConsider hash join (increase work_mem) or add index on join column
Sort with high costSorting in memory/diskAdd an index that provides the order, or increase work_mem
Buffers: shared read (not hit)Data not in cache, disk I/OIncrease shared_buffers or investigate if working set fits in RAM
Rows Removed by Filteractual rowsIndex is too broadUse a more selective index or add conditions to index

MySQL EXPLAIN

EXPLAIN FORMAT=TREE
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';

Key columns: type (ALL = full scan, ref = index, const = PK lookup), key (which index), rows (estimated), Extra (Using filesort, Using temporary = red flags).

Partitioning

Split large tables into smaller, more manageable pieces.

Horizontal Partitioning (Sharding)

Each partition contains a subset of rows.

-- PostgreSQL declarative partitioning
CREATE TABLE events (
    id         BIGSERIAL,
    created_at TIMESTAMPTZ NOT NULL,
    event_type TEXT,
    payload    JSONB
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_q1 PARTITION OF events
    FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');

CREATE TABLE events_2026_q2 PARTITION OF events
    FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');

-- Queries automatically route to the right partition
SELECT * FROM events WHERE created_at = '2026-02-15';  -- only scans events_2026_q1

Partition strategies

StrategyHow it worksBest for
RangePartition by value ranges (dates, IDs)Time-series data, log tables
ListPartition by discrete valuesMulti-tenant (by tenant_id), by region
HashPartition by hash of columnEven distribution when no natural range

When to partition

  • Table exceeds hundreds of millions of rows
  • Queries always filter on the partition key (partition pruning works)
  • You need to drop old data cheaply (drop partition instead of DELETE)
  • Maintenance operations (VACUUM, REINDEX) on the full table are too slow

When NOT to partition

  • Table has fewer than ~10M rows (index is sufficient)
  • Queries don't filter on the partition key (all partitions scanned = worse than no partitioning)
  • You'd need to query across all partitions frequently

Connection Pooling

Opening a database connection is expensive (TCP handshake, TLS, authentication, process creation in PostgreSQL). Connection pooling reuses existing connections.

PgBouncer (PostgreSQL)

; pgbouncer.ini
[databases]
mydb = host=127.0.0.1 dbname=mydb

[pgbouncer]
listen_port = 6432
pool_mode = transaction      ; release connection after each transaction
max_client_conn = 1000       ; clients can open 1000 connections
default_pool_size = 20       ; only 20 actual PostgreSQL connections
Pool modeConnection releasedUse case
sessionWhen client disconnectsLegacy apps, LISTEN/NOTIFY
transactionAfter each transaction commitsMost applications (recommended)
statementAfter each statementOnly for simple autocommit workloads

Application-level pooling

Most ORMs and drivers support pooling:

# SQLAlchemy connection pool
engine = create_engine(
    "postgresql://user:pass@localhost/db",
    pool_size=20,          # steady-state connections
    max_overflow=10,       # burst connections above pool_size
    pool_timeout=30,       # wait before raising error
    pool_recycle=1800,     # recycle connections after 30 min
)

Serverless gotcha

Lambda/Cloud Functions can create thousands of connections. Solutions:

  • RDS Proxy (AWS) — managed connection pooler for RDS/Aurora
  • PgBouncer on ECS — self-hosted pooler
  • Neon / Supabase — built-in connection pooling
  • Prisma Data Proxy — application-level pooling for serverless

Caching Strategies

Cache-Aside (Lazy Loading)

def get_user(user_id):
    # 1. Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # 2. Cache miss → query database
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # 3. Populate cache
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))  # TTL 1 hour
    return user

Pros: Only caches what's actually read. Cache failure doesn't break the app. Cons: First request is always a miss. Stale data until TTL expires.

Write-Through

def update_user(user_id, data):
    # 1. Write to database
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)
    # 2. Immediately update cache
    redis.setex(f"user:{user_id}", 3600, json.dumps(data))

Pros: Cache is always up to date. Cons: Writes are slower (two writes). Cache may hold data that's never read.

Write-Behind (Write-Back)

Write to cache first, asynchronously write to database later. Maximum write performance but risk of data loss if cache crashes before flush.

Cache Invalidation Patterns

PatternHowTrade-off
TTLExpire after fixed timeSimple but stale for up to TTL duration
Event-drivenInvalidate on write events (CDC, pub/sub)Fresh, but complex to implement
Version tagAppend version to cache keyNo stale reads, but requires version tracking

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

Query Optimization Techniques

Avoid SELECT *

-- ❌ Fetches all columns, can't use covering index
SELECT * FROM orders WHERE customer_id = 42;

-- ✅ Fetch only needed columns
SELECT id, total, status FROM orders WHERE customer_id = 42;

Use EXISTS instead of COUNT for existence checks

-- ❌ Counts all matching rows
SELECT COUNT(*) > 0 FROM orders WHERE customer_id = 42 AND status = 'pending';

-- ✅ Stops at first match
SELECT EXISTS(SELECT 1 FROM orders WHERE customer_id = 42 AND status = 'pending');

Batch operations

-- ❌ N+1 queries
for user_id in user_ids:
    db.query("SELECT * FROM users WHERE id = %s", user_id)

-- ✅ Single query
db.query("SELECT * FROM users WHERE id = ANY(%s)", user_ids)

Use CTEs for readability, subqueries for performance

In PostgreSQL < 12, CTEs are optimization fences — the planner can't push predicates into them. In PostgreSQL 12+, simple CTEs are inlined. But always check the query plan.

Pagination: keyset vs OFFSET

-- ❌ OFFSET: database reads and discards N rows
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 10000;  -- reads 10,020 rows

-- ✅ Keyset: seek directly to position
SELECT * FROM orders WHERE id > 10000 ORDER BY id LIMIT 20;  -- reads 20 rows

Replication Topologies

Primary-Replica (Read Replicas)

     Writes           Reads
       │          ┌────────────┐
       ▼          │            ▼
  ┌─────────┐    │    ┌──────────────┐
  │ Primary │────┘    │  Replica 1   │
  │         │────┐    └──────────────┘
  └─────────┘    │    ┌──────────────┐
                 └───►│  Replica 2   │
                      └──────────────┘

Sync replication: Primary waits for replica ACK → zero data loss, higher latency. Async replication: Primary doesn't wait → lower latency, possible data loss on failover.

Multi-Primary

Both nodes accept writes. Requires conflict resolution (last-writer-wins, CRDTs, application-level merge).

Used by: Galera Cluster (MySQL), BDR (PostgreSQL), CockroachDB, DynamoDB Global Tables.

Vacuum & Maintenance (PostgreSQL)

PostgreSQL's MVCC creates dead tuples on UPDATE/DELETE. VACUUM reclaims space.

-- Check bloat
SELECT schemaname, tablename, n_dead_tup, last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

Autovacuum tuning

-- For a high-write table, make autovacuum more aggressive
ALTER TABLE events SET (
    autovacuum_vacuum_scale_factor = 0.01,    -- trigger at 1% dead tuples (default 20%)
    autovacuum_vacuum_cost_delay = 2          -- run faster (default 20ms)
);

pg_stat_statements — find slow queries

-- Top 10 queries by total time
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Performance Checklist

AreaCheckTool
IndexesAll WHERE/JOIN columns indexedEXPLAIN ANALYZE, pg_stat_user_indexes
Query plansNo unexpected Seq ScansEXPLAIN (ANALYZE, BUFFERS)
N+1 queriesORMs not generating per-row queriesApplication logs, pg_stat_statements
Connection poolPool size matches workloadPgBouncer stats, connection count monitoring
Cache hit ratio> 99 % for OLTPSELECT ... FROM pg_stat_database
BloatDead tuples under controlpg_stat_user_tables, pgstattuple
Lock contentionNo long-running transactions blocking otherspg_stat_activity, pg_locks
Memoryshared_buffers ≈ 25 % RAM, work_mem tunedPostgreSQL config

Resources

:::