tadata
Back to home

Schema Evolution: Managing Change Without Breaking Pipelines

#data-engineering#schema#databases#migration#governance

Schema changes are inevitable. Business requirements shift, new data sources appear, and models get refactored. The question is not whether schemas will change, but whether those changes will break downstream consumers or flow safely through the system.

Schema Compatibility Matrix

Compatibility TypeWhat ChangedOld Readers → New DataNew Readers → Old DataSafe Operations
BackwardNew schema can read old dataMay breakWorksRemove fields, add optional fields
ForwardOld schema can read new dataWorksMay breakAdd fields, remove optional fields
FullBoth directions workWorksWorksAdd/remove optional fields only
NoneNo guaranteesMay breakMay breakAny change (risky)
TransitiveApplies across all versions, not just adjacentDepends on base typeDepends on base typeStrictest variant of each

In practice, backward compatibility is the most commonly enforced mode: new consumers must always be able to read data written by older producers. Schema registries (Confluent, AWS Glue, Karapace) enforce this automatically.

Migration Strategy Comparison

StrategyDowntimeRiskComplexityRollbackBest For
Big bang migrationHigh — full table rewriteHighLowDifficult — restore from backupSmall tables, dev environments
Expand-contractZeroLowMediumEasy — drop new columnsProduction systems, APIs
Blue-green schemaNear-zeroLowHighSwap pointer backCritical systems, warehouses
Shadow writesZeroVery lowHighStop shadow, keep originalHigh-risk changes, validation
Feature flagsZeroLowMediumToggle offApplication-layer changes
Versioned schemasZeroLowMediumKeep old version runningEvent streams, APIs

Expand-Contract Pattern

The expand-contract pattern (also called parallel change) is the safest approach for zero-downtime schema evolution. It splits every breaking change into three non-breaking phases.

Phase 1: EXPAND                    Phase 2: MIGRATE                  Phase 3: CONTRACT
┌──────────────────┐              ┌──────────────────┐              ┌──────────────────┐
│  Table: users    │              │  Table: users    │              │  Table: users    │
│                  │              │                  │              │                  │
│  id         INT  │              │  id         INT  │              │  id         INT  │
│  name       VARCHAR ◄── old    │  name       VARCHAR (deprecated)│  full_name  VARCHAR ◄── new
│  full_name  VARCHAR ◄── new    │  full_name  VARCHAR ◄── synced │                  │
│                  │              │                  │              │                  │
│  Dual-write both │              │  Backfill old →  │              │  Drop old column │
│  columns          │              │  new column      │              │  Remove dual-write│
└──────────────────┘              └──────────────────┘              └──────────────────┘

Timeline:  ─────────────▶          ─────────────▶                   ─────────────▶
           Deploy new code         Run migration script             Deploy cleanup code
           (writes to both)        (backfill + verify)              (drop old column)

Risk Assessment by Change Type

Change TypeRisk LevelCompatibility ImpactMitigation
Add optional columnLowForward + backward safeDefault value required
Add required columnMediumBreaks backward compatUse expand-contract
Rename columnHighBreaks both directionsExpand-contract with alias
Change data typeHighBreaks both directionsNew column + migration
Remove columnMediumBreaks forward compatDeprecate first, contract later
Change primary keyVery highBreaks joins, CDC, lookupsBlue-green or shadow writes
Add/change indexLowNo data compat impactMonitor lock contention
Change partitioningHighQuery patterns affectedDual-partition transition period
Merge/split tablesVery highAll downstream breaksView abstraction + expand-contract

Schema Governance Framework

┌─────────────────────────────────────────────────────────┐
│                Schema Change Lifecycle                   │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  1. PROPOSE   → Schema change request (PR / RFC)       │
│       │                                                 │
│  2. VALIDATE  → Compatibility check (schema registry)  │
│       │          Impact analysis (downstream consumers) │
│       │                                                 │
│  3. APPROVE   → Data steward / team lead review        │
│       │                                                 │
│  4. DEPLOY    → Expand phase (add new, keep old)       │
│       │                                                 │
│  5. MIGRATE   → Backfill data, update consumers        │
│       │                                                 │
│  6. CONTRACT  → Remove deprecated elements             │
│       │                                                 │
│  7. DOCUMENT  → Update catalog, notify stakeholders    │
│                                                         │
└─────────────────────────────────────────────────────────┘

Schema Registry Comparison

FeatureConfluent Schema RegistryAWS Glue Schema RegistryKarapaceBuf (Protobuf)
FormatsAvro, JSON Schema, ProtobufAvro, JSON SchemaAvro, JSON SchemaProtobuf only
Compatibility checksAll types + transitiveBackward, forward, fullAll typesBreaking change detection
DeploymentSelf-hosted or Confluent CloudManaged (AWS)Self-hostedCLI + BSR (cloud)
IntegrationKafka-nativeGlue + Kafka, KinesisKafka-compatiblegRPC, Connect
VersioningSubject-version pairsSchema-version pairsSubject-version pairsModule-based
CostFree (OSS) / Cloud pricingPer schema version storedFree (OSS)Free / Team pricing

Data Contract Essentials

ElementPurposeExample
Schema definitionStructure guaranteeAvro schema, JSON Schema, Protobuf
Semantic typeBusiness meaningemail, currency_usd, iso_country_code
Freshness SLADelivery timing"Updated within 15 minutes of source change"
Quality rulesData validity"null rate < 1%", "values in [A, B, C]"
OwnerAccountabilityteam-payments@company.com
Breaking change policyEvolution rules"30-day deprecation window, backward compat enforced"

Resources

:::