tadata
Back to home

Modern Data Warehouse Design: Layers, SCDs, and Incremental Models

#data-warehouse#data-modeling#analytics#data-engineering

A well-designed data warehouse is the foundation of trustworthy analytics. Despite the hype around newer paradigms, the core principles of dimensional modeling, layered architecture, and change tracking remain essential. This post covers the modern approach to warehouse design -- medallion layers, slowly changing dimensions, incremental strategies, and naming conventions that scale.

Layered Architecture: Staging to Mart

┌─────────────────────────────────────────────────────────────────┐
│                        CONSUMERS                                │
│   Dashboards    Ad-hoc SQL    ML Features    Reverse ETL        │
└────────────────────────┬────────────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────────────┐
│  MART LAYER (Gold)                                              │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐       │
│  │ fct_     │ │ dim_     │ │ rpt_     │ │ agg_         │       │
│  │ orders   │ │ customers│ │ revenue  │ │ daily_sales  │       │
│  └──────────┘ └──────────┘ └──────────┘ └──────────────┘       │
│  Business-ready, documented, governed, SLOs defined             │
└────────────────────────┬────────────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────────────┐
│  INTERMEDIATE LAYER (Silver)                                    │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐                        │
│  │ int_     │ │ int_     │ │ int_     │                        │
│  │ orders   │ │ customers│ │ products │                        │
│  │ _cleaned │ │ _deduped │ │ _enriched│                        │
│  └──────────┘ └──────────┘ └──────────┘                        │
│  Cleaned, typed, deduplicated, business logic applied           │
└────────────────────────┬────────────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────────────┐
│  STAGING LAYER (Bronze)                                         │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐                        │
│  │ stg_     │ │ stg_     │ │ stg_     │                        │
│  │ shopify  │ │ stripe   │ │ hubspot  │                        │
│  │ _orders  │ │ _payments│ │ _contacts│                        │
│  └──────────┘ └──────────┘ └──────────┘                        │
│  1:1 source mirror, minimal transformation, append-only         │
└─────────────────────────────────────────────────────────────────┘

Layer Design Principles

LayerPurposeMaterializationFreshnessAccess
Staging (Bronze)Raw source mirrorIncremental append / viewReal-time to hourlyData engineers only
Intermediate (Silver)Clean, dedupe, type, joinIncremental or tableHourly to dailyData engineers + analysts
Mart (Gold)Business-ready entitiesTable (materialized)Scheduled (SLO-bound)All consumers
MetricsSemantic layer definitionsVirtual / computedOn-queryBI tools + API

Slowly Changing Dimension (SCD) Type Comparison

SCD TypeDescriptionUse CaseComplexityStorage ImpactHistory
Type 0No changes trackedStatic reference data (country codes)NoneMinimalNone
Type 1Overwrite old valueCorrections, non-historical attributesLowMinimalLost
Type 2Add new row with validity datesFull history required (customer address)MediumHigh (row explosion)Full
Type 3Add column for previous valueOnly current + previous neededLowModerateLimited (1 prior)
Type 4Separate history tableHigh-velocity changesMediumModerate (separate)Full (separate)
Type 6Hybrid (1 + 2 + 3)Need current flag + history + previousHighHighFull + convenient

SCD Type 2 Implementation Pattern

dim_customers (SCD Type 2)
┌────────┬──────────┬─────────┬──────────────┬──────────────┬───────────┐
│ sk_id  │ nk_id    │ name    │ valid_from   │ valid_to     │ is_current│
├────────┼──────────┼─────────┼──────────────┼──────────────┼───────────┤
│ 1001   │ C-42     │ Alice   │ 2024-01-01   │ 2025-06-15   │ false     │
│ 1002   │ C-42     │ Alice B │ 2025-06-15   │ 9999-12-31   │ true      │
└────────┴──────────┴─────────┴──────────────┴──────────────┴───────────┘

Materialization Strategy Matrix

StrategyWhen to UseProsConsdbt Config
ViewStaging, light transformsNo storage cost, always freshSlow queries, recomputed each timematerialized='view'
TableMarts, heavy aggregationsFast queries, predictableFull rebuild, stale between runsmaterialized='table'
IncrementalLarge fact tables, append-heavyFast builds, low costComplex logic, late-arriving datamaterialized='incremental'
EphemeralCTEs, reusable logicNo object createdCan't query directly, debugging hardermaterialized='ephemeral'
SnapshotSCD Type 2 trackingAutomatic historyRequires unique key + timestampdbt snapshot

Incremental Strategy Comparison

StrategyHow It WorksBest ForRisk
AppendInsert new rows onlyImmutable event streamsDuplicates if reprocessed
Merge (upsert)Match on key, update or insertMutable source dataRequires unique key
Delete+InsertDelete partition, re-insertPartitioned fact tablesWindow must be correct
Insert overwriteReplace entire partitionLarge partitioned tablesPartition key required

Naming Convention Framework

Object TypePatternExampleNotes
Staging modelstg_{source}_{entity}stg_stripe_paymentsOne per source table
Intermediate modelint_{entity}_{verb}int_orders_enrichedVerb describes transform
Fact tablefct_{event/process}fct_ordersGrain = one event/transaction
Dimension tabledim_{entity}dim_customersGrain = one entity
Report/aggregaterpt_{subject} or agg_{subject}rpt_monthly_revenuePre-computed for BI
Metricsmetric_{measure}_{grain}metric_revenue_dailySemantic layer
Surrogate keysk_{entity}_idsk_customer_idAuto-generated
Natural keynk_{entity}_idnk_customer_idFrom source system
Timestamps{event}_atcreated_at, updated_atAlways UTC
Booleansis_{condition} or has_{thing}is_active, has_subscriptionClear intent

Resources