A well-designed data warehouse is the foundation of trustworthy analytics. Despite the hype around newer paradigms, the core principles of dimensional modeling, layered architecture, and change tracking remain essential. This post covers the modern approach to warehouse design -- medallion layers, slowly changing dimensions, incremental strategies, and naming conventions that scale.
Layered Architecture: Staging to Mart
┌─────────────────────────────────────────────────────────────────┐
│ CONSUMERS │
│ Dashboards Ad-hoc SQL ML Features Reverse ETL │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────────┐
│ MART LAYER (Gold) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ fct_ │ │ dim_ │ │ rpt_ │ │ agg_ │ │
│ │ orders │ │ customers│ │ revenue │ │ daily_sales │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ │
│ Business-ready, documented, governed, SLOs defined │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────────┐
│ INTERMEDIATE LAYER (Silver) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ int_ │ │ int_ │ │ int_ │ │
│ │ orders │ │ customers│ │ products │ │
│ │ _cleaned │ │ _deduped │ │ _enriched│ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ Cleaned, typed, deduplicated, business logic applied │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────────┐
│ STAGING LAYER (Bronze) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ stg_ │ │ stg_ │ │ stg_ │ │
│ │ shopify │ │ stripe │ │ hubspot │ │
│ │ _orders │ │ _payments│ │ _contacts│ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ 1:1 source mirror, minimal transformation, append-only │
└─────────────────────────────────────────────────────────────────┘
Layer Design Principles
| Layer | Purpose | Materialization | Freshness | Access |
|---|
| Staging (Bronze) | Raw source mirror | Incremental append / view | Real-time to hourly | Data engineers only |
| Intermediate (Silver) | Clean, dedupe, type, join | Incremental or table | Hourly to daily | Data engineers + analysts |
| Mart (Gold) | Business-ready entities | Table (materialized) | Scheduled (SLO-bound) | All consumers |
| Metrics | Semantic layer definitions | Virtual / computed | On-query | BI tools + API |
Slowly Changing Dimension (SCD) Type Comparison
| SCD Type | Description | Use Case | Complexity | Storage Impact | History |
|---|
| Type 0 | No changes tracked | Static reference data (country codes) | None | Minimal | None |
| Type 1 | Overwrite old value | Corrections, non-historical attributes | Low | Minimal | Lost |
| Type 2 | Add new row with validity dates | Full history required (customer address) | Medium | High (row explosion) | Full |
| Type 3 | Add column for previous value | Only current + previous needed | Low | Moderate | Limited (1 prior) |
| Type 4 | Separate history table | High-velocity changes | Medium | Moderate (separate) | Full (separate) |
| Type 6 | Hybrid (1 + 2 + 3) | Need current flag + history + previous | High | High | Full + convenient |
SCD Type 2 Implementation Pattern
dim_customers (SCD Type 2)
┌────────┬──────────┬─────────┬──────────────┬──────────────┬───────────┐
│ sk_id │ nk_id │ name │ valid_from │ valid_to │ is_current│
├────────┼──────────┼─────────┼──────────────┼──────────────┼───────────┤
│ 1001 │ C-42 │ Alice │ 2024-01-01 │ 2025-06-15 │ false │
│ 1002 │ C-42 │ Alice B │ 2025-06-15 │ 9999-12-31 │ true │
└────────┴──────────┴─────────┴──────────────┴──────────────┴───────────┘
Materialization Strategy Matrix
| Strategy | When to Use | Pros | Cons | dbt Config |
|---|
| View | Staging, light transforms | No storage cost, always fresh | Slow queries, recomputed each time | materialized='view' |
| Table | Marts, heavy aggregations | Fast queries, predictable | Full rebuild, stale between runs | materialized='table' |
| Incremental | Large fact tables, append-heavy | Fast builds, low cost | Complex logic, late-arriving data | materialized='incremental' |
| Ephemeral | CTEs, reusable logic | No object created | Can't query directly, debugging harder | materialized='ephemeral' |
| Snapshot | SCD Type 2 tracking | Automatic history | Requires unique key + timestamp | dbt snapshot |
Incremental Strategy Comparison
| Strategy | How It Works | Best For | Risk |
|---|
| Append | Insert new rows only | Immutable event streams | Duplicates if reprocessed |
| Merge (upsert) | Match on key, update or insert | Mutable source data | Requires unique key |
| Delete+Insert | Delete partition, re-insert | Partitioned fact tables | Window must be correct |
| Insert overwrite | Replace entire partition | Large partitioned tables | Partition key required |
Naming Convention Framework
| Object Type | Pattern | Example | Notes |
|---|
| Staging model | stg_{source}_{entity} | stg_stripe_payments | One per source table |
| Intermediate model | int_{entity}_{verb} | int_orders_enriched | Verb describes transform |
| Fact table | fct_{event/process} | fct_orders | Grain = one event/transaction |
| Dimension table | dim_{entity} | dim_customers | Grain = one entity |
| Report/aggregate | rpt_{subject} or agg_{subject} | rpt_monthly_revenue | Pre-computed for BI |
| Metrics | metric_{measure}_{grain} | metric_revenue_daily | Semantic layer |
| Surrogate key | sk_{entity}_id | sk_customer_id | Auto-generated |
| Natural key | nk_{entity}_id | nk_customer_id | From source system |
| Timestamps | {event}_at | created_at, updated_at | Always UTC |
| Booleans | is_{condition} or has_{thing} | is_active, has_subscription | Clear intent |
Resources