Data modeling is the structural foundation of every analytics platform. The wrong model slows queries, confuses analysts, and makes schema evolution painful. The right model accelerates insight delivery and scales with the business.
Modeling Approach Comparison
| Approach | Structure | Query Performance | Flexibility | Complexity | Best For |
|---|
| Star Schema | Fact tables + denormalized dimensions | Excellent — few joins | Medium | Low | BI dashboards, known queries |
| Snowflake Schema | Fact tables + normalized dimensions | Good — more joins | High | Medium | Storage-sensitive, normalized sources |
| Data Vault 2.0 | Hubs + Links + Satellites | Variable — needs marts | Very high | High | Regulated industries, auditability |
| One Big Table (OBT) | Single wide denormalized table | Excellent — zero joins | Low | Very low | Small teams, single-domain analytics |
| Activity Schema | Entity + activity stream | Good | High | Medium | Event-driven analytics |
Kimball vs Inmon: Philosophical Divide
| Dimension | Kimball (Bottom-Up) | Inmon (Top-Down) |
|---|
| Starting point | Business process / department | Enterprise-wide data model |
| Central structure | Dimensional star schemas | 3NF enterprise data warehouse |
| Data marts | Built first, conformed over time | Derived from the central warehouse |
| Time to first value | Weeks | Months |
| Schema evolution | Moderate — add dimensions/facts | Complex — ripple effects |
| Team size needed | Small to medium | Large, dedicated modeling team |
| Governance model | Decentralized, conformed dimensions | Centralized, strict standards |
| Modern relevance | High — aligns with dbt/lakehouse | Medium — suits large enterprises |
dbt Modeling Layers
┌─────────────────────────────────────────────────────────┐
│ dbt Project Structure │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Sources │──▶│ Staging │──▶│ Inter- │ │
│ │ │ │ (stg_) │ │ mediate │ │
│ │ Raw data │ │ 1:1 clean │ │ (int_) │ │
│ └───────────┘ └───────────┘ └─────┬─────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ Marts │ │
│ │ (fct_, │ │
│ │ dim_) │ │
│ └─────┬─────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ Metrics │ │
│ │ (mtc_) │ │
│ └───────────┘ │
└─────────────────────────────────────────────────────────┘
Layer Responsibilities:
Sources → External system references, freshness checks
Staging → Renaming, casting, deduplication. One model per source table.
Intermediate → Business logic joins, aggregations across staging models
Marts → Consumer-facing: fact and dimension tables
Metrics → Semantic layer definitions for BI tools
Pattern Selection Decision Tree
Start: What is your primary constraint?
├── "Speed to delivery"
│ ├── Small dataset (< 50 GB)? → One Big Table (OBT)
│ └── Larger dataset? → Star Schema (Kimball)
│
├── "Regulatory / auditability"
│ └── Need full history + lineage? → Data Vault 2.0
│
├── "Query flexibility"
│ ├── Known, stable queries? → Star Schema
│ └── Ad-hoc, exploratory? → Snowflake Schema or Data Vault + Marts
│
└── "Team size & skill"
├── < 3 data engineers? → OBT or Star Schema
└── Dedicated modeling team? → Data Vault or Inmon 3NF
Modeling Anti-Patterns to Avoid
| Anti-Pattern | Symptom | Fix |
|---|
| God table | 200+ columns, mixed granularity | Decompose into proper fact/dim |
| Soft deletes everywhere | is_deleted flags bloat tables | Use SCD Type 2 or separate archive |
| Business logic in BI tool | Metrics differ across dashboards | Push logic into dbt marts / semantic layer |
| No grain statement | Analysts unsure what a row represents | Document grain explicitly per model |
| Premature normalization | Excessive joins kill dashboard performance | Denormalize for read-heavy patterns |
| Missing surrogate keys | Natural keys change, breaking joins | Always add stable surrogate keys |
Slowly Changing Dimensions (SCD) Strategy
| SCD Type | Behavior | Use Case | Storage Impact |
|---|
| Type 0 | Never changes | Reference codes, constants | Minimal |
| Type 1 | Overwrite | Corrections, non-historical attributes | Minimal |
| Type 2 | Add new row with validity dates | Full history required | High |
| Type 3 | Add column for previous value | Limited history (current + previous) | Low |
| Type 6 | Hybrid (1 + 2 + 3) | Current + historical with easy access | Medium-High |
Resources
:::