Data pipelines are notoriously under-tested. Code changes are reviewed, but data changes — schema drift, NULL spikes, distribution shifts — often reach production undetected. A mature testing strategy treats data with the same rigor as application code.
Test Type Taxonomy
| Test Type | What It Validates | When It Runs | Scope | Failure = |
|---|
| Unit tests | Individual transformation logic | On every commit | Single function/model | Code bug |
| Integration tests | End-to-end pipeline on sample data | On PR / pre-merge | Multiple models | Wiring or logic error |
| Contract tests | Schema + semantics between producer-consumer | On schema change | Interface boundary | Breaking change |
| Data quality tests | Row counts, NULLs, ranges, uniqueness | Post-ingestion (runtime) | Table/column level | Data issue |
| Regression tests | Output stability (no unexpected drift) | Scheduled / post-deploy | Model output | Silent breakage |
| Performance tests | Query latency, throughput, resource usage | Pre-deploy / scheduled | Pipeline or query | SLA risk |
| Chaos tests | Resilience to failures (late data, duplicates) | Periodic | Full pipeline | Reliability gap |
The Test Pyramid for Data
╱╲
╱ ╲
╱ E2E╲ Few — expensive, slow, high confidence
╱ Tests ╲ (full pipeline on staging data)
╱──────────╲
╱ Integration╲ Some — medium cost, catch wiring issues
╱ Tests ╲ (multi-model joins, incremental logic)
╱────────────────╲
╱ Contract Tests ╲ Interface boundaries — schema + SLAs
╱────────────────────╲
╱ Data Quality Tests ╲ Runtime — every pipeline run
╱────────────────────────╲
╱ Unit Tests ╲ Many — fast, cheap, catch logic bugs
╱────────────────────────────╲ (SQL model tests, Python function tests)
╱──────────────────────────────╲
Principle: Most coverage at the bottom (fast, cheap).
Fewer tests at the top (slow, expensive, high-signal).
Tool Comparison
| Tool | Test Focus | Language | Integration | Pricing | Best For |
|---|
| dbt tests | Schema + custom SQL assertions | SQL (YAML + Jinja) | dbt-native | Free (OSS) | dbt-centric stacks |
| Great Expectations (GX) | Data quality + profiling | Python | Airflow, Spark, Pandas | Free (OSS) / GX Cloud | Python teams, complex validations |
| Soda | Data quality + monitoring | SodaCL (YAML) | Airflow, dbt, Spark | Free (OSS) / Soda Cloud | Accessible DQ checks |
| Elementary | dbt observability + anomaly detection | SQL (dbt package) | dbt-native | Free (OSS) / Cloud | dbt monitoring |
| Datafold | Data diff + regression testing | SQL | dbt, CI/CD | SaaS pricing | Catching regressions in PRs |
| Monte Carlo | Data observability (ML-based) | N/A (SaaS) | Warehouse-native | Enterprise SaaS | Automated anomaly detection |
| pytest + fixtures | Unit + integration tests | Python | Any Python pipeline | Free | Custom pipelines |
What to Test at Each Layer
| Pipeline Layer | Test Type | Example Test | Tool |
|---|
| Ingestion | Contract + quality | Schema matches contract; row count > 0; no full-NULL columns | Soda, GX |
| Staging (stg_) | Unit + quality | Deduplication works; casts are correct; PKs are unique | dbt tests |
| Intermediate (int_) | Integration | Joins produce expected grain; no fanout | dbt tests, Datafold |
| Marts (fct_, dim_) | Regression + quality | Metric values within expected range; no unexpected NULLs | Elementary, GX |
| Reverse ETL | Contract | Output schema matches destination API; record count matches | Custom assertions |
| Serving (API/BI) | E2E | Dashboard loads; API returns valid response | Selenium, pytest |
CI/CD Pipeline for Data
┌─────────────────────────────────────────────────────────────────────┐
│ Data Pipeline CI/CD │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Commit │──▶│ Lint │──▶│ Unit │──▶│ Integration│ │
│ │ (PR) │ │ + Format │ │ Tests │ │ Tests │ │
│ └───────────┘ │ (sqlfluff,│ │ (dbt test │ │ (staging │ │
│ │ ruff) │ │ --select │ │ env, data │ │
│ └───────────┘ │ modified)│ │ diff) │ │
│ └───────────┘ └─────┬─────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ Schema │ │
│ │ Deploy │◀──│ Approve │◀──│ Data │◀─│ Contract │ │
│ │ to Prod │ │ (manual │ │ Diff │ │ Check │ │
│ │ │ │ gate) │ │ Review │ │ │ │
│ └─────┬─────┘ └───────────┘ └───────────┘ └───────────┘ │
│ │ │
│ ┌─────▼─────┐ ┌───────────┐ │
│ │ Post- │──▶│ Alert │ │
│ │ Deploy │ │ on │ │
│ │ Quality │ │ Failure │ │
│ │ Checks │ │ │ │
│ └───────────┘ └───────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Testing Maturity Assessment
| Level | Practice | Tooling | Confidence |
|---|
| 0 — None | No tests, manual spot checks | None | Very low |
| 1 — Basic | dbt schema tests (not_null, unique) | dbt built-in | Low |
| 2 — Structured | Custom SQL tests, row count checks, CI integration | dbt + Soda/GX + GitHub Actions | Medium |
| 3 — Advanced | Data diff on PRs, contract tests, anomaly detection | Datafold + Elementary + schema registry | High |
| 4 — Comprehensive | Full pyramid, chaos testing, automated rollback | All above + Monte Carlo + canary deploys | Very high |
Anti-Patterns in Data Testing
| Anti-Pattern | Symptom | Fix |
|---|
| Test only in production | Bugs found by stakeholders | Add staging env + CI tests |
| Only schema tests | Logic errors pass silently | Add custom SQL + regression tests |
| Flaky tests ignored | Test suite becomes noise | Fix or quarantine flaky tests immediately |
| No test on ingestion | Bad data propagates everywhere | Add contract + quality checks at source |
| Manual QA only | Inconsistent, not repeatable | Automate everything, manual for edge cases only |
| 100% coverage obsession | Slow CI, diminishing returns | Follow the pyramid — focus unit tests, fewer E2E |
Resources
:::