tadata
Back to home

Data Pipeline Testing: Strategy, Tools & CI/CD Integration

#data-engineering#testing#data-quality#ci-cd

Data pipelines are notoriously under-tested. Code changes are reviewed, but data changes — schema drift, NULL spikes, distribution shifts — often reach production undetected. A mature testing strategy treats data with the same rigor as application code.

Test Type Taxonomy

Test TypeWhat It ValidatesWhen It RunsScopeFailure =
Unit testsIndividual transformation logicOn every commitSingle function/modelCode bug
Integration testsEnd-to-end pipeline on sample dataOn PR / pre-mergeMultiple modelsWiring or logic error
Contract testsSchema + semantics between producer-consumerOn schema changeInterface boundaryBreaking change
Data quality testsRow counts, NULLs, ranges, uniquenessPost-ingestion (runtime)Table/column levelData issue
Regression testsOutput stability (no unexpected drift)Scheduled / post-deployModel outputSilent breakage
Performance testsQuery latency, throughput, resource usagePre-deploy / scheduledPipeline or querySLA risk
Chaos testsResilience to failures (late data, duplicates)PeriodicFull pipelineReliability gap

The Test Pyramid for Data

                    ╱╲
                   ╱  ╲
                  ╱ E2E╲          Few — expensive, slow, high confidence
                 ╱ Tests ╲         (full pipeline on staging data)
                ╱──────────╲
               ╱ Integration╲      Some — medium cost, catch wiring issues
              ╱    Tests     ╲     (multi-model joins, incremental logic)
             ╱────────────────╲
            ╱  Contract Tests  ╲   Interface boundaries — schema + SLAs
           ╱────────────────────╲
          ╱   Data Quality Tests ╲  Runtime — every pipeline run
         ╱────────────────────────╲
        ╱      Unit Tests          ╲  Many — fast, cheap, catch logic bugs
       ╱────────────────────────────╲ (SQL model tests, Python function tests)
      ╱──────────────────────────────╲

Principle: Most coverage at the bottom (fast, cheap).
           Fewer tests at the top (slow, expensive, high-signal).

Tool Comparison

ToolTest FocusLanguageIntegrationPricingBest For
dbt testsSchema + custom SQL assertionsSQL (YAML + Jinja)dbt-nativeFree (OSS)dbt-centric stacks
Great Expectations (GX)Data quality + profilingPythonAirflow, Spark, PandasFree (OSS) / GX CloudPython teams, complex validations
SodaData quality + monitoringSodaCL (YAML)Airflow, dbt, SparkFree (OSS) / Soda CloudAccessible DQ checks
Elementarydbt observability + anomaly detectionSQL (dbt package)dbt-nativeFree (OSS) / Clouddbt monitoring
DatafoldData diff + regression testingSQLdbt, CI/CDSaaS pricingCatching regressions in PRs
Monte CarloData observability (ML-based)N/A (SaaS)Warehouse-nativeEnterprise SaaSAutomated anomaly detection
pytest + fixturesUnit + integration testsPythonAny Python pipelineFreeCustom pipelines

What to Test at Each Layer

Pipeline LayerTest TypeExample TestTool
IngestionContract + qualitySchema matches contract; row count > 0; no full-NULL columnsSoda, GX
Staging (stg_)Unit + qualityDeduplication works; casts are correct; PKs are uniquedbt tests
Intermediate (int_)IntegrationJoins produce expected grain; no fanoutdbt tests, Datafold
Marts (fct_, dim_)Regression + qualityMetric values within expected range; no unexpected NULLsElementary, GX
Reverse ETLContractOutput schema matches destination API; record count matchesCustom assertions
Serving (API/BI)E2EDashboard loads; API returns valid responseSelenium, pytest

CI/CD Pipeline for Data

┌─────────────────────────────────────────────────────────────────────┐
│                     Data Pipeline CI/CD                             │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌───────────┐   ┌───────────┐   ┌───────────┐   ┌───────────┐   │
│  │  Commit   │──▶│   Lint    │──▶│   Unit    │──▶│ Integration│   │
│  │  (PR)     │   │  + Format │   │  Tests    │   │   Tests    │   │
│  └───────────┘   │ (sqlfluff,│   │ (dbt test │   │ (staging   │   │
│                  │  ruff)    │   │  --select │   │  env, data │   │
│                  └───────────┘   │  modified)│   │  diff)     │   │
│                                  └───────────┘   └─────┬─────┘   │
│                                                        │          │
│                                                  ┌─────▼─────┐   │
│  ┌───────────┐   ┌───────────┐   ┌───────────┐ │  Schema   │   │
│  │  Deploy   │◀──│  Approve  │◀──│  Data     │◀─│ Contract  │   │
│  │  to Prod  │   │  (manual  │   │  Diff     │  │  Check    │   │
│  │           │   │   gate)   │   │  Review   │  │           │   │
│  └─────┬─────┘   └───────────┘   └───────────┘  └───────────┘   │
│        │                                                          │
│  ┌─────▼─────┐   ┌───────────┐                                   │
│  │  Post-    │──▶│  Alert    │                                   │
│  │  Deploy   │   │  on       │                                   │
│  │  Quality  │   │  Failure  │                                   │
│  │  Checks   │   │           │                                   │
│  └───────────┘   └───────────┘                                   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Testing Maturity Assessment

LevelPracticeToolingConfidence
0 — NoneNo tests, manual spot checksNoneVery low
1 — Basicdbt schema tests (not_null, unique)dbt built-inLow
2 — StructuredCustom SQL tests, row count checks, CI integrationdbt + Soda/GX + GitHub ActionsMedium
3 — AdvancedData diff on PRs, contract tests, anomaly detectionDatafold + Elementary + schema registryHigh
4 — ComprehensiveFull pyramid, chaos testing, automated rollbackAll above + Monte Carlo + canary deploysVery high

Anti-Patterns in Data Testing

Anti-PatternSymptomFix
Test only in productionBugs found by stakeholdersAdd staging env + CI tests
Only schema testsLogic errors pass silentlyAdd custom SQL + regression tests
Flaky tests ignoredTest suite becomes noiseFix or quarantine flaky tests immediately
No test on ingestionBad data propagates everywhereAdd contract + quality checks at source
Manual QA onlyInconsistent, not repeatableAutomate everything, manual for edge cases only
100% coverage obsessionSlow CI, diminishing returnsFollow the pyramid — focus unit tests, fewer E2E

Resources

:::