ETL vs ELT: Choosing the Right Data Pipeline Pattern
#data-engineering#etl#elt#data-pipelines#cloud
The shift from ETL (Extract-Transform-Load) to ELT (Extract-Load-Transform) reflects how modern cloud warehouses have changed the economics of data processing. Understanding when to use each pattern — or a hybrid — is essential for scalable data architectures.
The Core Difference
| Dimension | ETL | ELT |
|---|---|---|
| Transform location | Intermediate server / Spark cluster | Inside the target warehouse |
| Compute cost | Separate compute (Spark, Dataflow) | Warehouse compute (BigQuery, Snowflake, Redshift) |
| Raw data in warehouse | No — only transformed data lands | Yes — raw data lands first, transform later |
| Flexibility | Schema must be defined upfront | Schema-on-read, iterate freely |
| Latency | Higher (transform is a bottleneck) | Lower (load first, transform async) |
| Best for | Complex transformations, legacy systems | Analytics, cloud-native stacks |
When ETL Still Wins
- Data quality enforcement at ingestion — reject bad data before it enters the warehouse
- PII stripping — sensitive data must never land in the warehouse raw
- Complex enrichment — joining with external APIs or real-time streams during processing
- Cost control — warehouse compute is expensive; Spark on spot instances can be 5–10x cheaper for heavy transforms
When ELT Is the Modern Default
- Cloud warehouse with elastic compute — BigQuery, Snowflake, Databricks scale transform compute on demand
- Exploratory analytics — keep raw data, transform later as business questions evolve
- dbt-driven workflows — SQL-based transformation inside the warehouse, version-controlled, tested
- Faster time to value — data available in hours, not weeks
The Modern ELT Stack
Sources → Ingestion (Fivetran/Airbyte) → Warehouse (Snowflake/BigQuery) → Transform (dbt) → BI (Looker/Metabase)
| Layer | Tools | Role |
|---|---|---|
| Extract & Load | Fivetran, Airbyte, Stitch, AWS DMS | Move raw data to warehouse |
| Transform | dbt, Dataform, SQLMesh | SQL-based modeling in the warehouse |
| Orchestration | Airflow, Dagster, Prefect | Schedule and monitor pipelines |
| Quality | dbt tests, Great Expectations, Soda | Validate transformed data |
Hybrid Approaches
Most real-world architectures are hybrid: ELT for analytical pipelines, ETL for operational/compliance pipelines.
| Pattern | Use case |
|---|---|
| ELT + streaming ETL | Batch analytics via ELT; real-time features via Kafka + Flink |
| ETL for ingestion, ELT for modeling | Strip PII during extract, model inside warehouse |
| Reverse ETL | Push transformed warehouse data back to operational systems (CRM, marketing) |
Decision Framework
| Question | If yes → | If no → |
|---|---|---|
| Is your target a cloud warehouse? | ELT | Consider ETL |
| Do you need real-time transforms? | Streaming ETL | ELT batch |
| Must PII be stripped before landing? | ETL (or hybrid) | ELT |
| Is your team SQL-first? | ELT + dbt | ETL with Spark/Python |
| Do business questions change often? | ELT (keep raw data) | ETL (fixed schema OK) |
Resources
- dbt documentation — The standard for ELT transformation
- The Modern Data Stack — Vendor landscape and architecture patterns
- Airbyte vs Fivetran comparison — Open-source vs managed EL
- Fundamentals of Data Engineering — Joe Reis & Matt Housley
:::