Analytics Engineering: The Discipline Between Data Engineering and Analysis
Analytics engineering has emerged as a distinct discipline that bridges the gap between raw data pipelines and business-ready analytics. Rooted in software engineering best practices — version control, testing, documentation, CI/CD — it applies them to the transformation layer where raw data becomes trusted business logic.
Role Comparison
| Dimension | Data Engineer | Analytics Engineer | Data Analyst |
|---|---|---|---|
| Primary Focus | Infrastructure & pipelines | Transformation & modeling | Insights & reporting |
| Key Tools | Spark, Airflow, Kafka, Terraform | dbt, SQL, Git, Jinja | Tableau, Excel, SQL, Python |
| Output | Reliable data in the warehouse | Clean, tested, documented models | Dashboards, reports, analyses |
| Languages | Python, Scala, SQL, HCL | SQL, YAML, Jinja | SQL, Python, DAX |
| Works With | Raw sources, streaming, infra | Warehouse tables, staging models | Semantic layer, BI tools |
| Quality Focus | Pipeline uptime, data freshness | Model accuracy, test coverage | Insight correctness, clarity |
| Typical Background | Software engineering | Analytics + engineering hybrid | Business, statistics, economics |
| Reports To | Platform / Engineering | Data / Analytics | Business unit / Analytics |
Modern Analytics Workflow
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ SOURCE │ │ STAGING │ │ MARTS │ │ CONSUME │
│ │───▶│ │───▶│ │───▶│ │
│ Raw data │ │ Cleaned │ │ Business │ │ BI tools │
│ EL loaded│ │ Renamed │ │ logic │ │ ML models│
│ │ │ Typed │ │ Joined │ │ APIs │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│ │
┌────▼───────────────▼────┐
│ dbt PROJECT │
│ Version controlled │
│ Tested (schema + data) │
│ Documented │
│ CI/CD deployed │
└─────────────────────────┘
dbt Project Structure (Recommended)
models/
├── staging/ # 1:1 with source tables
│ ├── stripe/
│ │ ├── _stripe__sources.yml
│ │ ├── _stripe__models.yml
│ │ ├── stg_stripe__payments.sql
│ │ └── stg_stripe__customers.sql
│ └── hubspot/
│ ├── _hubspot__sources.yml
│ └── stg_hubspot__contacts.sql
├── intermediate/ # Complex joins, business logic
│ ├── finance/
│ │ └── int_payments_pivoted.sql
│ └── marketing/
│ └── int_attribution_sessionized.sql
├── marts/ # Business-ready, consumed by BI
│ ├── finance/
│ │ ├── _finance__models.yml
│ │ ├── fct_revenue.sql
│ │ └── dim_customers.sql
│ └── marketing/
│ ├── fct_campaigns.sql
│ └── dim_channels.sql
└── metrics/ # Semantic layer definitions
└── revenue_metrics.yml
Key conventions: stg_ prefix for staging, int_ for intermediate, fct_ for facts, dim_ for dimensions. Each layer only references the layer before it.
Testing Taxonomy
| Test Type | What It Validates | dbt Implementation | When to Use |
|---|---|---|---|
| Schema tests | Column properties (not null, unique, accepted values) | Built-in dbt tests | Every model, every column |
| Relationship tests | Foreign key integrity | relationships test | Every join key |
| Data tests | Business logic assertions | Custom SQL tests | Complex business rules |
| Freshness tests | Source data is recent enough | source freshness | Every source |
| Volume tests | Row count within expected range | Custom or dbt-utils | Critical models |
| Distribution tests | Statistical properties hold | dbt-expectations | ML feature tables |
| Regression tests | Output matches expected baseline | Custom audit queries | After refactors |
| Contract tests | Schema matches declared contract | dbt contracts (v1.5+) | Public-facing models |
Analytics Engineering Maturity Model
| Level | Stage | Characteristics | Key Milestone |
|---|---|---|---|
| 0 | SQL scripts | Analysts write one-off SQL, no version control | Transformation exists |
| 1 | dbt adopted | Models in Git, basic tests, single developer | First dbt project in prod |
| 2 | Team practice | Multiple contributors, PR reviews, CI checks | CI pipeline blocks bad merges |
| 3 | Platform | Shared macros, packages, staging layer standardized | <10 min feedback loop |
| 4 | Governed | Data contracts, semantic layer, SLAs on models | Consumers trust the data |
| 5 | Scaled | Multi-project, cross-domain, automated lineage | Data mesh enablement |
The Analytics Engineer's Impact
The analytics engineer's greatest contribution is not any single model — it is the compounding effect of trust. When business users know that a metric is tested, documented, and versioned, they stop building shadow spreadsheets. When analysts know that staging models are clean and consistent, they stop writing redundant CTEs. The entire organization moves faster because the foundation is reliable.