tadata
Back to home

Analytics Engineering: The Discipline Between Data Engineering and Analysis

#analytics#data-engineering#dbt#data-modeling

Analytics engineering has emerged as a distinct discipline that bridges the gap between raw data pipelines and business-ready analytics. Rooted in software engineering best practices — version control, testing, documentation, CI/CD — it applies them to the transformation layer where raw data becomes trusted business logic.

Role Comparison

DimensionData EngineerAnalytics EngineerData Analyst
Primary FocusInfrastructure & pipelinesTransformation & modelingInsights & reporting
Key ToolsSpark, Airflow, Kafka, Terraformdbt, SQL, Git, JinjaTableau, Excel, SQL, Python
OutputReliable data in the warehouseClean, tested, documented modelsDashboards, reports, analyses
LanguagesPython, Scala, SQL, HCLSQL, YAML, JinjaSQL, Python, DAX
Works WithRaw sources, streaming, infraWarehouse tables, staging modelsSemantic layer, BI tools
Quality FocusPipeline uptime, data freshnessModel accuracy, test coverageInsight correctness, clarity
Typical BackgroundSoftware engineeringAnalytics + engineering hybridBusiness, statistics, economics
Reports ToPlatform / EngineeringData / AnalyticsBusiness unit / Analytics

Modern Analytics Workflow

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│  SOURCE  │    │ STAGING  │    │  MARTS   │    │ CONSUME  │
│          │───▶│          │───▶│          │───▶│          │
│ Raw data │    │ Cleaned  │    │ Business │    │ BI tools │
│ EL loaded│    │ Renamed  │    │ logic    │    │ ML models│
│          │    │ Typed    │    │ Joined   │    │ APIs     │
└──────────┘    └──────────┘    └──────────┘    └──────────┘
                      │               │
                 ┌────▼───────────────▼────┐
                 │     dbt PROJECT         │
                 │  Version controlled     │
                 │  Tested (schema + data) │
                 │  Documented             │
                 │  CI/CD deployed         │
                 └─────────────────────────┘

dbt Project Structure (Recommended)

models/
├── staging/                    # 1:1 with source tables
│   ├── stripe/
│   │   ├── _stripe__sources.yml
│   │   ├── _stripe__models.yml
│   │   ├── stg_stripe__payments.sql
│   │   └── stg_stripe__customers.sql
│   └── hubspot/
│       ├── _hubspot__sources.yml
│       └── stg_hubspot__contacts.sql
├── intermediate/               # Complex joins, business logic
│   ├── finance/
│   │   └── int_payments_pivoted.sql
│   └── marketing/
│       └── int_attribution_sessionized.sql
├── marts/                      # Business-ready, consumed by BI
│   ├── finance/
│   │   ├── _finance__models.yml
│   │   ├── fct_revenue.sql
│   │   └── dim_customers.sql
│   └── marketing/
│       ├── fct_campaigns.sql
│       └── dim_channels.sql
└── metrics/                    # Semantic layer definitions
    └── revenue_metrics.yml

Key conventions: stg_ prefix for staging, int_ for intermediate, fct_ for facts, dim_ for dimensions. Each layer only references the layer before it.

Testing Taxonomy

Test TypeWhat It Validatesdbt ImplementationWhen to Use
Schema testsColumn properties (not null, unique, accepted values)Built-in dbt testsEvery model, every column
Relationship testsForeign key integrityrelationships testEvery join key
Data testsBusiness logic assertionsCustom SQL testsComplex business rules
Freshness testsSource data is recent enoughsource freshnessEvery source
Volume testsRow count within expected rangeCustom or dbt-utilsCritical models
Distribution testsStatistical properties holddbt-expectationsML feature tables
Regression testsOutput matches expected baselineCustom audit queriesAfter refactors
Contract testsSchema matches declared contractdbt contracts (v1.5+)Public-facing models

Analytics Engineering Maturity Model

LevelStageCharacteristicsKey Milestone
0SQL scriptsAnalysts write one-off SQL, no version controlTransformation exists
1dbt adoptedModels in Git, basic tests, single developerFirst dbt project in prod
2Team practiceMultiple contributors, PR reviews, CI checksCI pipeline blocks bad merges
3PlatformShared macros, packages, staging layer standardized<10 min feedback loop
4GovernedData contracts, semantic layer, SLAs on modelsConsumers trust the data
5ScaledMulti-project, cross-domain, automated lineageData mesh enablement

The Analytics Engineer's Impact

The analytics engineer's greatest contribution is not any single model — it is the compounding effect of trust. When business users know that a metric is tested, documented, and versioned, they stop building shadow spreadsheets. When analysts know that staging models are clean and consistent, they stop writing redundant CTEs. The entire organization moves faster because the foundation is reliable.

Resources