The lakehouse architecture promised to unify the best of data lakes and data warehouses. Several years into its maturity, the landscape is clearer -- but the decision is far from obvious. This post provides a structured comparison for decision makers evaluating their analytical architecture.
Architecture Comparison
Traditional Warehouse Lakehouse Architecture
┌─────────────────────┐ ┌─────────────────────────┐
│ BI / Analytics │ │ BI / Analytics / ML │
├─────────────────────┤ ├─────────────────────────┤
│ Semantic Layer │ │ Query Engine(s) │
├─────────────────────┤ │ (SQL, Spark, Python) │
│ Proprietary Store │ ├─────────────────────────┤
│ (columnar, closed) │ │ Table Format Layer │
├─────────────────────┤ │ (Iceberg/Delta/Hudi) │
│ Compute + Storage │ ├─────────────────────────┤
│ (tightly coupled) │ │ Object Storage (S3, │
└─────────────────────┘ │ GCS, ADLS) - open │
└─────────────────────────┘
Feature Comparison Matrix
| Capability | Snowflake | Databricks | BigQuery | Redshift | Lakehouse (OSS) |
|---|
| Storage format | Proprietary | Delta Lake (open) | Capacitor (proprietary) | Proprietary + Spectrum | Iceberg/Delta/Hudi |
| Compute-storage separation | Yes | Yes | Yes | Partial (RA3) | Yes |
| ACID transactions | Yes | Yes | Yes | Yes | Yes (table format) |
| Time travel | 90 days | Unlimited | 7 days | N/A | Unlimited |
| Schema evolution | Yes | Yes | Yes | Limited | Yes |
| Multi-engine access | Snowflake only | Spark + SQL | BQ only | Redshift + Spectrum | Any engine |
| Streaming ingestion | Snowpipe | Structured Streaming | Streaming inserts | Kinesis integration | Kafka + table format |
| ML/DS workloads | Snowpark (limited) | Native (Spark, MLflow) | BQML + Vertex | SageMaker integration | Native (any framework) |
| Governance | Horizon | Unity Catalog | Dataplex | Lake Formation | OpenMetadata, Polaris |
| Vendor lock-in risk | High | Medium | High | Medium-High | Low |
Cost Model Comparison
| Cost Factor | Warehouse (SaaS) | Lakehouse (Managed) | Lakehouse (OSS) |
|---|
| Storage | $23-40/TB/month | $23/TB/month (object storage) | $23/TB/month (S3/GCS) |
| Compute | Per-credit/slot, idle possible | Per-cluster, can auto-scale | Self-managed, full control |
| Egress | Included (mostly) | Cloud egress fees | Cloud egress fees |
| Administration | Minimal | Moderate | Significant |
| Scaling model | Vertical (warehouse size) | Horizontal (cluster count) | Horizontal (cluster count) |
| Cost predictability | Low (usage-based) | Medium | High (fixed infra) |
| Break-even point | < 10 TB, < 5 analysts | 10-100 TB, mixed workloads | > 100 TB, strong platform team |
Workload Fit Matrix
| Workload | Warehouse | Lakehouse | Notes |
|---|
| Ad-hoc SQL analytics | Excellent | Good | Warehouse optimized for interactive SQL |
| Scheduled BI dashboards | Excellent | Excellent | Both handle well |
| Data science / ML training | Poor-Fair | Excellent | Lakehouse supports multi-engine natively |
| Real-time streaming | Fair | Excellent | Lakehouse built for streaming + batch |
| Unstructured data (logs, images) | Poor | Excellent | Object storage handles any format |
| Cross-team data sharing | Good (same vendor) | Excellent | Open formats enable multi-engine |
| Regulatory compliance | Good (vendor-managed) | Good (more control) | Trade-off: convenience vs control |
| Small team, fast start | Excellent | Fair | Warehouse = less operational burden |
Migration Path Decision Diagram
Current State Assessment
│
├── Running a cloud warehouse (Snowflake, BQ, Redshift)?
│ ├── Satisfied with cost & capabilities? → STAY, add lakehouse for ML/streaming only
│ ├── Hitting cost ceiling? → Evaluate hybrid: keep warehouse for SQL, add lakehouse tier
│ └── Need multi-engine access? → Migrate to lakehouse with SQL query layer
│
├── Running a data lake (raw files on S3/GCS)?
│ ├── Need ACID + time travel? → Add table format (Iceberg recommended)
│ ├── Need better SQL perf? → Add query engine (Trino, Athena, Starburst)
│ └── Need governance? → Add catalog (Unity, Polaris, Nessie)
│
└── Greenfield project?
├── Team < 5, SQL-focused → Start with warehouse (Snowflake or BQ)
├── Team mixed (analysts + engineers + DS) → Start with managed lakehouse
└── Strong platform team, cost-sensitive → Build OSS lakehouse
Key Takeaways
- There is no universal winner. The right answer depends on team composition, workload mix, and cost constraints.
- The market is converging -- warehouses are adding lakehouse features and vice versa.
- Open table formats (Iceberg in particular) are the safe bet for long-term flexibility.
- Start with the simplest architecture that serves current needs, then evolve.
Resources