Data Fabric Architecture: Unified Data Access Through Active Metadata
#data-architecture#data-fabric#data-integration#ai
Data fabric is an architectural approach that uses active metadata, knowledge graphs, and AI to automate data integration and governance across distributed environments. Unlike data mesh (which is organizational), data fabric is a technology layer that makes data findable, accessible, and usable regardless of where it lives.
Data Fabric vs Data Mesh
| Dimension | Data Fabric | Data Mesh |
|---|---|---|
| Nature | Technology architecture | Organizational paradigm |
| Core idea | Automated integration via metadata | Domain ownership + self-serve platform |
| Centralization | Centralized metadata, distributed data | Decentralized ownership and architecture |
| Automation | High — AI-driven discovery & integration | Low — relies on domain teams |
| Governance | Unified, metadata-driven | Federated, domain-driven |
| Skill requirement | Strong data engineering team | Strong domain teams with data skills |
| Best for | Complex legacy landscapes | Modern orgs with clear domain boundaries |
| Adoption speed | Faster (technology change) | Slower (org change required) |
| Complementary? | Yes — fabric can serve as mesh's self-serve platform layer |
Architecture Layers
┌─────────────────────────────────────────────────────────────┐
│ DATA CONSUMERS │
│ Analysts │ Data Scientists │ Applications │ AI/ML │
└──────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────────┐
│ UNIFIED ACCESS LAYER │
│ Virtual queries │ APIs │ Semantic layer │ Search │
└──────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────────┐
│ ACTIVE METADATA & KNOWLEDGE GRAPH │
│ Schema inference │ Lineage │ Quality rules │ PII │
│ Usage analytics │ Recommendations │ Auto-cataloging │
└──────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────────┐
│ DATA INTEGRATION & ORCHESTRATION │
│ ETL/ELT │ CDC │ Streaming │ Virtualization │
└──────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────▼──────────────────────────────────────┐
│ DATA SOURCES │
│ RDBMS │ Data Lakes │ SaaS APIs │ Files │ Streams │
└─────────────────────────────────────────────────────────────┘
Tool Comparison
| Capability | Informatica (IDMC) | Talend (Qlik) | Denodo | Databricks Unity | Atlan |
|---|---|---|---|---|---|
| Approach | Full platform | ETL + quality | Virtualization-first | Lakehouse-native | Metadata platform |
| Metadata automation | CLAIRE AI engine | Basic | Good (semantic layer) | ML-driven | Knowledge graph |
| Data virtualization | Yes | Limited | Core strength | Limited | No (metadata only) |
| Governance | Strong | Moderate | Moderate | Strong (Unity Catalog) | Strong |
| Cloud-native | Yes (IDMC) | Yes | Yes | Yes | Yes |
| Open source option | No | Talend Open Studio (legacy) | No | Delta Lake (storage) | No |
| Pricing model | IPU-based | Per user / connector | Per core / query | DBU-based | Per seat |
| Best for | Enterprise multi-cloud | ETL-heavy pipelines | Real-time virtualization | Lakehouse ecosystem | Modern data stack |
Active Metadata Taxonomy
Active metadata goes beyond passive catalogs. It is metadata that drives automated actions.
Active Metadata
├── Technical Metadata
│ ├── Schema & types
│ ├── Lineage graphs
│ ├── Freshness & staleness
│ └── Query patterns & performance
├── Operational Metadata
│ ├── Pipeline run history
│ ├── Data quality scores
│ ├── SLA compliance
│ └── Cost per dataset
├── Business Metadata
│ ├── Domain ownership
│ ├── Business glossary terms
│ ├── Sensitivity classification
│ └── Regulatory requirements
└── Social Metadata
├── Usage frequency per user/team
├── Popularity rankings
├── Tribal knowledge (annotations)
└── Trust scores
Adoption Maturity Model
| Level | Name | Characteristics | Key metric |
|---|---|---|---|
| 1 | Siloed | No unified metadata, manual integration | Time to find data: days |
| 2 | Cataloged | Passive catalog in place, manual registration | Time to find data: hours |
| 3 | Connected | Automated discovery, lineage tracking | Time to find data: minutes |
| 4 | Intelligent | AI-driven recommendations, automated quality | Self-service adoption: >60% |
| 5 | Autonomous | Self-healing pipelines, auto-governance | Data incidents auto-resolved: >80% |
Resources
- Gartner — Data Fabric Architecture
- Denodo — Data Virtualization
- Informatica IDMC
- Atlan — Active Metadata
:::