tadata
Back to home

Data Fabric Architecture: Unified Data Access Through Active Metadata

#data-architecture#data-fabric#data-integration#ai

Data fabric is an architectural approach that uses active metadata, knowledge graphs, and AI to automate data integration and governance across distributed environments. Unlike data mesh (which is organizational), data fabric is a technology layer that makes data findable, accessible, and usable regardless of where it lives.

Data Fabric vs Data Mesh

DimensionData FabricData Mesh
NatureTechnology architectureOrganizational paradigm
Core ideaAutomated integration via metadataDomain ownership + self-serve platform
CentralizationCentralized metadata, distributed dataDecentralized ownership and architecture
AutomationHigh — AI-driven discovery & integrationLow — relies on domain teams
GovernanceUnified, metadata-drivenFederated, domain-driven
Skill requirementStrong data engineering teamStrong domain teams with data skills
Best forComplex legacy landscapesModern orgs with clear domain boundaries
Adoption speedFaster (technology change)Slower (org change required)
Complementary?Yes — fabric can serve as mesh's self-serve platform layer

Architecture Layers

┌─────────────────────────────────────────────────────────────┐
│                    DATA CONSUMERS                            │
│  Analysts  │  Data Scientists  │  Applications  │  AI/ML    │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────┐
│              UNIFIED ACCESS LAYER                            │
│  Virtual queries  │  APIs  │  Semantic layer  │  Search     │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────┐
│          ACTIVE METADATA & KNOWLEDGE GRAPH                   │
│  Schema inference  │  Lineage  │  Quality rules  │  PII     │
│  Usage analytics   │  Recommendations  │  Auto-cataloging   │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────┐
│           DATA INTEGRATION & ORCHESTRATION                   │
│  ETL/ELT  │  CDC  │  Streaming  │  Virtualization           │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────┐
│                  DATA SOURCES                                │
│  RDBMS  │  Data Lakes  │  SaaS APIs  │  Files  │  Streams  │
└─────────────────────────────────────────────────────────────┘

Tool Comparison

CapabilityInformatica (IDMC)Talend (Qlik)DenodoDatabricks UnityAtlan
ApproachFull platformETL + qualityVirtualization-firstLakehouse-nativeMetadata platform
Metadata automationCLAIRE AI engineBasicGood (semantic layer)ML-drivenKnowledge graph
Data virtualizationYesLimitedCore strengthLimitedNo (metadata only)
GovernanceStrongModerateModerateStrong (Unity Catalog)Strong
Cloud-nativeYes (IDMC)YesYesYesYes
Open source optionNoTalend Open Studio (legacy)NoDelta Lake (storage)No
Pricing modelIPU-basedPer user / connectorPer core / queryDBU-basedPer seat
Best forEnterprise multi-cloudETL-heavy pipelinesReal-time virtualizationLakehouse ecosystemModern data stack

Active Metadata Taxonomy

Active metadata goes beyond passive catalogs. It is metadata that drives automated actions.

Active Metadata
├── Technical Metadata
│   ├── Schema & types
│   ├── Lineage graphs
│   ├── Freshness & staleness
│   └── Query patterns & performance
├── Operational Metadata
│   ├── Pipeline run history
│   ├── Data quality scores
│   ├── SLA compliance
│   └── Cost per dataset
├── Business Metadata
│   ├── Domain ownership
│   ├── Business glossary terms
│   ├── Sensitivity classification
│   └── Regulatory requirements
└── Social Metadata
    ├── Usage frequency per user/team
    ├── Popularity rankings
    ├── Tribal knowledge (annotations)
    └── Trust scores

Adoption Maturity Model

LevelNameCharacteristicsKey metric
1SiloedNo unified metadata, manual integrationTime to find data: days
2CatalogedPassive catalog in place, manual registrationTime to find data: hours
3ConnectedAutomated discovery, lineage trackingTime to find data: minutes
4IntelligentAI-driven recommendations, automated qualitySelf-service adoption: >60%
5AutonomousSelf-healing pipelines, auto-governanceData incidents auto-resolved: >80%

Resources

:::