Master Data Management: Golden Records and Entity Resolution
#data-governance#mdm#data-quality#enterprise
What Is Master Data?
Master data is the core business entities shared across an organization: customers, products, suppliers, employees, locations. It is the data that every system references but no single system owns.
When master data is inconsistent, problems cascade:
- The same customer appears three times with different addresses
- Product hierarchies conflict between ERP and e-commerce
- Financial reporting cannot reconcile across business units
MDM Core Concepts
| Concept | Definition |
|---|---|
| Golden Record | The single, authoritative version of a master entity |
| Entity Resolution | Matching and merging records that represent the same real-world entity |
| Data Deduplication | Removing redundant records while preserving data completeness |
| Survivorship Rules | Logic that determines which field value "wins" when merging records |
| Hierarchy Management | Maintaining parent-child relationships (product categories, org charts) |
MDM Implementation Styles
| Style | Description | Best For | Complexity |
|---|---|---|---|
| Registry | MDM stores links to source records but doesn't hold master data | Low disruption, quick wins | Low |
| Consolidation | MDM creates golden records from sources, read-only hub | Analytics and reporting | Medium |
| Coexistence | MDM creates golden records and syncs back to source systems | Operational consistency | High |
| Transaction | MDM is the system of record; all changes go through it | Maximum control | Very high |
Most organizations start with consolidation and evolve toward coexistence as maturity grows.
Entity Resolution: The Hard Problem
Entity resolution determines whether two records represent the same entity. This is harder than it sounds:
- "John Smith" at "123 Main St" and "J. Smith" at "123 Main Street" -- same person?
- "Acme Corp" and "ACME Corporation" and "Acme Co., Ltd." -- same company?
Approaches:
- Deterministic matching: exact match on defined keys (email, tax ID)
- Probabilistic matching: scoring based on multiple fuzzy attributes
- ML-based matching: trained models that learn matching patterns
Key decisions:
- Match threshold: too low = false positives (over-merging), too high = false negatives (under-merging)
- Human review workflow for uncertain matches
- Ongoing matching (new records must be matched continuously)
When MDM Pays Off
- M&A integration: merging customer bases from acquired companies
- Regulatory compliance: single customer view for KYC/AML
- Customer experience: consistent profile across channels
- Financial consolidation: single product and customer hierarchy for reporting
- Supply chain optimization: unified supplier and product master
When MDM Struggles
- Organizations without executive sponsorship (MDM is political, not just technical)
- Companies that underestimate data stewardship effort
- Projects that try to boil the ocean (starting with all entities at once)
- Implementations that ignore source system data entry quality
Common Failure Patterns
| Pattern | Why It Fails |
|---|---|
| Big bang approach | Trying to master all entities at once overwhelms the team |
| Technology-first | Buying a tool without defining governance processes |
| No stewardship | Golden records degrade without ongoing human review |
| Ignoring sources | Fixing master data without fixing data entry at source is futile |
| Over-engineering match rules | Diminishing returns on match precision past a threshold |
Implementation Roadmap
- Pick one entity domain: start with customer or product, not everything
- Profile the data: understand duplication rates, completeness, consistency
- Define survivorship rules: which source wins for which attribute
- Implement matching: start deterministic, add probabilistic as needed
- Establish stewardship: assign data stewards for exception handling
- Measure and iterate: track duplicate rate, golden record coverage, steward throughput