tadata
Back to home

Master Data Management: Golden Records and Entity Resolution

#data-governance#mdm#data-quality#enterprise

What Is Master Data?

Master data is the core business entities shared across an organization: customers, products, suppliers, employees, locations. It is the data that every system references but no single system owns.

When master data is inconsistent, problems cascade:

  • The same customer appears three times with different addresses
  • Product hierarchies conflict between ERP and e-commerce
  • Financial reporting cannot reconcile across business units

MDM Core Concepts

ConceptDefinition
Golden RecordThe single, authoritative version of a master entity
Entity ResolutionMatching and merging records that represent the same real-world entity
Data DeduplicationRemoving redundant records while preserving data completeness
Survivorship RulesLogic that determines which field value "wins" when merging records
Hierarchy ManagementMaintaining parent-child relationships (product categories, org charts)

MDM Implementation Styles

StyleDescriptionBest ForComplexity
RegistryMDM stores links to source records but doesn't hold master dataLow disruption, quick winsLow
ConsolidationMDM creates golden records from sources, read-only hubAnalytics and reportingMedium
CoexistenceMDM creates golden records and syncs back to source systemsOperational consistencyHigh
TransactionMDM is the system of record; all changes go through itMaximum controlVery high

Most organizations start with consolidation and evolve toward coexistence as maturity grows.

Entity Resolution: The Hard Problem

Entity resolution determines whether two records represent the same entity. This is harder than it sounds:

  • "John Smith" at "123 Main St" and "J. Smith" at "123 Main Street" -- same person?
  • "Acme Corp" and "ACME Corporation" and "Acme Co., Ltd." -- same company?

Approaches:

  • Deterministic matching: exact match on defined keys (email, tax ID)
  • Probabilistic matching: scoring based on multiple fuzzy attributes
  • ML-based matching: trained models that learn matching patterns

Key decisions:

  • Match threshold: too low = false positives (over-merging), too high = false negatives (under-merging)
  • Human review workflow for uncertain matches
  • Ongoing matching (new records must be matched continuously)

When MDM Pays Off

  • M&A integration: merging customer bases from acquired companies
  • Regulatory compliance: single customer view for KYC/AML
  • Customer experience: consistent profile across channels
  • Financial consolidation: single product and customer hierarchy for reporting
  • Supply chain optimization: unified supplier and product master

When MDM Struggles

  • Organizations without executive sponsorship (MDM is political, not just technical)
  • Companies that underestimate data stewardship effort
  • Projects that try to boil the ocean (starting with all entities at once)
  • Implementations that ignore source system data entry quality

Common Failure Patterns

PatternWhy It Fails
Big bang approachTrying to master all entities at once overwhelms the team
Technology-firstBuying a tool without defining governance processes
No stewardshipGolden records degrade without ongoing human review
Ignoring sourcesFixing master data without fixing data entry at source is futile
Over-engineering match rulesDiminishing returns on match precision past a threshold

Implementation Roadmap

  1. Pick one entity domain: start with customer or product, not everything
  2. Profile the data: understand duplication rates, completeness, consistency
  3. Define survivorship rules: which source wins for which attribute
  4. Implement matching: start deterministic, add probabilistic as needed
  5. Establish stewardship: assign data stewards for exception handling
  6. Measure and iterate: track duplicate rate, golden record coverage, steward throughput

Resources