Relational databases model entities. Graph databases model relationships. When the questions you ask are about connections, paths, influence, or communities, graphs outperform tabular approaches by orders of magnitude.
When to Use Graphs — Decision Matrix
| Signal | Strength | Example |
|---|
| Queries with variable-depth joins (3+) | Strong | "Find all suppliers connected to a flagged entity within 5 hops" |
| Many-to-many relationships dominate | Strong | Social networks, recommendation engines |
| Schema evolves frequently | Moderate | Knowledge graphs, R&D datasets |
| Path-finding is a core operation | Strong | Logistics routing, network topology |
| Need real-time traversal | Strong | Fraud detection during transactions |
| Data is primarily tabular with few joins | Weak | Standard analytics — use SQL |
| Fixed schema, simple lookups | Weak | CRUD applications — use RDBMS |
| Aggregations over large datasets | Weak | Data warehousing — use columnar |
Graph Use Case Taxonomy
Graph Analytics
├── Fraud & Risk
│ ├── Transaction ring detection
│ ├── Identity resolution
│ ├── AML network analysis
│ └── Insurance claim networks
├── Knowledge & Search
│ ├── Enterprise knowledge graphs
│ ├── Semantic search
│ ├── Content recommendation
│ └── Drug discovery (molecule graphs)
├── Network & Infrastructure
│ ├── IT dependency mapping
│ ├── Telecom network optimization
│ ├── Supply chain visibility
│ └── Impact analysis
├── Social & Influence
│ ├── Community detection
│ ├── Influencer identification
│ ├── Organizational network analysis
│ └── Customer journey mapping
└── AI & ML
├── Graph neural networks (GNN)
├── Link prediction
├── Node classification
└── Graph-enhanced RAG
Algorithm Comparison
| Algorithm | Category | What it finds | Complexity | Typical use case |
|---|
| PageRank | Centrality | Important nodes by link structure | O(V + E) per iteration | Influence ranking, critical infra |
| Betweenness Centrality | Centrality | Bridge nodes between communities | O(V * E) | Key person analysis, bottleneck detection |
| Louvain | Community | Dense clusters (communities) | O(n log n) | Customer segmentation, fraud rings |
| Label Propagation | Community | Fast community assignment | O(E) | Large-scale community detection |
| Dijkstra | Path | Shortest weighted path | O(E + V log V) | Routing, supply chain |
| A* | Path | Shortest path with heuristic | O(E) best case | Geospatial routing |
| Node2Vec | Embedding | Vector representations of nodes | O(V * walk_length) | Feature engineering for ML |
| Triangle Count | Structure | Clustering coefficient | O(E^1.5) | Network density analysis |
| Weakly Connected Components | Structure | Disconnected subgraphs | O(V + E) | Data quality, entity resolution |
Tool & Platform Comparison
| Capability | Neo4j | Amazon Neptune | TigerGraph | JanusGraph | Memgraph |
|---|
| Model | Property graph (LPG) | LPG + RDF | Property graph | Property graph | Property graph |
| Query language | Cypher (GQL) | openCypher + SPARQL | GSQL | Gremlin | Cypher |
| Deployment | Self-hosted + Aura (cloud) | AWS managed | Self-hosted + cloud | Self-hosted | Self-hosted + cloud |
| Scalability | Vertical (CE), Sharded (EE) | Managed scaling | Distributed, MPP | Distributed (HBase/Cassandra) | Vertical, in-memory |
| Real-time | Good | Good | Excellent (deep-link analytics) | Moderate | Excellent (in-memory) |
| Graph algorithms | GDS library (40+) | Limited | Built-in analytics | Via TinkerPop | MAGE library |
| GNN integration | PyG connector | SageMaker | Built-in ML workbench | External | PyG connector |
| Pricing | Free CE / Enterprise license | Per IO request + storage | Free DE / Enterprise | Free (Apache 2.0) | Free CE / Enterprise |
| Best for | General purpose, developer UX | AWS-native, RDF/SPARQL | High-perf analytics at scale | OSS, existing HBase infra | Real-time streaming graphs |
Graph Data Modeling Principles
| Principle | Description |
|---|
| Nodes are nouns | People, products, accounts, devices |
| Edges are verbs | PURCHASED, KNOWS, DEPENDS_ON, TRANSFERRED |
| Properties are adjectives | amount, timestamp, weight, confidence |
| Favor relationships over joins | If you'd JOIN in SQL, make it an edge |
| Avoid super-nodes | Nodes with millions of edges degrade performance — consider bucketing |
| Design for traversal | Model edges in the direction you'll query most |
Resources
:::