1.4 Specialized Databases
1.4.1 Graph Databases
Graph databases model data as nodes (entities) connected by edges (relationships). While the same data could live in a relational database, graph databases treat relationships as first-class citizens, making it natural to traverse connections through graph queries rather than complex SQL joins.
| Use case | Example |
|---|---|
| Recommendation systems | โCustomers who bought X also bought Yโ |
| Social networks | Friendship graphs, follower relationships |
| Data lineage | Tracing how data flows through pipelines |
| Fraud detection | Identifying suspicious transaction patterns |
| Knowledge graphs | Connecting entities and their relationships |
| Network/IT operations | Mapping infrastructure dependencies |
| Supply chain logistics | Simulating logistics flows |
Popular graph databases include Neo4j, ArangoDB, and Amazon Neptune. Query languages include Cypher, Gremlin, and SPARQL.
1.4.2 Vector Databases
Vector databases are optimized for similarity search โ efficiently querying data based on semantic closeness rather than exact matches.
| Aspect | Details |
|---|---|
| Core concept | Store and query vector embeddings โ numerical representations of data |
| Similarity metrics | Euclidean distance, cosine distance |
| Core algorithm | K-Nearest Neighbors (KNN), optimized via Approximate Nearest Neighbors (ANN) for scale |
| Use cases | Recommendation systems, anomaly detection, text generation, semantic search |
1.4.3 Neo4j and the Cypher Query Language
Neo4j uses the Property Graph Model to describe graph structure:
| Concept | Description |
|---|---|
| Node label | The type/category of a node (e.g., Product, Order, Category) |
| Relationship type | The type of an edge (e.g., ORDERS, PART_OF) |
| Node properties | Key-value attributes on nodes (e.g., name, unitPrice) |
| Relationship properties | Key-value attributes on edges (e.g., quantity, unitPrice) |
Cypher Query Language
Cypher uses an intuitive pattern-matching syntax: () denotes a node, [] denotes a relationship, and --> denotes a directed edge.
-- return all nodes in the graph
MATCH (n) RETURN n
-- count all nodes
MATCH (n) RETURN count(n)
-- list all distinct node labels (types)
MATCH (n) RETURN DISTINCT labels(n)
-- count all Order nodes
MATCH (n:Order) RETURN count(n)
-- inspect properties of a single Order
MATCH (n:Order) RETURN properties(n) LIMIT 1
-- count all relationships
MATCH ()-[r]->() RETURN count(r)
-- list all distinct relationship types
MATCH ()-[r]->() RETURN DISTINCT type(r)
Traversing relationships โ Cypherโs real power is in pattern matching across connected nodes:
-- average order value across all orders
MATCH ()-[r:Orders]->()
RETURN AVG(r.quantity * r.unitPrice) AS average_price
-- average order value per product category
MATCH ()-[r:Orders]->()-[:PART_OF]->(c:Category)
RETURN c.categoryName, AVG(r.quantity * r.unitPrice) AS average_price
-- find all meat/poultry products with their prices
MATCH (p:Product)-[:PART_OF]->(c:Category)
WHERE c.categoryName = "Meat/Poultry"
RETURN p.productName, p.unitPrice