1.4 Specialized Databases

1.4.1 Graph Databases

Graph databases model data as nodes (entities) connected by edges (relationships). While the same data could live in a relational database, graph databases treat relationships as first-class citizens, making it natural to traverse connections through graph queries rather than complex SQL joins.

Use caseExample
Recommendation systemsโ€Customers who bought X also bought Yโ€
Social networksFriendship graphs, follower relationships
Data lineageTracing how data flows through pipelines
Fraud detectionIdentifying suspicious transaction patterns
Knowledge graphsConnecting entities and their relationships
Network/IT operationsMapping infrastructure dependencies
Supply chain logisticsSimulating logistics flows

Popular graph databases include Neo4j, ArangoDB, and Amazon Neptune. Query languages include Cypher, Gremlin, and SPARQL.

Graph database schematic showing nodes, edges, and properties Graph database schematic showing nodes, edges, and properties

1.4.2 Vector Databases

Vector databases are optimized for similarity search โ€” efficiently querying data based on semantic closeness rather than exact matches.

AspectDetails
Core conceptStore and query vector embeddings โ€” numerical representations of data
Similarity metricsEuclidean distance, cosine distance
Core algorithmK-Nearest Neighbors (KNN), optimized via Approximate Nearest Neighbors (ANN) for scale
Use casesRecommendation systems, anomaly detection, text generation, semantic search
Vector database schematic showing embedding, vector space, and KNN search Vector database schematic showing embedding, vector space, and KNN search

1.4.3 Neo4j and the Cypher Query Language

Neo4j uses the Property Graph Model to describe graph structure:

ConceptDescription
Node labelThe type/category of a node (e.g., Product, Order, Category)
Relationship typeThe type of an edge (e.g., ORDERS, PART_OF)
Node propertiesKey-value attributes on nodes (e.g., name, unitPrice)
Relationship propertiesKey-value attributes on edges (e.g., quantity, unitPrice)

Cypher Query Language

Cypher uses an intuitive pattern-matching syntax: () denotes a node, [] denotes a relationship, and --> denotes a directed edge.

-- return all nodes in the graph
MATCH (n) RETURN n

-- count all nodes
MATCH (n) RETURN count(n)

-- list all distinct node labels (types)
MATCH (n) RETURN DISTINCT labels(n)

-- count all Order nodes
MATCH (n:Order) RETURN count(n)

-- inspect properties of a single Order
MATCH (n:Order) RETURN properties(n) LIMIT 1

-- count all relationships
MATCH ()-[r]->() RETURN count(r)

-- list all distinct relationship types
MATCH ()-[r]->() RETURN DISTINCT type(r)

Traversing relationships โ€” Cypherโ€™s real power is in pattern matching across connected nodes:

-- average order value across all orders
MATCH ()-[r:Orders]->()
RETURN AVG(r.quantity * r.unitPrice) AS average_price

-- average order value per product category
MATCH ()-[r:Orders]->()-[:PART_OF]->(c:Category)
RETURN c.categoryName, AVG(r.quantity * r.unitPrice) AS average_price

-- find all meat/poultry products with their prices
MATCH (p:Product)-[:PART_OF]->(c:Category)
WHERE c.categoryName = "Meat/Poultry"
RETURN p.productName, p.unitPrice