1.3 Storage in Databases

1.3.1 Database Management Systems

A database management system (DBMS) is the software layer that sits between your application and the raw storage hardware. It manages both relational and non-relational databases (such as graph databases). Understanding its architecture helps explain why different databases perform differently.

DBMS Architecture

ComponentRole
Transport systemHandles client connections and network communication
Query processorParses SQL/query language and optimizes the execution plan
Execution engineRuns the optimized query plan against the storage layer
Storage engineManages how data is serialized, arranged on disk, and indexed

The storage engine is the most relevant component for understanding performance tradeoffs.

DBMS architecture layers from client to disk DBMS architecture layers from client to disk An **index** is a data structure that locates data in O(log n) time via binary search, instead of scanning all rows in O(n). Modern storage engines are optimized for SSDs, handle complex data types, and offer columnar support for analytical workloads.

In-Memory Storage Systems

In-memory stores trade durability for speed — they are fast and low-latency, but volatile. Common use cases include caching, real-time applications, and gaming.

SystemTypeUse case
MemcachedKey-value storeCaching database query results or API responses. Acceptable when data loss is tolerable.
RedisKey-value storeRicher data types (lists, sets, hashes). High-performance apps that can tolerate minor data loss.

1.3.2 Row vs. Column Storage

The way data is physically arranged on disk has a dramatic impact on query performance.

Row-oriented storage writes each row as a contiguous sequence of bytes. This layout is ideal for OLTP workloads where you frequently read or write entire rows with low latency. However, it is inefficient for analytical queries that only need a subset of columns — the system must still scan every column across every row.

Column-oriented storage writes each column contiguously. This is ideal for OLAP analytical queries that aggregate a single column across millions of rows, but inefficient for transactional workloads that read/write entire rows.


Performance Example

Consider a table with 1 billion rows, 30 columns, 100 bytes per entry, and a 200 MB/s disk transfer speed:

-- how long does this take with row vs. column storage?
SELECT SUM(price) FROM my_table
Storage typeData readCalculationTime
Row-orientedAll 30 columns, all rows1B × 30 × 100 bytes = 3,000,000 MB~4.2 hours
Column-orientedOnly the price column1B × 100 bytes = 100,000 MB~8.3 minutes
Row-OrientedColumn-Oriented
Optimized forOLTP (transactional)OLAP (analytical)
Read patternFull rows at a timeIndividual columns at a time
Write patternAppend entire rows efficientlyMust write to each column file separately
Best whenReading/writing complete recordsAggregating single columns across many rows
ExamplesPostgreSQL, MySQLRedshift, BigQuery, Snowflake