1.4 Advanced Modeling Approaches

1.4.1 Inmon vs. Kimball Modeling Approaches

The data warehouse was created to separate source systems from analytical systems. Two foundational approaches define how data should be structured inside the warehouse.

Inmon Approach

Stores data in the warehouse in highly normalized third normal form, then provides additional data marts (often as star schemas) for specific departments.

Source systems → 3NF warehouse → Star schema data marts → Business users
Prioritizes data quality and consistency - single source of truth
Slower to set up, but more robust and flexible long-term

Kimball Approach

Serves data structured as star schemas directly in the data warehouse, skipping the normalized intermediate layer.

Source systems → Star schema warehouse → Business users
Prioritizes speed of delivery - faster modeling and iteration
Introduces more redundancy, but simpler for business users to query

When to Choose

Scenario	Recommended Approach
Data quality is the highest priority	Inmon
Analysis requirements are not yet defined	Inmon
Quick insights and rapid iteration needed	Kimball
Small team with limited modeling resources	Kimball

Inmon top-down vs Kimball bottom-up warehouse data flow

1.4.2 Data Vault Modeling Approach

While Inmon and Kimball focus on how business logic is structured in the warehouse, Data Vault focuses on separating the structural aspects of data - business entities and their relationships. This separation keeps the warehouse flexible, agile, and scalable even as the business evolves.

Key characteristics:

Three layers: Staging → Enterprise Data Warehouse → Information Delivery
No notion of “good,” “bad,” or “conformed” data - only changes the storage structure
Full traceability back to source systems
Minimal restructuring when business requirements change

3 Types of Tables in a Data Vault

Table Type	Purpose	Key Columns
Hub	Stores a unique list of business keys (core entities)	Hash key (PK), business key, load date, record source
Link	Connects two or more hubs (relationships between entities)	Hash key (PK), parent hub hash keys, load date, record source
Satellite	Contains descriptive attributes that provide context for hubs and links	Parent hash key + load date (composite PK), attributes, record source

For example, an order_customer link table connects the order and customer hubs. The link’s primary key is the hash of the combined parent business keys.

1.4.3 One Big Table and Summary

The Inmon and Kimball models were developed when warehouses were expensive, on-premises, and resource-constrained with tightly coupled compute and storage. Modern cloud infrastructure has changed the calculus.

One Big Table

The One Big Table approach uses a single wide table (potentially thousands of columns) that is highly denormalized. It can contain nested and varied data types, requires no complex joins, and supports fast analytical queries.

Advantages	Disadvantages
Low storage cost on cloud columnar stores	Business logic can get lost in analytics
Nested data allows flexible schemas	Complex data structures for nested data
Reading nulls in columnar storage is free	Update and aggregation performance can suffer
No joins needed - simpler queries	Harder to maintain data integrity

Works well when you have a large volume of data that needs more flexibility than traditional modeling approaches provide.

Summary of Modeling Approaches

Four modeling approaches compared: Inmon, Kimball, Data Vault, One Big Table

1.4.4 dbt (Data Build Tool)

dbt (data build tool) wraps SQL statements that create fact and dimension tables with a CREATE statement.

It helps document and validate data within the data warehouse.

Variant	Description
dbt Core	Open-source CLI tool that communicates with databases through adapters (e.g., `dbt-postgres`, `dbt-redshift`)
dbt Cloud	Hosted environment with a browser-based interface that runs dbt Core under the hood

dbt enables version-controlled, testable, and documented data transformations - treating SQL-based modeling like software engineering.