1.4 Advanced Modeling Approaches

1.4.1 Inmon vs. Kimball Modeling Approaches

The data warehouse was created to separate source systems from analytical systems. Two foundational approaches define how data should be structured inside the warehouse.


Inmon Approach

Stores data in the warehouse in highly normalized third normal form, then provides additional data marts (often as star schemas) for specific departments.

  • Source systems โ†’ 3NF warehouse โ†’ Star schema data marts โ†’ Business users
  • Prioritizes data quality and consistency - single source of truth
  • Slower to set up, but more robust and flexible long-term

Kimball Approach

Serves data structured as star schemas directly in the data warehouse, skipping the normalized intermediate layer.

  • Source systems โ†’ Star schema warehouse โ†’ Business users
  • Prioritizes speed of delivery - faster modeling and iteration
  • Introduces more redundancy, but simpler for business users to query

When to Choose

ScenarioRecommended Approach
Data quality is the highest priorityInmon
Analysis requirements are not yet definedInmon
Quick insights and rapid iteration neededKimball
Small team with limited modeling resourcesKimball
Inmon top-down vs Kimball bottom-up warehouse data flow Inmon top-down vs Kimball bottom-up warehouse data flow

1.4.2 Data Vault Modeling Approach

While Inmon and Kimball focus on how business logic is structured in the warehouse, Data Vault focuses on separating the structural aspects of data - business entities and their relationships. This separation keeps the warehouse flexible, agile, and scalable even as the business evolves.

Key characteristics:

  • Three layers: Staging โ†’ Enterprise Data Warehouse โ†’ Information Delivery
  • No notion of โ€œgood,โ€ โ€œbad,โ€ or โ€œconformedโ€ data - only changes the storage structure
  • Full traceability back to source systems
  • Minimal restructuring when business requirements change

3 Types of Tables in a Data Vault

Table TypePurposeKey Columns
HubStores a unique list of business keys (core entities)Hash key (PK), business key, load date, record source
LinkConnects two or more hubs (relationships between entities)Hash key (PK), parent hub hash keys, load date, record source
SatelliteContains descriptive attributes that provide context for hubs and linksParent hash key + load date (composite PK), attributes, record source

For example, an order_customer link table connects the order and customer hubs. The linkโ€™s primary key is the hash of the combined parent business keys.

1.4.3 One Big Table and Summary

The Inmon and Kimball models were developed when warehouses were expensive, on-premises, and resource-constrained with tightly coupled compute and storage. Modern cloud infrastructure has changed the calculus.


One Big Table

The One Big Table approach uses a single wide table (potentially thousands of columns) that is highly denormalized. It can contain nested and varied data types, requires no complex joins, and supports fast analytical queries.

AdvantagesDisadvantages
Low storage cost on cloud columnar storesBusiness logic can get lost in analytics
Nested data allows flexible schemasComplex data structures for nested data
Reading nulls in columnar storage is freeUpdate and aggregation performance can suffer
No joins needed - simpler queriesHarder to maintain data integrity

Works well when you have a large volume of data that needs more flexibility than traditional modeling approaches provide.


Summary of Modeling Approaches

Four modeling approaches compared: Inmon, Kimball, Data Vault, One Big Table Four modeling approaches compared: Inmon, Kimball, Data Vault, One Big Table

1.4.4 dbt (Data Build Tool)

dbtdbt

dbt (data build tool) wraps SQL statements that create fact and dimension tables with a CREATE statement.


It helps document and validate data within the data warehouse.

VariantDescription
dbt CoreOpen-source CLI tool that communicates with databases through adapters (e.g., dbt-postgres, dbt-redshift)
dbt CloudHosted environment with a browser-based interface that runs dbt Core under the hood

dbt enables version-controlled, testable, and documented data transformations - treating SQL-based modeling like software engineering.