1.2 Normalization

1.2.1 Normalization Fundamentals

Normalization is a data modeling practice typically applied to relational databases to eliminate data redundancy and ensure referential integrity between tables. It was defined by Edgar Codd with two core objectives:

  1. Free relations from undesirable insertion, update, and deletion dependencies
  2. Reduce the need to restructure relations as new data types are introduced

Consider the difference: in first normal form, data lives in a single wide table - updating a customer name requires changing multiple rows, and adding a new column affects every row. In third normal form, changing a name means updating a single row in the customers table, and new attributes can be added through dedicated tables.

1.2.2 Normal Forms

Normal FormRequirementsWhat It Eliminates
DenormalizedNo rules - all data in one table, may contain nested JSONNothing - significant redundancy
1NFEach column holds a single value, unique primary key existsRepeating groups and multi-valued columns
2NFMeets 1NF + no partial dependencies (non-key columns depend on the full composite key)Partial dependencies on composite keys
3NFMeets 2NF + no transitive dependencies (non-key columns donโ€™t depend on other non-key columns)Transitive dependencies - data is fully normalized

Denormalized Form

All data sits in one table. Some columns may contain nested JSON. The table contains significant redundancy - the same customer name and address appear on every order row.


1st Normal Form (1NF)

Each column is unique and holds a single atomic value, and the table has a unique primary key. For example, order_id + line_no together form the composite primary key.


2nd Normal Form (2NF)

Builds on 1NF by removing partial dependencies - cases where a subset of non-key columns depends on only part of a composite key. However, 2NF can still contain transitive dependencies where a non-key column depends on another non-key column. For example, price and name depend on product_id in order items, while customer_name and address depend on customer_id in orders.


3rd Normal Form (3NF)

Meets all 2NF requirements and eliminates transitive dependencies. At this stage, the data is considered normalized. Each non-key column depends only on the primary key.


Choosing the Right Level

The right degree of normalization depends on the use case - there is no one-size-fits-all solution:

ApproachAdvantagesTrade-offs
More normalizationBetter data integrity, efficient writes, less redundancyMore joins needed for queries, slower reads
More denormalizationFaster reads, simpler queries, fewer joinsMore redundancy, complex updates, risk of inconsistency
Normalization progression from denormalized through 1NF, 2NF, to 3NF Normalization progression from denormalized through 1NF, 2NF, to 3NF