Database Normalization: 1NF, 2NF and 3NF on One Example

Normalization is reshaping a database structure so that every fact is stored exactly once. The rules of normalization are described as steps — normal forms: each next form eliminates its own kind of duplication. In practice the first three are enough in the vast majority of cases.

Let's walk through all three forms on a single running example that will travel from a "dump-it-all table" to a proper schema.

The Starting Point: Everything in One Table

A food delivery service stores its orders like this:

order_id	customer	phone	items	prices
1	Anna	+1 915 000-11-22	Pizza, Cola	12.99, 2.49
2	Boris	+1 903 555-66-77	Sushi Set	24.99
3	Anna	+1 915 000-11-22	Cola, Chips	2.49, 2.99

It looks compact, but working with it is impossible:

You can't count cola sales — the products are glued into a string, all you have left is LIKE '%Cola%' and the hope that "Cola Zero" never shows up.
Anna changed her phone number — you have to update every one of her orders. Miss one row and the database holds two different numbers with no way to tell which is right.
Order 2 was deleted — and Boris with his phone number vanished from the database along with it.
A new customer without an order — there is simply nowhere to record them.

These situations are called update, insertion and deletion anomalies. Normalization exists precisely so that they cannot happen.

1NF: One Value per Cell

The first normal form demands atomicity: the intersection of a row and a column holds one indivisible value — no lists, no enumerations. We unfold every order line into its own row:

order_id	customer	phone	product	price
1	Anna	+1 915 000-11-22	Pizza	12.99
1	Anna	+1 915 000-11-22	Cola	2.49
2	Boris	+1 903 555-66-77	Sushi Set	24.99
3	Anna	+1 915 000-11-22	Cola	2.49
3	Anna	+1 915 000-11-22	Chips	2.99

Cola sales are now an ordinary GROUP BY. A row is uniquely identified by the pair (order_id, product) — the table's composite key.

But there is even more duplication now: Anna's name and phone repeat in every row of every one of her orders. Let's keep going.

An employees table has a skills column with values like 'SQL, Python, Excel'. Which normalization rule is violated?

2NF: A Fact Depends on the Whole Key, Not Part of It

The second normal form looks at tables with composite keys and demands: every non-key column must depend on the entire key (order_id, product), not on a part of it. We have two violations:

customer and phone depend only on order_id — who placed the order does not change with the product in the row;
price depends only on product — the price of pizza is the same in every order.

We move each fact into a table where it depends on its key in full:

MySQL 8.1
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer VARCHAR(100),
    phone VARCHAR(20)
);

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL(10,2)
);

CREATE TABLE order_items (
    order_id INT REFERENCES orders(order_id),
    product_id INT REFERENCES products(product_id),
    quantity INT,
    PRIMARY KEY (order_id, product_id)
);

A product's price now lives once in products, and the contents of an order are compact identifier pairs in order_items.

3NF: Non-Key Columns Don't Depend on Each Other

The third normal form finishes off the last duplication — transitive dependencies, where a non-key column depends not on the key but on another non-key column. The orders table has exactly that left: phone depends on customer, not on the order number. Anna's phone still repeats in each of her orders.

We move the customer into a table of their own:

MySQL 8.1
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    phone VARCHAR(20)
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    order_date TIMESTAMP
);

The classic formula captures the essence of 3NF: every non-key column depends "on the key, the whole key, and nothing but the key".

An orders table stores the columns city and city_zip, and the zip code is fully determined by the city. Which form is violated?

The Final Schema

One dump-it-all table became four connected tables — explore the schema live:

Let's check what happened to the anomalies: Anna's phone changes in one place; deleting an order doesn't touch the customer; a customer with no orders lives happily in customers; cola sales are counted from order_items without any LIKE.

The path of normalization: from a dump-it-all table to three normal forms

When Duplication Is Justified: Denormalization

Normalization is not a goal in itself but a way to protect data from drifting out of sync. Sometimes it is deliberately set aside:

A snapshot in time. The price in products changes, but the receipt of a paid order must keep the price as it was at purchase. That is why order_items often gets its own price column — not a mistake but a recorded historical fact.
Precomputed totals. An order total can be recalculated from its lines every time, but at thousands of requests per second it is cheaper to store total_amount in orders and update it when the contents change.
Analytics. In reporting marts the data is intentionally "folded back" into wide tables so queries don't have to assemble a dozen joins.

The key word is deliberately: denormalization always means that keeping the duplicates in sync is now your code's job, not the DBMS's.

What's Next

what a sound table structure is made of — in the lesson Structure of relational databases;
the primary and foreign keys that hold relations together — in the lesson Constraints;
designing schemas "from the business problem", including normalization, denormalization and data history, has a whole course — Database Design and Architecture.

SELECT *

Fundamentals

7 min read