Database Normalization: 1NF, 2NF and 3NF on One Example
Normalization is reshaping a database structure so that every fact is stored exactly once. The rules of normalization are described as steps — normal forms: each next form eliminates its own kind of duplication. In practice the first three are enough in the vast majority of cases.
Let's walk through all three forms on a single running example that will travel from a "dump-it-all table" to a proper schema.
The Starting Point: Everything in One Table
A food delivery service stores its orders like this:
It looks compact, but working with it is impossible:
- You can't count cola sales — the products are glued into a string, all you have left is LIKE '%Cola%' and the hope that "Cola Zero" never shows up.
- Anna changed her phone number — you have to update every one of her orders. Miss one row and the database holds two different numbers with no way to tell which is right.
- Order 2 was deleted — and Boris with his phone number vanished from the database along with it.
- A new customer without an order — there is simply nowhere to record them.
These situations are called update, insertion and deletion anomalies. Normalization exists precisely so that they cannot happen.
1NF: One Value per Cell
The first normal form demands atomicity: the intersection of a row and a column holds one indivisible value — no lists, no enumerations. We unfold every order line into its own row:
Cola sales are now an ordinary GROUP BY. A row is uniquely identified by the pair (order_id, product) — the table's composite key.
But there is even more duplication now: Anna's name and phone repeat in every row of every one of her orders. Let's keep going.
An employees table has a skills column with values like 'SQL, Python, Excel'. Which normalization rule is violated?
2NF: A Fact Depends on the Whole Key, Not Part of It
The second normal form looks at tables with composite keys and demands: every non-key column must depend on the entire key (order_id, product), not on a part of it. We have two violations:
- customer and phone depend only on order_id — who placed the order does not change with the product in the row;
- price depends only on product — the price of pizza is the same in every order.
We move each fact into a table where it depends on its key in full:
MySQL 8.1CREATE TABLE orders ( order_id INT PRIMARY KEY, customer VARCHAR(100), phone VARCHAR(20) ); CREATE TABLE products ( product_id INT PRIMARY KEY, name VARCHAR(100), price DECIMAL(10,2) ); CREATE TABLE order_items ( order_id INT REFERENCES orders(order_id), product_id INT REFERENCES products(product_id), quantity INT, PRIMARY KEY (order_id, product_id) );
A product's price now lives once in products, and the contents of an order are compact identifier pairs in order_items.
3NF: Non-Key Columns Don't Depend on Each Other
The third normal form finishes off the last duplication — transitive dependencies, where a non-key column depends not on the key but on another non-key column. The orders table has exactly that left: phone depends on customer, not on the order number. Anna's phone still repeats in each of her orders.
We move the customer into a table of their own:
MySQL 8.1CREATE TABLE customers ( customer_id INT PRIMARY KEY, name VARCHAR(100), phone VARCHAR(20) ); CREATE TABLE orders ( order_id INT PRIMARY KEY, customer_id INT REFERENCES customers(customer_id), order_date TIMESTAMP );
The classic formula captures the essence of 3NF: every non-key column depends "on the key, the whole key, and nothing but the key".
An orders table stores the columns city and city_zip, and the zip code is fully determined by the city. Which form is violated?
The Final Schema
One dump-it-all table became four connected tables — explore the schema live:
Let's check what happened to the anomalies: Anna's phone changes in one place; deleting an order doesn't touch the customer; a customer with no orders lives happily in customers; cola sales are counted from order_items without any LIKE.

When Duplication Is Justified: Denormalization
Normalization is not a goal in itself but a way to protect data from drifting out of sync. Sometimes it is deliberately set aside:
- A snapshot in time. The price in products changes, but the receipt of a paid order must keep the price as it was at purchase. That is why order_items often gets its own price column — not a mistake but a recorded historical fact.
- Precomputed totals. An order total can be recalculated from its lines every time, but at thousands of requests per second it is cheaper to store total_amount in orders and update it when the contents change.
- Analytics. In reporting marts the data is intentionally "folded back" into wide tables so queries don't have to assemble a dozen joins.
The key word is deliberately: denormalization always means that keeping the duplicates in sync is now your code's job, not the DBMS's.
What's Next
- what a sound table structure is made of — in the lesson Structure of relational databases;
- the primary and foreign keys that hold relations together — in the lesson Constraints;
- designing schemas "from the business problem", including normalization, denormalization and data history, has a whole course — Database Design and Architecture.
Related articles
SQL from Scratch: A Step-by-Step Learning Plan for Beginners in 2026
A 6-week roadmap — from your first SELECT to the interview
ACID: The 4 Properties of Database Transactions Explained Simply
Atomicity, consistency, isolation, durability — through a money transfer example
ROW_NUMBER vs RANK vs DENSE_RANK in SQL: The Difference in One Example
Three ranking functions, one query — and the difference is visible