Lesson 22 of 24

Database Design

Database Normalization

Normalization is the process of organizing a database to reduce redundancy and improve data integrity. It involves splitting data across multiple related tables so that each fact is stored in exactly one place.

Designing a starship requires careful planning of how all systems --- life support, propulsion, weapons --- connect and depend on each other. Good database design demands the same discipline.

First Normal Form (1NF)

A table is in 1NF when:

  • Each column contains atomic (indivisible) values.
  • There are no repeating groups or arrays stored in a single column.
  • Each row is unique (usually enforced by a primary key).

Violates 1NF:

idnamephone_numbers
1Alice555-0100, 555-0101
2Bob555-0200

The phone_numbers column contains multiple values. To fix this, move phone numbers to a separate table:

Satisfies 1NF:

idname
1Alice
2Bob
idcustomer_idphone_number
11555-0100
21555-0101
32555-0200

Second Normal Form (2NF)

A table is in 2NF when:

  • It satisfies 1NF.
  • Every non-key column depends on the entire primary key, not just part of it.

This mainly applies to tables with composite primary keys. If a column depends on only one part of the key, it should be moved to its own table.

Violates 2NF (composite key: order_id + product_id):

order_idproduct_idproduct_namequantity
1101Laptop1
1102Headphones2

product_name depends only on product_id, not the full key. It should live in a separate products table.

Third Normal Form (3NF)

A table is in 3NF when:

  • It satisfies 2NF.
  • No non-key column depends on another non-key column (no transitive dependencies).

Violates 3NF:

idnamedepartment_iddepartment_name
1Alice10Engineering
2Bob10Engineering

department_name depends on department_id, not on the primary key id. The fix is a separate departments table:

idnamedepartment_id
1Alice10
2Bob10
idname
10Engineering

Denormalized vs Normalized

Consider an e-commerce order stored in a single denormalized table:

Denormalized (single table):

order_idcustomer_namecustomer_emailproduct_nameproduct_pricequantitytotal
1Alicealice@example.comLaptop999.991999.99
2Alicealice@example.comHeadphones79.992159.98

Problems with this design:

  • Update anomaly: If Alice changes her email, every row must be updated.
  • Insert anomaly: You cannot add a new customer until they place an order.
  • Delete anomaly: If you delete Alice's orders, you lose her customer record.

Normalized (multiple tables):

customers table:

idnameemail
1Alicealice@example.com

products table:

idnameprice
1Laptop999.99
2Headphones79.99

orders table:

idcustomer_idproduct_idquantitytotal
1111999.99
2122159.98

Each fact is stored once. Changes propagate correctly. Referential integrity is maintained through foreign keys.

When to Denormalize

Normalization is not always the right choice. In some cases, controlled denormalization improves performance:

  • Read-heavy workloads: If a query constantly joins five tables, storing precomputed data in one table avoids the join overhead.
  • Reporting and analytics: Aggregated or materialized views are often denormalized for speed.
  • Caching columns: Storing a derived value (like order_total) avoids recalculating it on every read.

Tip: Start with a normalized design. Denormalize only when you have measured a performance problem that normalization causes. Premature denormalization trades correctness for speed you may not need.

Naming Conventions

Consistent naming makes a schema easier to navigate:

ConventionExampleNotes
Table names: plural, snake_caseproducts, order_itemsRepresents a collection of entities
Column names: singular, snake_casefirst_name, created_atRepresents a single attribute
Primary key: idid SERIAL PRIMARY KEYSimple and universally understood
Foreign key: table_idcustomer_id, product_idReferences id in the related table
Indexes: idx_table_columnidx_products_categoryDescribes what is indexed
Boolean columns: is_ or has_ prefixis_active, has_shippedClearly indicates a boolean
Timestamps: _at suffixcreated_at, updated_atStandard temporal column naming

Schema Design Tips

  1. Always define primary keys. Every table should have a primary key. Use SERIAL or BIGSERIAL for auto-incrementing integers, or UUID for distributed systems.

  2. Use foreign keys. Foreign key constraints enforce referential integrity and prevent orphaned records:

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_id INTEGER NOT NULL REFERENCES customers(id),
  product_id INTEGER NOT NULL REFERENCES products(id),
  quantity INTEGER NOT NULL CHECK (quantity > 0),
  total DECIMAL(10,2) NOT NULL
);
  1. Add NOT NULL where appropriate. Columns that must always have a value should be NOT NULL. This prevents accidental gaps in your data.

  2. Use CHECK constraints. Enforce data rules at the database level:

ALTER TABLE products ADD CONSTRAINT price_positive CHECK (price > 0);
  1. Plan for time. Include created_at and updated_at columns in tables that track records over time. Use TIMESTAMPTZ (timestamp with time zone) to avoid timezone issues.

  2. Think about access patterns. Design tables around how data will be queried, not just how it is conceptually organized.

Tip: A well-designed schema is the foundation of a reliable application. Spending time on design upfront saves significant effort debugging data issues later.

Your Task

Write a query that shows the normalized structure by joining customers with their orders and the product names.

SQL runtime loading...
Loading...
Click "Run" to execute your code.