Normalisation

👍 Key Requirements:

  • All columns contain atomic values (no lists, sets, or composite fields).

  • Each row is unique (typically enforced by a primary key).

  • No repeating groups or arrays within a row.

  • Each column contains values of a single data type.

Suppose you have a table tracking customer purchases, where the “Purchased Products” column contains a comma-separated list of products:

Customer ID
Customer Name
Purchased Products

101

John Doe

Laptop, Mouse

102

Jane Smith

Tablet

103

Alice Brown

Keyboard, Monitor, Pen

Why is this not in 1NF?

  • Non-atomic values: “Purchased Products” contains multiple items per cell.

  • Querying and updating are complex: Searching for customers who bought “Mouse” requires string parsing.

  • Data integrity risks: No way to enforce referential integrity between products and customers.

  • Inconsistent data entry: Different delimiters or typos can creep in.

Real-World Impact:

  • Reporting (e.g., “Who bought a Laptop?”) is error-prone.

  • Updates (e.g., renaming “Mouse” to “Wireless Mouse”) are tedious and unreliable.

  • Referential integrity cannot be enforced.

Real-World Issues:

  • Reporting challenges: Generating reports such as “How many customers bought a Laptop?” becomes complicated, as you cannot simply filter a column for “Laptop”, you must parse the string.

  • Update anomalies: If a product name changes (e.g., “Mouse” to “Wireless Mouse”), you must update every occurrence in every cell, increasing the risk of missing some entries.

  • Data integrity risks: There is no way to enforce referential integrity between products and customers, which can lead to orphaned or inconsistent data.

Transformation Steps to Achieve 1NF:

  1. Identify columns with non-atomic values: In this case, “Purchased Products” contains multiple values.

  2. Split the multi-valued column into separate rows: Each product purchased by a customer should be represented as a separate row, ensuring that every field contains only a single value.

Transformed Table in 1NF:

Customer ID
Customer Name
Product

101

John Doe

Laptop

101

John Doe

Mouse

102

Jane Smith

Tablet

103

Alice Brown

Keyboard

103

Alice Brown

Monitor

103

Alice Brown

Pen

Definition: A table is in 2NF if it is in 1NF and every non-prime attribute (i.e., non-primary key attribute) is fully functionally dependent on the entire primary key. This addresses partial dependencies, where a non-key attribute depends only on part of a composite key.

Example Transformation to 2NF

1NF Table:

(student_id, course_id) | student_name | course_name

⚠️ Issue:course_name” depends only on “course_id”, not the full primary key (“course_id”, “student_id”). This is a partial dependency.

Normalization to 2NF:

  • table students(student_id, student_name)

  • table courses(course_id, course_name)

  • table student_courses(student_id, course_id)

Definition: A table is in 3NF if it is in 2NF and all the attributes are functionally dependent only on the primary key, there are no transitive dependencies

Order ID
Customer ID
Product
Supplier

201

101

Laptop

HP

202

101

Mouse

Logitech

203

102

Tablet

Apple

Supplier does not directly depend on primary key. Normalization to 3NF:

  • Move product and supplier information to separate tables.

table orders(order_id, customer_id, product_id)

table products(product_id, product_name, supplier_id)

table suppliers(supplier_id, supplier_name)

Last updated

Was this helpful?