Introduction to Normalization
Normalization is a database design technique used to reduce redundancy and eliminate anomalies in relational databases. It involves organizing data into multiple related tables to improve data integrity, consistency, and efficiency.
Why is Normalization Important?
Without normalization, databases suffer from:
✅ Redundancy – Repetitive data consumes extra storage.
✅ Insertion Anomalies – Unable to insert data without redundant details.
✅ Update Anomalies – Modifying one record requires updating multiple rows.
✅ Deletion Anomalies – Deleting one record may remove important information.
Example of a Poorly Designed Table (Unnormalized Form – UNF)
StudentID | Name | Course | Instructor | InstructorPhone | Hobby |
---|---|---|---|---|---|
101 | Alice | Math | Prof. X | 9998887777 | Reading |
101 | Alice | Science | Prof. Y | 8887776666 | Chess |
102 | Bob | English | Prof. Z | 7776665555 | Reading |
💡 Issues in this table:
- Repeated data (Student Name, Instructor, InstructorPhone).
- Multiple values in a single column (Hobby).
- Difficult to update or delete without affecting other data.
First Normal Form (1NF) – Eliminating Repeating Groups
A table is in 1NF if:
✅ Each column contains atomic (indivisible) values.
✅ Each row has a unique identifier (Primary Key).
Applying 1NF
StudentID | Name | Course | Instructor | InstructorPhone | Hobby |
---|---|---|---|---|---|
101 | Alice | Math | Prof. X | 9998887777 | Reading |
101 | Alice | Math | Prof. X | 9998887777 | Chess |
101 | Alice | Science | Prof. Y | 8887776666 | Reading |
102 | Bob | English | Prof. Z | 7776665555 | Reading |
✅ Solution:
- Hobby values are split into separate rows.
- Each column contains atomic values.
Second Normal Form (2NF) – Removing Partial Dependency
A table is in 2NF if:
✅ It is in 1NF.
✅ Every non-key attribute depends on the entire primary key.
💡 Problem in 1NF Table:
InstructorPhone
depends only onInstructor
, not onStudentID + Course
.- Solution: Split into separate tables.
Applying 2NF
Student Table
StudentID | Name |
---|---|
101 | Alice |
102 | Bob |
Course Table
Course | Instructor |
---|---|
Math | Prof. X |
Science | Prof. Y |
English | Prof. Z |
Instructor Table
Instructor | InstructorPhone |
---|---|
Prof. X | 9998887777 |
Prof. Y | 8887776666 |
Prof. Z | 7776665555 |
Student_Course Table (Mapping Table)
StudentID | Course |
---|---|
101 | Math |
101 | Science |
102 | English |
✅ Solution:
- Instructor details are moved to a separate table.
- No partial dependencies exist.
Third Normal Form (3NF) – Removing Transitive Dependency
A table is in 3NF if:
✅ It is in 2NF.
✅ No transitive dependencies (A → B and B → C, then A → C must not exist).
💡 Problem in 2NF Table:
InstructorPhone
depends onInstructor
, which depends onCourse
.
Applying 3NF
- Move
InstructorPhone
to a separate table.
Instructor Table (After 3NF)
Instructor | InstructorPhone |
---|---|
Prof. X | 9998887777 |
Prof. Y | 8887776666 |
Prof. Z | 7776665555 |
✅ Solution:
- Transitive dependency removed.
- All non-key attributes now depend directly on the primary key.
Boyce-Codd Normal Form (BCNF) – Stronger than 3NF
A table is in BCNF if:
✅ It is in 3NF.
✅ Every determinant is a candidate key.
💡 When to Apply BCNF?
If a table has multiple candidate keys with dependencies.
Example Before BCNF
TeacherID | Course | Classroom |
---|---|---|
T101 | Math | Room 101 |
T102 | Science | Room 102 |
T101 | Physics | Room 101 |
- Issue:
Course → Classroom
, butTeacherID → Classroom
is also valid.- Not every determinant is a candidate key.
✅ Solution:
- Break the table into separate tables to remove conflicting dependencies.
Fourth Normal Form (4NF) – Removing Multi-Valued Dependencies
A table is in 4NF if:
✅ It is in BCNF.
✅ It has no multi-valued dependencies (MVDs).
💡 Example Before 4NF
StudentID | Course | Hobby |
---|---|---|
101 | Math | Reading |
101 | Math | Chess |
- Issue:
StudentID →→ Hobby
andStudentID →→ Course
are independent.
✅ Solution:
Separate Multi-Valued Data into Two Tables.
Student_Course Table
StudentID | Course |
---|---|
101 | Math |
Student_Hobby Table
StudentID | Hobby |
---|---|
101 | Reading |
101 | Chess |
Fifth Normal Form (5NF) – Eliminating Join Dependencies
A table is in 5NF if:
✅ It is in 4NF.
✅ No join dependency exists (tables can be recombined without losing information).
💡 Used for complex relationships where multiple many-to-many dependencies exist.
Domain-Key Normal Form (DKNF) – The Ultimate Normal Form
A table is in DKNF if:
✅ Every constraint is defined as a key constraint or domain constraint.
✅ No additional constraints exist at the application level.
💡 Achieving DKNF is difficult in practice.
Final Thoughts
- Normalization eliminates redundancy and improves data integrity.
- 1NF ensures atomicity, 2NF removes partial dependencies, 3NF removes transitive dependencies.
- BCNF, 4NF, 5NF, and DKNF further refine the structure for complex databases.