What is Normalization?
Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It involves dividing large tables into smaller, related tables and defining relationships between them using primary keys and foreign keys.
Objectives of Normalization
✅ Eliminate data redundancy – Avoid storing duplicate data.
✅ Ensure data consistency – Prevent anomalies during insert, update, and delete operations.
✅ Improve database efficiency – Optimize storage and reduce unnecessary complexity.
✅ Enhance data integrity – Enforce constraints like primary keys and foreign keys.
Problems Without Normalization
Before applying normalization, a database may have anomalies, such as:
1. Insertion Anomaly
- Occurs when a new record cannot be inserted without redundant data.
- Example: If we store student and course details in the same table, we cannot insert a new student until they enroll in a course.
2. Update Anomaly
- Occurs when modifying a single record requires updating multiple rows.
- Example: If an instructor’s phone number appears in multiple rows, updating it requires changing every occurrence.
3. Deletion Anomaly
- Occurs when deleting a record removes useful information.
- Example: If a student drops all courses, their details may also be deleted from a poorly designed table.
Normalization Process – Normal Forms (NF)
Normalization follows a step-by-step approach known as normal forms (NF). Each step refines the table structure to reduce redundancy and anomalies.
1. First Normal Form (1NF) – Eliminating Repeating Groups
A table is in 1NF if:
✅ Each column has atomic (indivisible) values.
✅ Each row has a unique identifier (Primary Key).
Example: Unnormalized Table (UNF)
StudentID | Name | Course | Phone Number |
---|---|---|---|
101 | Alice | Math, Science | 9876543210 |
102 | Bob | English | 9123456789 |
❌ Issues:
- Multiple values in the
Course
column (Math, Science). - Non-atomic values violate 1NF.
Normalized Table (1NF)
StudentID | Name | Course | Phone Number |
---|---|---|---|
101 | Alice | Math | 9876543210 |
101 | Alice | Science | 9876543210 |
102 | Bob | English | 9123456789 |
✅ Solution:
- Each column contains atomic values.
- Separate rows for each course.
2. Second Normal Form (2NF) – Removing Partial Dependency
A table is in 2NF if:
✅ It is in 1NF.
✅ No partial dependency (All non-key attributes must depend only on the primary key).
Example: 1NF Table (Before 2NF)
StudentID | Name | Course | Instructor |
---|---|---|---|
101 | Alice | Math | Prof. X |
101 | Alice | Science | Prof. Y |
102 | Bob | English | Prof. Z |
❌ Issue:
Name
depends only on StudentID, butInstructor
depends only on Course.- Partial dependency exists (A non-key attribute depends only on part of the primary key).
Solution: Break into Two Tables (2NF)
Table 1: Student Table
StudentID | Name |
---|---|
101 | Alice |
102 | Bob |
Table 2: Course Table
Course | Instructor |
---|---|
Math | Prof. X |
Science | Prof. Y |
English | Prof. Z |
Table 3: Student_Course Table (Mapping Table)
StudentID | Course |
---|---|
101 | Math |
101 | Science |
102 | English |
✅ Solution:
- Removes partial dependency by separating attributes that do not depend on the entire primary key.
3. Third Normal Form (3NF) – Removing Transitive Dependency
A table is in 3NF if:
✅ It is in 2NF.
✅ No transitive dependency (A non-key attribute should not depend on another non-key attribute).
Example: 2NF Table (Before 3NF)
StudentID | Name | Course | Instructor | InstructorPhone |
---|---|---|---|---|
101 | Alice | Math | Prof. X | 9998887777 |
101 | Alice | Science | Prof. Y | 8887776666 |
102 | Bob | English | Prof. Z | 7776665555 |
❌ Issue:
InstructorPhone
depends onInstructor
, not directly on StudentID.- Transitive dependency exists (one non-key column depends on another non-key column).
Solution: Break into Two Tables (3NF)
Table 1: Student_Course Table
StudentID | Course | Instructor |
---|---|---|
101 | Math | Prof. X |
101 | Science | Prof. Y |
102 | English | Prof. Z |
Table 2: Instructor Table
Instructor | InstructorPhone |
---|---|
Prof. X | 9998887777 |
Prof. Y | 8887776666 |
Prof. Z | 7776665555 |
✅ Solution:
- Removes transitive dependency by moving instructor details into a separate table.
Higher Normal Forms (BCNF, 4NF, 5NF)
Beyond 3NF, advanced normal forms ensure even stricter integrity:
Boyce-Codd Normal Form (BCNF)
✅ A table is in BCNF if:
- It is in 3NF.
- For every functional dependency (A → B),
A
must be a super key.
📌 Used when a table has multiple candidate keys with overlapping dependencies.
Fourth Normal Form (4NF)
✅ Eliminates multi-valued dependencies, ensuring no column contains multiple independent values.
📌 Used when a table has two or more independent one-to-many relationships.
Fifth Normal Form (5NF)
✅ Removes join dependencies by ensuring no information is lost when tables are recombined.
📌 Used in highly complex databases with multi-relational dependencies.
Conclusion
🔹 Normalization helps to reduce redundancy, anomalies, and improve database efficiency.
🔹 1NF ensures atomic values, 2NF removes partial dependency, and 3NF removes transitive dependency.
🔹 Higher normal forms (BCNF, 4NF, 5NF) further refine database design for complex systems.