Introduction to database design and normal forms

UsmanAhmed269749 8 views 28 slides Oct 28, 2025
Slide 1
Slide 1 of 28
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28

About This Presentation

first second and third normal forms


Slide Content

Database design & development

Database Normalization Database normalization is structured around a series of increasingly strict rules called  normal forms. Each normal form addresses specific types of redundancy and dependency issues, guiding toward a more robust and maintainable relational schema. The most widely applied normal forms are First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and Boyce- Codd Normal Form (BCNF).

First Normal Form (1NF) First Normal Form (1NF) is the initial stage in the process of database normalization. It ensures that each column in a table contains only atomic, indivisible values, and that each row is uniquely identifiable. By removing repeating groups and multi-valued attributes, 1NF lays the groundwork for a more organized and consistent database structure. Key Requirements: All columns contain atomic values (no lists, sets, or composite fields). Each row is unique (typically enforced by a primary key). No repeating groups or arrays within a row. Each column contains values of a single data type.

First Normal Form (1NF) Suppose you have a table tracking customer purchases, where the “Purchased Products” column contains a comma-separated list of products : Customer ID Customer Name Purchased Products 101 John Doe Laptop, Mouse 102 Jane Smith Tablet 103 Alice Brown Keyboard, Monitor, Pen

First Normal Form (1NF) Why is this not in 1NF? Non-atomic values:  “Purchased Products” contains multiple items per cell. Querying and updating are complex:  Searching for customers who bought “Mouse” requires string parsing. Data integrity risks:  No way to enforce referential integrity between products and customers. Inconsistent data entry:  Different typos can creep in . Customer ID Customer Name Purchased Products 101 John Doe Laptop, Mouse 102 Jane Smith Tablet 103 Alice Brown Keyboard, Monitor, Pen

First Normal Form (1NF ) Real-World Impact: Reporting (e.g., “Who bought a Laptop?”) is error-prone. Updates (e.g., renaming “Mouse” to “Wireless Mouse”) are tedious and unreliable. Referential integrity cannot be enforced .

Real-World Issues: Reporting challenges:  Generating reports such as “How many customers bought a Laptop?” becomes complicated, as you cannot simply filter a column for “Laptop”, you must parse the string. Update anomalies:  If a product name changes (e.g., “Mouse” to “Wireless Mouse”), you must update every occurrence in every cell, increasing the risk of missing some entries. Data integrity risks:  There is no way to enforce referential integrity between products and customers, which can lead to inconsistent data.

First Normal Form (1NF) This unnormalized table structure is easy to read for small datasets but quickly becomes unmanageable and unreliable as the amount of data grows. To comply with First Normal Form (1NF), we must ensure that each field contains only a single value, and that the table structure supports efficient querying, updating, and data integrity .

First Normal Form (1NF) Problems with the Unnormalized Table: Non-atomic values:  The “Purchased Products” column contains multiple items in a single cell, making it difficult to query or update individual products. Data redundancy and inconsistency:  If a customer purchases more products, the list grows, increasing the risk of inconsistent data entry (e.g., different typos ). Difficulties in searching and reporting:  Queries to find all customers who purchased a specific product become complex and inefficient .

First Normal Form (1NF) Transformation Steps to Achieve 1NF: Identify columns with non-atomic values:  In this case, “Purchased Products” contains multiple values. Split the multi-valued column into separate rows:  Each product purchased by a customer should be represented as a separate row, ensuring that every field contains only a single value . Customer ID Customer Name Product 101 John Doe Laptop 101 John Doe Mouse 102 Jane Smith Tablet 103 Alice Brown Keyboard 103 Alice Brown Monitor 103 Alice Brown Pen

Second Normal Form (2NF) A table is in 2NF, if it is in 1NF and every non-prime attribute (i.e., non-primary key attribute) is fully functionally dependent on the entire primary key. This addresses  partial dependencies , where a non-key attribute depends only on part of a composite key.

Second Normal Form (2NF) Issue : “ Customer Name” depends only on “Customer ID”, not the full primary key (“Order ID”, “Customer ID”). This is a partial dependency. Normalization to 2NF: Separate customer information into its own table. Order ID Customer ID Customer Name Product 201 101 John Doe Laptop 202 101 John Doe Mouse 203 102 Jane Smith Tablet

Second Normal Form (2NF) Benefits: Eliminates redundancy of customer details. Simplifies data maintenance and updates. By moving CustomerName into a separate Customers table, we ensure it’s only dependent on CustomerID and eliminate partial dependency on a composite key. Orders Table Order ID Customer ID Product 201 101 Laptop 202 101 Mouse 203 102 Tablet Customes Table Customer ID Customer Name 101 John Doe 102 Jane Smith

Third Normal Form (3NF ) A table is in 3NF, if it is in 2NF and all the attributes are functionally dependent only on the primary key, there are no  transitive dependencies  (i.e., non-key attributes depending on other non-key attributes ).

Third Normal Form (3NF ) Example Transformation to 3NF Issue: “ Supplier” depends on “Product”, not directly on the primary key. Order ID Customer ID Product Supplier 201 101 Laptop HP 202 101 Mouse Logitech 203 102 Tablet Apple

Third Normal Form (3NF ) Normalization to 3NF: Move product and supplier information to separate tables . Orders Table Order ID Customer ID Product ID 201 101 301 202 101 302 203 102 303 Products Table Product ID Product Name Supplier ID 301 Laptop 401 302 Mouse 402 303 Tablet 403 Suppliers Table Supplier ID Supplier Name 401 HP 402 Logitech 403 Apple

Third Normal Form (3NF ) Benefits: Removes transitive dependencies. Reduces data duplication. Improves data integrity and maintainability. Supplier depends on Product and not directly on the primary key ( OrderID ), we separate it into a new Suppliers table and reference it via the Products table. This removes transitive dependencies and aligns with 3NF rules.

Boyce- Codd Normal Form (BCNF) BCNF addresses certain edge cases where 3NF does not eliminate all redundancy, particularly when there are overlapping candidate keys or complex dependencies. Example Transformation to BCNF Scenario : Suppose we have a university database that tracks which students are enrolled in which courses, and who teaches each course. The initial table structure is as follows: StudentID Course Instructor 1 Math Dr. Smith 2 Math Dr. Smith 3 History Dr. Jones 4 History Dr. Jones

Boyce- Codd Normal Form (BCNF) Explanation of Columns: StudentID :  Unique identifier for each student. Course:  The course in which the student is enrolled. Instructor:  The instructor teaching the course. StudentID Course Instructor 1 Math Dr. Smith 2 Math Dr. Smith 3 History Dr. Jones 4 History Dr. Jones

Boyce- Codd Normal Form (BCNF) Functional Dependencies in the Table: ( StudentID , Course) → Instructor Each unique combination of student and course determines the instructor for that course. Course → Instructor Each course is always taught by the same instructor . StudentID Course Instructor 1 Math Dr. Smith 2 Math Dr. Smith 3 History Dr. Jones 4 History Dr. Jones

Boyce- Codd Normal Form (BCNF) Candidate Keys: The only candidate key in this table is the composite key  ( StudentID , Course) , since both are needed to uniquely identify a row. Why is this Table in 3NF? All non-prime attributes (Instructor) are fully functionally dependent on the candidate key ( StudentID , Course). There are no transitive dependencies (i.e., no non-prime attribute depends on another non-prime attribute through the candidate key). StudentID Course Instructor 1 Math Dr. Smith 2 Math Dr. Smith 3 History Dr. Jones 4 History Dr. Jones

Boyce- Codd Normal Form (BCNF) Why is this Table  Not  in BCNF? The functional dependency  Course → Instructor  exists, but “Course” is not a superkey (it does not uniquely identify a row in the table). BCNF requires that for every non-trivial functional dependency X → Y, X must be a superkey . Here, “Course” is not a superkey , so this violates BCNF. StudentID Course Instructor 1 Math Dr. Smith 2 Math Dr. Smith 3 History Dr. Jones 4 History Dr. Jones

Boyce- Codd Normal Form (BCNF) How to Normalize to BCNF: To resolve the BCNF violation, we need to decompose the table so that every determinant is a candidate key in its respective table. This is done by splitting the original table into two separate tables :

Boyce- Codd Normal Form (BCNF) StudentCourses Table: This table records which students are enrolled in which courses . Primary Key:  ( StudentID , Course) This table no longer contains the Instructor column, so there are no functional dependencies that violate BCNF . StudentCourses   StudentID Course 1 Math 2 Math 3 History 4 History

Boyce- Codd Normal Form (BCNF) CourseInstructors Table: This table records which instructor teaches each course . Primary Key:  Course The dependency Course → Instructor is now valid, as “Course” is the primary key (and as a result a superkey ) in this table. CourseInstructors Course Instructor Math Dr. Smith History Dr. Jones

Boyce- Codd Normal Form (BCNF) Resulting Structure and Benefits: All functional dependencies in both tables have determinants that are candidate keys, so both tables are in BCNF. Data redundancy is reduced: the instructor for each course is stored only once, rather than repeated for every student enrolled in the course. Updates are easier and less error-prone: if an instructor changes for a course, you only need to update one row in the CourseInstructors table .

Normalization BCNF 3NF 2NF 1NF A relation in BCNF, is also in 3NF A relation in 3NF is also in 2NF A relation in 2NF is also in 1NF

Thanks