chapter 4-Functional Dependency and Normilization.pdf

424 views 45 slides Mar 23, 2024
Slide 1
Slide 1 of 45
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45

About This Presentation

This chapter describe about the theory that has been developed with the goal of evaluating relational schemas for design quality , that is, to measure formally why one set of groupings of attributes into relation schemas is better than
another.


Slide Content

Fundamentals of Database
system
Chapter Four:
Functional Dependency
and Normalization
Prepared by:
Misganaw Abeje
University of Gondar
College Of Informatics
Department of computer science
[email protected]

outline
⚫Goals of Functional Dependency and Normalization
⚫Functional Dependency
⚫Normalization
–1st NF
–2nd NF
–3rd NF
–BCNF
BY: MA

Goals
1.To make Each tuple in a relation should represent one entity
or relationship instance
–Only foreign keys should be used to refer to other entities
–Entity and relationship attributes should be kept apart as much as possible
–Design a schema that can be explained easily relation by relation. The
semantics of attributes should be easy to interpret.
2.Prevent cause problems due to Mixing attributes of multiple
entities may
–Information is stored redundantly wasting storage
– Problems with update anomalies:
–Insertion anomalies
–Deletion anomalies: When a project is deleted, it will result in deleting all the
employees who work on that project
BY: MA

3. It can prevent Null Values in Tuples
⚫Relations should be designed such that their tuples will
have as few NULL values as possible
–Attributes that are NULL frequently could be placed in
separate relations (with the primary key)
–Reasons for nulls:
I.attribute not applicable or invalid
II.attribute value unknown (may exist)
III.value known to exist, but unavailable
⚫Generally Functional dependency and normalization
is used to help to develop a good database model
BY: MA

Functional dependency (FD)
⚫Definition: A functional dependency (FD) on a relation
schema R is a constraint X → Y, where X and Y are subsets
of attributes of R.
⚫Definition: an FD is a relationship between an attribute
"Y" and a determinant (1 or more other attributes) "X"
such that for a given value of a determinant the value of
the attribute is uniquely defined.
1.X is a determinant
2.X determines Y
3.Y is functionally dependent on X
4.X → Y
BY: MA

⚫The determination of functional dependencies is an important
part of designing databases in the relational model, and In
Database normalization and de-normalization.
⚫The most important properties of functional dependencies are
Armstrong's axioms, which are used in database normalization:
Subset Property (Axiom of Reflexivity):
If Y is a subset of X, then X → Y
Augmentation (Axiom of Augmentation):
⚫If X → Y, then XZ → YZ
⚫Transitivity (Axiom of Transitivity):
• If X → Y and Y → Z, then X → ZBY: MA

⚫Functional Dependencies: We say an attribute, B, has
a functional dependency on another attribute, A, if for any
two records, which have the same value for A, then the
values for B in these two records must be the same. We
illustrate this as: A → B
⚫Example: Suppose we keep track of employee email addresses,
and we only track one email address for each employee.
Suppose each employee is identified by their unique employee
number. We say there is a functional dependency of email
address on employee number:
employee number → email address
BY: MA

Functional Dependencies
8
EmpNum EmpEmail EmpFname EmpLname
123 [email protected] John Doe
456 [email protected] Peter Smith
555 [email protected] Alan Lee
633 [email protected] Peter Doe
787 [email protected] Alan Lee
If EmpNum is the PK then the FDs:
EmpNum → EmpEmail
EmpNum → EmpFname
EmpNum → EmpLname
must exist.
BY: MA

Functional Dependencies
9
EmpNum → EmpEmail
EmpNum → EmpFname
EmpNum → EmpLname
EmpNum
EmpEmail
EmpFname
EmpLname
EmpNum EmpEmail EmpFname EmpLname
3 different ways
you might see FDs
depicted
BY: MA

Determinant
⚫Functional Dependency EmpNum → EmpEmail
–Attribute on the LHS is known as the determinant
• EmpNum is a determinant of EmpEmail
⚫Transitive dependency Consider attributes A, B, and C,
and where A → B and B → C.
⚫Functional dependencies are transitive, which means that
we also have the functional dependency A → C
⚫We say that C is transitively dependent on A through B.
10BY: MA

Transitive dependency
11
EmpNum EmpEmail DeptNum DeptNname
EmpNum EmpEmail DeptNum DeptNname
DeptName is transitively dependent on EmpNum via DeptNum
EmpNum → DeptName
EmpNum → DeptNum
DeptNum → DeptName
BY: MA

Partial dependency
12
A partial dependency exists when an attribute B is
functionally dependent on an attribute A, and A is a
component of a multipart candidate key.
InvNum LineNum Qty InvDate
Candidate keys: {InvNum, LineNum} InvDate is
partially dependent on {InvNum, LineNum} as
InvNum is a determinant of InvDate and InvNum is
part of a candidate key
BY: MA

Normalization
⚫A large database defined as a single relation may result in
data duplication. This repetition of data may result in:
–Making relations very large.
–It isn't easy to maintain and update data as it would involve
searching many records in relation.
–Wastage and poor utilization of disk space and resources.
–The likelihood of errors and inconsistencies increases.
⚫So to handle these problems, we should analyze and
decompose the relations with redundant data into smaller,
simpler, and well-structured relations that are satisfy
desirable properties. Normalization is a process of
decomposing the relations into relations with fewer attributes.BY: MA

What is Normalization?
⚫Normalization is the process of organizing the data in the database.
⚫Normalization is used to minimize the redundancy from a relation or
set of relations. It is also used to eliminate undesirable
characteristics like Insertion, Update, and Deletion Anomalies.
⚫Normalization divides the larger table into smaller and links them
using relationships.
⚫The normal form is used to reduce redundancy from the database
table.
⚫Why do we need Normalization? The main reason for normalizing
the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity
and other problems as the database grows. Normalization consists
of a series of guidelines that helps to guide you in creating a good
database structure.
BY: MA

⚫Advantages of Normalization
–Normalization helps to minimize data redundancy.
–Greater overall database organization.
–Data consistency within the database.
–Much more flexible database design.
–Enforces the concept of relational integrity.
⚫Disadvantages of Normalization
–You cannot start building the database before knowing what the user
needs.
–The performance degrades when normalizing the relations to higher
normal forms, i.e., 4NF, 5NF.
–It is very time-consuming and difficult to normalize relations of a higher
degree.
–Careless decomposition may lead to a bad database design, leading to
BY: MA

Types of Normalization
⚫We discuss different types of normal forms: first, second,
third, and Boyce-Codd normal forms
⚫1NF, 2NF, 3NF, and BCNF, 4NF,5NF
⚫Normalization is a process that “improves” a database design
by generating relations that are of higher normal forms.
⚫The objective of normalization: “to create relations where
every dependency is on the key, the whole key, and nothing
but the key”.
⚫Definition. The normal form of a relation refers to the
highest normal form condition that it meets, and hence
indicates the degree to which it has been normalized.
16BY: MA

Definitions of Keys and Attributes
Participating in Keys
⚫A superkey of a relation schema R = {A1, A2, ... , An} is a set of
attributes S ⊆ R with the property that no two tuples t1 and t2 in any
legal relation state r of R will have t 1[S] = t2[S].
⚫If a relation schema has more than one key, each is called a
candidate key. One of the candidate keys is arbitrarily designated
to be the primary key, and the others are called secondary keys.
⚫An attribute of relation schema R is called a prime attribute of R if
it is a member of some candidate key of R. An attribute is called
nonprime if it is not a prime attribute—that is, if it is not a member
of any candidate key.
BY: MA

Normalization
18
There is a sequence to normal forms:
1NF is considered the weakest,
2NF is stronger than 1NF,
3NF is stronger than 2NF, and
BCNF is considered the strongest
Also,
any relation that is in BCNF, is in 3NF;
any relation in 3NF is in 2NF; and
any relation in 2NF is in 1NF.
BY: MA

Normalization
19
BCNF
3NF
2NF
1NF a relation in BCNF, is also
in 3NF
a relation in 3NF is also in
2NF
a relation in 2NF is also in
1NF
BY: MA

Normalization
20
•We consider a relation in BCNF to be fully normalized.
•The benefit of higher normal forms is that update semantics
for the affected data are simplified.
•This means that applications required to maintain the
database are simpler.
•A design that has a lower normal form than another design
has more redundancy. Uncontrolled redundancy can lead to
data integrity problems.
BY: MA

First Normal Form
⚫We say a relation is in 1NF if all values stored in the
relation are single-valued and atomic.
⚫It states that an attribute of a table cannot hold multiple
values. It must hold only single-valued attribute.
⚫First normal form disallows the multi-valued attribute,
composite attribute, and their combinations. 1NF places
restrictions on the structure of relations Values must be
simple.
21BY: MA

First Normal Form
22
The following in not in 1NF
EmpNum EmpPhone EmpDegrees
123 233-9876
333 233-1231BA, BSc, PhD
679 233-1231BSc, MSc
EmpDegrees is a multi-valued field:
employee 679 has two degrees: BSc and MSc
employee 333 has three degrees: BA, BSc, PhD
BY: MA

First Normal Form
23
To obtain 1NF relations we must, without loss of
information, replace the above with two relations -
see next slide
EmpNum EmpPhone EmpDegrees
123 233-9876
333 233-1231BA, BSc, PhD
679 233-1231BSc, MSc
BY: MA

First Normal Form
24
EmpNumEmpDegree
333 BA
333 BSc
333 PhD
679 BSc
MSc679
EmpNum EmpPhone
123 233-9876
333 233-1231
679 233-1231
An outer join between Employee and EmployeeDegree will
produce the information we saw before
Employee
EmployeeDegree
BY: MA

Exercise: which NF is satisfy? Normalize into next
higher NF?
BY: MA
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385,
9064738238
UP
20 Harry 8574783832 Bihar
12 Sam 7390372389,
8589830302
Punjab

Second Normal Form
26
Second Normal Form
•Definition. A relation schema R is in 2NF if every nonprime
attribute in R is fully functionally dependent on the primary
key of R. A relation is in 2NF if it is in 1NF, and every non-
key attribute is fully dependent on each candidate key. (That
is, we don’t have any partial functional dependency.)
• 2NF (and 3NF) both involve the concepts of key and
non-key attributes.
• A key attribute is any attribute that is part of a key;
any attribute that is not a key attribute, is a non-key
attribute.
•A relation in 2NF will not have any partial dependencies
BY: MA

Second Normal Form
⚫Consider this InvLine table (in 1NF):
⚫InvNum, LineNum →ProdNum, Qty
⚫InvNum→ InvDate
⚫In this relation there are two candidate keys: InvNum and LineNum.
⚫ProdNum, Qty and Invdate are the only non-key attribute, and from
those non key attributes ProdNum, Qty it is fully dependent on each
candidate key (InvNum, LineNum) whereas Invdate is partial
functional dependency on key (InvNum).
⚫Therefore the relation is not 2NF since there is a partial dependency
of InvDate on InvNum, it is 1NF.
InvNum LineNumProdNumQty InvDate
BY: MA

Second Normal Form
28
LineNumProdNum QtyInvNum InvDate
InvLine
The above relation has redundancies: the invoice date is
repeated on each invoice line. We can improve the database
by decomposing the relation into two relations:
LineNumProdNum QtyInvNum
InvDateInvNum
Question: What is the highest normal form for these
relations? 2NF? 3NF? BCNF?
BY: MA

Is the following relation in 2NF?
•It is 2NFbecause of all non key attributes are full depend on
the given candidate key, but not in 3NF, nor in BCNF

30
2NF, but not in 3NF, nor in BCNF:
since dnumber is not a candidate key and we have:
dnumber → dname.
EmployeeDept
enamessnbdateaddressdnumberdname
BY: MA

Third Normal Form
31
Third Normal Form
•A relation is in 3NF if the relation is in 2NF and all
determinants of non-key attributes are candidate keys
That is, for any functional dependency: X → Y, where Y is a
non-key attribute (or a set of non-key attributes), X is a
candidate key.
•This definition of 3NF differs from BCNF only in the
specification of non-key attributes - 3NF is weaker than BCNF.
(BCNF requires all determinants to be candidate keys.)
•A relation in 3NF will not have any transitive dependencies
of non-key attribute on a candidate key through another non-
key attribute.
BY: MA

Third Normal Form
32
EmpNum EmpNameDeptNumDeptName
EmpName, DeptNum, and DeptName are non-key attributes.
DeptNum determines DeptName, a non-key attribute, and
DeptNum is not a candidate key.
Consider this Employee relation
Is the relation in BCNF? … no
Is the relation in 3NF? … no
Is the relation in 2NF? … yes
Candidate keys are?

BY: MA

Third Normal Form
33
EmpNum EmpNameDeptNumDeptName
We correct the situation by decomposing the original relation
into two 3NF relations. Note the decomposition is lossless.
EmpNumEmpNameDeptNum DeptNameDeptNum
Verify these two relations are in 3NF.
BY: MA

Boyce-Codd Normal Form(BCNF)
34
•BCNF is defined very simply: For a table to satisfy the Boyce-
Codd Normal Form, it should satisfy the following two
conditions:
–It should be in theThird Normal Form.
–And, for any dependency A → B, A should be asuper key or candidate
key.
•The second point sounds a bit tricky, right? In simple words, it
means, that for a dependency A → B, if B is a prime attribute, A
cannot be a non-prime attribute.
•An attribute that is not part of any candidate key is known
as non-prime attribute. An attribute that is a part of one of the
candidate keys is known as prime attribute.
BY: MA

35
student_id subject professor
101 Java P.Java
101 C++ P.Cpp
102 Java P.Java2
103 C# P.Chash
104 Java P.Java
Below we have a college enrolment table with columns
student_id, subject and professor.
As you can see, we have also added some sample data to the table. In the table above:
•One student can enroll for multiple subjects. For example, student
withstudent_id101, has opted for subjects - Java & C++
•For each subject, a professor is assigned to the student.
•And, there can be multiple professors teaching one subject like we have for
Java.
BY: MA

36
•Well, in the table abovestudent_id, subjecttogether form the primary
key, because usingstudent_idandsubject, we can find all the columns
of the table.
•One more important point to note here is, one professor teaches only
one subject, but one subject may have two different professors.
•Hence, there is a dependency betweensubjectandprofessorhere,
wheresubjectdepends on the professor name.
•This table satisfies the1st Normal formbecause all the values are
atomic, column names are unique and all the values stored in a
particular column are of same domain.
•This table also satisfies the2nd Normal Formas their is noPartial
Dependency.
•And, there is noTransitive Dependency, hence the table also satisfies
the3rd Normal Form.
,….BCNF
BY: MA

37
Why this table is not in BCNF?
•In the table above, student_id, subject form primary key, which
means subject column is a prime attribute.
•But, there is one more dependency, professor → subject.
•And while subject is a prime attribute, professor is a non-prime
attribute, which is not allowed by BCNF.
•How to satisfy BCNF?
•To make this relation(table) satisfy BCNF, we will decompose
this table into two tables, student table and professor table.
•Below we have the structure for both the tables.
,….BCNF
BY: MA

38
student_idp_id
101 1
101 2
and so on...
p_id professorsubject
1 P.Java Java
2 P.Cpp C++
Student Table
Professor Table
,….BCNF
BY: MA

39
course_noinstr_no
student_noinstr_no
student_nocourse_noinstr_no
{student_no, instr_no} → course_no
{student_no, course_no} → instr_no
instr_no → course_no
,….BCNF
BY: MA

Main Difference Between BCNF and 3NF
⚫Most relations in 3NF are also in BCNF, the only
time this may not be true is when there is more
than one candidate key for a relation and at least
one of composite in 3NF.
BY: MA

Summary
⚫We say a relation is in 1NF if all values stored in the
relation are single-valued and atomic.
⚫A relation is in 2NF if it is in 1NF, and every non-key
attribute is fully dependent on each candidate key.
(That is, we don’t have any partial functional
dependency.)
⚫A relation is in 3NF if the relation is in 2NF and all
determinants of non-key attributes are candidate keys
BY: MA

Exercise
1.Consider the relation R, which has attributes that hold schedules of courses
and sections at a university; R = {Course_no, Sec_no, Offering_dept,
Credit_hours, Course_level, Instructor_ssn, Semester, Year, Days_hours,
Room_no, No_of_students}. Suppose that the following functional
dependencies hold on R:
{Course_no} → {Offering_dept, Credit_hours, Course_level}
{Course_no, Sec_no, Semester, Year} → {Days_hours, Room_no,
No_of_students, Instructor_ssn}
{Room_no, Days_hours, Semester, Year} → {Instructor_ssn, Course_no,
Sec_no}
⚫Try to determine which sets of attributes form keys of R. How would you
normalize this relation?
42BY: MA

2.Suppose that we have the following requirements for a university database that
is used to keep track of students’ transcripts:
a.The university keeps track of each student’s name (Sname), student number
(Snum), Social Security number (Ssn), current address (Sc_addr) and
phone (Sc_phone), permanent address (Sp_addr) and phone
(Sp_phone),birth date (Bdate), sex (Sex), class (Class) (‘freshman’,
‘sophomore’, ... ,‘graduate’), major department (Major_code), minor
department(Minor_code) (if any), and degree program (Prog) (‘b.a.’, ‘b.s.’, ...
, ‘ph.d.’).Both Ssn and student number have unique values for each student.
b.Each department is described by a name (Dname), department code
(Dcode), office number (Doffice), office phone (Dphone), and college
(Dcollege). Both name and code have unique values for each department.
c.Each course has a course name (Cname), description (Cdesc), course
number (Cnum), number of semester hours (Credit), level (Level), and
offering department (Cdept). The course number is unique for each course.
43BY: MA

d. Each section has an instructor (Iname), semester (Semester), year
(Year), course (Sec_course), and section number (Sec_num). The
section numberdistinguishes different sections of the same course
that are taught during the same semester/year; its values are 1, 2, 3,
..., up to the total number of sections taught during each semester.
e.A grade record refers to a student (Ssn), a particular section, and a
grade (Grade).
Design a relational database schema for this database application. First
show all the functional dependencies that should hold among the
attributes. Then design relation schemas for the database that are
each in 3NF or BCNF.
Specify the key attributes of each relation. Note any unspecified
requirements, and make appropriate assumptions to render the
specification complete BY: MA

⚫Thank you
BY: MA