Introduction E-R modeling is used as a requirement analysis tool. DB design consists of inconsistency, ambiguity and redundancy. The refinement process of DB design – Normalization.
Issues – DB Design Making relations very large. Difficult to maintain and update data as it would involve searching many records in relation. Poor utilization of disk space and resources. Errors and inconsistencies increases.
Normalization Normalization is the process of organizing the data in the database. Normalization is used to minimize the redundancy from a relation or set of relations. Eliminate Insertion, Update, and Deletion Anomalies. Normalization divides the larger table into smaller and links them using relationships . The normal form is used to reduce redundancy from the database table.
Determinant Attribute X can be defined as determinant if it uniquely determines the attribute Y in a given relationship. Attribute need not be a key attribute.
Functional Dependency REPORT( Student#course# ,Cname, Room #,Marks, Grade) Student#course# : composite attributes exactly determines one value of marks. Marks is functionally dependent on student#course# Example: X->Y
Full Functional Dependency Marks is fully functional dependent on student# and course# Cannot determine marks secured by a student in a course if only one is known. Cname is not fully functionally dependent on student#course#.
Partial dependency X and Y are attribute sets. Attribute sets Y are partially dependent on attribute set X. Course# alone is enough to determine courseName, Iname and Room#.
Transitive Dependency Room# depends on Iname and in turn Iname depends on course#.
Trivial functional dependency means that the right-handside is a subset ( not necessarily a proper subset) of the left-hand side. Trival functional dependency For example: staffNo, sName 🡪 sName staffNo, sName 🡪 staffNo They do not provide any additional information about possible integrity constraints on the values held by these attributes. We are normally more interested in nontrivial dependencies because they represent integrity constraints for the relation.
FD – Example Database to track reviews of papers submitted to an academic conference. Prospective authors submit papers for review and possible acceptance in the published conference proceedings. Details of the entities Author information includes a unique author number, a name, a mailing address, and a unique (optional) email address. Paper information includes the primary author, the paper number, the title, the abstract, and review status (pending, accepted,rejected) Reviewer information includes the reviewer number, the name, the mailing address, and a unique (optional) email address A completed review includes the reviewer number, the date, the paper number, comments to the authors, comments to the program chairperson, and ratings (overall, originality, correctness, style, clarity)
Levels of normalization based on the amount of redundancy in the database. Various levels of normalization are: First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) Domain Key Normal Form (DKNF) Levels of Normalization Redundancy Number of Tables Most databases should be 3NF or BCNF in order to avoid the database anomalies. Complexity
A table is considered to be in 1NF if all the fields contain only scalar values (as opposed to list of values). Example (Not 1NF) First Normal Form (1NF) Author and AuPhone columns are not scalar 0-321-32132-1 Balloon Sleepy, Snoopy, Grumpy 321-321-1111, 232-234-1234, 665-235-6532 Small House 714-000-0000 $34.00 0-55-123456-9 Main Street Jones, Smith 123-333-3333, 654-223-3455 Small House 714-000-0000 $22.95 0-123-45678-0 Ulysses Joyce 666-666-6666 Alpha Press 999-999-9999 $34.00 1-22-233700-0 Visual Basic Roman 444-444-4444 Big House 123-456-7890 $25.00 ISBN Title AuName AuPhone PubName PubPhone Price
Place all items that appear in the repeating group in a new table Designate a primary key for each new table produced. Duplicate in the new table the primary key of the table from which the repeating group was extracted or vice versa. Example (1NF) 1NF - Decomposition 0-321-32132-1 Balloon Small House 714-000-0000 $34.00 0-55-123456-9 Main Street Small House 714-000-0000 $22.95 0-123-45678-0 Ulysses Alpha Press 999-999-9999 $34.00 1-22-233700-0 Visual Basic Big House 123-456-7890 $25.00 ISBN Title PubName PubPhone Price ISBN AuName AuPhone 0-123-45678-0 Joyce 666-666-6666 1-22-233700-0 Roman 444-444-4444 0-55-123456-9 Smith 654-223-3455 0-55-123456-9 Jones 123-333-3333 0-321-32132-1 Grumpy 665-235-6532 0-321-32132-1 Snoopy 232-234-1234 0-321-32132-1 Sleepy 321-321-1111
For a table to be in 2NF, there are two requirements The database is in first normal form No partial dependency between key and non-key attributes. Note: Remember that we are dealing with non-key attributes Second Normal Form (2NF)
Sample table
Decomposing tables
Example 1 (Not 2NF) Scheme 🡪 {Title, PubId, AuId, Price, AuAddress} Key 🡪 {Title, PubId, AuId} {Title, PubId, AuID} 🡪 {Price} {AuID} 🡪 {AuAddress} AuAddress does not belong to a key AuAddress functionally depends on AuId which is a subset of a key
Example 2 (Not 2NF) Scheme 🡪 {City, Street, HouseNumber, HouseColor, CityPopulation} key 🡪 {City, Street, HouseNumber} {City, Street, HouseNumber} 🡪 {HouseColor} {City} 🡪 {CityPopulation} CityPopulation does not belong to any key. CityPopulation is functionally dependent on the City which is a proper subset of the key Second Normal Form (2NF)
If a data item is fully functionally dependent on only a part of the primary key, move that data item and that part of the primary key to a new table. If other data items are functionally dependent on the same part of the key, place them in the new table also Make the partial primary key copied from the original table the primary key for the new table. Place all items that appear in the repeating group in a new table 2NF - Decomposition
Example 1 (Convert to 2NF) Old Scheme 🡪 { Title, PubId, AuId , Price, AuAddress} New Scheme 🡪 { Title, PubId, AuId , Price} New Scheme 🡪 { AuId , AuAddress} Example 2(Convert to 2NF) Old Scheme 🡪 { City , Street , HouseNumber , HouseColor, CityPopulation} New Scheme 🡪 { City , Street , HouseNumber , HouseColor} New Scheme 🡪 { City , CityPopulation} 2NF - Decomposition
This form dictates that all non-key attributes of a table must be functionally dependent on a candidate key i.e. there can be no interdependencies among non-key attributes. For a table to be in 3NF, there are two requirements The table should be second normal form No attribute is transitively dependent on the primary key Third Normal Form (3NF)
Move all items involved in transitive dependencies to a new entity. Identify a primary key for the new entity. Place the primary key for the new entity as a foreign key on the original entity. 3NF - Decomposition
Example 1 (Not in 3NF) Scheme 🡪 { Studio , StudioCity, CityTemp} Primary Key 🡪 {Studio} {Studio} 🡪 {StudioCity} {StudioCity} 🡪 {CityTemp} {Studio} 🡪 {CityTemp} Both StudioCity and CityTemp depends on the entire key hence 2NF CityTemp transitively depends on Studio hence violates 3NF Example 2 (Not in 3NF) Scheme 🡪 {BuildingID, Contractor, Fee} Primary Key 🡪 {BuildingID} {BuildingID} 🡪 {Contractor} {Contractor} 🡪 {Fee} {BuildingID} 🡪 {Fee} Fee transitively depends on the BuildingID Both Contractor and Fee depend on the entire key hence 2NF Third Normal Form (3NF) BuildingID Contractor Fee 100 Randolph 1200 150 Ingersoll 1100 200 Randolph 1200 250 Pitkin 1100 300 Randolph 1200
Example 1(Convert to 3NF) Old Scheme 🡪 { Studio , StudioCity, CityTemp} New Scheme 🡪 { Studio , StudioCity} New Scheme 🡪 { StudioCity , CityTemp} Example 2 (Convert to 3NF) Old Scheme 🡪 {BuildingID, Contractor, Fee} New Scheme 🡪 {BuildingID, Contractor} New Scheme 🡪 {Contractor, Fee} 3NF - Decomposition
3 Normal Form
BCNF Boyce Codd normal form (BCNF) It is an advance version of 3NF that’s why it is also referred as 3.5NF. A table complies with BCNF if it is in 3NF and for every non trivial functional dependency of the form X->Y, X should be the super key of the table.
Example
Decomposing into two tables
4NF
Conditions- Multivalued dependency For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may have multi-valued dependency. Also, a table should have at-least 3 columns for it to have a multi-valued dependency. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C should be independent of each other.
Example
Decomposition
Decomposing
5NF R is already in 4NF and not having any join dependency and joining should be lossless. R decomposed into R1,R2 and R3. (R1⋈ R2) ⋈R3 or R1⋈ (R2 ⋈ R3) should be able to reconstruct the original table R.
Given Relation
Natural join of three relations
Properties of Functional Dependency Let X , Y , and Z are sets of attributes in a relation R . There are several properties of functional dependencies which always hold in R also known as Armstrong Axioms . Reflexivity : If Y is a subset of X , then X → Y . e.g.; Let X represents {E-ID, E-NAME} and Y represents {E-ID}. {E-ID, E-NAME}->E-ID is true for the relation.
Properties of Functional Dependency Augmentation : If X → Y , then XZ → YZ . e.g.; Let X represents {E-ID}, Y represents {E-NAME} and Z represents {E-CITY}. As {E-ID}->E-NAME is true for the relation, so {E-ID,E-CITY}->{E-NAME,E-CITY} will also be true.
Properties of Functional Dependency Transitivity : If X → Y and Y → Z , then X → Z . e.g.; Let X represents {E-ID}, Y represents {E-CITY} and Z represents {E-STATE}. As {E-ID} ->{E-CITY} and {E-CITY}->{E-STATE} is true for the relation, so { E-ID }->{E-STATE} will also be true. Attribute Closure : The set of attributes that are functionally dependent on the attribute A is called Attribute Closure of A and it can be represented as A + .
Closure of a Set of Functional Dependencies Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F . For example: If A → B and B → C , then we can infer that A → C The set of all functional dependencies logically implied by F is the closure of F . We denote the closure of F by F + . We can find all of F + by applying Armstrong’s Axioms: if β ⊆ α, then α → β (reflexivity) if α → β, then γ α → γ β (augmentation) if α → β, and β → γ, then α → γ (transitivity) These rules are sound (generate only functional dependencies that actually hold) and complete (generate all functional dependencies that hold).
Example R = (A, B, C, G, H, I) F = { A → B A → C CG → H CG → I B → H } some members of F + A → H by transitivity from A → B and B → H AG → I by augmenting A → C with G, to get AG → CG and then transitivity with CG → I CG → HI by applying union CG → H and CG → I
Procedure for Computing F + To compute the closure of a set of functional dependencies F: F + = F repeat for each functional dependency f in F + apply reflexivity and augmentation rules on f add the resulting functional dependencies to F + for each pair of functional dependencies f 1 and f 2 in F + if f 1 and f 2 can be combined using transitivity then add the resulting functional dependency to F + until F + does not change any further NOTE : We shall see an alternative procedure for this task later
Closure of Functional Dependencies (Cont.) We can further simplify manual computation of F + by using the following additional rules. If α → β holds a nd α → γ holds, then α → β γ holds (union) If α → β γ holds, then α → β holds and α → γ holds (decomposition) If X → Y and WY → Z hold, then WX → Z also holds. This is a special case of transitivity ( Pseudotransitivity ) The above rules can be inferred from Armstrong’s axioms.
Closure of Attribute Sets Given a set of attributes α, define the closure of α under F (denoted by α + ) as the set of attributes that are functionally determined by α under F Algorithm to compute α + , the closure of α under F result := α ; while (changes to result ) do for each β → γ in F do begin if β ⊆ result then result := result ∪ γ end
Example of Attribute Set Closure R = (A, B, C, G, H, I) F = { A → B A → C CG → H CG → I B → H } ( AG) + 1. result = AG 2. result = ABCG (A → C and A → B) 3. result = ABCGH (CG → H and CG ⊆ AGBC) 4. result = ABCGHI (CG → I and CG ⊆ AGBCH) Is AG a super key? Does AG → R? == Is (AG) + ⊇ R Is any subset of AG a superkey? Does A → R ? == Is (A) + ⊇ R Does G → R ? == Is (G) + ⊇ R
CANONICAL COVER A canonical cover of a set of functional dependencies F is a simplified set of functional dependencies that has the same closure as the original set F.
Extraneous attributes Extraneous attributes: An attribute of a functional dependency is said to be extraneous if we can remove it without changing the closure of the set of functional dependencies
Conditions Canonical cover: A canonical cover of a set of functional dependencies F such that ALL the following properties are satisfied: F logically implies all dependencies in Fclosure . Fclosure logically implies all dependencies in F. No functional dependency in contains an extraneous attribute. Each left side of a functional dependency in is unique. That is, there are no two dependencies and in such that .
Algorithm repeat 1. Use the union rule to replace any dependencies in α 1 → β 1 and α 1 → β 2 with α 1 → β 1 β 2 2. Find a functional dependency with an extraneous attribute either in or in . 3. If an extraneous attribute is found, delete it from . until F does not change
Example Consider the following set F of functional dependencies: F= { A ->BC B->C A->B AB->C }
Step 1 There are two functional dependencies with the same set of attributes on the left: A->BC A->B These two can be combined to get A->BC. Now, the revised set F becomes: F= { A->BC B->C AB->C }
Step 2 There is an extraneous attribute in AB ->C because even after removing AB ->C from the set F, we get the same closures. This is because B->C is already a part of F. Now, the revised set F becomes: F= { A ->BC B ->C }
Step 3 C is an extraneous attribute in A-> BC, also A ->B is logically implied by A->B and B->C (by transitivity). F= { A ->B B ->C }
Step 4 After this step, F does not change anymore. So, Hence the required canonical cover is, = { A ->B B ->C }