05- Relational Database Design_Week 05.pptx

KanishkaKhanna11 12 views 37 slides Mar 07, 2025
Slide 1
Slide 1 of 37
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37

About This Presentation

Database, its uses, relational database


Slide Content

Unit # 3 - Relational DBMS Design Guidelines, - Functional Dependencies - Normalisation for RDBMS, - Normal Forms ( 1NF,2NF,3NF,BCNF,4NF,5NF) (Week-5)

RDBMS Design Guidelines β€œInformally , each tuple in a relation should represent one entity or relationship instance. In the design of RDBMS there are several important practices that should be kept in mind to create databases that are a stable ,efficient , scalable, and maintainable. There are four informal guidelines that may be used as measures to determine the quality of relation schema design : Making sure that the semantics of the attributes is clear in the schema Reducing the redundant information in tuples Reducing the NULL values in tuples Disallowing the possibility of generating spurious tuples GUIDELINE-1 This essentially means , Design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation . If the schema is complex it shall suffer from three anomalies. This guideline can be stated more formally as below : Chapter-15

RDBMS Design Guidelines (contd) Insertion Anomaly : Insertion anomaly occurs in relational databases when the structure of the table or the design of the schema makes it difficult to insert new data without introducing redundancy, inconsistency, or irrelevant data. This typically happens when the table is not properly normalized, causing dependencies between unrelated data . Consider In the COMPANY database, we have a single table called EMPLOYEE_PROJECT that stores both employee information and project information together. This table combines data about employees and the projects they work on which were above correctly separated in two different tables. 1. Insertion anomaly 2. Deletion anomaly 3. Update anomaly Now, let’s assume we want to add a new project, called Project Gamma, located at the Mechanical Engineering Department but there are yet no employees assigned to this project. In the current structure, we cannot we cannot do this task since there is no employee information. This dependency is insertion anomaly.

RDBMS Design Guidelines (contd) 2. Deletion Anomaly : A deletion anomaly shall occurs in a relational database when the deletion of data inadvertently results in the loss of other valuable information i.e deleting a record related to one entity can unintentionally delete information related to another entity . In the EMPLOYEE_PROJECT table now let’s assume Rajesh is no longer working on Project Alpha, and we decide to delete his record for Project Alpha from the EMPLOYEE_PROJECT table . Rajesh's information related to Project Alpha was deleted, but along with that, his date of birth ( Bdate ), address, and department number ( Dnumber ) related to that project were also removed . In fact , If Rajesh was only working on Project Alpha and not on any other projects, deleting this record would also remove all information about Rajesh, including his address, birthdate, and employment details, even though that information might be useful for other purposes (e.g., HR records ). This is Deletion anomaly. 3. Update Anomaly : An update anomaly occurs when updating a piece of information in one place requires updating the same data in multiple places. For ex if Rajesh moves from his old 12 MG Road address to say a new address 45,Ashok Vihar this update needs to be done in all places . If not all instances are updated correctly, the database can become inconsistent.

RDBMS Design Guidelines (contd) β€œ Design a schema that does not suffer from the insertion, deletion and update anomalies. If there are any anomalies present, then note them so that applications can be made to take them into account. β€œ GUIDELINE-2 This guideline stresses the importance of eliminating data anomaliesβ€”insertion, deletion, and update anomaliesβ€”from the database schema to ensure data integrity and efficiency. In cases where these anomalies cannot be avoided, the design should clearly document the potential issues, and database applications or update programs should handle these anomalies carefully to avoid inconsistency . More formally , β€œRelations should be designed such that their tuples will have as few NULL values as possible . Attributes that are NULL frequently could be placed in separate relations (with the primary key ).β€œ GUIDELINE-3 This guideline can be stated more formally as below : This guideline emphasizes minimizing the use of null values within relation schemas. Null values can eat up space and can introduce ambiguity because they may signify different situations such as ,the attribute does not apply to the entity. The attribute's value is unknown . The attribute's value is known but hasn't been recorded yet.

RDBMS Design Guidelines (contd) β€œDesign relation schemas so that relations do not contain matching attribute other than foreign key - primary key combination so that no spurious tuples (additional tuples) are generated in join operation.β€œ GUIDELINE-4 This guideline addresses the prevention of spurious tuples during joins. A spurious tuple is an unintended and incorrect result created when joining tables on attributes that do not ensure a meaningful or valid join. When relation schemas are not designed with primary key-foreign key constraints for joining, joining these tables may produce tuples that do not represent actual or meaningful data. More formally , For example, if only 15 percent of employees have individual offices , there is little justification for including an attribute Office_number in the EMPLOYEE relation ; rather, a relation EMP_OFFICES(Essn, Office_number ) can be created to include tuples for only the employees with individual offices.

RDBMS Design Guidelines (contd) If we join these tables on Plocation (a non-key attribute in both tables), the result might include invalid or duplicate records (spurious tuples), suggesting erroneous relationships between employees and projects. By following Guideline 4, the design would instead ensure joins are based on SSN and Pnumberβ€”attributes that act as a primary key and foreign keyβ€”maintaining the integrity of the information represented​ . Form more details , Please refer the pdf . Explanation for Figure 15.5/page# 511

Functional Dependeny Functional Dependency concept is a formal tool for analysis of relational schemas that enables us to detect and describe RDBMS Design G uidelines in precise terms i.e it is used to specify formal measures of the "goodness" of relational designs Definition : A functional dependency is defined as a constraint between two sets of attributes say X and Y of relation R such that for any two tuples t 1 and t 2 in R that have t 1 [X]=t 2 [X], we must also have t 1 [Y]=t 2 [Y]. What this means basically is that value of Y component of tuple in r depend on or determined by value of X component . We can also say that there is functional dependency from X to Y or that Y is functionally dependent on X. It is shown as X οƒ  Y In our example of relation EMPLOYEE we can see Dept_ID β†’ Dept_Name because it satisfies the condition t 1 [X]=t 2 [X], we have t 1 [Y]=t 2 [Y].

Functional Dependeny (contd) Another example: The notation t 1 [X]=t 2 [X ] β‡’ 𝑑1 [π‘Œ]=𝑑2[π‘Œ ] describes the core concept of functional dependency. It ensures that when attributes are dependent on others, any two rows with the same values for the determinant attribute(s) must have the same values for the dependent attribute(s). This property is crucial in database design for maintaining data integrity and consistency . Some other examples from COMPANY database: ssn -> eName ( Social security number determines employee name) Pnumber -> { Pname, Plocation } ( Project number determines project name and location) { ssn , Pnumber} -> hour ( Employee ssn and project number determines the hours per week that the employee works on the project )

Functional Dependeny (contd) Functional dependency can be either Full or Partial. Full Functional dependency A functional dependency X → Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more. Eg : In Figure below , {Ssn, Pnumber} → Hours is a full dependency (neither Ssn → Hours nor Pnumber→Hours holds). Partial Functional dependency A functional dependency X → Y is a partial dependency if some attribute A Ρ X can be removed from X and the dependency still holds. Eg : The dependency {Ssn, Pnumber}→Ename is partial because Ssn→Ename holds.

Functional Dependency (contd) Armstrong's Inference Rules : Armstrong's Inference Rules are a set of axioms or rules to derive all possible functional dependencies from a given set of functional dependencies. These rules help determine the closure of a set of functional dependencies, which is essential in tasks like normalization and ensuring the consistency of database design . The Armstrong’s three primary Inference rules (IRs) are : Reflexive : If Y subset-of X, then X -> Y Augmentation : If X -> Y, then XZ -> YZ (Notation : XZ stands for X U Z) Transitive : If X -> Y and Y -> Z, then X -> Z 1. IR1 - Reflexive Any attribute set (or combination) can determine its own subset. This is trivial and holds by definition i.e If Y is a subset of X , then X β†’ Y . Eg . If X = { Emp_ID , Emp_Name } and Y = { Emp_ID }, then { Emp_ID , Emp_Name } β†’ { Emp_ID }. 2. IR2 - Augmentation If an attribute set X can determine Y, then adding more attributes (denoted by Z) to both X and Y does not change the dependency.

Functional Dependency (contd) Essentially, you can "augment" both sides of a dependency with the same set of attributes without changing the meaning of the dependency . Eg : If Emp_ID β†’ Emp_Name , i.e Emp_ID uniquely determines Emp_Name (each employee has a unique identifier that determines their name ). Then , According to the augmentation rule, if Emp_ID β†’ Emp_Name , we can add any additional attribute set (say, Dept_ID ) to both sides of this dependency, and the dependency will still hold. So, using augmentation : { Emp_ID , Dept_ID } β†’ { Emp_Name , Dept_ID }. Augmentation is often used in conjunction with other rules (like transitivity) to derive additional dependencies that help with database normalization. 3. IR3 - Transitivity If X determines Y and Y determines Z, then X indirectly determines Z. This is similar to the transitive property in mathematics . Eg : If Emp_ID β†’ Dept_ID and Dept_ID β†’ Dept_Name , then Emp_ID β†’ Dept_Name . IR1, IR2, IR3 form a sound and complete set of inference rules. All other rules that hold can be deduced from these rules.

Functional Dependency (contd) Some additional inference rules that are derived using IR1 , IR2 & IR3 are Decomposition: If X -> YZ, then X -> Y and X -> Z If X determines both Y and Z together, then X can determine each one separately . Example: If Emp_ID β†’ { Emp_Name , Dept_ID }, then Emp_ID β†’ Emp_Name and Emp_ID β†’ Dept_ID . Union: If X -> Y and X -> Z, then X -> YZ Conversely , If X determines Y and X determines Z , then X determines the union of Y and Z . Example: If Emp_ID β†’ Emp_Name and Emp_ID β†’ Dept_ID , then Emp_ID β†’ { Emp_Name , Dept_ID }. Psuedotransitivity : If X -> Y and WY -> Z, then WX -> Z If X determines Y and a combination of W and Y determines Z , then the combination of W and X can determine Z . Example: If Emp_ID β†’ Dept_ID and {Role, Dept_ID } β†’ Salary, then {Role, Emp_ID } β†’ Salary.

Normalisation for RDBMS Normalization is the process of organizing the data in the database . More specifically , It is used to minimize the redundancy from a relation or set of relations . It divides the larger table into smaller and links them using relationships . The main goal for normalizing the relations is removing INSERTION, DELETION and UPDATE anomalies. Failure to eliminate anomalies leads to data redundancy and can cause data integrity and other problems as the database grows. Normalization consists of a series of guidelines that helps to guide in creating a good database structure . Normalization works through a series of stages called Normal forms . The normal forms apply to individual relations. The relation is said to be in particular normal form if it satisfies constraints. Key Goals of Normalization Eliminate Redundancy: Avoid storing the same data in multiple places, which reduces storage costs and keeps data consistent . Prevent Anomalies: Ensure that insertions, deletions, and updates can occur without causing inconsistency . Improve Data Integrity: Enforce consistency through dependencies and relationships between tables . Optimize Query Performance: A well-structured database is easier to query and maintain over time.

A quick Wrap up on Keys … Key A Key is an attribute (or a combination of attributes) used to uniquely identify a row in a table . 2. Super Key A Superkey is any combination of attributes that can uniquely identify a row in a table Superkeys include not only the primary key but also any additional attributes that can still uniquely identify each row . Example : In a table with attributes Emp_ID , Emp_Name , and Dept_ID , { Emp_ID } is a superkey , and so are { Emp_ID , Emp_Name } and { Emp_ID , Dept_ID }. 3. Candidate Key A Candidate Key is a minimal superkey β€”meaning it’s a superkey with no unnecessary attributes . Candidate keys are potential choices for the primary key, as each candidate key can uniquely identify rows in a table . A table can have multiple candidate keys, and one of them is chosen the primary key Example : In a table with Emp_ID and SSN, both could be candidate keys if each is unique for each row. Every Primary Key is a Candidate Key, but not all candidate keys are chosen as the primary key . Every Candidate Key is a Superkey , but not all superkeys are candidate keys (since superkeys can have unnecessary attributes).

Normal Forms The normalization procedure provides database designers with series of tests to certify whether a relation satisfies a certain normal form . It is a process which proceeds in a top-down fashion by evaluating each relation against the criteria for normal forms and decomposing relations as necessary. Simply put, it can be considered as a β€œ filtering ” or β€œ purification ” process to make the design have successively better quality . Unsatisfactory relation schemas that do not meet certain conditionsβ€”the normal form tests β€”are decomposed into smaller relation schemas that meet the tests and hence possess the desirable properties . The normal form of a relation refers to the highest normal form condition that it meets, and hence indicates the degree to which it has been normalized . Codd first proposed in 1972 ,three normal forms 1NF , 2NF , 3NF(BCNF ) . Later, a fourth normal form (4NF) and a fifth normal form (5NF) were proposed, based on the concepts of multivalued dependencies and join dependencies The database designers need not normalize a relation to the highest possible normal form. Relations may be left in a lower normalization status , such as 2NF, if performance is achieved. 1. First Normal Form A relation is said to be in 1NF if it contains only atomic (indivisible) values, and each column contains only one value per row . Thus , First normal form (1NF ) is actually considered to be part of the formal definition of a relation

Normal Forms (contd) Historically , it was defined to disallow multivalued attributes, composite attributes, and their combinations . It states that the domain of an attribute must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute. Hence, 1NF disallows having a set of values, a tuple of values, or a combination of both as an attribute value for a single tuple. Rules : All values in each column must be atomic, meaning no repeating groups or arrays . Each column must contain only one type of data . Each row must be unique.

Normal Forms (contd) 2. Second Normal Form Second normal form (2NF) is based on the concept of full functional dependency . A relation is said to be in 2NF, if it is in 1NF and every non-key attribute is fully functionally dependent on the primary key. A functional dependency X β†’ Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more. Refer slide #10. Rules The table must be in 1NF . Every non-key attributes is fully depend on the Primary Key i.e Partial dependency should NOT exist . In the below relation , FD2 & FD3 are partial hence this relation is not 2NF. The functional dependencies FD2 and FD3 make Ename, Pname, and Plocation partially dependent on the primary key {Ssn, Pnumber} of EMP_PROJ, thus violating the 2NF test.

Normal Forms (contd) To make this relation 2NF we can split the table into three tables , EP1 with ( Ssn,Pnumber , Hours ) , EP2 with ( Ssn,Ename ) and EP3 with (Pnumber , Pname ,Plocation) to eliminate partial dependencies. 3. Third Normal Form Third normal form (3NF) is based on the concept of transitive dependency . A relation is in 3NF if it is in 2NF and there are NO transitive dependencies. A functional dependency X β†’ Z in a relation schema R is a transitive dependency if both X β†’ Y and Y β†’ Z hold . The only restriction is that the attributes Z in R is neither a candidate key nor a subset of any key of R . Refer slide#12 Rules: The table must be in 2NF . No transitive dependency should exist, which means that non-key attributes should not depend on other non-key attributes.

Normal Forms (contd) In the EMP_DEPT relation below , the dependency Ssn→ Dmgr_ssn is transitive through Dnumber because both the dependencies Ssn → Dnumber and Dnumber → Dmgr_ssn hold and Dnumber is neither a key itself nor a subset of the key of EMP_DEPT . Hence , EMP_DEPT is not in 3NF because of the transitive dependency of Dmgr_ssn (and also Dname) on Ssn via Dnumber . We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and ED2 as shown in fig below. shown in Figure

Normal Forms (contd) 1NF, 2NF ,3NF Recap :- In general, we want to design relation schemas so that they have neither partial nor transitive dependencies because these types of dependencies cause the update Anomalies. Below table summarise the 1NF,2NF &3NF along with test and procedure of how to reduce them to required normal form.

Normal Forms (contd) 4 . Boyce- Codd Normal Form BCNF is a stricter form of 3NF. A relation is in BCNF if it is in 3NF and, for every functional dependency X β†’ Y, X is a superkey . A relation might be in 3NF but not necessarily in BCNF : This occurs when there are dependencies involving non- superkey attributes, which BCNF does not allow. This is explained below: Consider the COMPANY database with attributes { Project_ID , Project_Manager , Emp_ID , Emp_Name } Lets define some functional dependencies as below : Project_ID β†’ Project_Manager (Each project has a single manager.) Emp_ID β†’ Emp_Name (Each employee has a unique name.) Project_ID , Emp_ID β†’ Emp_Name (An employee can work on multiple projects, so Project_ID and Emp_ID together determine Emp_Name .) 3NF Check To determine if this relation is in 3NF we have rules as per slide# 19 i.e Relation should be in 2NF . This means there is No partial dependencies No transitive dependencies for non-key attributes . The table meets the requirements for 3NF because: Every non-key attribute depends directly on a candidate key. There are no transitive dependencies from non-prime attributes.

Normal Forms (contd) BCNF Check To determine if this relation is in BCNF every determinant must be a superkey . Let’s see if that’s the case here. Project_ID β†’ Project_Manager : In this dependency, Project_ID is not a superkey for the entire table (because it does not uniquely determine Emp_ID and Emp_Name ). Thus, this dependency violates BCNF . Emp_ID β†’ Emp_Name : This dependency doesn’t violate BCNF since Emp_ID uniquely determines Emp_Name . Project_ID , Emp_ID β†’ Emp_Name : This composite key dependency does not violate BCNF since Project_ID and Emp_ID together form a superkey . Thus , Since Project_ID β†’ Project_Manager does not meet BCNF requirements (because Project_ID is not a superkey ), this table is in 3NF but not in BCNF . To bring this relation into BCNF, we need to decompose it into separate tables based on dependencies . PROJECT { Project_ID , Project_Manager } EMPLOYEE_PROJECT { Project_ID , Emp_ID , Emp_Name }

After decomposition, each table now satisfies BCNF : In the PROJECT table, Project_ID is a superkey for determining Project_Manager . In the EMPLOYEE_PROJECT table, each functional dependency either has Emp_ID as a key (for Emp_Name ) or Project_ID , Emp_ID together as a composite key. Normal Forms (contd) This example illustrates that : The original table was in 3NF but not in BCNF due to a dependency where the determinant ( Project_ID ) was not a superkey . Decomposition was required to bring the table into BCNF . 3NF is often sufficient for practical applications, but BCNF eliminates further anomalies by enforcing stricter rules, especially when there are dependencies involving non- superkey attributes. Note: You can answer following two questions based on above understanding What is BCNF Show with example a relation that is in 3NF but not in BCNF. How will you convert the relation to BCNF .

Normal Forms (contd) 4 . Fourth Normal Form 4NF addresses a specific type of dependency called a multi-valued dependency (MVD). A relation is in 4NF if : It is already in Boyce- Codd Normal Form (BCNF ). It has no multi-valued dependencies other than a trivial one. 1NF , 2NF , 3NF and BCNF depends on the concept of functional dependency, which is by far the most important type of dependency in relational database design theory. However, in many cases relations have constraints that cannot be specified as functional dependencies . A M ultivalued D ependency (MVD) occurs when, in a relation, two attributes are independent of each other but both depend on a third attribute . 4NF ( fourth normal form) , is based on this type of dependency. Consider a relation 𝑅 containing information about a student's hobbies and the courses they are taking : Here , Priya has multiple hobbies: Singing and Painting . Priya is enrolled in multiple courses: Mathematics and Physics. For this relation : Hobby is independent of Course for a given student . But both dependent on Student Student β†’β†’Hobby , Studentβ†’β†’ Course This constraint is Multivalued Dependency .

Normal Forms (contd) To keep the relation state consistent, and to avoid any spurious relationship between the two independent attributes , we split the EMP table to two relations as below. 5. Fifth Normal Form A relation R is in 5NF if and only if every join dependency in R is implied by the candidate keys of R. What this actually means is a relation decomposed into two relations must have loss-less join Property, which ensures that no spurious or extra tuples are generated, when relations are reunited through a natural join based on candidate Key . Rules It is in 4NF . It cannot be decomposed into smaller relations without losing information, meaning that all join dependencies are preserved.

Normal Forms (contd) In many relations , A join dependency exists when a relation can be decomposed into multiple smaller relations that can be joined back to get the original relation without any loss of information . In simpler terms, 5NF ensures that complex relationships among data are structured in a way that does not introduce redundancy if they are joined using candidate keys.

Lossless Join and Dependency Preserving Decomposition When designing a relational database, it is often necessary to decompose large tables into smaller ones to remove redundancy and anomalies (insertion, deletion, and update). However, decomposition should meet two important criteria to ensure data integrity and usability : Lossless Join: Ensures no data is lost when decomposed tables are joined back together . Dependency Preservation: Ensures all functional dependencies in the original table are maintained in the decomposed tables. Lossless Join Decomposition A decomposition is said to be lossless if, after decomposing a relation 𝑅 into smaller relations 𝑅 1 and 𝑅 2, the original relation 𝑅 can be reconstructed by performing a natural join on 𝑅 1​ and 𝑅 2 without losing any data or introducing spurious tuples . More formally , A decomposition of a relation R into R1​ and R2​ is lossless if: R=R1β‹ˆ R2. This means the result of the natural join of R1​ and R2​ contains exactly the tuples in the original relation R . The decomposition is lossless if at least one of the following holds: Key condition for Lossless Join are : The intersection of 𝑅 1​ and 𝑅 2​ contains a candidate key of either 𝑅 1​ or 𝑅 2​ . The intersection of 𝑅1 ​ and 𝑅2 can functionally determine all attributes in one of the decomposed tables.

Lossless Join Decomposition (contd) Consider the COMPANY database with attributes from slide#22 { Project_ID , Project_Manager , Emp_ID , Emp_Name } We decompose this relation into two smaller relations : PROJECT { Project_ID , Project_Manager } EMPLOYEE_PROJECT { Project_ID , Emp_ID , Emp_Name } To check if this decomposition is lossless, we can perform a natural join on PROJECT and EMPLOYEE_PROJECT PROJECT β‹ˆ EMPLOYEE_PROJECT The result matches the original table . Hence, this decomposition is lossless. Dependency Preserving Decomposition A decomposition is said to be dependency preserving if all functional dependencies from the original relation 𝑅 are maintained in the decomposed relations 𝑅1,𝑅2,…, 𝑅𝑛. Formally , A decomposition of a relation 𝑅 into 𝑅 1,…,𝑅𝑛 is dependency preserving if the union of functional dependencies preserved in 𝑅 1,…,𝑅𝑛​ is equivalent to the functional dependencies in 𝑅.

Dependency Preserving Decomposition 𝐹 β€² = 𝐹′ Where: 𝐹: The original set of functional dependencies in 𝑅. 𝐹′ : The union of functional dependencies preserved in 𝑅 1,…,𝑅𝑛 . Consider the previous example from COMPANY database. { Project_ID , Project_Manager , Emp_ID , Emp_Name } In this relation we can see following dependencies. Project_ID β†’ Project_Manager ( Each project has a unique manager .) Emp_ID β†’ Emp_Name ( Each employee has a unique name .) Project_ID , Emp_ID β†’ Emp_Name ( For a given project, an employee’s ID determines their name.) We decompose this table into two smaller tables:

Dependency Preserving Decomposition (contd) We can now check Dependency Preservation Project_ID β†’ Project_Manager : In the PROJECT table, Project_Manager can be uniquely identified by Project_ID . Dependency Project_ID β†’ Project_Manager is preserved . 2 . Emp_ID β†’ Emp_Name : In the EMPLOYEE_ASSIGNMENT table, Emp_ID uniquely determines Emp_Name . Dependency Emp_ID β†’ Emp_Name is preserved . 3 . Project_ID , Emp_ID β†’ Emp_Name : In the EMPLOYEE_ASSIGNMENT table, the combination of Project_ID and Emp_ID uniquely determines Emp_Name . Dependency Project_ID , Emp_ID β†’ Emp_Name is preserved. The decomposition into PROJECT and EMPLOYEE_ASSIGNMENT preserves all the functional dependencies from the original relation. Hence, this decomposition is dependency preserving.

Multivalued Dependency in RDBMS A Multivalued Dependency occurs when two distinct attributes in a table are independent of each other but depend on a third attribute. Thus a multivalued dependency always requires at least three attributes. Formally , Given a relation R, a multi-valued dependency Xβ†’β†’ Y exists if, for each value of X, a set of values for Y is associated independently of the values of any other attributes (denoted as Z) in the relation . Eg . Consider a relation 𝑅 containing information about a student's hobbies and the courses they are taking : Here , Priya has multiple hobbies: Singing and Painting . Priya is enrolled in multiple courses: Mathematics and Physics. For this relation : Hobby is independent of Course for a given student . But both dependent on Student Student β†’β†’Hobby , Studentβ†’β†’ Course This leads to data redundancy, as for each hobby, all course combinations are repeated . Thus, a multivalued dependency would prevent the 4NF.

Multivalued Dependency in DBMS (contd) To achieve 4NF, decompose 𝑅 into two relations as was done on slide #: 26 This decomposition removes redundancy while preserving the multi-valued dependencies.

Join Dependency in DBMS Join decomposition is a further generalization of Multivalued dependencies . A relation R is said to have a join dependency if and only if it can be decomposed into multiple relations R1, R2, ..., Rn such that the join of these relations results in the original relation R , without any data redundancy or information loss. Formally, If the join of R1 and R2 over C is equal to relation R, then we can say that a join dependency (JD) exists. Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A, B, C, D). Join Dependency Ensures that a relation is decomposed in a way that eliminates redundancy . Prevents anomalies like update, delete, or insert anomalies . Ensures that data is stored in its minimal form. Consider a relation R(Student , Subject, Professor ): Problem: The same Subject-Professor pair is repeated for multiple students. This redundancy can lead to update anomalies . For example, if a professor for a subject changes, multiple rows will need to be updated .

Join Dependency in DBMS (contd) We can decompose this relation R(Student, Subject, Professor) into smaller relations: The original relation R(Student, Subject, Professor) can be reconstructed by performing a natural join on R1(Student, Subject) and R2(Subject, Professor ) 𝑅= R1 β‹ˆR2 Thus this decomposition ensures that any changes in the Subject-Professor assignment do not require updating multiple rows and it can be seen that 𝑅= R1β‹ˆR2 Shall recover the original relation R. Hence Join Dependency ensures lossless decomposition.

Thanks
Tags