Informal Design Guidelines for Relation Schemas.pptx
780 views
34 slides
Aug 31, 2024
Slide 1 of 34
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
About This Presentation
Informal Design Guidelines for Relation Schemas
Size: 1003.84 KB
Language: en
Added: Aug 31, 2024
Slides: 34 pages
Slide Content
Informal Design Guidelines for Relation Schemas Dr. Gowthami V
F our informal guidelines that may be used as measures to determine the quality of relation schema design: Making sure that the semantics of the attributes is clear in the schema Minimizing the redundant information in tuples Minimizing the NULL values in tuples Disallowing the possibility of generating spurious tuples
1. Imparting Clear Semantics to Attributes in Relations Whenever we group attributes to form a relation schema, we assume that attributes belonging to one relation have certain real-world meaning and a proper interpretation associated with them. The semantics of a relation refers to its meaning resulting from the interpretation of attribute values in a tuple.
Guideline 1 Design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation. Intuitively, if a relation schema corresponds to one entity type or one relationship type, it is straightforward to interpret and to explain its meaning. Otherwise, if the relation corresponds to a mixture of multiple entities and relationships, semantic ambiguities will result and the relation cannot be easily explained.
each tuple relates an employee to a project but also includes the employee name ( Ename ), project name ( Pname ), and project location ( Plocation ). Although there is nothing wrong logically with these two relations, they violate Guideline 1 by mixing attributes from distinct real-world entities: EMP_DEPT mixes attributes of employees and departments, and EMP_PROJ mixes attributes of employees and projects and the WORKS_ON relationship. Hence, they fare poorly against the above measure of design quality.
2. Redundant Information in Tuples and Update Anomalies Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly.
In database design, redundant information refers to the unnecessary repetition of data within a database. Redundancy can lead to several problems, often referred to as anomalies: Update Anomalies: If redundant data exists in multiple places, updating the data in one place but not in others can lead to inconsistencies. For example, if a customer's address is stored in several tables and the customer moves, failing to update all instances of the address can result in having multiple, conflicting addresses for the same customer.
Insertion Anomalies : These occur when you cannot add data to the database due to the absence of other data. For instance, if a new student must be added to a database but there is no course record yet, and if the database design requires a course record for each student, then the student cannot be added without also creating a placeholder for the course.
Deletion Anomalies : These occur when deleting some data inadvertently results in losing other valuable data. For example, if a student record is deleted, and this record also stores unique information about a course (and if this information is only stored in the student record), then deleting the student record would result in losing the course information as well.
Why Avoid Redundant Information? Consistency is maintained: Changes to data only need to be made in one place, preventing conflicting data entries. Data Integrity is preserved: The database accurately reflects real-world entities and their relationships without inconsistencies. Efficiency is improved: Reducing the size of the database by eliminating duplicate entries saves storage space and makes data management more straightforward.
There are three main types of anomalies: Update Anomalies: Definition: Occur when data is duplicated in multiple places, and a change to the data in one place does not automatically propagate to all instances of that data. Example: Suppose a customer's phone number is stored in multiple rows of a table because the customer has multiple orders. If the customer changes their phone number and the update is made in only one row, the other rows will have outdated information. This inconsistency can cause confusion and errors.
Insertion Anomalies: Definition: Occur when certain data cannot be added to the database without the presence of other data. Example: Consider a table that stores both student and course information. If the database design requires that each student must be associated with a course, then it becomes impossible to add a new student who hasn't yet enrolled in any course. This restriction is due to the poor structure of the table, where unrelated data is combined in a single table.
Deletion Anomalies: Definition: Occur when the deletion of some data inadvertently causes the loss of other data. Example: Suppose a database table stores both employee information and department details. If the last employee in a department is deleted from the table, the department information might also be lost, even though the department still exists independently of its employees. This happens because the department information is tied directly to the employee record, and there’s no separate record for departments.
3. NULL Values in Tuples In relational databases, a null value represents the absence of a value or that the value is unknown. While null values can be useful in certain situations, overusing them or allowing them to be scattered throughout the database can lead to various problems, such as complications in query results, misunderstandings in data interpretation, and difficulties in maintaining data integrity.
Guideline 3: Minimize the Use of Null Values Clarity and Meaning: Null values can make it unclear what data is missing. For instance, if an employee's " end_date " in a database is null, it could mean that the employee is still employed, the end date is not yet determined, or that it is simply unknown whether the employee has left. Minimize nulls to ensure that the meaning of each data entry is clear and unambiguous.
Data Integrity and Consistency: Having too many nulls in a database can lead to inconsistent data states. For example, if a table recording sales transactions has several null values in the "amount" column, it could lead to confusion about whether the sale amounts were missed, not applicable, or if there was a mistake in data entry. Consistent use of nulls ensures that the database integrity is maintained, making it easier to understand and analyze the data.
Complications in Queries: Null values can complicate SQL queries, particularly when performing aggregations or comparisons. Functions like SUM() or AVG() might ignore nulls, leading to results that do not accurately reflect the data's true state. Queries must be carefully written to handle nulls appropriately, often requiring additional conditions (IS NULL, IS NOT NULL), which can complicate query logic and reduce performance.
Potential for Anomalies: Null values can create anomalies, especially in operations like updates, inserts, and deletes. For example, if a database enforces certain constraints (like foreign keys), a null value might violate these constraints, leading to data anomalies and integrity issues.
Strategies to Minimize Null Values Use Default Values: Instead of using nulls, consider using default values that make sense within the context. For example, using "Unknown" or "Not Applicable" as a default for textual fields can help avoid nulls. Break Down Tables: Normalize the database to separate optional data into different tables. For instance, optional fields like a second phone number can be moved to a separate table that links back to the main table.
Use Proper Data Types and Constraints: Define columns with NOT NULL constraints where possible to enforce that values must be provided. This practice ensures that no nulls are entered accidentally. Evaluate the Necessity of Nulls: Sometimes nulls are necessary, such as in cases where a value truly is unknown or inapplicable. However, they should be used judiciously and only when there is a strong justification.
4. Generation of Spurious Tuples In the context of relational database design, spurious tuples are unintended and incorrect rows that can appear when joining tables that are not properly normalized or when the joins are based on incorrect or inadequate conditions. These tuples do not represent real-world entities or relationships and can lead to inaccurate query results and misleading data interpretations.
Guideline 4: Understanding Spurious Tuples: Spurious tuples are the result of a poor database schema design, especially when the schema allows incorrect join operations to produce meaningless or incorrect data. This often happens when tables are joined on attributes that do not uniquely identify rows in each table, causing a Cartesian product-like result rather than a meaningful combination of related data.
Causes of Spurious Tuples: Incorrect Joins: Using non-key attributes (i.e., attributes that are not primary or foreign keys) to join tables can lead to spurious tuples. For example, if two tables are joined on a common attribute that does not uniquely identify records, extra rows can appear in the result set. Redundant or Overlapping Data: When tables store overlapping or redundant data, improper normalization can lead to situations where the join conditions accidentally match unrelated rows, producing spurious tuples. Improper Decomposition: When a table is decomposed (split into two or more tables) in such a way that it loses important information about how to correctly reassemble (join) them, spurious tuples can be generated. This is often due to the loss of functional dependencies that ensure the correct relationships between data.
Impact of Spurious Tuples: Data Integrity Issues: Spurious tuples compromise data integrity, as they do not accurately represent the underlying real-world data or relationships. This can lead to incorrect conclusions, faulty reports, and errors in data analysis. Misleading Information: Spurious tuples can produce misleading results in queries, especially in aggregations or when attempting to retrieve meaningful patterns from the data.
Strategies to Avoid Spurious Tuples Normalize the Database: Proper normalization (up to at least the Third Normal Form or Boyce-Codd Normal Form) helps ensure that each table contains only related attributes and that each attribute depends on the primary key. This process reduces redundancy and minimizes the chances of generating spurious tuples during joins.
Use Proper Join Conditions: Always use primary keys and foreign keys as join conditions. These keys uniquely identify records in their respective tables and ensure that only related records are joined together, preventing spurious tuples. Avoid using non-key attributes in join conditions unless there is a specific, logical reason that these attributes are meant to join.
Maintain Referential Integrity : Ensure that foreign key constraints are properly defined and enforced. Referential integrity guarantees that relationships between tables are consistent and that every foreign key corresponds to a primary key in another table.
Careful Decomposition : When decomposing tables, ensure that the decomposition does not lead to loss of important information or functional dependencies. A well-decomposed schema should allow tables to be joined without generating spurious tuples, preserving the original data’s integrity and relationships.
Check Join Operations : Regularly review and test join operations to ensure that they produce the expected results without generating spurious tuples. Use sample data and validate results to identify and correct any issues early.