Fundamentals of Database System(CoSc2041) - Chapter 3.pptx

ssusere261f6 5 views 104 slides Oct 17, 2025
Slide 1
Slide 1 of 104
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104

About This Presentation

Fundamentals of Database System


Slide Content

Chapter 3 Database Modeling

Outline O verview of D atabase D esign P rocess E xample D atabase A pplication ( COMPANY) ER M odel C oncepts Entities and Attributes Entity Types, Value Sets, and Key Attributes Relationships and Relationship Types Weak Entity Types Roles and Attributes in Relationship Types T he ER D iagrams. M apping ER - M odels to R elational T ables 1

O verview of D atabase D esign P rocess There are several steps in designing a database system. The major steps in database or any system design are; Planning: that is identifying information gap in an organization and propose a database solution to solve the problem. Analysis : that concentrates more on fact finding about the problem or the opportunity. Feasibility analysis, requirement determination and structuring, and selection of best design method are also performed at this phase. Design : in database development more emphasis is given to this phase. The phase is further divided into three sub-phases . Implementation: the testing and deployment of the designed database for use. Operation and Support: administering and maintaining the operation of the database system and providing support to users. Tuning the database operations for best performance. 2

O verview of D atabase D esign P rocess Steps for database design process User requirement collection and analysis: the database designers interview prospective database users to understand and document their data requirements. Ensuring that a database system effectively meets the needs and expectations of its stakeholders . 2) Conceptual design: Provides an abstract view of the data. It focuses on: what data is needed. how the data elements relate to one another without delving into the implementation details . 3

O verview of D atabase D esign P rocess Steps for database design process 2 ) Conceptual design: concise description of the data, data type, relationship between data and constraints on the data. It revolves around discovering and analyzing organizational and user data requirements. The important activities are to identify:- Entities Attributes Relationships Constraints and based on these identified components then develop the ER model using ER diagrams 4

3) Logical design Define detail view of data structure. Specifies data type for attribute, table for each entity type . a higher level conceptual abstraction with selected specific data model to implement the data structure 4) Schema design: represent the entire description/skeleton of the database using schema diagram. 5) Physical design: Define how the data can be stored and accessed. i.e. file organization and access path to the data . physical implementation of the logical design of the database with respect to internal storage and file structure of the database for the selected DBMS. 5

O verview of D atabase D esign P rocess 6

Example COMPANY Database Based on the following requirements design COMPANY Database: The company is organized into DEPARTMENTs . Each department has a name , number and an employee who manages the department. We keep track of the start date of the department manager . A department may have several locations . Each department controls a number of PROJECTs . Each project has a unique name , unique number and is located at a single location . 8

Cont… We store each EMPLOYEE’s social security number, address, salary, sex, and birthdate. Each employee works for one department but may work on several projects. We keep track of the number of hours per week that an employee currently works on each project. We also keep track of the direct supervisor of each employee. Each employee may have a number of DEPENDENTs. For each dependent, we keep track of their name, sex, birthdate, and relationship to the employee. 9

Entity Relationship(ER) model High-level conceptual framework used to represent and design a database structure in a systematic way. Components of Entity-Relational (RE) Model Entities Attributes Relationships Relational constraints Represented with entity relationship diagram 10

Entities and Attributes Entities are specific objects or things in the mini-world that are represented in the database. A thing or object in the real world with an independent existence. Exist physical (person, car, house, or employee) or conceptual (a company, a job, or a university course)). Must have descriptive information. Attributes are properties used to describe an entity. For example, an EMPLOYEE entity may be described by the employee’s name, age, address, salary, and job 11

Cont... A specific entity will have a value for each of its attributes . For example a specific employee entity may have Name='John Smith', SSN='123456789', Address ='731, Fondren , Houston, TX', Sex='M', BirthDate ='09-JAN-55‘ Each attribute has a value set (or data type ) associated with it e.g. integer, string, subrange, enumerated type, … The attribute values that describe each entity become a major part of the data stored in the database. 12

Types of Attributes 1) Composite versus Simple (Atomic) Attributes Composite Divided into smaller subparts, which represent more basic attributes with independent meanings. For example: Address(Street, City, State, ZipCode , Country) Name( FirstName , MiddleName , LastName ). Composition may form a hierarchy where some components are themselves composite. Simple Attributes that are not divisible . Each entity has a single atomic value for the attribute . For example, SSN or Sex. 13

Example of composite attribute 2) Single-Valued versus Multivalued Attributes Single-Valued : Attributes that have a single value for a particular entity. For example, Age is a single-valued attribute of a person. Multivalued : Entity may have multiple values for that attribute. For example, Color of a CAR or Previous Degrees of a STUDENT. 14

3) Stored versus Derived Attributes Derived: Some attribute values can be derived from related entities. For example: the value of Age can be determined from the current (today’s) date and the value of that person’s Birth_date . So, age called derived attributes Stored: Attributes that are physically stored in the database and used to derive other attribute. For example: Birth_date that directly stored in the database. 15

Complex Attributes: Composite and multi-valued attributes may be nested arbitrarily to any number of levels, although this is rare . Components of a composite attribute between parentheses ( ) and separating the components with commas, and by displaying multivalued attributes between braces { }. For example; { Address_phone ( { Phone ( Area_code , Phone_number ) } , Address ( Street_address ( Number , Street,Apartment_number ) , City , State,Zip ) ) } 16

  It does not mean zero or an empty string. Not applicable A particular entity may not have an applicable value for an attribute. For example, a CollegeDegrees attribute applies only to persons with college degrees. For such situations, a special value called null is created. Unknown The value exist but we do not know the value or missing value. Not known whether the attribute value exists. NULL Values 17

Entity Types A collection (or set) of entities that have the same attributes . For example, the entity type STUDENT and PROJECT . Entity sets or Entity collection The collection of all entities of a particular entity type in the database at any point in time. 18

Key or uniqueness constraint attribute An entity type usually has one or more attributes whose values are distinct for each individual entity in the entity set. Its values can be used to identify each entity uniquely. For example, SSN of EMPLOYEE . A key attribute may be composite. e.g , VehicleTagNumber is a key of the CAR entity type with components (Number, State). An entity type may have more than one key . The CAR entity type may have two keys: VehicleIdentificationNumber (popularly called VIN) VehicleTagNumber (Number, State), aka license plate number . 19

Cont… Each key is underlined. It is not the property of a particular entity set. key attributes should be defined as non- nullable or not be null . Value Sets (Domains) of Attributes: the set of values that may be assigned to that attribute for each individual entity. e.g.; if the range of ages allowed for employees is between 16 and 70, we can specify the value set of the Age attribute of EMPLOYEE to be the set of integer numbers between 16 and 70 Not typically displayed in basic ER diagrams and similar to data types. 20

Initial Design of Entity Types for the COMPANY Database Based on the requirements, we can identify four initial entity types in the COMPANY database: DEPARTMENT PROJECT EMPLOYEE DEPENDENT The attributes shown are derived from the requirements description. Weak entity type is an entity that does not have a key attribute , in contrast called strong entity . 21

22

Refining the initial design by introducing relationships The initial design is typically not complete . Some aspects in the requirements will be represented as relationships ER model has three main concepts: Entities (and their entity types and entity sets) Attributes (simple, composite, multivalued) Relationships (and their relationship types and relationship sets) 23

A relationship relates two or more distinct entities with a specific meaning. E .g., EMPLOYEE John Smith works on the ProductX PROJECT, or EMPLOYEE Franklin Wong manages the Research DEPARTMENT. Relationships of the same type are grouped or typed into a relationship type . E.g., the WORKS_ON relationship type in which EMPLOYEEs and PROJECTs participate, or the MANAGES relationship type in which EMPLOYEEs and DEPARTMENTs participate. Relationships and Relationship Types 24

Degree of Relationship The degree of a relationship type is the number of participating entity types. Both MANAGES and WORKS_ON are binary relationships. 25

Relationship type between EMPLOYEE and PROJECT Relationship type of the M:N WORKS_ON 26

Relationship Type: Identifies the relationship name and the participating entity types Also identifies certain relationship constraints. Relationship Set: A collection of actual relationships that exist in the database at a certain point in time. In ER diagrams: Diamond-shaped box is used to display a relationship type. Connected to the participating entity types via straight lines. Relationship type vs. relationship set 27

Refining the COMPANY database by introducing relationships By examining six relationship types are identified All are binary relationships ( degree 2) Listed below with their participating entity types: WORKS_FOR (between EMPLOYEE, DEPARTMENT) MANAGES (also between EMPLOYEE, DEPARTMENT) CONTROLS (between DEPARTMENT, PROJECT) WORKS_ON (between EMPLOYEE, PROJECT) SUPERVISION (between EMPLOYEE ( as subordinate), EMPLOYEE (as supervisor )) DEPENDENTS_OF (between EMPLOYEE, DEPENDENT) 28

ER DIAGRAM 29

points on Relationship Types In the refined design, some attributes from the initial entity types are refined into relationships : Manager of DEPARTMENT -> MANAGES Works_on of EMPLOYEE -> WORKS_ON Department of EMPLOYEE -> WORKS_FOR In general, more than one relationship type can exist between the same participating entity types MANAGES and WORKS_FOR are distinct relationship types between EMPLOYEE and DEPARTMENT Different meanings and different relationship instances. 30

Roles name and recursive relationship Signifies the role that a participating entity from the entity type plays in each relationship instance. For example, in the WORKS_FOR relationship type, EMPLOYEE plays the role of employee or worker and DEPARTMENT plays the role of department or employer. Role names are unnecessary in relationship types with distinct participating entities, as each entity type name can serve as the role name. Recursive Relationship: An relationship type whose with the same participating entity type in distinct roles. 31

EMPLOYEE participates twice in two distinct roles: Supervisor (or boss) role. Supervisee (or subordinate) role. The lines marked ‘1’ represent the supervisor role, and those marked ‘2’ represent the supervisee role. In ER diagram, need to display role names to distinguish participations. 32

Constraints on Relationship Two main types of binary relationship constraints : cardinality ratio and participation . 1) Cardinality Ratios: specifies the maximum number of relationship instances that an entity can participate in. For example, in the WORKS_FOR binary relationship type, DEPARTMENT:EMPLOYEE is of cardinality ratio 1:N, meaning that each department can be related to (that is, employs) any number of employees (N), but an employee can be related to (work for) at most one department. 33

Cont… The possible cardinality ratios for binary relationship types are; One-to-one (1:1) One-to-many (1:N) or Many-to-one (N:1) Many-to-many (M:N) Cardinality ratios for binary relationships are represented on ER diagrams by displaying 1, M, and N on the diamonds. 2) Existence Dependency Constraint (specifies minimum participation) (also called participation constraint): specifies the minimum participation of an entity in a relationship. There are two types of participation constraints; total and partial . 34

Cont… For example; If a company policy states that every employee must work for a department, then an employee entity can exist only if it participates in at least one WORKS_FOR relationship instance. Thus, the participation of EMPLOYEE in WORKS_FOR is called total participation . Cardinality ratio and participation constraints, taken together, as the structural constraints of a relationship type. In ER diagrams, total participation is shown with a double line connecting the entity to the relationship, while partial participation is represented by a single line. 35

Weak entity vs Strong entity Weak entity type is an entity that does not have a key attribute, in contrast called strong entity . 36

(min, max) notation for relationship structural constraints: Specified on each participation of an entity type E in a relationship type R. Specifies that each entity e in E participates in at least min and at most max relationship instances in R. Default(no constraint): min =0, max=n (signifying no limit). Must have minmax , min0, max 1 Examples: A department has exactly one manager and an employee can manage at most one department . Specify (0,1) for participation of EMPLOYEE in MANAGES 37

Cont… Specify (1,1) for participation of DEPARTMENT in MANAGES An employee can work for exactly one department but a department can have any number of employees. Specify (1,1) for participation of EMPLOYEE in WORKS_FOR Specify (0,n) for participation of DEPARTMENT in WORKS_FOR 38

ER diagram with (Min, Max) 39

Notation for ER diagrams 40

Part Two Enhanced entity relationship model

EER stands for Enhanced ER or Extended ER EER Model Concepts Includes all modeling concepts of basic ER. Additional concepts: Subclasses/ Superclasses Specialization/Generalization Categories (UNION types) Attribute and Relationship inheritance The additional EER concepts are used to model applications more completely and more accurately. EER includes some object-oriented concepts, such as inheritance.

Subclasses and Superclasses An entity type may have additional meaningful subgroupings/subtype/subclasses of its entities Example : EMPLOYEE may be further grouped into: SECRETARY, ENGINEER, TECHNICIAN, … Based on the EMPLOYEE’s Job MANAGER EMPLOYEEs who are managers SALARIED_EMPLOYEE, HOURLY_EMPLOYEE Based on the EMPLOYEE’s method of pay EER diagrams extend ER diagrams to represent these additional subgroupings, called subclasses or subtypes .

Subclasses and Superclasses

Cont… Each of these subgroupings is a subset of EMPLOYEE entities. Each is called a subclass of EMPLOYEE. EMPLOYEE is the superclass for each of these subclasses. These are called superclass/subclass relationships: EMPLOYEE/SECRETARY. EMPLOYEE/TECHNICIAN. EMPLOYEE/MANAGER. …

Cont… These are also called IS-A relationships SECRETARY IS-A EMPLOYEE, TECHNICIAN IS-A EMPLOYEE, …. Note: An entity that is member of a subclass represents the same real-world entity as some member of the superclass: The subclass member is the same entity in a distinct specific role. An entity cannot exist in the database merely by being a member of a subclass; it must also be a member of the superclass. A member of the superclass can be optionally included as a member of any number of its subclasses.

Cont… Examples: A salaried employee who is also an engineer belongs to the two subclasses: ENGINEER , and SALARIED_EMPLOYEE A salaried employee who is also an engineering manager belongs to the three subclasses: MANAGER, ENGINEER , and SALARIED_EMPLOYEE It is not necessary that every entity in a superclass be a member of some subclass.

Attribute Inheritance in Superclass / Subclass Relationships An entity that is member of a subclass inherits All attributes of the entity as a member of the superclass. All relationships of the entity as a member of the superclass. Example : In the previous slide, SECRETARY (as well as TECHNICIAN and ENGINEER) inherit the attributes Name, SSN, …, from EMPLOYEE. Every SECRETARY entity will have values for the inherited attributes.

Specialization Specialization is the process of defining a set of subclasses of a superclass. The set of subclasses is based upon some distinguishing characteristics of the entities in the superclass. Example : { SECRETARY, ENGINEER, TECHNICIAN} is a specialization of EMPLOYEE based upon job type. May have several specializations of the same superclass

Cont… Example: Another specialization of EMPLOYEE based on method of pay is {SALARIED_EMPLOYEE, HOURLY_EMPLOYEE }. Superclass/subclass relationships and specialization can be diagrammatically represented in EER diagrams. Attributes of a subclass are called specific or local attributes. For example, the attribute TypingSpeed of SECRETARY The subclass can also participate in specific relationship types. For example, a relationship BELONGS_TO of HOURLY_EMPLOYEE.

Why including class/subclass relationships and specialization in EER: Certain attributes may apply to some but not all entities of the superclass entity type. For example ; the SECRETARY subclass has the specific attribute Typing_speed , whereas the ENGINEER subclass has the specific attribute Eng_type . Some relationship types may be participated in only by entities that are members of the subclass. For example, if only HOURLY_EMPLOYEES can join a trade union, we can represent this by creating the subclass HOURLY_EMPLOYEE under EMPLOYEE and linking it to the TRADE_UNION entity type through the BELONGS_TO relationship.

Generalization Generalization is the reverse of the specialization process. Several classes with common features are generalized into a superclass; Original classes become its subclasses Example: CAR, TRUCK generalized into VEHICLE; Both CAR , TRUCK become subclasses of the superclass VEHICLE. We can view {CAR, TRUCK} as a specialization of VEHICLE Alternatively , we can view VEHICLE as a generalization of CAR and TRUCK

Cont…

Cont… Data Modeling with Specialization and Generalization Superclass or subclass represents a collection (or set or grouping) of entities. It also represents a particular type of entity. Shown in rectangles in EER diagrams (as are entity types). We can call all entity types (and their corresponding collections) classes , whether they are entity types, superclasses , or subclasses.

Constraints on Specialization and Generalization If we can determine exactly those entities that will become members of each subclass by a condition, the subclasses are called predicate-defined (or condition-defined ) subclasses. Condition is a constraint that determines subclass members. Display a predicate-defined subclass by writing the predicate condition next to the line attaching the subclass to its superclass . If all subclasses in a specialization have membership condition on same attribute of the superclass, specialization is called an attribute-defined specialization

Cont… Attribute is called the defining attribute of the specialization Example : JobType is the defining attribute of the specialization {SECRETARY, TECHNICIAN, ENGINEER} of EMPLOYEE If no condition determines membership, the subclass is called user-defined . Membership in a subclass is determined by the database users by applying an operation to add an entity to the subclass. Membership in the subclass is specified individually for each entity in the superclass by the user.

Displaying an attribute-defined specialization in EER diagrams

Constraints on Specialization and Generalization Two basic constraints can apply to a specialization/generalization Disjointness Constraint . Completeness Constraint. 1) Disjointness Constraint: Specifies that the subclasses of the specialization must be disjoint : An entity can be a member of at most one of the subclasses of the specialization. Specified by d in EER diagram

Cont… If not disjoint, specialization is overlapping That is the same entity may be a member of more than one subclass of the specialization. Specified by o in EER diagram 2) Completeness Constraint: Total specifies that every entity in the superclass must be a member of some subclass in the specialization/generalization Shown in EER diagrams by a double line Partial allows an entity not to belong to any of the subclasses Shown in EER diagrams by a single line

Cont… For example, if every EMPLOYEE must be either an HOURLY_EMPLOYEE or a SALARIED_EMPLOYEE, then the specialization {HOURLY_EMPLOYEE, SALARIED_EMPLOYEE} is a total specialization of EMPLOYEE. For example, if some EMPLOYEE entities do not belong to any of the subclasses {SECRETARY, ENGINEER, TECHNICIAN }; then that specialization is partial specialization.

Cont… Hence, we have four types of specialization/generalization: Disjoint , total Disjoint , partial Overlapping , total Overlapping , partial Note: Generalization usually is total because the superclass is derived from the subclasses.

Example of disjoint total Specialization

Specialization/Generalization Hierarchies, Lattices & Shared Subclasses A subclass may itself have further subclasses specified on it forms a hierarchy or a lattice Hierarchy has a constraint that every subclass has only one superclass (called single inheritance ); this is basically a tree structure In a lattice , a subclass can be subclass of more than one superclass (called multiple inheritance )

Shared Subclass “ Engineering_Manager ” In a lattice or hierarchy, a subclass inherits attributes not only of its direct superclass, but also of all its predecessor superclasses .

Cont … A subclass with more than one superclass is called a shared subclass (multiple inheritance) Can have: Specialization hierarchies or lattices, or Generalization hierarchies or lattices,

Utilizing Specialization and Generalization in Refining Conceptual Schemas In specialization , start with an entity type and then define subclasses of the entity type by successive specialization called a top down conceptual refinement process In generalization , start with many entity types and generalize those that have common properties Called a bottom up conceptual synthesis process In practice, a combination of both processes is usually employed

Categories (UNION TYPES) Represent a collection of entities from different superclasses entity types. A subclass will represent a collection of entities that is a subset of the UNION of entities from distinct entity types. Such a subclass is called a category or UNION TYPE

Cont… Example: In a database for vehicle registration, a vehicle owner can be a PERSON, a BANK (holding a lien on a vehicle) or a COMPANY. A category (UNION type) called OWNER is created to represent a subset of the union of the three superclasses COMPANY, BANK, and PERSON. A category member must exist in at least one of its superclasses Difference from shared subclass , which is a: Subset of the intersection of its superclasses . Shared subclass member must exist in all of its superclasses .

Two categories: OWNER, REGISTERED_VEHICLE

Part Three Relational data model and relational data model constraints

Relational Model Concepts The model was first proposed by Dr. E.F. Codd of IBM Research in 1970 in the following paper: " A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970 A Relation is a mathematical concept based on the ideas of sets . Informal definition A relation looks like a table of values. A relation typically contains a set of rows. The data elements in each row represent certain facts that correspond to a real-world entity or relationship.

Cont… In the formal model, rows are called tuples . Each column has a column header that gives an indication of the meaning of the data items in that column. In the formal model, the column header is called an attribute name (or just attribute )

Cont… Key of a Relation: Each row has a value of a data item (or set of items) that uniquely identifies that row in the table. Called the key In the STUDENT table, SSN is the key Sometimes row-ids or sequential numbers are assigned as keys to identify the rows in a table Called artificial key or surrogate key.

Formal definition - Schema The Schema (or description) of a Relation: Denoted by R(A1, A2, ..... An) R is the name of the relation The attributes of the relation are A1, A2, ..., An Example: STUDENT (Stud-id , Stud-name , Address, Phone) STUDENT is the relation name. Defined over the four attributes: Stud-id , Stud-name , Address, Phone Each attribute has a domain or a set of valid values. For example, the domain of Stud-id is 6 digit numbers.

Cont… A tuple is an ordered set of values (enclosed in angled brackets ‘< … >’) Each value is derived from an appropriate domain . A row in the STUDENT relation is a 4-column and would consist of four values, for example: <632895, “ Dawit Abebe ”, “101 Main St. Atlanta, GA 30332”, “0916681940”> This is called a 4-column as it has 4 values.

A domain defines either data-type or a format . The Ethiopia_phone_numbers may have a format: ( ddd ) ddd-dddd where each d is a decimal digit. Dates have various formats such as year , month , date formatted as yyyy -mm- dd , or as dd , mm,yyyy etc The attribute name designates(labels) indicates the role played by a domain in a relation: Used to interpret the meaning of the data elements corresponding to that attribute. Example : The domain Date may be used to define two attributes named “ Birth-date ” and “ Payment-date ” with different meanings.

Relation state Data that is stored within a relation (table) at a specific point in time The relation state is a subset of the Cartesian product of the domains of its attributes. Each domain contains the set of all possible values the attribute can take. Example : attribute Stud-name is defined over the domain of character strings of maximum length 25 dom (Stud-name ) is varchar(25) The role these strings play in the STUDENT relation is that of the name of a student .

Summary of formal definitions Formally, Given R(A1, A2, .........., An) r(R )  dom (A1) X dom (A2) X ....X dom(An) R(A1, A2, …, An) is the schema of the relation. R is the name of the relation. A1 , A2, …, An are the attributes of the relation. r(R ): a specific state (or "value" or “population”) of relation R-this is a set of tuples (rows) r(R) = {t1, t2, …, tn } where each ti is an n-tuple. ti = <v1, v2, …, vn > where each vj element-of dom( Aj )

Example Let R(A1, A2) be a relation schema: Let dom(A1) = { 0,1} Let dom(A2) = { a,b,c } Then: dom(A1) X dom(A2) is all possible combinations: {<0,a> , <0,b> , <0,c>, <1,a>, <1,b>, <1,c> } The relation state r(R)  dom(A1) X dom(A2) For example: r(R) could be {<0,a> , <0,b> , <1,c> } This is one possible state (or “population” or “extension”) r of the relation R, defined over A1 and A2. It has three 3-tuples : <0,a> , <0,b> , <1,c>

Definition Summary Informal Terms Formal Terms Table Relation Column Header Attribute All possible Column Values Domain Row Tuple Table Definition Schema of a Relation Populated Table State of the Relation

Characteristics of relations The tuples are not considered to be ordered . Attributes (columns) can accept null values. All values are considered atomic (indivisible). Each value in a tuple must be from the domain of the attribute for that column. Each column in a relation has a defined data type . Each row in a relational table must be unique , This uniqueness is often enforced using a  primary key.

Example: A Relation STUDENT with different order

Relational Integrity Constraints Constraints are conditions that must hold on all valid relation states . Set of rules that ensure consistency , and validity of data in a relational database. There are three main types of constraints in the relational model: Key constraints Entity integrity constraints Referential integrity constraints Another implicit constraint is the domain constraint. Every value in a tuple must be from the domain of its attribute (or it could be null , if allowed for that attribute)

Key Constraints Key : A set of attributes (columns) that defines a uniqueness constraint for rows in a table . Types of Keys : Super Key : Any attribute or combination of columns that can uniquely identify a record, including unnecessary attributes. Example:  student_id + first_name  is a super key but not minimal . 2) Candidate Key : A minimal set of attributes that can uniquely identify a record. No unnecessary columns included. If a super key is having only one attribute, it is automatically a Candidate key Example : Both  student_id  and email can serve as candidate keys.

A candidate key has three properties Uniqueness Irreducibility/ M inimality Not null If a candidate key consists of more than one attribute it is called Composite Key. Primary key: the candidate key that is selected to identify tuples uniquely within the relation. A unique identifier for each row in a table. Cannot contain null values. Example:  student_id  in a students table .

Example: STUDENT Table Student_id First_name Last_name Email 1 John Dawit [email protected] 2 Jane Jalla [email protected] 3 Asrat Addisu [email protected] 4 Dejene Dawit [email protected] 5 Mona Lisa [email protected] 6 John Dawit [email protected]

Cont… In general: Any key is a superkey (but not vice versa) Any set of attributes that includes a key is a superkey A minimal superkey is also a key. If a relation has several candidate keys , one is chosen arbitrarily to be the primary key . The primary key attributes are underlined . Example: Consider the CAR relation schema: CAR(State, Reg #, SerialNo , Make, Model, Year) We chose SerialNo as the primary key

Cont… The primary key value is used to uniquely identify each tuple in a relation Provides the tuple identity Also used to reference the tuple from another tuple General rule: Choose as primary key the smallest of the candidate keys (in terms of size) Not always applicable – choice is sometimes subjective

Relational Database Schema Relational Database Schema: A set S of relation schemas that belong to the same database. S is the name of the whole database schema. S = {R1, R2, ..., Rn}. R1 , R2, …, Rn are the names of the individual relation schemas within the database S Following slide shows a COMPANY database schema with 6 relation schemas.

Company database schema

Entity integrity  E nsures each entity (or record) in a table can be uniquely identified. The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R ). This is because primary key values are used to identify the individual tuples. t[PK ]  null for any tuple t in r(R) If PK has several attributes, null is not allowed in any of these attributes. Note : Other attributes of R may be constrained to disallow null values, even though they are not members of the primary key .

Referential Integrity Department_id Department_name 1 Human Resources 2 Computer science 3 Management Employee_id Name Department_id 101 Ayano 2 102 Endris Null 103 Ahmed 1 Employee Table Department Table   Indicating that he does not belong to any department. Deparment_id is not a part of primary key in Employee table. The reason to Null value for foreign key Foreign key

Referential Integrity A   foreign key  is an attribute or a set of attributes in one table that refers to the primary key in another table. This creates a link between the two tables and enforces referential integrity . Ensures that a foreign key value in one table always points to a valid, existing primary key in another table. This means that if one table references another, the referenced data must exist to maintain the integrity of the relationship . A referential integrity constraint can be displayed in a relational database schema as a directed arc from R1.FK to R2.

Referential Integrity (or foreign key) Constraint Statement of the constraint: The value in the foreign key column (or columns) FK of the the referencing relation R1 can be either : A value of an existing primary key value of a corresponding primary key PK in the referenced relation R2, or A null . In case (2), the FK in R1 should not be a part of its own primary key.

Displaying a relational database schema and its constraints Each relation schema can be displayed as a row of attribute names. The name of the relation is written above the attribute names. The primary key attribute (or attributes) will be underlined. A foreign key (referential integrity) constraints is displayed as a directed arc (arrow) from the foreign key attributes to the referenced table Can also point the primary key of the referenced relation for clarity Next slide shows the COMPANY relational schema diagram

Referential Integrity Constraints for COMPANY database

Populated database state Each relation will have many tuples in its current relation state The relational database state is a union of all the individual relation states. Whenever the database is changed, a new state arises. Basic operations for changing the database: INSERT a new tuple in a relation DELETE an existing tuple from a relation MODIFY an attribute of an existing tuple Next slide shows an example state for the COMPANY database.

Populated database state for COMPANY

Update Operations on Relations INSERT a tuple. DELETE a tuple. MODIFY a tuple. Integrity constraints should not be violated by the update operations. Updates may propagate to cause other updates automatically. This may be necessary to maintain integrity constraints. Integrity   refers consistency , and reliability of data stored within the relational database

Cont … In case of integrity violation, several actions can be taken: Cancel the operation that causes the violation (RESTRICT or REJECT option). Perform the operation but inform the user of the violation. Trigger additional updates so the violation is corrected (CASCADE option, SET NULL option). Execute a user-specified error-correction routine.

Possible violations for each operation INSERT may violate any of the constraints: Primary key violation: Inserted a duplicate value for primary key. Domain Constraint: Inserting a record with a value that does not meet the specified domain constraints Unique constraint: Inserting a duplicate value in a column that has a unique constraint. Not null constraint:  insert a record with NULL in a NOT NULL column Referential integrity : Inserting a record with a foreign key that doesn't exist in the referenced table.

Entity integrity: If the primary key value is null in the new tuple. DELETE may violate only referential integrity If the primary key value of the tuple being deleted is referenced from other tuples in the database. Can be remedied by several actions: RESTRICT, CASCADE, SET NULL RESTRICT option: reject the deletion. CASCADE option: propagate the new primary key value into the foreign keys of the referencing tuples. SET NULL option: set the foreign keys of the referencing tuples to NULL

Cont … UPDATE may violate domain constraint and NOT NULL constraint on an attribute being modified. Any of the other constraints may also be violated, depending on the attribute being updated: Updating the primary key (PK): Similar to a DELETE followed by an INSERT Need to specify similar options to DELETE Updating a foreign key (FK): May violate referential integrity Updating an ordinary attribute (neither PK nor FK): Can only violate domain constraints

Thank You Any Question? welcome 104