Database Management system : UNit I Helping to understand basics of dbms operation

SanthanalakshmiSelva2 31 views 178 slides Aug 29, 2024
Slide 1
Slide 1 of 276
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138
Slide 139
139
Slide 140
140
Slide 141
141
Slide 142
142
Slide 143
143
Slide 144
144
Slide 145
145
Slide 146
146
Slide 147
147
Slide 148
148
Slide 149
149
Slide 150
150
Slide 151
151
Slide 152
152
Slide 153
153
Slide 154
154
Slide 155
155
Slide 156
156
Slide 157
157
Slide 158
158
Slide 159
159
Slide 160
160
Slide 161
161
Slide 162
162
Slide 163
163
Slide 164
164
Slide 165
165
Slide 166
166
Slide 167
167
Slide 168
168
Slide 169
169
Slide 170
170
Slide 171
171
Slide 172
172
Slide 173
173
Slide 174
174
Slide 175
175
Slide 176
176
Slide 177
177
Slide 178
178
Slide 179
179
Slide 180
180
Slide 181
181
Slide 182
182
Slide 183
183
Slide 184
184
Slide 185
185
Slide 186
186
Slide 187
187
Slide 188
188
Slide 189
189
Slide 190
190
Slide 191
191
Slide 192
192
Slide 193
193
Slide 194
194
Slide 195
195
Slide 196
196
Slide 197
197
Slide 198
198
Slide 199
199
Slide 200
200
Slide 201
201
Slide 202
202
Slide 203
203
Slide 204
204
Slide 205
205
Slide 206
206
Slide 207
207
Slide 208
208
Slide 209
209
Slide 210
210
Slide 211
211
Slide 212
212
Slide 213
213
Slide 214
214
Slide 215
215
Slide 216
216
Slide 217
217
Slide 218
218
Slide 219
219
Slide 220
220
Slide 221
221
Slide 222
222
Slide 223
223
Slide 224
224
Slide 225
225
Slide 226
226
Slide 227
227
Slide 228
228
Slide 229
229
Slide 230
230
Slide 231
231
Slide 232
232
Slide 233
233
Slide 234
234
Slide 235
235
Slide 236
236
Slide 237
237
Slide 238
238
Slide 239
239
Slide 240
240
Slide 241
241
Slide 242
242
Slide 243
243
Slide 244
244
Slide 245
245
Slide 246
246
Slide 247
247
Slide 248
248
Slide 249
249
Slide 250
250
Slide 251
251
Slide 252
252
Slide 253
253
Slide 254
254
Slide 255
255
Slide 256
256
Slide 257
257
Slide 258
258
Slide 259
259
Slide 260
260
Slide 261
261
Slide 262
262
Slide 263
263
Slide 264
264
Slide 265
265
Slide 266
266
Slide 267
267
Slide 268
268
Slide 269
269
Slide 270
270
Slide 271
271
Slide 272
272
Slide 273
273
Slide 274
274
Slide 275
275
Slide 276
276

About This Presentation

Database management helps to create table, query, modify table and help table which is software and databases.


Slide Content

DATABASE MANAGEMENT SYSTEMS COURSE CODE : 2 4CA 1202

UNIT - 1 INTRODUCTION, DATABASE DESIGN AND RELATIONAL MODEL Introduction Database System Applications, Purpose of Database Systems, View of Data Database Languages, Database and Application Architecture Database Users and Administrators Database Schema Keys Schema Diagrams ER diagrams Mapping Cardinalities Alternatives Notations for Modeling Data – Data Flow Diagram.

UNIT I INTRODUCTION Database & Database Users. Characteristics of the Database Approach advantages of usine DBMS. Data Models, Schemas & Instances. DBMS Architecture & Data Independence. System Architecture for DBMS and Data Dictionary, Database Users Data Base languages & Interfaces. Data Modeling using the Entity-Relationship Model -Entity types, Entity Sets, Attributes and Keys, Relationship, Relationship Types, Week Entity Types, Structural Constraints, Enhanced ER Model- Specialization Generalization, Constraints on Specialization Generalization - Data flow diagram.

DATA Data refers to raw facts Data may be numerical data which may be integers or floating point numbers, and non-numerical data such as characters, date and etc.,

Example 98 89 87 92 phy chem maths biology The above numbers may be anything: It may be distance in kms or amount in rupees or no of days or marks in each subject etc., Prepared by : M.Nirmala / AP / MCA

Information Processed Data is referred as Information. It delivers some meanings to the user 98 89 87 92 phy chem maths biology Prepared by : M.Nirmala / AP / MCA

Database Database is an organized, collection of related information Eg. telephone directory, dictionary, and college database. 98 89 87 92 phy chem maths biology 76 87 79 88 phy chem maths biology 91 67 87 77 phy chem maths biology 86 80 79 88 phy chem maths biology Prepared by : M.Nirmala / AP / MCA

Database Management Systems A database management system or DBMS is software designed to store, modify, maintaining and utilizing large collection of data . A database-management system (DBMS) is a collection of interrelated data and a set of programs to access those data   Database Applications: Banking: all transactions Airlines: reservations, schedules Universities : registration, grades Sales: customers, products, purchases Online retailers: order tracking, customized recommendations Manufacturing: production, inventory, orders, supply chain Human resources : employee records, salaries, tax deductions Databases touch all aspects of our lives Prepared by : M.Nirmala / AP / MCA

Why do u need a DBMS? A Database Management System Is an Extension of Human Logic Computers Can Quickly Answer Lots of Questions Some Questions Can Be Really Complicated  We Are Easily Overwhelmed With Information Automation Is the Key to Efficiency A DBMS Is Better Than Manual Processes in so Many Ways

DATABASE APPLICATIONS

ENTERPRISE INFORMATION Sales : For customer, product, and purchase information. Accounting : For payments, receipts, account balances, assets and other accounting information. Human resources : For information about employees, salaries, payroll taxes, and benefits, and for generation of paychecks. Manufacturing : For management of the supply chain and for tracking production of items in factories, inventories of items inwarehouses and stores, and orders for items . Online retailers : For sales data noted above plus online order tracking, generation of recommendation lists, and maintenance of online product evaluations

Banking and Finance ◦ Banking : For customer information, accounts, loans, and banking transactions. ◦ Credit card transactions : For purchases on credit cards and generation of monthly statements. ◦ Finance : For storing information about holdings, sales, and purchases of financial instruments such as stocks and bonds; also for storing real-time market data to enable online trading by customers and automated trading by the firm.

Universities For student information, course registrations, and grades ( in addition to standard enterprise information such as human resources and accounting ).

Airlines : For reservations and schedule information. Airlines were among the first to use databases in a geographically distributed manner. Telecommunication : For keeping records of calls made, generating monthly bills , maintaining balances on prepaid calling cards, and storing information about the communication networks.

Database Systems Vs File Systems FILE BASED DATA MANAGEMENT  Before advent of database , file system—manual file system was used 1) Collection of file, folders each tagged and kept in a filing cabinet 2) When data was relatively small—few reporting requirements—manual system was ok 3) Requirements grew—manual system complex (files, folders collection growing—time consuming & cumbersome Prepared by : M.Nirmala / AP / MCA

The conventional file system was much faster and efficient than manual counterparts but not flexibly than modern DBMS.   DISADVANTAGE OF FILE BASE SYSTEM Need for programming and skilled personnel Need different application programs to extract records to add records etc For Each query an application program has to be written For different reports to be generate, different programs to be written Lack of data security Password security alone is maintained and it is not sufficient when different users have access permissions to different subsets of data To safeguard confidentiality was difficult To lock part of files or part of the system not possible Prepared by : M.Nirmala / AP / MCA

Information compartmentalization Promote data ownership and thus promoting the storage of same data in different location Professional says as information islands or information compartmentalization, data at different locations are to be consistently updated , islands of information often contains different versions of data Structural Dependency Change in file structure/addition/deletion of a field requires modification of all programs using that files It exhibits structural dependency Changing from integer to decimal will make change in program Prepared by : M.Nirmala / AP / MCA

Data Dependency File data characteristic change Normal representation is ASCII Specify opening of a specific file type either random or sequential How to store and how to retrieve is to be clearly defined Data Redundancy Employee details stored in personnel dept ,account department, production planning department etc   Data Inconsistency Data stored at different location / departments and modification / deletion / addition can create data inconsistencies Eg:-employee leaves organization should be properly updated in all departments Prepared by : M.Nirmala / AP / MCA

Concurrent access by multiple users Concurrent accessed needed for performance Uncontrolled concurrent accesses can lead to inconsistencies Example: Two people reading a balance and updating it at the same time Security problems Hard to provide user access to some, but not all, data Prepared by : M.Nirmala / AP / MCA

ADVANTAGE OF DBMS Data independence Application programs should be independent as possible from details of data representation & storage . Efficient data access Data can be effectively retrieved when data is stored on external storage device Data integrity & Security DBMS can enforce integrity constraints on data DBMS can check, before inserting salary for an employee the dept budget not exceeded. Security restrictions can be applied Access controls is enforced What data is visible to different classes of users Data administration When several users share data centralizing the administration of data can offer significant improvements Prepared by : M.Nirmala / AP / MCA

Concurrent access and crash recovery Concurrent access to data, in such a manner that users can think of the data being accessed by only one user at a time DBMS protects users from effects of system failures Reduced application development time It supports many important functions that are common to much application accessing data stored in the DBMS Describing and storing data in a DBMS A data model is a collection of high level data description that hide many low level storage details. Conflicting requirements can be balanced Knowing the overall requirements helps database designers in creating a database design that is best for the organization Prepared by : M.Nirmala / AP / MCA

View of Data / Database Architecture

DBMS Architecture The DBMS architecture describes how data in the database is viewed by the users. Database abstraction also hides the implementation details of the data from the user The major purpose of a database system is to provide users with an abstract view of the data i.e. The system hides certain details of how the data are stored and maintained. Prepared by : M.Nirmala / AP / MCA

Example to Understand Abstraction User Level A layman is walking on the street in front of your college. When he looks at your college what all does he knows about the college ! He knows the name of the college then he might know that it is a Science or Arts college, it is run by so a so society etc.

Logical Level You are a student of the college You know more about the person on the street You know the number of courses held in the college, most of the teachers in the college, the course fees of the course which you are learning, the different buildings of the college etc .

Physical Level Now, consider the administrative staff of the college . These people know even more information about the college than a student of the college like how courses are conducted, how teachers are appointed, what is the fee structure of the courses what new courses are to be started etc. Now , these 3 people are looking at the same thing (i.e. the college) from 3 different levels of abstraction.

3 Levels of Abstraction The architecture is defined at 3 levels. This architecture is proposed by ANSI/SPARC (American National Standards Institute/Standards Planning and Requirements Committee) and hence, is also known as ANSI/SPARC architecture Physical level / Internal Level Logical level / Conceptual View or external level Prepared by : M.Nirmala / AP / MCA

DBMS Architecture Diagram Prepared by : M.Nirmala / AP / MCA

Physical level Lowest level of abstraction. It describes how the data are actually stored. It describes data structures in detail. At this level achieve various aspects are considered for optimal runtime performance and storage space utilization. These aspects include storage space allocation techniques for data and indexes, access paths such as indexes, data compression and encryption techniques, and record placement

Logical level   Next level of abstraction. It describes what data are stored in database and what relationship exists between those data. Defines all database entities, their attributes, and their relationships Security and integrity information

View or external level It describes only some part of entire database. It access only database and not concerned about physical storage location and data structures.   Prepared by : M.Nirmala / AP / MCA

Student information it contains   studentid stuname stuage stusex stugrade   Physical level: Student information are stored as block of consecutive storage locations. 79 memory locations are needed . Logical level: Char stuid [25], stuname [50], stusex, and stugrade; int age; External level: It uses set of application program to retrive data from database Select * from student where stuid=’1001 ’; Prepared by : M.Nirmala / AP / MCA

Instances and Schemas Similar to types and variables in programming languages Schema –Overall design of a database is called as schema Example: The database consists of information about a set of customers and accounts and the relationship between them) Analogous to type information of a variable in a program Physical schema: database design at the physical level Logical schema: database design at the logical level Instance Collection of information stored in the database at a particular moment is called an instance of a database.   Analogous to the value of a variable Physical Data Independence – the ability to modify the physical schema without changing the logical schema Applications depend on the logical schema In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others Prepared by : M.Nirmala / AP / MCA

Eg of instance & schema : In a program, various variable declarations are referred as schema. Every variable has a particular value at a given instance of database schema Database schemas 3 types of schemas : 1. Conceptual schema. 2. Physical schema/ internal. 3. Logical schema/external. Prepared by : M.Nirmala / AP / MCA

Database Schemas External schema 1 External schema 2 External schema 3 Conceptual schema Physical schema disk Prepared by : M.Nirmala / AP / MCA

Instances and Schemas DB Schema Diagram for a Company: Employee : Eno Ename Salary Address Prepared by : M.Nirmala / AP / MCA

Instances and Schemas DB Schema Diagram for a Company: Department: Dno Dname Dlocation Project: Pno Pname Hours Prepared by : M.Nirmala / AP / MCA

Instances and Schemas Instance Example: Eno Ename Salary Address 1 2 3 A B C 10,000 20,000 30,000 First street Second street Third street Prepared by : M.Nirmala / AP / MCA

CONCEPTUAL SCHEMA It describes all relations that are stored in the database Students(sid :strings,name :strings,age:integer) Faculty(fid :string,fname:string,sal:real) Courses( cid:string,cname:string Rooms(rno:integer,address:string,capacity:integer) Enrolled(sid:string,cid:string,grade:string) Teachers(fid:string,cid:string) Meets-in( cid:string,rno:integer,time:string ) Prepared by : M.Nirmala / AP / MCA

Physical Schema It specifies additional storage details. The relations described in conceptual schema are actually stored in a secondary storage devices such as disks, tapes etc. What file org is used to store the relations Prepared by : M.Nirmala / AP / MCA

External Schema It allows data access to be customized at the level of individual users. Only one conceptual schema and one physical schema but may have several external schemas each tailored to particular group of users. A view is conceptually a relation but the records in a view are not stored in DBMS. Suited for end user requirement. Prepared by : M.Nirmala / AP / MCA

Mapping Whenever a user specifies a request to generate a new external view, the DBMS must transform the request specified at external level into a request at conceptual level, and then into a request at physical level. This process of transforming the requests and results between various levels of DBMS architecture is known as mapping . Prepared by : M.Nirmala / AP / MCA

Data Independence The main advantage of three-schema architecture is that it provides data independence. Data independence is the ability to change the schema at one level of the database system without having to change the schema at the other levels. Data independence is of two types, namely, logical data independence and physical data independence. Prepared by : M.Nirmala / AP / MCA

Three Levels of online Book Database Prepared by : M.Nirmala / AP / MCA

Logical Data Independence It is the ability to change the conceptual schema without affecting the external schemas or application programs. The conceptual schema may be changed due to change in constraints or addition of new data item or removal of existing data item, etc., from the database. The separation of the external level from the conceptual level enables the users to make changes at the conceptual level without affecting the external level or the application programs. For example, if a new data item, say Edition is added to the BOOK file, the two views (view 1 and view 2 shown in the above Figure) are not affected. Prepared by : M.Nirmala / AP / MCA

Physical Data Independence Physical data independence : It is the ability to change the internal schema without affecting the conceptual or external schema. An internal schema may be changed due to several reasons such as for creating additional access structure, changing the storage structure, etc. The separation of internal schema from the conceptual schema facilitates physical data independence. Prepared by : M.Nirmala / AP / MCA

Logical data independence is more difficult to achieve than the physical data independence because the application programs are always dependent on the logical structure of the database. Therefore, the change in the logical structure of the database may require change in the application programs. Prepared by : M.Nirmala / AP / MCA

Data Models A data model is an organizing principle that specifies particular mechanisms for data storage and retrieval The primary difference between the various database models lie in the method of expressing relationships and constraints among data elements Prepared by : M.Nirmala / AP / MCA

RELATIONAL MODEL Relational Databases store data in relations i.e. tables. Each relation must have a name . Relation can be otherwise called as object or Table

Tuple/Row/Record A single entry in a table is called a  Tuple  or  Record  or  Row . A   tuple  in a table represents a set of related data. For example, the above  Employee  table has 4 tuples/records/rows. A record is a set of related data in a table.

ATTRIBUTES A table consists of several records(row), each record can be broken down into several smaller parts of data known as  Attributes An attribute is a named column of a relation. It stores a specific information about an object / Table e.g. salary . Name the Attributes The above  Employee  table consist of five attributes ,  ID ,   ENAME, SALARY, BONUS, DEPT Attribute Domain : Attribute is defined to hold only a certain type of values, which is known as  Attribute Domain . The attribute  Name  will hold the name of employee for every tuple. If we save employee's address there, it will be violation of the Relational database model.

ATTRIBUTES

CARDINALITY OF THE RELATION

DEGREE OF THE RELATION

NULL ATTRIBUTE The attribute value that is currently unknown is called Null Attribute

Attribute Domain Attribute is defined to hold only a certain type of values, which is known as  Attribute Domain .

Data Integrity and Constraints / Keys Integrity constraints  are a set of rules.  Integrity constraints  ensure that the  data  insertion, updating, and other processes have to be performed in such a way that  data integrity  is not affected. Thus ,  integrity constraint  is used to guard against accidental damage to the database. Data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle. Database Systems ensure data integrity through constraints which are used to restrict data that can be entered or modified in the database. Database Systems offer three types of integrity constraints:

Types of Integrity

Integrity Types Definition Enforced Through Entity Integrity Each table must have a column or a set of columns through which we can uniquely identify a row. These column(s) cannot have empty (null) values. PRIMARY KEY Domain Integrity All attributes in a table must have a defined domain i.e. a finite set of values which have to be used. When we assign a data type to a column we limit the values that it can contain. In addition we can also have value restriction as per business rules e.g. Gender must be M or F. DATA TYPES, CHECK CONSTRAINT Referential Integrity Every value of a column in a table must exist as a value of another column in a different (or the same) table. FOREIGN KEY

Keys Super Key Candidate Key Primary Key

Super Key A super key is a set of one or more attributes (columns), which can uniquely identify a row in a table. Often   DBMS beginners  get confused between super key and  candidate key Let’s take an example to understand this: Table: Employee Emp_SSN Emp_Number Emp_Name 123456789 226227 Steve 999999321 227 Ajeet 999999322 228 Chaitanya 777778888 229 Robert

Super keys : All of the following sets of super key are able to uniquely identify a row of the employee table . {Emp_SSN} {Emp_Number} {Emp_SSN, Emp_Number} {Emp_SSN, Emp_Name} {Emp_SSN, Emp_Number, Emp_Name} {Emp_Number, Emp_Name }

1. {Emp_Id} – No redundant attribute 2 . {Emp_Number} – No redundant attributes 3 . {Emp_Id, Emp_Number} – No Redundant attribute. Either of those attributes can be a minimal super key as both of these columns have unique values. 4 . {Emp_Id, Emp_Name} – Redundant attribute Emp_Name 5 . {Emp_Id, Emp_Number, Emp_Name} – Redundant attributes. Emp_Id or Emp_Number alone are sufficient enough to uniquely identify a row of Employee table 6 . {Emp_Number, Emp_Name} – Redundant attribute Emp_Name.

 A  super key  with no redundant attribute is known as candidate key. Candidate keys are selected from the set of super keys, the only thing we take care while selecting candidate key is that the candidate key should not have any redundant attributes. That’s the reason they are also termed as minimal super key A  Candidate Key  is a minimal set of columns/attributes that can be used to uniquely identify a single tuple in a relation.

Lets select the candidate keys from the above set of super keys. The  candidate keys  we have selected are: {Emp_Id} {Emp_Number}

Primary key Mandatory Desired must uniquely identify a tuple should not change with time must not allow NULL values should have short size e.g. numeric data types Primary key  is the candidate key that is selected to uniquely identify a tuple in a relation. A  primary key  is a minimal set of attributes (columns) in a table that uniquely identifies tuples (rows) in that table. A primary key  is selected from the set of candidate keys. That means we can either have Emp_Id or Emp_Number as primary key. The decision is made by DBA (Database administrator)

Primary Key Example in DBMS Lets take an example to understand the concept of primary key. In the following table, there are three attributes: Stu_ID, Stu_Name & Stu_Age. Out of these three attributes, one attribute or a set of more than one attributes can be a primary key. Attribute Stu_Name alone cannot be a primary key as more than one students can have same name. Attribute Stu_Age alone cannot be a primary key as more than one students can have same age . Attribute Stu_Id alone is a primary key as each student has a unique id that can identify the student record in the table.

We denote usually denote it by underlining the attribute name (column name). The value of primary key should be unique for each row of the table. The column(s) that makes the key cannot contain duplicate values. The attribute(s) that is marked as primary key is not allowed to have null values. Primary keys are not necessarily to be a single attribute (column). It can be a set of more than one attributes (columns). For example {Stu_Id, Stu_Name} collectively can identify the tuple in the above table, but we do not choose it as primary key because Stu_Id alone is enough to uniquely identifies rows in a table and we always go for minimal set. Having that said, we should choose more than one columns as primary key only when there is no single column that can uniquely identify the tuple in table. Key Points Primary key

Composite Primary Key – More than one Key Consider this table ORDER, this table keeps the daily record of the purchases made by the customer. This table has three attributes: Customer_ID, Product_ID & Order_Quantity. Customer_ID alone cannot be a primary key as a single customer can place more than one order thus more than one rows of same Customer_ID value. As we see in the following example that customer id 1011 has placed two orders with product if 9023 and 9111. Product_ID alone cannot be a primary key as more than one customers can place a order for the same product thus more than one rows with same product id. In the following table, customer id 1011 & 1122 placed an order for the same product (product id 9023).

Order_Quantity alone cannot be a primary key as more more than one customers can place the order for the same quantity. Since none of the attributes alone were able to become a primary key, lets try to make a set of attributes that plays the role of it. {Customer_ID, Product_ID} together can identify the rows uniquely in the table so this set is the primary key for this table.

Order Example

Order Customer_ID Product_ID Order_Quantity 1011 9023 10 1122 9023 15 1099 9031 20 1177 9031 18 1011 9111 50

Consider the Following Table StudentID StudentName Department DOB 1 Raja Computer Science & Engg 10-12-1990 2 Geetha Computer Science & Engg 12-12-1990 3 Dev Information Technology 5-7-1990 4 Preetha Information Technology 12-5-1998 5 Hari Computer Science & Engg 10-3-1998 6 Vani Computer Science & Engg 10-12-1990 7 Krishna Electrical&Electronics Engg 12-12-1990 8 Ajith Electrical&Electronics Engg 5-7-1990 9 Ram Information Technology 12-5-1998 10 Kumar Computer Science & Engg 10-3-1998 11 Sandhya Automobile Engg 10-12-1990 12 Uma Automobile Engg 12-12-1990

Deptname Gets Repeated more than once as more than one student belong to the same dept. Also the size of the Department name is also big Instead of the Department name with an Deptid can be represented as a separate table and the Deptid which is a primary key in the Dept table will be a Foreign key in the Student Table

Dept Table DeptID Department 1 Computer Science & Engg 2 Information Technology 3 Electrical & Electronics Engg 4 Automobile Engg 5 Mechanical Engg

StudentID StudentName DeptID DOB 1 Raja 1 10-12-1990 2 Geetha 1 12-12-1990 3 Dev 2 5-7-1990 4 Preetha 2 12-5-1998 5 Hari 1 10-3-1998 6 Vani 1 10-12-1990 7 Krishna 3 12-12-1990 8 Ajith 3 5-7-1990 9 Ram 2 12-5-1998 10 Kumar 1 10-3-1998 11 Sandhya 4 10-12-1990 12 Uma 4 12-12-1990 DeptID Department 1 Computer Science & Engg 2 Information Technology 3 Electrical & Electronics Engg 4 Automobile Engg 5 Mechanical Engg DeptID is Primary Key in the Dept Table is acting as a Foreign key in the Student Table Student Department

 Foreign keys are the columns of a table that points to the  primary key  of another table. They act as a cross-reference between tables . Note : Practically, the foreign key has nothing to do with the primary key tag of another table, if it points to a unique column (not necessarily a primary key) of another table then too, it would be a foreign key. So, a correct definition of foreign key would be: Foreign keys are the columns of a table that points to the  candidate key  of another table.

Properties of Foreign Key It states that if a foreign key exists in a relation then either the foreign key value must match a primary key value of some tuple in its home relation or the foreign key value must be null. The rules are: You can't delete a record from a primary table if matching records exist in a related table. You can't change a primary key value in the primary table if that record has related records. You can't enter a value in the foreign key field of the related table that doesn't exist in the primary key of the primary table. However, you can enter a Null value in the foreign key, specifying that the records are unrelated.

ENTITY RELATIONSHIP DIAGRAM Jack is part of database team and he needs to present the database design to business users. The business users are non-technical and it's difficult for them to read a verbose design document. Jack needs to use an Entity Relation (ER) Model. ER model  is a graphical representation of entities and their relationships which helps in understanding data independent of the actual database implementation.

The ER model consists of basic objects called entities and a relationship among these objects . Proposed by Peter Pin-Shan Chen (1976). Three basic notations in ER model are Entity sets Relationship sets Attributes Prepared by : M.Nirmala / AP / MCA

Entity Objects in the miniworld about which information has to be stored. E.g. persons, books. It must be possible to distinguish among entities An entity may be Concrete -- person /Book. Abstract -- loan/holiday/Concept Prepared by : M.Nirmala / AP / MCA

Entity set An entity set is set of entities of the same type that share the same properties/ Attributes. Eg: set of all persons who are customers at a given bank can be defined as entity set customer. Eg: set of all loans awarded by a bank is referred as entity set loan. Note: entity sets do not need to be disjoint. It is possible to define the entity set of all employees of a bank(employee) and the entity set of all customers of the bank (customer). A person entity may be an employee entity, a customer entity, both or neither Prepared by : M.Nirmala / AP / MCA

Entity set loan and Customer LOAN NO AMOUNT L17 1000 L23 3000 L15 1000 L17 1500 NAME SOCIAL SECURITYNO STREET Jones 321-12-3123 Main Smith 019-28-5611 North Hayes 912-51-1239 north Jack 335-57-1298 dupont Prepared by : M.Nirmala / AP / MCA

Attributes An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of entity set Possible attributes of the customer entity set are customer-name social security no street Attributes of loan entity loan no amount Prepared by : M.Nirmala / AP / MCA

Domain For each attributes there is a set of permitted values called domain or value set Domain of attribute customer name set of all text strings of certain length Domain of attribute loanno all positive integer in the form “L-n” eg L1, L2 Attributes of an entity set is a function that maps from the entity set in to a domain Mapping between entity and domain Entity is described by (attributes,datavalue)pairs {(name,jones ), ( social security,312-12-5123 )…. (( street,main)} Prepared by : M.Nirmala / AP / MCA

Attributes Types Simple and composite attributes Single valued and multi valued attributes Null attributes Derived attributes Simple attributes The attributes simple in its structure, which cannot be further sub divided are referred as simple attributes. Eg: sex,etc . Prepared by : M.Nirmala / AP / MCA

Hierarchy of Composite Attributes Prepared by : M.Nirmala / AP / MCA Composite attributes: The attribute may be composed of several components. It can be further sub divided Address(Apt#, House#, Street, City, State, ZipCode, Country) Name(FirstName, MiddleName, LastName).

Single values attributes Attributes have single value for particular entity eg: loan number attribute has a single loan number Multivalued attributes Attributes has a set of values for a specific entity consider an employee entity set dependent-name an employee can have zero, one or more dependents Eg : In bio data- references-names Educational qualifications sslc,hsc,ug,pg Prepared by : M.Nirmala / AP / MCA

Null attributes A null value is used when an entity does not have a value for an attributes Eg : One may have no middle name. Derived attribute The value for this type of attribute can be derived from the values of other related attributes/entities Eg 1: entity : customer Attributes : loans-held which represents how many no of loans a customer had from the bank. To derive the value this attribute, we can count the number of loans entities associated with that customer Eg 2 entity : employee Attributes : start-date; employment length Employment length – total length of time the employee has worked. Value of employment –length can be derived from the value of start-date & current-date. Start-date is referred as base attributes Prepared by : M.Nirmala / AP / MCA

Relationship sets Relationship is an association among several entities. Relationship set is the relationship of the same type . E.g. “X teaches course Y”. Prepared by : M.Nirmala / AP / MCA

Let us understand some key terms used in ER Modelling Term Definition Examples Entity Real world objects which have an independent existence and about which we intend to collect data. Employee, Computer Attribute A property that describes an entity. Name, Salary

Different Forms ID ENAME SALARY DEPT EMPLOYEE Customer Account cusname SSN Cus-city Cus-street A/cno balance Depositor

M ore than one relationship between entities There can be more than one relationship between entities, e.g. an Employee  works in  a Department while the head of department (also an employee)  manages  a Department.

Recursive Relationship If the same entity participates more than once in a relationship it is known as a recursive relationship. In the below example an employee can be a supervisor and be supervised, so there is a recursive relationship. Eg: A women can take up the role of a wife, mother, daughter-in-law etc. Prepared by : M.Nirmala / AP / MCA

specifying that each employee has zero or one bosses (assuming the CEO doesn‘t have any boss), and that each employee may supervise zero or more other employees would look like Prepared by : M.Nirmala / AP / MCA

Degree of Relationships The number of roles in the relationship Binary – links two entity sets; set of ordered pairs (most common) Ternary – links three entity sets; ordered triples (rare). If a relationship exists among the three entities, all three must be present Ternary relationship set involving the entity sets customer, loan, branch.. customer jones, loan – L-17 and the perryridge branch. N-ary – links n entity sets; ordered n-tuples (very rare). If a relationship exists among the entities, then all must be present. Cannot represesnt subsets. Prepared by : M.Nirmala / AP / MCA

E-R Diagram with a Ternary Relationship Prepared by : M.Nirmala / AP / MCA

One - one An entity in a is associated with at most one entity in b, and an entity in b is associated with at most one entity in A. Borrower in a particular bank A loan can belong to only customer. 1 Department one HOD Prepared by : M.Nirmala / AP / MCA

1-1 Prepared by : M.Nirmala / AP / MCA

Cardinality Constraints We express cardinality constraints by drawing either a directed line ( ), signifying “one,” or an undirected line (—), signifying “many,” between the relationship set and the entity set. E.g.: One-to-one relationship: A customer is associated with at most one loan via the relationship borrower A loan is associated with at most one customer via borrower Prepared by : M.Nirmala / AP / MCA

1-Many An entity is A is associated with any number of entities in B. an entity in B, however can be associated with almost one entity in A. a1 a2 a3 b1 b2 b3 b4 b5 Depositor 1.customer may have several accounts. Prepared by : M.Nirmala / AP / MCA

One-To-Many Relationship In the one-to-many relationship a loan is associated with at most one customer via borrower , a customer is associated with several (including 0) loans via borrower Prepared by : M.Nirmala / AP / MCA

Many - 1 An entity in A is associated with atmost one entity in B. AN entity in B, however can be associated with any number of entities in A. a1 a2 a4 a3 b1 b2 a5 b3 Prepared by : M.Nirmala / AP / MCA

Many-To-One Relationships In a many-to-one relationship a loan is associated with several (including 0) customers via borrower , a customer is associated with at most one loan via borrower Prepared by : M.Nirmala / AP / MCA

Many - Many An entity in A is associated with any number of entities in B, and an entity in B is associated with any no of entities in A. a1 a2 a4 a3 b1 b2 b4 b3 Loan can belong to several customers can loans can be taken jointly by several business partners Prepared by : M.Nirmala / AP / MCA

Many-To-Many Relationship A customer is associated with several (possibly 0) loans via borrower A loan is associated with several (possibly 0) customers via borrower Prepared by : M.Nirmala / AP / MCA

Prepared by : M.Nirmala / AP / MCA

Prepared by : M.Nirmala / AP / MCA

Prepared by : M.Nirmala / AP / MCA

Prepared by : M.Nirmala / AP / MCA

Prepared by : M.Nirmala / AP / MCA

E-R Diagram With Composite, Multivalued, and Derived Attributes Composite Attribute – name, address , street Multivalued Attribute = phone number depicted by double ellipse Derived attribute = age depicted by dashed ellipse Prepared by : M.Nirmala / AP / MCA

How to Create an Entity Relationship Diagram (ERD )

Scenario In a university, a Student enrolls in Courses. A student must be assigned to at least one or more Courses. Each course is taught by a single Professor. To maintain instruction quality, a Professor can deliver only one course

Step 1) Entity Identification Student Course Professor

Step 2) Relationship Identification Identify the association between the entities The student is  assigned  a course Professor  delivers  a course

Step 3) Cardinality Identification A student can be assigned  multiple  courses A Professor can deliver only  one  course

Step 4) Identify Attributes Entity Primary Key Attribute Student Student_ID StudentName Professor Employee_ID ProfessorName Course Course_ID CourseName

ER DIAGRAM “ Patients are treated in a single ward by the doctors assigned to them. Usually each patient will be assigned a single doctor , but in rare cases they will have two. Healthcare assistants also attend to the patients, a number of these are associated with each ward .

Step 1) Entity Identification Patients Doctor healthcareAssistants ward

Step 2) Relationship Identification Identify the association between the entities Patients are treated in ward Patient are assigned to doctor Healthcare assistants attends to patients

Step 3) Cardinality Identification Patients are treated in single ward (M:1) Patient are assigned to doctor(1:M) Healthcare assistants attends to patients (M:M)

DFD Examples

Why Data Flow Diagram A picture is worth a thousand words. A Data Flow Diagram (DFD) is a traditional way to visualize the information flows within a system. It shows how information enters the System How Information leaves the system what changes the information where information is stored.

Good DFDs are critical to the process Building DFDs == understanding the system Analyzing DFDs == understanding the threats

DFD Types of Notation Yourdon & Coad Gene & Sarson are the two main methods of notation used in DFDs

Elements of DFD

External Entity An external agent is a source or destination of data The sharp cornered rectangles(or simply boxes) in a DFD indicates entities. An external entity, which are also known as terminators, sources, sinks, or actors, are an outside system or process that sends or receives data to and from the diagrammed system

Entities are people things, organizations etc of data from the system

Process A process is a unit of work that operates on the data The rounded cornered rectangles or circles in a DFD indicate processes Process is a procedure that manipulates the data and its flow by taking incoming data, changing it, and producing an output with it

Examples of Process

Data Store Opened sided rectangles in DFD indicates data store. The Data Store symbol represents data that is not moving (delayed data at rest). A Data Store is a logical repository of data. Data can be written into the data store. This is depicted by an incoming arrow. Data can be read from a data store. This is depicted by an outgoing arrow.

Data store Two data stores cannot be connected by a data flow. External entity cannot read or write to the data store.

Examples of data store

Data flow Data flow is the path the system’s information takes from external entities through processes and data stores. With arrows and succinct labels, the DFD can show you the direction of the data flow. Arrow symbol in DFD indicate data flow The Data Flow symbol represents movement of data  

Examples of Data Flow

Rules of Data Flow Data can flow from External Entity to Process Process to External Entity Process to Data Store and back Process to Process

Rules of Data Flow Data cannot flow from External entity to External Entity External entity to store Data store to External Entity Data store to Data Store

Four rules of thumb to create a valid DFD. Each process should have at least one input and one output. Each data store should have at least one data flow in and data flow out. A system’s stored data must go through a process. All processes in a DFD must link to another process or data store.

Creating Data Flow Diagrams Steps: Create a list of activities Construct Context Level DFD (identifies external entities and processes) Construct Level 0 DFD (identifies manageable sub process ) Construct Level 1- n DFD (identifies actual data flows and data stores ) Check against rules of DFD

DFD Naming Guidelines External Entity  Noun Data Flow  Names of data Process  verb phrase a system name a subsystem name Data Store  Noun

Creating Data Flow Diagrams Lemonade Stand Example

Creating Data Flow Diagrams Steps: Create a list of activities New way: use Use-Case Diagram Construct Context Level DFD (identifies sources and sink) Construct Level 0 DFD (identifies manageable sub processes ) Construct Level 1- n DFD (identifies actual data flows and data stores ) Example The operations of a simple lemonade stand will be used to demonstrate the creation of dataflow diagrams.

Creating Data Flow Diagrams Create a list of activities Example Think through the activities that take place at a lemonade stand. Customer Order Produce Product Serve Product Collect Payment Store Product

Creating Data Flow Diagrams Example Also think of the additional activities needed to support the basic activities. Customer Order Produce Product Serve Product Collect Payment Store Product Order Raw Materials Pay for Raw Materials Pay for Labor Create a list of activities

Creating Data Flow Diagrams Example Group these activities in some logical fashion, possibly functional areas. Customer Order Serve Product Collect Payment Produce Product Store Product Order Raw Materials Pay for Raw Materials Pay for Labor Create a list of activities

Process Customer Order Sale Serve Product Collect Payment Produce Product (product preparation) Production Store Product Order Raw Materials Procurement Pay for Raw Materials Pay for Labor payroll

Context Level DFD Basic form of DFD Called as Level 0 Shows how the system works at a Glance There is  only one process(Main Process)  in the system and all the data flows either into or out of this process. 

Context level DFD’s demonstrates the interactions between the process and external entities. They  do not  contain Data Stores.

Creation of Context Level DFD Identify the main system (Lemonade) Identify the external people who interact with the system Customer Employee (Seller) Vendor (for purchasing Raw Materials)

Decide what data these entities will enter into the system Determine what these entities expect as output from the system

Creating Data Flow Diagrams Context Level DFD Example Create a context level diagram identifying the sources and sinks (users). Customer Order Serve Product Collect Payment Produce Product Store Product Order Raw Materials Pay for Raw Materials Pay for Labor Construct Context Level DFD (identifies sources and sink) 0.0 Lemonade System EMPLOYEE CUSTOMER Pay Payment Order VENDOR Payment Purchase Order Produce Product Received Goods Time Worked Sales Forecast Product Served

Creating Data Flow Diagrams Level 0 DFD Example Create a level 0 diagram identifying the logical subsystems that may exist. Customer Order Serve Product Collect Payment Produce Product Store Product Order Raw Materials Pay for Raw Materials Pay for Labor Construct Level 0 DFD (identifies manageable sub processes ) 2.0 Production EMPLOYEE Produce Product 1.0 Sale 3.0 Procure-ment Sales Forecast Product Ordered CUSTOMER Pay Payment Customer Order VENDOR Payment Purchase Order Order Decisions Received Goods Time Worked Inventory Product Served 4.0 Payroll

Create a level 1 decomposing the processes in level 0 and identifying data stores In the Level 0 with respect to 1.0 Sale

Main Process Sub Process Data Stores Sale Customer Order Customer , Order Payment Order, Payment Sales forecast Payment

Creating Data Flow Diagrams Level 1 DFD Example Create a level 1 decomposing the processes in level 0 and identifying data stores. Construct Level 1- n DFD (identifies actual data flows and data stores ) 1.3 Produce Sales Forecast Sales Forecast Payment Customer Order Serve Product Collect Payment Produce Product Store Product Order Raw Materials Pay for Raw Materials Pay for Labor 1.1 Record Order Customer Order ORDER 1.2 Receive Payment PAYMENT Severed Order Request for Forecast CUSTOMER

Creating Data Flow Diagrams Level 1 DFD Example Create a level 1 decomposing the processes in level 0 and identifying data stores. Construct Level 1 (continued) Customer Order Serve Product Collect Payment Produce Product Store Product Order Raw Materials Pay for Raw Materials Pay for Labor 2.1 Serve Product Product Order ORDER 2.2 Produce Product INVENTORTY Quantity Severed Production Schedule RAW MATERIALS 2.3 Store Product Quantity Produced & Location Stored Quantity Used Production Data

Creating Data Flow Diagrams Level 1 DFD Example Create a level 1 decomposing the processes in level 0 and identifying data stores. Construct Level 1 (continued) Customer Order Serve Product Collect Payment Produce Product Store Product Order Raw Materials Pay for Raw Materials Pay for Labor 3.1 Produce Purchase Order Order Decision PURCHASE ORDER 3.2 Receive Items Received Goods RAW MATERIALS 3.3 Pay Vendor Quantity Received Quantity On-Hand RECEIVED ITEMS VENDOR Payment Approval Payment

Creating Data Flow Diagrams Level 1 DFD Example Create a level 1 decomposing the processes in level 0 and identifying data stores. Construct Level 1 (continued) Time Worked Customer Order Serve Product Collect Payment Produce Product Store Product Order Raw Materials Pay for Raw Materials Pay for Labor 4.1 Record Time Worked TIME CARDS 4.2 Calculate Payroll Payroll Request EMPLOYEE 4.3 Pay Employee Employee ID PAYROLL PAYMENTS Payment Approval Payment Unpaid time cards

Process Decomposition 4.1 Record Time Worked 4.2 Calculate Payroll 4.3 Pay Employee 3.1 Produce Purchase Order 3.2 Receive Items 3.3 Pay Vendor 2.1 Serve Product 2.2 Produce Product 2.3 Store Product 1.1 Record Order 1.2 Receive Payment 2.0 Production 1.0 Sale 3.0 Procure-ment 4.0 Payroll 0.0 Lemonade System Level 0 Level 1 Context Level

Let us try to identify some candidate keys for this relation Key Rationale EmployeeNo This seems to be a good candidate key as companies usually issue a unique number for each employee. AadharNo This seems to be a good candidate key for a company based in India. However we have assumed that every employee has an Aadhar number which may not be true. In addition for a multinational firm with employees across the globe this will not work at all. Name, DateOfBirth This might work for a small firm of 10 people as a combination of Name and Date of Birth is likely to be unique. Salary This is not a good candidate as salary is generally paid uniformly to people at same level. EmployeeNo, DateOfBirth It is not a candidate key as EmployeeNo alone is unique. By definition only minimal set of attributes can be candidate key. Thus the choice of candidate key depends upon the business context.

Key Rationale EmployeeNo Good candidate as it is numeric, cannot be null and does not change with time. AadharNo It will be null for people who do not have Aadhar number. Hence it cannot be considered as primary key. Name, DateOfBirth Both Name and DateOfBirth cannot be null. However even if uniqueness is guaranteed, it is not a good choice due to large size. When two or more columns together identify the unique row then it's referred to as  Composite Primary Key.  The combination of Name and DateOfBirth if selected as a primary key would be a composite primary key.

Database Users and Administrators  Primary goal of the database system is to retrieve information and store new information in the database 2 Types of Users Database Users Database Administrators

Database Users 4 Types of Database Users Naive users Application Programmers Sophisticated Users Specialized Users

Naïve Users They are unsophisticated users who interact with the system by invoking one of the application program that have been written previously. Simple User, who operates the applications Does not write any code

Application Programmers They are computer professionals who write application programs. Application programmers can choose from many tools to develop user interfaces.

Sophisticated Users They interact with the system without writing programs. Instead they write their requests in a database query language

Specialized Users They are sophisticated users who write specialized database applications that do not fit in to the traditional data-processing frame work.

Database Administrator DBA is the person or group that is responsible for supervising both the data base and the use of the DBMS Roles of DBA Policy Formulation and Implementation Access privileges Security Disaster planning Archives Data Dictionary Management Training DBMS Support DBMS evaluation and selection DBMS responsibility Data Base Design

DBA formulates policies and communicates these policies to users. Among these policies are those covering access privilege, security, disaster planning, archives.

AccessPrivilege Access to every table and field in the database is not a necessity for every user. User A Access denide Access permitted

Security Security refers to the prevention of unauthorized access to the database Once access privilege have been specified and security features are in place, DBA draws up policies to explain the security privileges and then distributes these policies to authorized users.

Planning for Disaster A database may be harmed by some physical problems or by some natural disasters. For this the DBA’s responsibility is to establish and implement backup and recovery procedures.

Archives This tells that data need to be kept in the database for only a limited time. A data archive is a place where a record of certain corporate data is kept Data Dictionary Management (Data About Data) Data dictionary contains a wider range of information, including at the very least, information on tables, indexes, and programs. The creation and distribution of appropriate reports from the data dictionary is another of DBA”s responsibility.

UNIT I INTRODUCTION

Data Base and Database Users A database is a collection of related data. By data, we mean known facts that can be recorded and that have implicit meaning. For example, consider the names, telephone numbers, and addresses of the people you know. Nowadays , this data is typically stored in mobile phones, which have their own simple database software. This data can also be recorded in an indexed address book or stored on a hard drive , using a personal computer and software such as Microsoft Access or Excel. This collection of related data with an implicit meaning is a database.

Properties A database represents some aspect of the real world, sometimes called the miniworld or the universe of discourse (UoD) . Changes to the miniworld are reflected in the database. A database is a logically coherent collection of data with some inherent meaning . A random assortment of data cannot correctly be referred to as a database . A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and some preconceived applications in which these users are interested.

A database management system (DBMS) is a computerized system that enables users to create and maintain a database. The DBMS is a general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications. Defining a database involves specifying the data types, structures, and constraints of the data to be stored in the database. The database definition or descriptive information is also stored by the DBMS in the form of a database catalog or dictionary; it is called meta-data .

Constructing the database is the process of storing the data on some storage medium that is controlled by the DBMS. Manipulating a database includes functions such as querying the database to retrieve specific data, updating the database to reflect changes in the mini world , and generating reports from the data. Sharing a database allows multiple users and programs to access the database simultaneously.

Database Environment

Database Example

Characteristics of the Database Approach A number of characteristics distinguish the database approach from the much older approach of writing customized programs to access data stored in files. In traditional file processing , each user defines and implements the files needed for a specific software application as part of programming the application. For example, one user, the grade reporting office, may keep files on students and their grades. Programs to print a student’s transcript and to enter new grades are implemented as part of the application. A second user, the accounting office , may keep track of students ’ fees and their payments. Although both users are interested in data about students , each user maintains separate files—and programs to manipulate these files—because each requires some data not available from the other user’s files.

The main characteristics of the database approach versus the file-processing approach are the following: Self-describing nature of a database system Insulation between programs and data, and data abstraction Support of multiple views of the data Sharing of data and multiuser transaction processing

Self-Describing Nature of a Database System A fundamental characteristic of the database approach is that the database system contains not only the database itself but also a complete definition or description of the database structure and constraints. This definition is stored in the DBMS catalog, which contains information such as the structure of each file, the type and storage format of each data item, and various constraints on the data. The information stored in the catalog is called meta-data , and it describes the structure of the primary database

It is important to note that some newer types of database systems , known as NOSQL systems, do not require meta-data. Rather the data is stored as self-describing data that includes the data item names and data values together in one structure The catalog is used by the DBMS software and also by database users who need information about the database structure. A general-purpose DBMS software package is not written for a specific database application. Therefore , it must refer to the catalog to know the structure of the files in a specific database, such as the type and format of data it will access.

Example

Insulation between Programs and Data, and Data Abstraction In traditional file processing, the structure of data files is embedded in the application programs , so any changes to the structure of a file may require changing all programs that access that file. By contrast, DBMS access programs do not require such changes in most cases. The structure of data files is stored in the DBMS catalog separately from the access programs. We call this property program-data independence .

An operation (also called a function or method ) is specified in two parts . The interface (or signature ) of an operation includes the operation name and the data types of its arguments (or parameters). The implementation (or method ) of the operation is specified separately and can be changed without affecting the interface. User application programs can operate on the data by invoking these operations through their names and arguments, regardless of how the operations are implemented . This may be termed program-operation independence .

Support of Multiple Views of the Data A database typically has many types of users, each of whom may require a different perspective or view of the database. A view may be a subset of the database or it may contain virtual data that is derived from the database files but is not explicitly stored. Some users may not need to be aware of whether the data they refer to is stored or derived . A multiuser DBMS whose users have a variety of distinct applications must provide facilities for defining multiple views.

Sharing of Data and Multiuser Transaction Processing A multiuser DBMS, as its name implies, must allow multiple users to access the database at the same time. This is essential if data for multiple applications is to be integrated and maintained in a single database. The DBMS must include concurrency control software to ensure that several users trying to update the same data do so in a controlled manner so that the result of the updates is correct.

Two views from Database

Advantages of Using the DBMS Approach Controlling Redundancy Restricting Unauthorized Access Providing Persistent Storage for Program Objects Providing Storage Structures and Search Techniques for Efficient Query Processing Providing Backup and Recovery Providing Multiple User Interfaces Representing Complex Relationships among Data Enforcing Integrity Constraints Permitting Inferencing and Actions Using Rules and Triggers Additional Implications of Using the Database Approach

Potential for Enforcing Standards. The database approach permits the DBA to define and enforce standards among database users in a large organization. This facilitates communication and cooperation among various departments, projects, and users within the organization. Standards can be defined for names and formats of data elements, display formats, report structures, terminology, and so on . Reduced Application Development Time. A prime selling feature of the database approach is that developing a new application—such as the retrieval of certain data from the database for printing a new report—takes very little time. Designing and implementing a large multiuser database from scratch may take more time than writing a single specialized file application .

Flexibility. It may be necessary to change the structure of a database as requirements change . For example, a new user group may emerge that needs information not currently in the database. In response, it may be necessary to add a file to the database or to extend the data elements in an existing file . Availability of Up-to-Date Information. A DBMS makes the database available to all users. As soon as one user’s update is applied to the database, all other users can immediately see this update. This availability of up-to-date information is essential for many transaction-processing applications, such as reservation systems or banking databases, and it is made possible by the concurrency control and recovery subsystems of a DBMS .

Economies of Scale. The DBMS approach permits consolidation of data and applications , thus reducing the amount of wasteful overlap between activities of data-processing personnel in different projects or departments as well as redundancies among applications. This enables the whole organization to invest in more powerful processors, storage devices, or networking gear, rather than having each department purchase its own (lower performance) equipment. This reduces overall costs of operation and management .`

Data Models, Schemas, and Instances A data model —a collection of concepts that can be used to describe the structure of a database—provides the necessary means to achieve this abstraction. By structure of a database we mean the data types, relationships, and constraints that apply to the data. Most data models also include a set of basic operations for specifying retrievals and updates on the database.

Categories of Data Models Many data models have been proposed, which we can categorize according to the types of concepts they use to describe the database structure. High-level or conceptual data models provide concepts that are close to the way many users perceive data , whereas low-level or physical data models provide concepts that describe the details of how data is stored on the computer storage media, typically magnetic disks . Concepts provided by physical data models are generally meant for computer specialists , not for end users. Between these two extremes is a class of representational (or implementation ) data models , which provide concepts that may be easily understood by end users but that are not too far removed from the way data is organized in computer storage.

Conceptual data models use concepts such as entities, attributes, and relationships. An entity represents a real-world object or concept, such as an employee or a project from the mini-world that is described in the database. An attribute represents some property of interest that further describes an entity, such as the employee’s name or salary . A relationship among two or more entities represents an association among the entities, for example, a works-on relationship between an employee and a project .

Representational or implementation data models are the models used most frequently in traditional commercial DBMSs. These include the widely used relational data model , as well as the so-called legacy data models—the network and hierarchical models —that have been widely used in the past. Part 3 of the text is devoted to the relational data model, and its constraints, operations, and languages . Representational data models represent data by using record structures and hence are sometimes called record-based data model

Schema diagram for the database The description of a database is called the database schema , which is specified during database design and is not expected to change frequently . Most data models have certain conventions for displaying schemas as diagrams . A displayed schema is called a schema diagram

Architecture and Data Independence

The internal level has an internal schema , which describes the physical storage structure of the database. The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database . The conceptual level has a conceptual schema , which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints. Usually , a representational data model is used to describe the conceptual schema when a database system is implemented. This implementation conceptual schema is often based on a conceptual schema design in a high-level data model.

The external or view level includes a number of external schemas or user views . Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group. As in the previous level, each external schema is typically implemented using a representational data model, possibly based on an external schema design in a high-level conceptual data model.

The three-schema architecture is a convenient tool with which the user can visualize the schema levels in a database system. Most DBMSs do not separate the three levels completely and explicitly, but they support the three-schema architecture to some extent. Some older DBMSs may include physical-level details in the conceptual schema . The three-level ANSI architecture has an important place in database technology development because it clearly separates the users’ external level, the database’s conceptual level, and the internal storage level for designing a database. It is very much applicable in the design of DBMSs, even today. In most DBMSs that support user views, external schemas are specified in the same data model that describes the conceptual-level information

Data Independence Logical data independence is the capacity to change the conceptual schema without having to change external schemas or application programs. We may change the conceptual schema to expand the database (by adding a record type or data item), to change constraints, or to reduce the database (by removing a record type or data item). In the last case, external schemas that refer only to the remaining data should not be affected. Only the view definition and the mappings need to be changed in a DBMS that supports logical data independence. After the conceptual schema undergoes a logical reorganization, application programs that reference the external schema constructs must work as before. Changes to constraints can be applied to the conceptual schema without affecting the external schemas or application programs.

Physical data independence is the capacity to change the internal schema without having to change the conceptual schema. Hence , the external schemas need not be changed as well. Changes to the internal schema may be needed because some physical files were reorganized—for example, by creating additional access structures—to improve the performance of retrieval or update. If the same data as before remains in the database, we should not have to change the conceptual schema.

Database Languages and Interfaces Once the design of a database is completed and a DBMS is chosen to implement the database , the first step is to specify conceptual and internal schemas for the database and any mappings between the two. In many DBMSs where no strict separation of levels is maintained, one language, called the data definition language ( DDL ), is used by the DBA and by database designers to define both schemas. The DBMS will have a DDL compiler whose function is to process DDL statements in order to identify descriptions of the schema constructs and to store the schema description in the DBMS catalog.

In DBMSs where a clear separation is maintained between the conceptual and internal levels, the DDL is used to specify the conceptual schema only. Another language , the storage definition language ( SDL ), is used to specify the internal schema . The mappings between the two schemas may be specified in either one of these languages. In most relational DBMSs today, there is no specific language that performs the role of SDL. Instead , the internal schema is specified by a combination of functions, parameters, and specifications related to storage of files. These permit the DBA staff to control indexing choices and mapping of data to storage. For a true three-schema architecture, we would need a third language, the view definition language ( VDL ), to specify user views and their mappings to the conceptual schema , but in most DBMSs the DDL is used to define both conceptual and external schemas .

There are two main types of DMLs. A high-level or nonprocedural DML can be used on its own to specify complex database operations concisely. Many DBMSs allow high-level DML statements either to be entered interactively from a display monitor or terminal or to be embedded in a general-purpose programming language . A low-level or procedural DML must be embedded in a general-purpose programming language . This type of DML typically retrieves individual records or objects from the database and processes each separately. Therefore , it needs to use programming language constructs, such as looping, to retrieve and process each record from a set of records.

Low-level DMLs are also called record-at-a-time DMLs because of this property. High-level DMLs, such as SQL, can specify and retrieve many records in a single DML statement; therefore, they are called set-at-a-time or set-oriented DMLs. A query in a high-level DML often specifies which data to retrieve rather than how to retrieve it; therefore, such languages are also called declarative . Whenever DML commands, whether high level or low level, are embedded in a general-purpose programming language, that language is called the host language and the DML is called the data sublanguage. On the other hand, a high-level DML used in a standalone interactive manner is called a query language.

DBMS Interfaces Menu-based Interfaces for Web Clients or Browsing. These interfaces present the user with lists of options (called menus) that lead the user through the formulation of a request. Menus do away with the need to memorize the specific commands and syntax of a query language; rather, the query is composed step-by-step by picking options from a menu that is displayed by the system. Pull-down menus are a very popular technique in Web-based user interfaces . They are also often used in browsing interfaces , which allow a user to look through the contents of a database in an exploratory and unstructured manner.

Apps for Mobile Devices. These interfaces present mobile users with access to their data. For example, banking, reservations, and insurance companies, among many others, provide apps that allow users to access their data through a mobile phone or mobile device. The apps have built-in programmed interfaces that typically allow users to login using their account name and password; the apps then provide a limited menu of options for mobile access to the user data, as well as options such as paying bills (for banks) or making reservations (for reservation Web sites).

Forms-based Interfaces. A forms-based interface displays a form to each user. Users can fill out all of the form entries to insert new data, or they can fill out only certain entries, in which case the DBMS will retrieve matching data for the remaining entries . Forms are usually designed and programmed for naive users as interfaces to canned transactions. Many DBMSs have forms specification languages , which are special languages that help programmers specify such forms .

SQL*Forms is a form-based language that specifies queries using a form designed in conjunction with the relational database schema. Oracle Forms is a component of the Oracle product suite that provides an extensive set of features to design and build applications using forms. Some systems have utilities that define a form by letting the end user interactively construct a sample form on the screen . Graphical User Interfaces. A GUI typically displays a schema to the user in diagrammatic form . The user then can specify a query by manipulating the diagram. In many cases, GUIs utilize both menus and forms.

Natural Language Interfaces These interfaces accept requests written in English or some other language and attempt to understand them. A natural language interface usually has its own schema, which is similar to the database conceptual schema , as well as a dictionary of important words. The natural language interface refers to the words in its schema, as well as to the set of standard words in its dictionary, that are used to interpret the request. If the interpretation is successful, the interface generates a high-level query corresponding to the natural language request and submits it to the DBMS for processing; otherwise, a dialogue is started with the user to clarify the request.

Keyword-based Database Search . These are somewhat similar to Web search engines , which accept strings of natural language (like English or Spanish) words and match them with documents at specific sites (for local search engines) or Web pages on the Web at large (for engines like Google or Ask). They use predefined indexes on words and use ranking functions to retrieve and present resulting documents in a decreasing degree of match. Such “free form” textual query interfaces are not yet common in structured relational databases, although a research area called keyword-based querying has emerged recently for relational databases.

Speech Input and Output . Limited use of speech as an input query and speech as an answer to a question or result of a request is becoming commonplace. Applications with limited vocabularies, such as inquiries for telephone directory, flight arrival/departure , and credit card account information, are allowing speech for input and output to enable customers to access this information. The speech input is detected using a library of predefined words and used to set up the parameters that are supplied to the queries. For output, a similar conversion from text or numbers into speech takes place.

Interfaces for Parametric Users . Parametric users, such as bank tellers, often have a small set of operations that they must perform repeatedly. For example, a teller is able to use single function keys to invoke routine and repetitive transactions such as account deposits or withdrawals, or balance inquiries. Systems analysts and programmers design and implement a special interface for each known class of naive users. Usually a small set of abbreviated commands is included, with the goal of minimizing the number of keystrokes required for each request. Interfaces for the DBA. Most database systems contain privileged commands that can be used only by the DBA staff. These include commands for creating accounts , setting system parameters, granting account authorization, changing a schema , and reorganizing the storage structures of a database.

Data Modeling Using the Entity– Relationship (ER) Model

The first step shown is requirements collection and analysis . During this step, the database designers interview prospective database users to understand and document their data requirements . The result of this step is a concisely written set of users’ requirements. These requirements should be specified in as detailed and complete a form as possible. In parallel with specifying the data requirements, it is useful to specify the known functional requirements of the application. These consist of the user defined operations (or transactions) that will be applied to the database, including both retrievals and updates. In software design, it is common to use data flow diagrams, sequence diagrams, scenarios, and other techniques to specify functional requirements .

Once the requirements have been collected and analyzed, the next step is to create a conceptual schema for the database, using a high-level conceptual data model. This step is called conceptual design . The conceptual schema is a concise description of the data requirements of the users and includes detailed descriptions of the entity types , relationships, and constraints; these are expressed using the concepts provided by the high-level data model. Because these concepts do not include implementation details , they are usually easier to understand and can be used to communicate with nontechnical users. The high-level conceptual schema can also be used as a reference to ensure that all users’ data requirements are met and that the requirements do not conflict.

This approach enables database designers to concentrate on specifying the properties of the data, without being concerned with storage and implementation details, which makes it is easier to create a good conceptual database design . During or after the conceptual schema design, the basic data model operations can be used to specify the high-level user queries and operations identified during functional analysis. This also serves to confirm that the conceptual schema meets all the identified functional requirements. Modifications to the conceptual schema can be introduced if some functional requirements cannot be specified using the initial schema.

The next step in database design is the actual implementation of the database, using a commercial DBMS. Most current commercial DBMSs use an implementation data model—such as the relational (SQL) model—so the conceptual schema is transformed from the high-level data model into the implementation data model. This step is called logical design or data model mapping ; its result is a database schema in the implementation data model of the DBMS.

Data model mapping is often automated or semi-automated within the database design tools. The last step is the physical design phase, during which the internal storage structures, file organizations, indexes, access paths, and physical design parameters for the database files are specified. In parallel with these activities, application programs are designed and implemented as database transactions corresponding to the high-level transaction specifications .

Entity Types, Entity Sets, Attributes, and Keys

Two Entities

Entities and Attributes Entities and Their Attributes. The basic concept that the ER model represents is an entity , which is a thing or object in the real world with an independent existence. An entity may be an object with a physical existence (for example, a particular person, car , house, or employee) or it may be an object with a conceptual existence ( for instance , a company, a job, or a university course). Each entity has attributes —the particular properties that describe it.

Single-Valued versus Multivalued Attributes Most attributes have a single value for a particular entity; such attributes are called single-valued . For example, Age is a single-valued attribute of a person. In some cases an attribute can have a set of values for the same entity—for instance, a Colors attribute for a car, or a College_degrees attribute for a person. Cars with one color have a single value, whereas two-tone cars have two color values. Similarly, one person may not have any college degrees, another person may have one, and a third person may have two or more degrees; therefore, different people can have different numbers of values for the College_degrees attribute. Such attributes are called multivalued .

A multivalued attribute may have lower and upper bounds to constrain the number of values allowed for each individual entity. For example, the Colors attribute of a car may be restricted to have between one and two values, if we assume that a car can have two colors at most .

Stored versus Derived Attributes In some cases, two (or more) attribute values are related—for example, the Age and Birth_date attributes of a person. For a particular person entity, the value of Age can be determined from the current (today’s ) date and the value of that person’s Birth_date. The Age attribute is hence called a derived attribute and is said to be derivable from the Birth_date attribute, which is called a stored attribute . Some attribute values can be derived from related entities ; for example, an attribute Number_of_employees of a DEPARTMENT entity can be derived by counting the number of employees related to (working for) that department .

NULL Values In some cases, a particular entity may not have an applicable value for an attribute. For example, the Apartment_number attribute of an address applies only to addresses that are in apartment buildings and not to other types of residences, such as single-family homes. Similarly, a College_degrees attribute applies only to people with college degrees. For such situations, a special value called NULL is created . The meaning of the former type of NULL is not applicable, whereas the meaning of the latter is unknown. The unknown category of NULL can be further classified into two cases. The first case arises when it is known that the attribute value exists but is missing—for instance, if the Height attribute of a person is listed as NULL.

Complex Attributes Notice that, in general, composite and multivalued attributes can be nested arbitrarily. We can represent arbitrary nesting by grouping components of a composite attribute between parentheses ( ) and separating the components with commas, and by displaying multivalued attributes between braces { }. Such attributes are called complex attributes. For example, if a person can have more than one residence and each residence can have a single address and multiple phones, an attribute Address_phone for a person can be specified as Both Phone and Address are themselves composite attributes.

Entity Types, Entity Sets, Keys, and Value Sets Entity Types and Entity Sets. A database usually contains groups of entities that are similar. For example, a company employing hundreds of employees may want to store similar information concerning each of the employees. These employee entities share the same attributes, but each entity has its own value ( s ) for each attribute. An entity type defines a collection (or set ) of entities that have the same attributes. Each entity type in the database is described by its name and attributes.

D atabase at any point in time is called an entity set or entity collection ; the entity set is usually referred to using the same name as the entity type, even though they are two separate concepts. For example, EMPLOYEE refers to both a type of entity as well as the current collection of all employee entities in the database. It is now more common to give separate names to the entity type and entity collection; for example in object and object-relational data models

Key Attributes of an Entity Type An important constraint on the entities of an entity type is the key or uniqueness constraint on attributes. An entity type usually has one or more attributes whose values are distinct for each individual entity in the entity set. Such an attribute is called a key attribute , and its values can be used to identify each entity uniquely .

Sometimes several attributes together form a key, meaning that the combination of the attribute values must be distinct for each entity. If a set of attributes possesses this property, the proper way to represent this in the ER model that describe here is to define a composite attribute and designate it as a key attribute of the entity type . Specifying that an attribute is a key of an entity type means that the preceding uniqueness property must hold for every entity set of the entity type. Hence , it is a constraint that prohibits any two entities from having the same value for the key attribute at the same time . It is not the property of a particular entity set; rather, it is a constraint on any entity set of the entity type at any point in time.

Value Sets (Domains) of Attributes Each simple attribute of an entity type is associated with a value set (or domain of values), which specifies the set of values that may be assigned to that attribute for each individual entity. If the range of ages allowed for employees is between 16 and 70, we can specify the value set of the Age attribute of EMPLOYEE to be the set of integer numbers between 16 and 70. Similarly , we can specify the value set for the Name attribute to be the set of strings of alphabetic characters separated by blank characters, and so on. Value sets are not typically displayed in basic ER diagrams and are similar to the basic data types available in most programming languages, such as integer, string, Boolean, float , enumerated type, subrange, and so on.

Relationship Types, Relationship Sets, Roles, and Structural Constraints For example, the attribute Manager of DEPARTMENT refers to an employee who manages the department; the attribute Controlling_department of PROJECT refers to the department that controls the project; the attribute Supervisor of EMPLOYEE refers to another employee (the one who supervises this employee ); the attribute Department of EMPLOYEE refers to the department for which the employee works; and so on. In the ER model, these references should not be represented as attributes but as relationships .

A relationship type R among n entity types E 1, E 2, . . . , En defines a set of associations— or a relationship set —among entities from these entity types. Similar to the case of entity types and entity sets, a relationship type and its corresponding relationship set are customarily referred to by the same name , R . Mathematically , the relationship set R is a set of relationship instances ri, where each ri associates n individual entities ( e 1, e 2, . . . , en ), and each entity ej in ri is a member of entity set Ej , 1 ≤ j ≤ n . Hence, a relationship set is a mathematical relation on E 1, E 2, . . . , En ; alternatively , it can be defined as a subset of the Cartesian product of the entity sets E 1 × E 2 × . . . × En . Each of the entity types E 1, E 2, . . . , En is said to participate in the relationship type R ; similarly, each of the individual entities e 1, e 2, . . . , en is said to participate in the relationship instance ri = ( e 1, e 2, . . . , en ).

EMPLOYEE and DEPARTMENT, which associates each employee with the department for which the employee works. Each relationship instance in the relationship set WORKS_FOR associates one EMPLOYEE entity and one DEPARTMENT entity . Figure illustrates this example, where each relationship instance ri is shown connected to the EMPLOYEE and DEPARTMENT entities that participate in ri . In the mini-world represented by Figure, the employees e 1, e 3, and e 6 work for department d 1; the employees e 2 and e 4 work for department d 2; and the employees e 5 and e 7 work for department d 3.

Relationship Degree, Role Names, and Recursive Relationships Degree of a Relationship Type The degree of a relationship type is the number of participating entity types. Hence, the WORKS_FOR relationship is of degree two . A relationship type of degree two is called binary , and one of degree three is called ternary . An example of a ternary relationship is SUPPLY, shown in Figure where each relationship instance ri associates three entities—a supplier s , a part p , and a project j —whenever s supplies part p to project j . Relationships can generally be of any degree, but the ones most common are binary relationships. Higher-degree relationships are generally more complex than binary relationships

Relationships as Attributes One can think of an attribute called Department of the EMPLOYEE entity type, where the value of Department for each EMPLOYEE entity is (a reference to) the DEPARTMENT entity for which that employee works. Hence , the value set for this Department attribute is the set of all DEPARTMENT entities, which is the DEPARTMENT entity set. This is what we did in Figure when we specified the initial design of the entity type EMPLOYEE for the COMPANY database. However , when we think of a binary relationship as an attribute, we always have two options or two points of view.

In this example, the alternative point of view is to think of a multivalued attribute Employees of the entity type DEPARTMENT whose value for each DEPARTMENT entity is the set of EMPLOYEE entities who work for that department. The value set of this Employees attribute is the power set of the EMPLOYEE entity set. Either of these two attributes—Department of EMPLOYEE or Employees of DEPARTMENT—can represent the WORKS_FOR relationship type . Role Names and Recursive Relationships. Each entity type that participates in a relationship type plays a particular role in the relationship.

The role name signifies the role that a participating entity from the entity type plays in each relationship instance, and it helps to explain what the relationship means. For example, in the WORKS_FOR relationship type, EMPLOYEE plays the role of employee or worker and DEPARTMENT plays the role of department or employer . In such cases the role name becomes essential for distinguishing the meaning of the role that each participating entity plays . Such relationship types are called recursive relationships or self-referencing relationships .

Weak Entity Types Entity types that do not have key attributes of their own are called weak entity types . In contrast , regular entity types that do have a key attribute—which include all the examples discussed so far—are called strong entity types . Entities belonging to a weak entity type are identified by being related to specific entities from another entity type in combination with one of their attribute values. We call this other entity type the identifying or owner entity type ,10 and we call the relationship type that relates a weak entity typeto its owner the identifying relationship of the weak entity type.

A weak entity type always has a total participation constraint (existence dependency) with respect to its identifying relationship because a weak entity cannot be identified without an owner entity. However, not every existence dependency results in a weak entity type. For example, a DRIVER_LICENSE entity cannot exist unless it is related to a PERSON entity, even though it has its own key (License_number) and hence is not a weak entity.

A weak entity type normally has a partial key , which is the attribute that can uniquely identify weak entities that are related to the same owner entity . In our example , if we assume that no two dependents of the same employee ever have the same first name, the attribute Name of DEPENDENT is the partial key. In the worst case , a composite attribute of all the weak entity’s attributes will be the partial key.

ER Diagram Notations

Enhanced ER Model

The EER model includes all the modeling concepts of the ER model that were presented. In addition, it includes the concepts of subclass and superclass and the related concepts of specialization and generalization Another concept included in the EER model is that of a category or union type which is used to represent a collection of objects ( entities) that is the union of objects of different entity types. Associated with these concepts is the important mechanism of attribute and relationship inheritance . Unfortunately, no standard terminology exists for these concepts, so we use the most common terminology . Alternative terminology is given in footnotes. We also describe a diagrammatic technique for displaying these concepts when they arise in an EER schema . We call the resulting schema diagrams enhanced ER or EER diagrams .

For example, the entity type EMPLOYEE describes the type (that is, the attributes and relationships) of each employee entity, and also refers to the current set of EMPLOYEE entities in the COMPANY database. In many cases an entity type has numerous subgroupings or subtypes of its entities that are meaningful and need to be represented explicitly because of their significance to the database application . For example, the entities that are members of the EMPLOYEE entity type may be distinguished further into SECRETARY, ENGINEER, MANAGER, TECHNICIAN , SALARIED_EMPLOYEE, HOURLY_EMPLOYEE, and so on. The set or collection of entities in each of the latter groupings is a subset of the entities that belong to the EMPLOYEE entity set, meaning that every entity that is a member of one of these subgroupings is also an employee.

Specialization and Generalization Specialization is the process of defining a set of subclasses of an entity type; this entity type is called the superclass of the specialization. The set of subclasses that forms a specialization is defined on the basis of some distinguishing characteristic of the entities in the superclass. For example, the set of subclasses { SECRETARY, ENGINEER , TECHNICIAN} is a specialization of the superclass EMPLOYEE that distinguishes among employee entities based on the job type of each employee. We may have several specializations of the same entity type based on different distinguishing characteristics. For example, another specialization of the EMPLOYEE entity type may yield the set of subclasses { SALARIED_EMPLOYEE, HOURLY_EMPLOYEE }; this specialization distinguishes among employees based on the method of pay.

There are two main reasons for including class/subclass relationships and specializations. The first is that certain attributes may apply to some but not all entities of which these attributes apply. The members of the subclass may still share the majority of their attributes with the other members of the superclass. The SECRETARY subclass has the specific attribute Typing_speed, whereas the ENGINEER subclass has the specific attribute Eng_type, but SECRETARY and ENGINEER share their other inherited attributes from the EMPLOYEE entity type. The second reason for using subclasses is that some relationship types may be participated in only by entities that are members of the subclass. For example, if only HOURLY_EMPLOYEES can belong to a trade union, we can represent that fact by creating the subclass HOURLY_EMPLOYEE of EMPLOYEE and relating the subclass to an entity type TRADE_UNION via the BELONGS_TO relationship type, as illustrated in Figure

Generalization

Notice that the generalization process can be viewed as being functionally the inverse of the specialization process; we can view {CAR, TRUCK} as a specialization of VEHICLE rather than viewing VEHICLE as a generalization of CAR and TRUCK. A diagrammatic notation to distinguish between generalization and specialization is used in some design methodologies. An arrow pointing to the generalized superclass represents a generalization process, whereas arrows pointing to the specialized subclasses represent a specialization process. We will not use this notation because the decision as to which process was followed in a particular situation is often subjective.

Constraints and Characteristics of Specialization and Generalization Hierarchies If all subclasses in a specialization have their membership condition on the same attribute of the superclass, the specialization itself is called an attribute-defined specialization , and the attribute is called the defining attribute of the specialization. In this case, all the entities with the same value for the attribute belong to the same subclass. We display an attribute-defined specialization by placing the defining attribute name next to the arc from the circle to the superclass

Two other constraints may apply to a specialization. The first is the disjointness constraint , which specifies that the subclasses of the specialization must be disjoint sets . This means that an entity can be a member of at most one of the subclasses of the specialization. A specialization that is attribute-defined implies the dis jointness constraint (if the attribute used to define the membership predicate is single valued ). Figure illustrates this case, where the d in the circle stands for disjoint . The d notation also applies to user-defined subclasses of a specialization that must be disjoint , as illustrated by the specialization {HOURLY_EMPLOYEE, SALARIED_EMPLOYEE} in Figure

The second constraint on specialization is called the completeness (or totalness ) constraint , which may be total or partial. A total specialization constraint specifies that every entity in the superclass must be a member of at least one subclass in the specialization. For example, if every EMPLOYEE must be either an HOURLY_EMPLOYEE or a SALARIED_EMPLOYEE, then the specialization {HOURLY_EMPLOYEE , SALARIED_EMPLOYEE} in Figure 4.1 is a total specialization of EMPLOYEE. This is shown in EER diagrams by using a double line to connect the superclass to the circle. A single line is used to display a partial specialization , which allows an entity not to belong to any of the subclasses.

Utilizing Specialization and Generalization in Refining Conceptual Schemas In the specialization process, the database designers typically start with an entity type and then define subclasses of the entity type by successive specialization; that is, they repeatedly define more specific groupings of the entity type . For example, when designing the specialization lattice in Figure, we may first specify an entity type PERSON for a university database. Then we discover that three types of persons will be represented in the database: university employees, alumni , and students and we create the specialization {EMPLOYEE, ALUMNUS, STUDENT }. The overlapping constraint is chosen because a person may belong to more than one of the subclasses.

We specialize EMPLOYEE further into {STAFF , FACULTY, STUDENT_ASSISTANT}, and specialize STUDENT into {GRADUATE_STUDENT , UNDERGRADUATE_STUDENT}. Finally , we specialize STUDENT_ASSISTANT into {RESEARCH_ASSISTANT, TEACHING_ASSISTANT}. This process is called top-down conceptual refinement . So far, we have a hierarchy; then we realize that STUDENT_ASSISTANT is a shared subclass, since it is also a subclass of STUDENT, leading to the lattice.

The final design of hierarchies or lattices resulting from either process may be identical ; the only difference relates to the manner or order in which the schema super classes and subclasses were created during the design process. In practice, it is likely that a combination of the two processes is employed.
Tags