DATABASE MANAGEMENT SYSTEMS DR. SHEFALI ARORA CHOUHAN ASSISTANT PROFESSOR DEPT OF CSE
COURSE OUTCOMES To understand the different issues involved in the design and implementation of a database system. To study the physical and logical database designs, database modeling, relational, hierarchical, and network models To understand and use data manipulation language to query, update, and manage a database To develop an understanding of essential DBMS concepts such as: database security, integrity, concurrency, distributed database, and intelligent database, Client/Server (Database Server), Data Warehousing. To design and build a simple database system and demonstrate competence with the fundamental tasks involved with modeling, designing, and implementing a DBMS.
SYLLABUS
INTRODUCTION TO DATABASES A database is a collection of related data.1 By data ,we mean known facts that can be recorded and that have implicit meaning. For example, consider the names, telephone numbers , and addresses of the people you know A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and some preconceived applications in which these users are interested.
Example Amazon has an estimated 1,000,000,000 gigabytes of data hosted across more than 1,400,000 servers . Amazon's marketplace features over 600 million products, with approximately 12 million sold directly by Amazon
Database Management Systems A database management system (DBMS) is a collection of programs that enables users to create and maintain a database . DBMS is a general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications. The database definition or descriptive information is also stored by the DBMS in the form of a database catalog or dictionary; it is called meta-data . Manipulating a database includes functions such as querying the database to retrieve specific data
History of Databases
Facts Edgar F. Codd introduced the relational database model in 1970. He suggested using SQL for querying and arranging data into tables, or relations. The hierarchical model and the network model, two early database models, both organized data trees or graphs formats. First adopted by the American National Standards Institute (ANSI), SQL rose to prominence as the industry standard query language for relational databases in the 1980s.
Traditional File Systems vs DBMS In traditional file processing, the structure of data files is embedded in the application programs , so any changes to the structure of a file may require changing all programs DBMS access programs do not require such changes in most cases. The structure of data files is stored in the DBMS catalog separately from the access programs
Traditional File Systems vs DBMS In traditional file processing, data definition is typically part of the application programs themselves. Hence, these programs are constrained to work with only one specific database, whose structure is declared in the application programs. Whereas file-processing software can access only specific databases, DBMS software can access diverse databases by extracting the database definitions from the catalog and using these definitions.
Popular DBMS Softwares
Why use DBMS? Avoid Redundancy Avoid unauthorized access Backup & Recovery Integrity Constraints Complex relationships between data Data Sharing Multiple Views of Data
The People Managing Databases
Characteristics of DBMS Self-describing nature of a database system: A DBMS catalog stores the description of a particular database (e.g. data structures, types, and constraints) The description is called meta-data Insulation between programs and data called program-data independence. Allows changing data structures and storage organization without having to change the DBMS access programs.
Characteristics of DBMS Data Abstraction - A data model is used to hide storage details and present the users with a conceptual view of the database . Programs refer to the data model constructs rather than data storage details Support of multiple views of the data Each user may see a different view of the database, which describes only the data of interest to that user.
Database vs Information Retrieval Traditionally, database technology applies to structured and formatted data that arises in routine applications in government, business, and industry. Information Retrieval ( IR) deals with books, manuscripts, and various forms of library-based articles. Data is indexed, cataloged, and annotated using keywords. IR is concerned with searching for material based on these keywords, and with the many problems dealing with document processing and free-form text processing .
DBMS may not be suitable for.. Scenarios with High initial investment in hardware, software, and training Simple , well-defined database applications that are not expected to change at all Stringent , real-time requirements for some application programs that may not be met because of DBMS overhead
Common Terminology Client: typically designed so that it will run on a user workstation or personal computer. Server: typically handles data storage, access, search, and other functions An entity represents a real-world object or concept, such as an employee or a project from the miniworld that is described in the database. An attribute represents some property of interest that further describes an entity, such as the employee’s name or salary . A relationship among two or more entities represents an association among the entities, for example, a works-on relationship between an employee and a project. Entity-Relationship model —a popular high-level conceptual data model. Chapter 8 describes additional abstractions used for advanced Data models: widely used relational data model , as well as the so-called legacy data models—the network and hierarchical models —that have been widely used in the past.
Schema: The description of a database is called the database . A schema diagram displays only some aspects of a schema, such as the names of record types and data items, and some types of constraints. The data in the database at a particular moment in time is called a database state or snapshot . It is also called the current set of occurrences or instances Data independence : the capacity to change the schema at one level of a database system without having to change the schema at the next higher level
Three Schema / Three tier architecture
Three schema architecture The internal level has an internal schema , uses a physical data model and describes the complete details of data storage and access paths for the database . The conceptual level has a conceptual schema , which describes the structure of the whole database for a community of users. concentrates on describing entities , data types, relationships, user operations, and constraints. The external or view level includes a number of external schemas or user views . Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group.
Logical data independence : capacity to change the conceptual schema without having to change external schemas or application programs. Eg . expand the database (by adding a record type or data item), to change constraints, or to reduce the database (by removing a record type or data item ). Physical data independence :capacity to change the internal schema without having to change the conceptual schema. Eg . Changes to the internal schema may be needed because some physical files were reorganized—for example, by creating additional access structures—to improve the performance of retrieval or update .
DBMS Languages D ata definition language ( DDL ), is used by the DBA and by database designers to define both schemas. The DBMS will have a DDL compiler whose function is to process DDL statements in order to identif descriptions of the schema constructs and to store the schema description in the DBMS catalog Storage definition language ( SDL ), is used to specify the internal schema. The mappings between the two schemas may be specified in either one of these languages. V iew definition language ( VDL ), is used to specify user views and their mappings to the conceptual schema DDL is used to define both conceptual and external schemas . Typical manipulations include retrieval, insertion, deletion, and modification of the data. The DBMS provides a set of operations or a language called the data manipulation language( DML ) for these purposes.
Components of DBMS Access to the disk is controlled primarily by the operating system ( OS ), which schedules disk read/write . Many DBMSs have their own buffer management module to schedule disk read/write, because this has a considerable effect on performance . The DDL compiler processes schema definitions, specified in the DDL, and stores descriptions of the schemas (meta-data) in the DBMS catalog Casual users and persons with occasional need for information from the database interact using some form of interface, which we call the interactive query interface
Components of DBMS The precompiler extracts DML commands from an application program written in a host programming language. These commands are sent to the DML compiler for compilation into object code for database access . The rest of the program is sent to the host language compiler . It is now common to have the client program that accesses the DBMS running on a separate computer from the computer on which the database resides. The former is called the client computer running a DBMS client software and the latter is called the database server .
Components of DBMS In addition to possessing the software modules just described, most DBMSs have database utilities that help the DBA manage the database system. Eg . A loading utility is used to load existing data files—such as text files or sequential files For the hierarchical DBMS ,the vendors or third-party companies are making a variety of conversion tools available (e.g ., Cincom’s SUPRA Server SQL) to transform data into the relational model . A backup utility creates a backup copy of the database, usually by dumping the entire database onto tape or other mass storage medium . Data dictionary (or data repository ) stores catalog information about schemas and constraints, other information, such as design decisions, usage standards, application program descriptions, and user information.
Centralized DBMS vs Client/Server Architecture C entralized DBMS : All DBMS functionality, application program execution, and user interface processing are carried out on one machine . The client/server architecture was developed to deal with computing environments in which a large number of PCs, workstations, file servers, printers, data base servers, Web servers, e-mail servers, and other software and equipment are connected via a network.
Client/Server architecture The concept of client/server architecture assumes an underlying framework that consists of many PCs and workstations as well as a smaller number of mainframe machines, connected via LANs and other types of computer networks. A client in this framework is typically a user machine that provides user interface capabilities and local processing. When a client requires access to additional functionality— such as database access—that does not exist at that machine, it connects to a server that provides the needed functionality. A server is a system containing both hardware and software that can provide services to the client machines, such as file access, printing, archiving, or database access.
2-Tier Systems In such an architecture, the server is often called a query server or transaction server because it provides these two functionalities . In an RDBMS, the server is also often called an SQL server . The user interface programs and application programs can run on the client side. When DBMS access is required, the program establishes a connection to the DBMS (which is on the server side); once the connection is created, the client program can communicate with the DBMS. A standard called Open Database Connectivity ( ODBC ) provides an application programming interface ( API ), which allows client-side programs to call the DBMS, as long as both client and server machines have the necessary software installed. Most DBMS vendors provide ODBC drivers for their systems The architectures described here are called two-tier architectures because the software components are distributed over two systems: client and server
3-Tier Systems
Basic SQL commands
ALTER TABLE Customers DROP COLUMN Email; ALTER TABLE Persons ALTER COLUMN DateOfBirth year; ALTER TABLE table_name RENAME COLUMN old_name to new_name ;
ER Modelling Entity-Relationship ( ER ) model is a popular high-level conceptual data model. This model and its variations are frequently used for the conceptual design of database applications, and many database design tools employ its concepts. The diagrammatic notation associated with the ER model is known as ER diagrams
Steps for Database Design Collect the functional requirements of the application. These consist of the user-defined operations (or transactions ) that will be applied to the database, including both retrievals and updates. In software design, it is common to use data flow diagrams , sequence diagrams , scenarios , and other techniques to specify functional requirements . Once the requirements have been collected and analyzed, the next step is to create a conceptual schema for the database, using a high-level conceptual data model. This step is called conceptual design .
Components of ER models The basic object that the ER model represents is an entity , which is a thing in the real world with an independent existence . An entity may be an object with a physical existence (for example, a particular person, car, house , or employee) or it may be an object with a conceptual existence (for instance, a company, a job, or a university course ). Each entity has attributes —the particular properties that describe it. Composite attributes can be divided into smaller subparts, which represent more basic attributes with independent meanings. For example, the Address attribute of the EMPLOYEE can be subdivided into Street_address , City, State, and Zip,3 with the values ‘ 2311 Kirby’, ‘Houston’, ‘Texas’, and ‘77001.’ Attributes that are not divisible are called simple or atomic attributes .
Most attributes have a single value for a particular entity; such attributes are called single-valued . For example, Age is a single-valued attribute of a person. Similarly , one person may not have a college degree, another person may have one, and a third person may have two or more degrees; therefore, different people can have different numbers of values for the College_degrees attribute. Such attributes are called multivalued .
The Age attribute is called a derived attribute and is said to be derivable from the Birth_date attribute, which is called a stored attribute . Some attribute values can be derived from In some cases, a particular entity may not have an applicable value for an attribute. A College_degrees attribute applies only to people with college degrees. For such situations, a special value called NULL is created . Complex attributes . For example, if a person can have more than one residence and each residence can have a single address and multiple phones , an attribute Address_phone for a person can be specified . Both Phone and Address are themselves composite attributes.
Key attributes are those attributes that can uniquely identify the entity in the entity set. Example : Roll-No is the key attribute because it can uniquely identify the student. This attribute can take NULL value when entity does not have value for it. Example – The ‘Net Banking Active Bin’ attribute gives weather particular customer having net banking facility activated or not activated.
ER Diagrams and Relationships
Total Vs Partial Participation Participation Constraints in database management refer to rules that determine the minimum and maximum participation of entities or relationships in a given relationship set. While partial participation permits discretionary involvement, total participation requires every entity in one set to take part in a relationship in another set. Entire participation, sometimes known as required participation, denotes the requirement that each individual in a group participate in an activity pertaining to another group. It's similar to saying that in order to belong to one group, you must somehow be associated with another.
TYPES OF RELATIONSHIPS Cardinality defines the possible number of occurrences in one entity which is associated with the number of occurrences in another . A one-to-many relationship refers to the relationship between two entities X and Y in which an instance of X may be linked to many instances of Y, but an instance of Y is linked to only one instance of X A one-to-one relationship is mostly used to split an entity in two to provide information concisely and make it more understandable A many-to-many relationship refers to the relationship between two entities X and Y in which X may be linked to many instances of Y and vice versa
CREATING A DATABASE FROM RELATIONSHIPS For a 1: N relationship, the key is mostly migrated to the N side of the relationship. For a 1: 1 relationship, the key can be migrated to any side of the relationship For a M:N relationship, the keys of both participating entities are combined into a table
DEGREES OF RELATIONSHIP Unary (Degree 1) In this type of relationship, both the associating entity types are the same. So, we can say that unary relationships exist when both entity types are the same and we call them the degree of relationship is 1 In a Binary relationship, there are two types of entity associates. So, we can say that a Binary relationship exists when there are two types of entity and we call them a degree of relationship is 2
In the Ternary relationship, there are three types of entity associates. So, we can say that a Ternary relationship exists when there are three types of entity and we call them a degree of relationship is 3 . In the N- ary relationship, there are n types of entity that associates. So, we can say that an N- ary relationship exists when there are n types of entities.
Employee Department Manages (0,1) (1,1) Employee Department Works_for (1,1) (1,1)
KEYS A key is an attribute or set of attributes Primary key uniquely identifies tuples in a table Eg Aadhar card, Roll no Other keys are known as alternative keys Primary key is chosen from a set of candidate keys
A super key is a combination of all possible attributes that uniquely identify tuples A table can have many super keys A super key ma have additional attributes not required for unique identity
Candidate Key A candidate key is an attribute or set of attributes derived from super key Also known as minimal super key Defined as a distinct set of attributes from which primary key is selected Cannot have NULL values Primary key is chosen from among candidate keys by database designer
Other Terms A domain D is a set of atomic values. By atomic we mean that each value in the domain is indivisible as far as the formal relational model is concerned . A relation schema R , denoted by R ( A 1, A 2, ...,A n ), is made up of a relation name R and a list of attributes, A 1, A 2, ..., An . Each attribute Ai is the name of a role played by some domain D in the relation schema R . A relation (or relation state ) r of the relation schema R ( A 1, A 2, ..., An ), also denoted by r ( R ), is a set of n -tuples r = { t 1, t 2, ..., tm }. Each n -tuple t is an ordered list of n values t =<v 1 , v 2 , ..., vn > , where each value vi, 1 ≤ i ≤ n , is an element of dom (Ai ) or is a special NULL value.
The Cartesian product specifies all possible combinations of values from the underlying domains . Hence, if we denote the total number of values Cardinality represents the number of tuples The lowercase letters q, r, s denote relation states . The letters t, u, v denote tuples. In general, the name of a relation schema such as STUDENT also indicates the current set of tuples in that relation—the current relation state—whereas STUDENT(Name , Ssn , ...) refers only to the relation schema. An attribute A can be qualified with the relation name R to which it belongs by using the dot notation R.A—for example, STUDENT.Name or STUDENT.Age . This is because the same name may be used for two attributes in different relations. However, all attribute names in a particular relation must be distinct .
Constraints Domain constraints specify that within each tuple, the value of each attribute A must be an atomic value from the domain dom ( A ) The entity integrity constraint states that no primary key value can be NULL. This is because the primary key value is used to identify individual tuples in a relation. Having NULL values for the primary key implies that we cannot identify some tuples .
Referential Integrity Constraint The conditions for a foreign key, given below, specify a referential integrity constraint between the two relation schemas R 1 and R 2. A set of attributes FK in relation schema R 1 is a foreign key of R 1 that references relation R 2 if it satisfies the following rules: 1. The attributes in FK have the same domain(s) as the primary key attributes PK of R 2; the attributes FK are said to reference or refer to the relation R 2. 2. A value of FK in a tuple t 1 of the current state r 1( R 1) either occurs as a value of PK for some tuple t 2 in the current state r 2( R 2) or is NULL .
Candidate Keys
Relational Algebra Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation: Relational Algebra: More operational, very useful for representing execution plans. Relational Calculus: Lets users describe what they want, rather than how to compute it. (Nonoperational, declarative.)
Relational Algebra A query is applied to relation instances, and the result of a query is also a relation instance. Schemas of input relations for a query are fixed (but query will run regardless of instance!) The schema for the result of a given query is also fixed
Operations Selection - Selects a subset of rows from relation. Projection Deletes unwanted columns from relation. Cross-product Allows us to combine two relations. Set-difference Tuples in reln . 1, but not in reln . 2. Union Tuples in reln . 1 and in reln . 2.
Projection Deletes attributes that are not in projection list. Schema of result contains exactly the fields in the projection list, with the same names that they had in the (only) input relation. Projection operator has to eliminate duplicates!
Selection Selects rows that satisfy selection condition. V No duplicates in result Schema of result identical to schema of ( only) input relation. Result relation can be the input for another relational algebra operation!
Union , Intersection and Difference
Cross Product Each row of S1 is paired with each row of R1. Result schema has one field per field of S1 and R1, with field names `inherited’ if possible. Conflict : Both S1 and R1 have a field called sid
Join Result schema same as that of cross-product. Fewer tuples than cross-product, might be able to compute more efficiently Sometimes called a theta-join .
Rename We can also define a formal RENAME operation—which can rename either the relation name or the attribute names, or both—as a unary operator . The general RENAME operation when applied to a relation R of degree n is denoted by any of the following three forms: ρ S ( B 1, B 2, ..., Bn )( R ) or ρ S ( R ) or ρ( B 1, B 2, ..., Bn )( R ) where the symbol ρ (rho) is used to denote the RENAME operator, S is the new relation name , and B 1, B 2, ..., Bn are the new attribute names.
What will the following give as output?
Types of Join- Theta Join & Equi Join A general case of join operation Also called conditional join Denoted by symbol theta Conditional operators such as =,<,> etc can be used in theta join Only = is used in equi join
Natural Join Special case of Equi Join Denoted by Both tables should have same attribute Name and domain should be the same in both tables Does not use comparison operator Removes redundant attributes
Divsion R1/R2= Tuples of R1 associated with R2 Possible if R2 is a subset of R1
Practice on Relational Algebra Consider the following schema: Suppliers ( sid : integer, sname : string, address : string) Parts ( pid : integer, pname : string, color : string) Catalog ( sid : integer, pid : integer, cost : real) The key fields are underlined and domain of each field is listed after the field name .
Questions Find the name of suppliers who supply some red parts Find the sids of suppliers who supply some red or green parts Find the sids of suppliers who supply some red part or are at 221 packer Ave Find the sids of suppliers who supply some red part and some green part Find the sids of suppliers who supply every part Find the sids of suppliers who supply every red part Find the sids of suppliers who supply every red or green part
For every project located in ‘Stafford’, list the project number, the controlling department number, and the department manager’s last name, address , and birth date.
Find the names of employees who work on all the projects controlled by department number 5
Retrieve the names of employees who have no dependents
Make a list of project numbers for projects that involve an employee whose last name is ‘Smith’, either as a worker or as a manager of the department that controls the project.