Intro to DBMS
and its Models
Name Roll Nos.
Babli Kumari 02
D Gokul 11
Shraddha Labde 23
Ravikant Sharma 46
Prabhat Sinha 48
•Data: Known facts that can be recorded and have an
implicit meaning.
•Database: A collection of related data
•Mini-world: Some part of the real world about which
data is stored in a database. For example, student
grades and transcripts at a university.
•Database Management System (DBMS) : A software
package/ system to facilitate the creation and
maintenance of a computerized database.
•Database System: The DBMS software together with
the data itself. Sometimes, the applications are also
included.
Basic Definitions
DBMS
A DBMS consists of a group of
programs that manipulate the database and
provide an interface between the database ,
the user of the database and other
application programs.
History of DBMS
•1960 – First DBMS designed by Charles Bachman at
GE. Integrated Data source(IBS).
•1970 – Codd introduced IMS. IBMs Information
Management System (IMS)
•1980 – Relational model became popular and
accepted as the main database paradigm. SQL, ANSI
SQL, etc.
•1980 to 1990 – New data models, powerful query
languages, etc. Popular vendors are Oracle, SQL
Server, IBMs DB2, Informix, etc.
File System vs. DBMS
A company has 500 GB of data on employees,
departments, products, sales, & so on..
Data is accessed concurrently by several employees
Queries about the data must be answered quickly
Changes made to the data by different users must be
applied consistently
Access to certain parts of the data be restricted
File System vs. DBMS
•Data stored in operating system files
•Many drawbacks!!!
500 GB of main memory not available to hold all data.
Data must be stored on secondary storage devices
Even if 500GB of main memory is available, with 32-bit
addressing, we cannot refer directly to more than 4GB of
data
Data redundancy and inconsistency
Multiple file formats, duplication of information in
different files
Special program to answer each query user may ask
File System vs. DBMS
•Many drawbacks!!!
Integrity problems
oIntegrity constraints (e.g. account balance > 0) become
“buried” in program code rather than being stated explicitly
oHard to add new constraints or change existing ones
We must protect the data from inconsistent changes made by
different users. If application programs need to address
concurrency, their complexity increases manifolds
Consistent state of data must be restored if the system crashes
while changes are being made
OS provide only a password mechanism for security. Not
flexible enough if users have permission to access subsets of
data
Define a database : In terms of data types, structures and
constraints
Construct or Load the Database on a secondary storage
medium
Manipulating the database : Querying, generating reports,
insertions, deletions and modifications to its content
Concurrent Processing and Sharing by a set of users and
programs – Yet, keeping all data valid and consistent
Crash Recovery
Data Security and Integrity
Data Dictionary
Performance
DBMS Functionalities
Plan Executor
OptimizerOperator Evaluator
Parser
SQL Engine
Recovery
Manager
Lock
Manager
Tx
Manager
Files & Access
Buffer Manager
Disk Space Manager DBMS
CatalogData File Database
Web Forms
Front-End
SQL I/f
SQL Commands
DBMS Architecture
People Who Work with
Databases
Database Implementers/ Designers
DBA
Application Programmers
End Users
End Users
Casual users
These are people who use the database occasionally.
Naive users
These are users who constantly querying and updating the
database.
Eg. Reservation Clerks of Airline, Railway, Hotel, etc.
Clerks at receiving station of Courier service, Insurance
agencies, etc.
Sophisticated Users
People who use for their complex requirements.
Eg. Engineers, Scientists, Business analysts…
Standalone Users
Who maintain database for personal use.
DBA
Managing resources
Creation of user accounts
Providing security and authorization
Managing poor system response time
System Recovery
Tuning the Database
Database Languages
DDL – Data Definition Language
SDL – Storage Definition Language
VDL – View Definition Language
DML – Data Manipulation Language
(For data manipulations like
insertion, deletion, update,
retrieval, etc.)
Various types of data: Images, Text, complex queries,
Data Mining, etc.
Enterprise Resource Planning (ERP)
Management Resource Planning (MRP)
Database in Web technologies
Banking: all transactions
Airlines: reservations, schedules
Universities: registration, grades
Current Database trends:
Multimedia databases
Interactive video
Streaming data
Digital Libraries
Databases touch all aspects of our lives
Applications of DBMS
Advantages of a DBMS
•Program-Data Independence
Insulation between programs and data: Allows changing data
storage structures and operations without having to change the
DBMS access programs.
•Efficient Data Access
DBMS uses a variety of techniques to store & retrieve data
efficiently
•Data Integrity & Security
Before inserting salary of an employee, the DBMS can check
that the dept. budget is not exceeded
Enforces access controls that govern what data is visible to
different classes of users
Advantages of a DBMS
•Data Administration
When several users share data , centralizing the administration
offers significant improvement
•Concurrent Access & Crash Recovery
DBMS schedules concurrent access to the data in such a
manner that users think of the data as being accessed by only
one user at a time
DBMS protects users from the ill-effects of system failures
•Reduced Application Development Time
Many important tasks are handled by the DBMS
Data Model Overview
•What is data model ?
•Why data models are important ?
•Basic data-modeling building blocks.
•What are business rules and how do they
influence database design?
•How the major data models evolved ?
•How data models can be classified by level of
abstraction ?
What is Data Model
•Data Model: A set of concepts to describe the
structure of a database, and certain constraints that
the database should obey.
•Data Model Operations: Operations for specifying
database retrievals and updates by referring to the
concepts of the data model. Operations on the data
model may include basic operations and user-defined
operations.
Data Model Basic Building Blocks
•Entity
–Anything about which data will be collected/stored
•Attribute
–Characteristic of an entity
•Relationship
–Describes an association among entities
•One-to-one (1:1) relationship
•One-to-many (1:M) relationship
•Many-to-many (M:N or M:M) relationship
•Constraint
–A restriction placed on the data
Importance of Data Models
Data models
Representations, usually graphical, of complex
real-world data structures
Facilitate interaction among the designer, the
applications programmer and the end user
End-users have different views and needs
for data
Data model organizes data for various users
Business Rules
•Brief, precise and unambiguous descriptions of
policies, procedures or principles within the
organization
•Apply to any organization that stores and uses
data to generate information
•Description of operations that help to create
and enforce actions within that organization’s
environment
The Evolution of Data Models
Categories of data models
•Conceptual (high-level, semantic) data models:
Provide concepts that are close to the way many users
perceive data. (Also called entity-based or object-
based data models.)
•Physical (low-level, internal) data models: Provide
concepts that describe details of how data is stored in
the computer.
•Implementation (representational/External) data
models: Provide concepts that fall between the above
two, balancing user views with some computer
storage details.
Degrees of Data Abstraction
American National Standards Institute
(ANSI)
Standards Planning and
Requirements Committee
(SPARC)
Developed standards 1970
Framework for data modeling
based on degrees of data
abstraction:
External
Conceptual
Internal
Physical
The External Model
Each end users’ view of the data environment
Modeler subdivides requirements and constraints into
functional (Business unit’s) modules
These can be examined within the framework of their external
models
External Model – Advantages
•Easy to identify specific data required to support each
business unit’s operations
•Facilitates designer’s job by providing feedback
about the model’s adequacy
•Creation of external models helps to identify and
ensure security constraints in the database design
•Simplifies application program development
The Conceptual Model
•Global view of the entire database
•Representation of data as viewed by the entire organization
•Basis for identification and high-level description of main
data objects, avoiding details
The Conceptual Model
•Software and hardware independent
–Independent of DBMS software
–Independent of hardware to be used
–Changes in either hardware or DBMS
software have no effect on the database
design at the conceptual level
•Most widely used conceptual model is
the Entity Relationship (ER) model
–Provides a relatively easily understood
macro level view of data environment
The Internal Model
•The database as “seen” by the DBMS
•Maps the conceptual model to the DBMS
•Depicts a specific representation of an internal model
•Logical independence
–Can change the internal model without affecting the conceptual
model
The Physical Model
•Lowest level of abstraction
–Describes the way data are saved on
storage media such as disks or tapes
•Software and hardware dependent
–Requires database designers to have a
detailed knowledge of the hardware and
software used to implement database
design
•Physical independence
–Can change the physical model without
affecting the internal model
Degrees of Data Abstraction - Summary
Hierarchical DB model
•Logically represented by an upside down TREE
•Each parent can have many children
•Each child has only one parent
•The top layer is perceived as the parent of the
segment directly beneath it.
•The segments below other segments are the children
of the segment above them.
SALIENT FEATURES
Hierarchical DB model
Hierarchical DB model
Example
Emp No. First Name Last Name Dept Num
100 John Dougals 2A
101 Antony Wanton 2B
102 Mary Queen 2C
103 David Moorey 2D
Serial No. Type User Emp No.
3009734-4 Computer 100
3-23-283742 Monitor 100
2-22-723423 Monitor 100
232342 Printer 100
PARENT
CHILD
Advantages
•Conceptual simplicity
•Data independence
•Efficiency dealing with a large database
Disadvantages
•Complex implementation
•Difficult to manage and lack of standards
•Lacks structural independence
•Applications programming and use complexity
•Implementation limitations (no M:N relationship)
Network DB model
Network DB model
•Developed in mid 1960s as part of work of CODASYL
(Conference on Data Systems Languages) which proposed
programming language COBOL (1966) and then network model
(1971)
•The network model has greater flexibility than the hierarchical
model for handling complex spatial relationships
•Objective of network model is to separate data structure from
physical storage, eliminate unnecessary duplication of data with
associated errors and costs
•The Network Database Model was created for three main
purposes :
- representing a complex data relationship more effectively
- improving database performance
- imposing a database standard
Network DB model
•Major characteristic of this database model is that it
comprises of at least two record types ; the owner &
the member.
•An owner is a record type equivalent to the parent type
in the hierarchal database model, and the member
record type resembles the child type in the hierarchal
model.
•The network database model uses a data management
language that defines data characteristics and the data
structure in order to manipulate the data.
Network DB model
•The network model contains logical information such as
connectivity relationships among nodes and links, directions
of links, and costs of nodes and links.
•Example with diagram
Key terms in network Model
•A node represents an object of interest.
•A link represents a relationship between two nodes. Within a
directed network, any link can be bidirected (that is, able to be
traversed either from the start node to the end node or from the
end node to the start node) or unidirected (that is, able to be
traversed only from the start node to the end node). Within an
undirected network, all links are bidirected.
•A path is an alternating sequence of nodes and links, beginning
and ending with nodes, and usually with no nodes and links
appearing more than once. (Repeating nodes and links within a
path are permitted, but are rare in most network applications.)
Key terms in network Model
•A network is a set of nodes and links. A network is directed if
the links that it contains are directed, and a network is
undirected if the links that it contains are undirected.
•A logical network contains connectivity information but no
geometric information. This is the model used for network
analysis. A logical network can be treated as a directed graph
or undirected graph, depending on the application.
• Cost is a non-negative numeric attribute that can be
associated with links or nodes for computing the minimum
cost path
• Duration is a non-negative numeric attribute that can be
associated with links or nodes to specify a duration value for
the link or node.
Network Hierarchy
•A network hierarchy enables us to represent a
network with multiple levels of abstraction by
assigning a hierarchy level to each node.
•The lowest (most detailed) level in the hierarchy is
level 1, and successive higher levels are numbered 2,
3, and so on.
•Nodes at adjacent levels of a network hierarchy have
parent-child relationships.
•Each node at the higher level can be the parent node
for one or more nodes at the lower level.
Network Hierarchy
•Each node at the lower level can be a child node of
one node at the higher level.
•Sibling nodes are nodes that have the same parent
node.
•Links can also have parent-child relationships.
However, because links are not assigned to a
hierarchy level, there is not necessarily a relationship
between link parent-child relationships and network
hierarchy levels.
•Sibling links are links that have the same parent link.
Applications of Network Model
•In a typical road network, the intersections of roads are
nodes and the road segments between two intersections are
links. An important operation with a road network is to find
the path from a start point to an end point, minimizing either
the travel time or distance. There may be additional constraints
on the path computation, such as having the path go through a
particular landmark or avoid a particular intersection.
•Also in a biochemical process metabolic pathways are
networks involved in enzymatic reactions, while regulatory
pathways represent protein-protein interactions. In this
example, a pathway is a network; genes, proteins, and
chemical compounds are nodes; and reactions among nodes
are links.
Applications of Network Model
•The subway network of any major city is probably
best modeled as a logical network, assuming that
precise spatial representation of the stops and track
lines is unimportant. Important operations with a train
network include finding all stations that can be
reached from a specified station, finding the number
of stops between two specified stations, and finding
the travel time between two stations.
Advantages
• Simplicity : The network model is conceptually
simple and easy to design.
• Ability to handle more relationship types : The
network model can handle the one-to-many and
many-to-many relationships.
• Ease of data access : In the network database
terminology, a relationship is a set. Each set
comprises of two types of records.- an owner record
and a member record, In a network model an
application can access an owner record and all the
member records within a set.
Advantages
•Data Integrity : In a network model, no member can
exist without an owner. A user must therefore first
define the owner record and then the member record.
This ensures the integrity.
• Data Independence : The network model draws a
clear line of demarcation between programs and the
complex physical storage details. The application
programs work independently of the data. Any
changes made in the data characteristics do not affect
the application program.
Disadvantages
• System Complexity : The structure of the network
model is very difficult to change. This type of system
is very complex
• Lack of Structural independence : This model
lacks structural independence. This database model
should be used when it is necessary to have a flexible
way of representing objects and their relationship.
Any changes made to the database structure require
the application programs to be modified before they
can access data.
Relational DB model
Why Study the Relational Model?
•Most widely used model.
–Vendors: IBM, Informix, Microsoft, Oracle,
Sybase, etc.
•“Legacy systems” in older models
–E.G., IBM’s IMS
•Recent competitor: object-oriented model
–ObjectStore, Versant, Ontos
–A synthesis emerging: object-relational model
•Informix Universal Server, UniSQL, O2,
Oracle, DB2
Definitions
Instance
A database instance, or an ‘instance’ is made up
of the background processes needed by the
database software.
Include a process monitor, session monitor, lock
monitor, etc. They will vary from database
vendor to database vendor.
Definitions
What is a schema?
A SCHEMA IS NOT A DATABASE, AND A DATABASE IS
NOT A SCHEMA.
A database instance controls 0 or more databases.
A database contains 0 or more database application schemas.
A database application schema
Set of database objects that apply to a specific application.
Objects are relational in nature, and are related to each other, within a database to serve a
specific functionality.
Example payroll, purchasing, calibration, trigger, etc.
A database application schema not a database. Usually several schemas coexist in a
database.
A database application is the code base to manipulate and
retrieve the data stored in the database application schema.
Definitions Cont.
Primary Definitions
•Table, a set of columns that contain data. In the old
days, a table was called a file.
•Row, a set of columns from a table reflecting a
record.
•Index, an object that allows for fast retrieval of table
rows. Every primary key and foreign key should have
an index for retrieval speed.
•Primary key, often designated pk, is 1 or more
columns in a table that makes a record unique.
Definitions Cont.
Primary Definitions
•Foreign key, often designated fk, is a common
column common between 2 tables that define the
relationship between those 2 tables.
•Foreign keys are either mandatory or optional.
Mandatory forces a child to have a parent by creating
a not null column at the child. Optional allows a child
to exist without a parent, allowing a nullable column
at the child table (not a common circumstance).
Definitions Cont.
Primary Definitions
Entity Relationship Diagram or ER is a pictorial
representation of the application schema.
Index_no
Definitions Cont.
Primary Definitions
Constraints are rules residing in the database’s
data dictionary governing relationships and
dictating the ways records are manipulated,
what is a legal move vs. what is an illegal
move.
These are of the utmost importance for a secure
and consistent set of data.
Definitions Cont.
Primary Definitions
Data Manipulation Language or DML, sql
statements that insert, update or delete
database in a database.
Data Definition Language or DDL, sql used to
create and modify database objects used in an
application schema.
Definitions Cont.
Primary Definitions
A transaction is a logical unit of work that
contains one or more SQL statements.
A transaction is an atomic unit.
The effects of all the SQL statements in a
transaction can be either all committed (applied to
the database) or all rolled back (undone from the
database), insuring data consistency.
Definitions Cont.
Primary Definitions
•A view is a selective presentation of the
structure of, and data in, one or more tables (or
other views).
•A view is a ‘virtual table’, having predefined
columns and joins to one or more tables,
reflecting a specific facet of information.
Definitions Cont.
Primary Definitions
Database triggers are PL/SQL, Java, or C procedures
that run implicitly whenever a table or view is
modified or when some user actions or database
system actions occur.
Database triggers can be used in a variety of ways for
managing your database.
For example, they can automate data generation, audit data
modifications, enforce complex integrity constraints, and
customize complex security authorizations.
Trigger methodology differs between databases.
Terminology
Name Price Category Manufacturer
gizmo $19.99 gadgets GizmoWorks
Power gizmo $29.99 gadgets GizmoWorks
SingleTouch $149.99 photography Canon
MultiTouch $203.99 household Hitachi
Tuples or rows or records
Attribute names
Table name or relation name
Products:
Relational DB model
Relation subject to the following rules
•Relation (file, table) is a two-dimensional table.
•Attribute (i.e. field or data item) is a column in the
table.
•Each column in the table has a unique name within
that table.
•Each column is homogeneous. Thus the entries in any
column are all of the same type (e.g. age, name,
employee-number, etc).
•Each column has a domain, the set of possible values
that can appear in that column.
•A Tuple (i.e. record) is a row in the table.
Relation subject to the following rules
•The order of the rows and columns is not important.
•Values of a row all relate to some thing or portion of a thing.
•Repeating groups (collections of logically related attributes
that occur multiple times within one record occurrence) are not
allowed.
•Duplicate rows are not allowed (candidate keys are designed to
prevent this).
•Cells must be single-valued (but can be variable length). Single
valued means the following:
–Cannot contain multiple values such as 'A1,B2,C3'.
–Cannot contain combined values such as 'ABC-XYZ' where
'ABC' means one thing and 'XYZ' another