Recommended Book Database Systems: A Practical Approach to Design, Implementation, and Management, 6th Edition by Thomas Connolly and Carolyn Begg Reference Material 1 . Database Systems: The Complete Book, 2nd Edition by Hector Garcia-Molina, Jeffrey D. Ullman , Jennifer Widom 2 . Database System Concepts, 6th Edition by Avi Silberschatz , Henry F. Korth and S. Sudarshan . 3 . Database Management Systems, 3rd Edition by Raghu Ramakrishnan , Johannes Gehrke
Chapter 1 Introduction to Databases
Objective Introduction to database system Why need databases History of database Types of databases Database user DBMS
Why Study Databases? Databases are useful Many computing applications deal with large amounts of information Database systems give a set of tools for storing, searching and managing this information Databases in CS Databases are a ‘core topic’ in computer science Basic concepts and skills with database systems are part of the skill set you will be assumed to have as a CS graduate
What is a Database? “A set of information held in a computer” Oxford English Dictionary “One or more large structured sets of persistent data, usually associated with software to update and query the data” Free On-Line Dictionary of Computing “A collection of data arranged for ease and speed of search and retrieval” Dictionary.com
Database Definition A collection of self-describing and integrated data files System catalog Meta data Data dictionary Data abstraction
Databases Web indexes Library catalogues Medical records Bank accounts Stock control Personnel systems Product catalogues Telephone directories Train timetables Airline bookings Credit card details Student records Customer histories Stock market prices Discussion boards and so on…
File-Based Systems Early attempt to Computerize the manual filing system Collection of application programs that perform services for the end users (e.g. reports). Each program defines and manages its own data.
Manual Filing Systems Works well while number of items to be stored is small For only storage or retrieval functionality of large number of items
File-Based Processing
Limitations of Fil e -Based App r oach Separat i on and isolat i on of data E a ch p r o g ram m a intains its own s e t of data. Use r s of o n e p r o g ram m a y be un a w a r e of p o tent i ally us e ful data held by other p r o g rams. F o r ex a mple, if w e want to p r o d uce a l i st of all ho u s e s that m a tch the r equi r ements of the c l i ent s . Duplicat i on of data Dec e ntrali z ed a p p r o a ch tak e n by each dep a rtm e nt. S a me d a ta is held by di f fe r ent p r o g rams. W asted space and p o tent i ally diff e r ent values a n d/or dif f e r ent formats for the same i t em.
Limitations of Fil e -Based App r oach.. Data dependence File st ruc t u r e is def ined in the p r ogram code. Incompatible fi l e formats P r ogram s a r e w ri tten in diff e r ent languages, and so cannot easily access each other’s files. Fix e d Quer i es/P r ol i ferat i on of appl i cati o n p r ograms P r ogram s a r e w ri tten to sat i sfy part i cular functions. A n y new r equi r ement needs a new p r ogram.
Datab a se App r oach A r ose because: Defin i t i on of data w as em bedded in applica t ion p r ogram s, ra t her than being st o r ed separa t ely and independentl y . No cont r ol over acce s s and manipulat i on of d ata beyond that i m posed by appli c ation p r ograms. Res u lt: the database a n d D a tabase Management System (D B MS).
History of Dat a base Sys t ems Roots of the DBMS Apollo moon-land i ng p r oj e ct, 1 9 6 s NAA (North Ameri c an A via t ion), prime contr a ctor for the p r oje c t De v eloped a softw a r e GUAM (Gene r alized Update Ac c ess Method), hiera r chical In mid – 1 9 6 s IBM j o ined NAA, r esult was IMS(Information Mana g ement System)
History of Dat a base Systems.. IDS ( Integrat e d Data Sto r e) By Gene r al Elect r ic, netw o rk, mi d - 1960 COD A SYL ( Confe r ence o n Data Systems Languages) DBTG ( D ata Base T ask G r oup)
History of Dat a base Systems.. DBTG p r oposal in 1971, components b y the DBA – w h ich i n cl u des a definition of the database name, the type of each record , and the c o mponents o f eac h r e c o r d typ e . T h e su b schem a : the p art o f the da t a b ase as se e n b y the user o r appl i c a tion p r o g r a m; A data management language to define the data char a cte r is t ics and the data structu r e, and to mani p ulate the data. The network schema: the logical org a n i zation of the entire database as seen
History of Dat a base Systems.. DBTG specif i ed th r ee l anguages A sche m a Data Definition Language (DDL), which enables the DBA to define the schema. A subschema DDL, which allows the application programs to define the parts of the database they require. A Data Manipulation Language (DML), to manipulate the data.
History of Dat a base Systems.. E. F . Cod d , 1970 IBM R e se a r ch Labo r at o ry Rel a tional model System R project by IBM’S S an Jose Re s e a r ch Labo r at o ry Calif o rnia Re s ult of this p r oj e ct Development of SQL Comme r cial r elat i onal DBMS p r oducts e . g. DB2, SQL/DS f r om IBM, Oracle f r om Or acle Corp.
History of Database Systems First generation Hierarchical model I nformation M anagement S ystem (IMS) Network model Co nference on Da ta S y stem L anguages (CODASYL) D ata B ase T ask G roup (DBTG) Limitation Complex program for simple query Minimum data independence No theoretical foundation Second generation Relational model E. R. Codd DB2, Oracle Limitation Limited data modeling Third generation Object-relational DBMS Object-oriented DBMS
Evolution of Databases
History of Database Systems File based systems File based systems came in 1960s and was widely used. It stores information and organize it into storage devices like a hard disk, a CD-ROM, USB, SSD, floppy disk, etc. Relational Model Relational Model introduced by E.F.Codd in 1969. The model stated that data will be represented in tuples . A relational model groups data into one or more tables. These tables are related to each other using common records. Dbase Database like Dbase went on sale in 1980s. It was one of the first database management systems for microcomputers. Cecil Wayne Ratliff developed it. Centralized DBMS and Data Warehousing In 1990s, centralized DBMS server was used. The period also witnessed the introduction of MS-Access. In addition, users worked on Internet and data warehousing introduced. NoSQL NoSQL , Big Data came in 2008. Big Data described large value of both the structured and unstructured data. This data is so large that traditional database cannot process it. Hadoop Hadoop and MongoDB launched in 2009. Hadoop use distributed file system for storing big data, and MapReduce to process it. Hadoop excels in storing and processing of huge data of various formats such as arbitrary, semi-, unstructured, etc. MongoDB is a cross-platform, document oriented database that provides, high performance, high availability, and easy scalability. It works works on the concept of collection and document. Hbase It introduced in 2010 and is a database built on top of the HDFS. HBase provides fast lookups for larger tables.
Database Systems A database system consists of Data (the database) Software Hardware Users We focus mainly on the software Database systems allow users to Store Update Retrieve Organise Protect their data.
Database Users End users Use the database system to achieve some goal Application developers Write software to allow end users to interface with the database system Data Administrator (DA) Database planning Development and maintenance of standards, policies and procedures Database Administrator (DBA) Designs & manages the database system Database systems programmer Writes the database software itself
Database Management Systems A database is a collection of information A database management system (DBMS) is the software that controls information Used to create, maintain, and access databases Examples: Oracle DB2 (IBM) MS SQL Server MS Access Ingres PostgreSQL MySQL OpenOffice Base Corel Paradox
What the DBMS does? Provides users with Data definition language ( DDL) Perm i t s spec i f i ca t i o n of data types, st ruc t u r es and any data const rain t s. Data manipulation language (DML ) Gene ral enquiry f aci l i t y (query language) of the data Data control language (DCL) Often these are all the same language DBMS provides Concurrency Integrity Security Data independence Backup & recovery system Data Dictionary Describes the database itself
V iews Allows each user to have h i s o r her own view of the databa s e. A view is essentially some s ubset of the database.
V iews - Benefits Reduce comp l exity P r ovide a level of security P r ovide a mechanism to c ustomize the appearance of the database P r esent a cons i stent, unchanging pictu r e of the structu r e of the database, even if the underlying database i s changed
Components of DBMS Envi r onment
Hardware Client-server architecture C a n ran g e f r om a P C to a net w ork of com p uter s Software dbms , os , network, application Data Schema, subschema, table, attribute People Data administrator & database administrator Database designer: logical & physical Application programmer End-user: naive & sophisticated Procedure Start, stop, log on, log off, back up, recovery
Advantages of DBMS Control redundancy Consistency Integrity Security Concurrency control Backup & recovery Data standard More information Data sharing & conflict control Productivity & accessibility Economy of scale Maintenance
Data Dictionary - Metadata The dictionary or catalog stores information about the database itself This is data about data or ‘ metadata ’ Almost every aspect of the DBMS uses the dictionary The dictionary holds Descriptions of database objects (tables, users, rules, views, indexes,…) Information about who is using which data (locks) Schemas and mappings
File Based Systems File based systems Data is stored in files Each file has a specific format Programs that use these files depend on knowledge about that format Problems: No standards Data duplication Data dependence No way to generate ad hoc queries No provision for security, recovery, concurrency, etc.
Relational Systems Problems with early databases Navigating the records requires complex programs There is minimal data independence No theoretical foundations Then, in 1970, E. F. Codd wrote “ A Relational Model of Data for Large Shared Databanks ” and introduced the relational model
Relational Systems Information is stored as tuples or records in relations or tables There is a sound mathematical theory of relations Most modern DBMS are based on the relational model The relational model covers 3 areas: Data structure Data integrity Data manipulation
DBMS vs File System There are following differences between DBMS and File system: