~!!! Please Remember in my prayers May Almighty Allah or God bless you all with great happiness Ameen. !!!~
Size: 316.35 KB
Language: en
Added: Mar 09, 2016
Slides: 50 pages
Slide Content
RDBMS Relational Database Management System Fall 2016 Muhammad Adeel Rajput Scientist/Instector
Contents Introduction to Database management systems RDBMS Entity Relationship Model Normalization Introduction to SQL File Organization & Indexing Database Administration Recovery
Database Management System
Data, Database and Database Management System Data Data is numerical, character or other symbols which can be recorded in a form suitable for processing by a computer. (e.g. names and addresses of students enrolling onto a university course). Database A Database is a collection of related data (such as an enrolling students data) arranged for speedy search and retrieval. Database Management System A Database Management System is a collection of programs that allows users to specify the structure of a database, to create, query and modify the data in the database and to control access to it. (e.g. limit access to the database so that only relevant staff can access details of enrolling students).
PREVIOUS HISTORY OF DBMS Before the concept of DBMS, they used to store the data (i.e. information) in the form of written copies and store them.... This made the retrieve process very difficult. It made wastage of paper, files, storage and precious time.
What is DBMS ? A set of programs to access the interrelated data. DBMS contains information about a particular enterprise. Computerized record keeping system. Provides convenient environment to user to perform operations: -Creation, Insertion, Deletion, Updating & Retrieval of information.
Examples of DBMS Some of the common used DBMSs are: -Oracle, IBM’s DB2, Microsoft’s SQL Server, MS-Access and Informix. Some of the desktop based DBMSs are: -Microsoft FoxPro, Borland dBase and Microsoft Access.
Advantages of DBMS Controlling Data Redundancy : Data is recorded in only one place in the database and it is not duplicated. Data Consistency : Data item appears only once, and the updated value is immediately available to all users. Control Over Concurrency : In a computer file-based system in updating, one may overwrite the values recorded by the other. Backup and Recovery Procedures : automatically create the backup of data and restore data if required. Data Independence : Separation of data structure of database from application program that uses the data is called data independence.
Disadvantages of DBMS Cost of Hardware and Software : Processor with high speed of data processing and memory of large size is required. Cost of Data Conversion : Very difficult and costly method to convert data of data file into database. Cost of Staff Training : A lot of amount for the training of staff to run the DBMS. Appointing Technical Staff : Trained technical persons such as database administrator, application programmers, data entry operators etc. are required to handle the DBMS. Database Damage : All data is integrated into a single database. If database is damaged due to electric failure or database is corrupted on the storage media, then your valuable data may be lost forever.
Applications of DBMS.... Banking: for transactions Airlines: reservation and schedules Tele communications: for retrieving data of user Credit card: for transactions Universities: registration, retrieving marks, applications, grades Human resources: employee records, salaries, tax deductions,
RDBMS DATABASE SYSTEM Edger F CODD ( Turing award )
RDBMS... Most popular database system. Simple and sound theoretical basis. Developed by E F Codd in the early 1970's. The model is based on tables, rows and columns and the manipulation of data stored within. Relational database is a collection of these tables. First commercial system: MULTICS in 1978. Has overtaken Hierarchical and Network models. Main feature: Single database can be spread across several tables. Examples include: Oracle, IBM's DB2, Sybase, MySQL & Microsoft Access.
RDBMS Advantages Increases the sharing of data and faster development of new applications Support a simple data structure, namely tables or relations Limit redundancy or replication of data Better integrity as data inconsistencies are avoided by storing data in one place Provide physical data independence so users do not have to be aware of underlying objects Offer logical database independence - data can be viewed in different ways by different users. Expandability is relatively easy to achieve by adding new views of the data as they are required. Support one off queries using SQL or other appropriate language. Better backup and recovery procedures Provides multiple interfaces Solves many problems created by other data models The ability to handle efficiently simple data types Multiple users can access which is not possible in DBMS
RDBMS Disadvantages Software is expensive Complex software means expensive hardware Requires skilled knowledge to implement Certain applications are slower processing Increased vulnerability More difficult to recover if data is lost Seen as a poor representation of the real world Difficult to represent hierarchies Difficult to represent complex data type.
Schema : - Logical structure of the database. - Doesn’t show the data in database. - Classification: 1. Physical 2. Conceptual 3. External
Cont… 1. Physical Schema: -Describes the physical storage of database. -Not in terms of blocks or devices, but describes organization of files, access path etc. 2. Conceptual Schema: - Describes structure of whole database. - Describes entities their relationships and constraints. 3. External Schema: - Provides a user’s view of data. - Shows relevant info particular to user, hides rest of the info. - one or more levels. Instances: Actual data contained in database at a particular point of time.
Differences betwen DBMS and RDBMS DBMS Data is stored in a single large table Single record modification affects the whole database RDBMS ( Codd 1980) Database is 'broken down' into smaller pieces The changes will not affect the entire database
The Future of RDBMS It is very difficult to see where RDBMS’s could go because alternatives and new versions such as XML have been introduced. From delving through forums it is plain to see that many people do not only prefer standard RDBMS’s but do not even like the alternatives. Academics refer to relation models “ It's such a simple and elegant model that it can never become unfashionable”. - Andre Naess . Luke you may want to put something here about the incorporation of other software such as CAD. etc... Hope these slides are ok. Got my script written as well but you may want me to read different stuff out. X x x
Entity-Relationship ( E-R) Model
Entity-Relationship (E-R) Model College Principal College Student C Student A Student B College 3 College 2 College 1 Course C Course B Course A Student Course Admission Stud_Name Stud_Roll No Course_Id Course_Name Relationships E-R diagram
Normalization
Normalization It is a technique for designing relational database tables to minimize duplication of information. Normalization is a practice to safeguard the database against logical and structural anomalies. Normalization is also termed as canonical synthesis by the experts . It is used to keep data consistent and check that no loss of data as well as data integrity is there. Its complexity may lead to higher degree of join operations which sometimes lead to the degraded throughput times. The normal forms like 1NF, 2NF, 3NF, BCNF, 4NF, 5NF, DKNF & 6NF are in practice .
DATABASE LANGUAGE SQL
SQL… The most basic Oracle Database utility A Basic command-line interface The first thing you work with it in Oracle DBMS
What is SQL? When a user wants to get some information from a database file, he can issue a query . A query is a user–request to retrieve data or information with a certain condition. SQL is a query language that allows user to specify the conditions. (instead of algorithms)
Introduction to SQL… Concept of SQL The user specifies a certain condition. The result of the query will then be stored in form of a table. Statistical information of the data. The program will go through all the records in the database file and select those records that satisfy the condition.(searching).
Structured Query Language ... Relational Database Management Systems use the language known as SQL SQL is a simple programming language used for accessing and managing data in relational databases. Developed by IBM in 1970 to support its various relational products.
File Organization
29 File Organization The physical arrangement of data in a file into records and pages on the disk File organization determines the set of access methods for Storing and retrieving records from a file Therefore, ‘file organization’ synonymous with ‘access method’ We study three types of file organization Unordered or Heap files Ordered or sequential files Hash files We examine each of them in terms of the operations we perform on the database Insert a new record Search for a record (or update a record) Delete a record
30 Unordered Or Heap File Records are stored in the same order in which they are created Insert operation Fast – because the incoming record is written at the end of the last page of the file Search (or update) operation Slow – because linear search is performed on pages Delete Operation Slow – because the record to be deleted is first searched for Deleting the record creates a hole in the page Periodic file compacting work required to reclaim the wasted space
31 Ordered or Sequential File Records are sorted on the values of one or more fields Ordering field – the field on which the records are sorted Ordering key – the key of the file when it is used for record sorting Search (or update) Operation Fast – because binary search is performed on sorted records Update the ordering field? Delete Operation Fast – because searching the record is fast Periodic file compacting work is, of course, required Insert Operation Poor – because if we insert the new record in the correct position we need to shift all the subsequent records in the file Alternatively an ‘overflow file’ is created which contains all the new records as a heap Periodically overflow file is merged with the main file If overflow file is created search and delete operations for records in the overflow file have to be linear!
32 Hash File Is an array of buckets Given a record, r a hash function, h(r) computes the index of the bucket in which record r belongs h uses one or more fields in the record called hash fields Hash key - the key of the file when it is used by the hash function Example hash function Assume that the staff last name is used as the hash field Assume also that the hash file size is 26 buckets - each bucket corresponding to each of the letters from the alphabet Then a hash function can be defined which computes the bucket address (index) based on the first letter in the last name.
33 Hash File (2) Insert Operation Fast – because the hash function computes the index of the bucket to which the record belongs If that bucket is full you go to the next free one Search Operation Fast – because the hash function computes the index of the bucket Performance may degrade if the record is not found in the bucket suggested by hash function Delete Operation Fast – once again for the same reason of hashing function being able to locate the record quick
34 Indexing Can we do anything else to improve query performance other than selecting a good file organization? Yes, the answer lies in indexing Index - a data structure that allows the DBMS to locate particular records in a file more quickly Very similar to the index at the end of a book to locate various topics covered in the book Types of Index Primary index – one primary index per file Clustering index – one clustering index per file – data file is ordered on a non-key field and the index file is built on that non-key field Secondary index – many secondary indexes per file Sparse index – has only some of the search key values in the file Dense index – has an index corresponding to every search key value in the file
35 Primary Indexes The data file is sequentially ordered on the key field Index file stores all (dense) or some (sparse) values of the key field and the page number of the data file in which the corresponding record is stored B002 1 B003 1 B004 2 B005 2 B007 3 Branch BranchNo Street City Postcode B002 56 Clover Dr London NW10 6EU B003 163 Main St Glasgow G11 9QX B004 32 Manse Rd Bristol BS99 1NZ B005 22 Deer Rd London SW1 4EH B007 16 Argyll St Aberdeen AB2 3SU Branch B002 record Branch B003 record Branch B004 record Branch B005 record Branch B007 record 1 2 3 4
36 Indexed Sequential Access Method ISAM – Indexed sequential access method is based on primary index Default access method or table type in MySQL, MyISAM is an extension of ISAM Insert and delete operations disturb the sorting You need an overflow file which periodically needs to be merged with the main file
37 Secondary Indexes An index file that uses a non primary field as an index e.g. City field in the branch table They improve the performance of queries that use attributes other than the primary key You can use a separate index for every attribute you wish to use in the WHERE clause of your select query But there is the overhead of maintaining a large number of these indexes
Database administration
Database administration Database administration is the function of managing and maintaining database management systems (DBMS) software. Mainstream DBMS software such as Oracle , IBM DB2 and Microsoft SQL Server need ongoing management. As such, corporations that use DBMS software often hire specialized IT ( Information Technology ) personnel called Database Administrators or DBAs.
DBA Responsibilities Installation, configuration and upgrading of Database server software and related products. Evaluate Database features and Database related products. Establish and maintain sound backup and recovery policies and procedures. Take care of the Database design and implementation. Implement and maintain database security (create and maintain users and roles, assign privileges). Database tuning and performance monitoring. Application tuning and performance monitoring. Setup and maintain documentation and standards. Plan growth and changes (capacity planning). Work as part of a team and provide 24x7 support when required. Do general technical troubleshooting and give cons. Database recovery.
Types of database administration There are three types of DBAs: Systems DBAs: (also referred to as Physical DBAs, Operations DBAs or Production Support DBAs): focus on the physical aspects of database administration such as DBMS installation, configuration, patching, upgrades, backups, restores, refreshes, performance optimization, maintenance and disaster recovery. Development DBAs : focus on the logical and development aspects of database administration such as data model design and maintenance, DDL ( data definition language ) generation, SQL writing and tuning, coding stored procedures , collaborating with developers to help choose the most appropriate DBMS feature/functionality and other pre-production activities. Application DBAs: usually found in organizations that have purchased 3rd party application software such as ERP ( enterprise resource planning ) and CRM ( customer relationship management ) systems. Examples of such application software includes Oracle Applications , Siebel and PeopleSoft (both now part of Oracle Corp.) and SAP. Application DBAs straddle the fence between the DBMS and the application software and are responsible for ensuring that the application is fully optimized for the database and vice versa. They usually manage all the application components that interact with the database and carry out activities such as application installation and patching, application upgrades, database cloning, building and running data cleanup routines, data load process management , etc. While individuals usually specialize in one type of database administration, in smaller organizations, it is not uncommon to find a single individual or group performing more than one type of database administration.
Recovery
Recovery We give a short overview of how recovery might be implemented: Requirements for recovery A practical approach to recovery – keep a recovery log – must be write-ahead Example showing system components with values in DB and in-memory cache Checkpoint procedure: to aid processing of the very large recovery log Transaction categories for recovery An algorithm for the recovery manager
Requirements for Recovery Media failure , e.g. disc-head crash. Part of persistent store is lost – need to restore it. Transactions in progress may be using this area – abort uncommitted transactions. System failure e.g. crash - main memory lost. Persistent store is not lost but may have been changed by uncommitted transactions. Also, committed transactions’ effects may not yet have reached persistent objects. Transaction abort Need to undo any changes made by the aborted transaction. Our object model assumed all invocations are recorded with the object. It was not made clear how this was to be implemented – synchronously in persistent store? We need to optimise for performance reasons - not write-out every operation synchronously. We consider one method – a recovery log . i.e. update data objects in place in persistent store, as and when appropriate, and make a (recovery) log of the updates.
Recovery Log Assume a periodic (daily?) dump of the database (e.g. Op. Sys. backup) Assume that a record of every change to the database is written to a log {transaction-ID, data-object-ID, operation (arguments), old value, new value } If a failure occurs the log can be used by the Recovery manager to REDO or UNDO selected operations. UNDO and REDO must be idempotent (repeatable), e.g. contain before and after values, not just “add 3”. Further crashes might occur at any time. Transaction abort: UNDO the operations – roll back the transaction System failure AIM: REDO committed transactions, UNDO uncommitted transactions Media failure reload the database from the last dump REDO the operations of all the transactions that committed since then But the log is very large to search for this information so, to assist rapid recovery, take a CHECKPOINT at “small” time intervals e.g. after 5 mins or after n log items – see 15
Recovery Log must be “write-ahead” Two distinct operations: write a change to an object in the database write the log record of the change A failure could occur between them – in which order should they be done? If an object is updated in the database, there is no record of the previous value, so no means of UNDOing the operation on abort. The log must be written first. Also, a transaction is not allowed to commit until the log records for all its operations have been written out to the log. Note: we can’t, and needn’t, take time to update in the database on every commit the (few) objects involved in a transaction. Note: a log can be written efficiently, because: there are enough records from the many transactions in progress at any time, the writes are to one place – the log file.
Checkpoints and the checkpoint procedure From 13: The log is very large to search for this information on transactions especially for abort of a single transaction, so take a CHECKPOINT at “small” time intervals e.g. After 5 mins or after n log items. Checkpoint procedure : Force-write any log records in main memory out to the log (OS must do this) Force-write a checkpoint record to the log, containing: - list of all transactions active (started but not committed) at the time of the checkpoint - address within the log of each transaction’s most recent log record - note: the log records of a given transaction are chained Force-write database buffers (database updates still in main memory) out to the database. Write the address of the checkpoint record within the log into a restart file.
A recovery log with a checkpoint record the data manager keeps object updates and log records in its cache in main memory main memory log records T 1 : x, add(1), 2 ->3 T 2 : a, add(2) 7->9 object values x = 3 a = 9 persistent memory log file ... many previous records ... T 1 : x, add(1), 2 ->3 T 2 : a, add(2) 7->9 persistent system state object values x = 2 a = 7 checkpoint record active Txs T 1 , T 2 T 1 most recent log location T 2 most recent log location restart file has the locations of checkpoint records in the log file
Transaction categories for recovery Checkpoint record says T 2 and T 3 are active T 1 : its log records were written out before commit . Any remaining DB updates were written out at checkpoint time. No action required. T 2 : any updates made after the checkpoint are in the log and can be re-applied (REDO) T 4 : log records are written on commit – can be re-applied (REDO is idempotent) T 3 and T 5 : any changes that might have been made can be found in the log and previous state recovered (undone using UNDO operation) T 3 requires log to be searched before the checkpoint – checkpoint contains pointer to previous log record. Time checkpoint time failure time T 1 : no action T 5 T 4 T 3 T 2 T 1 T 2 : REDO from checkpoint T 3 : UNDO all T 4 : REDO T 5 : UNDO
Algorithm for recovery manager Keeps: UNDO list - initially contains all transactions listed in the checkpoint record REDO list – initially empty Searches forward through the log starting from the checkpoint record, to the end of the log If it finds a start-transaction record it adds that transaction to the UNDO list If it finds a commit record it moves that transaction from the UNDO list to the REDO list Then, works backwards through the log UNDOing transactions on the UNDO list (restores state) Finally, works forward again through the log REDOing transactions on the REDO list Thank You