Distributed DBMS - Unit 9 - Distributed Deadlock & Recovery

DhavalChandarana 8,561 views 55 slides Jan 11, 2017
Slide 1
Slide 1 of 55
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55

About This Presentation

Distributed Deadlock & Recovery Deadlock concept, Deadlock in Centralized systems, Deadlock in Distributed Systems – Detection, Prevention, Avoidance, Wait-Die Algorithm, Wound-Wait algorithm Recovery in DBMS - Types of Failure, Methods to control failure, Different techniques of recoverabilit...


Slide Content

Unit – 9 Distributed Deadlock & Recovery

Outlines.. Deadlock in Distributed Systems Recovery in DBMS Advanced recovery techniques Shadow Paging Fuzzy checkpoint ARIES RAID levels Two Phase and Three Phase commit protocols 11/01/2017 2 Prof. Dhaval R. Chandarana

Deadlock A deadlock can occur because transactions wait for one another. Informally, a deadlock situation is a set of requests that can never be granted by the concurrency control mechanism. A deadlock can be indicated by a cycle in the wait-for-graph (WFG) . In computer science, deadlock refers to a specific condition when two or more processes are each waiting for another to release a resource, or more than two processes are waiting for resources in a circular chain. 11/01/2017 3 Prof. Dhaval R. Chandarana

Deadlock 11/01/2017 4 Prof. Dhaval R. Chandarana

Necessary Condition for Deadlock Mutual exclusion : A resource may be acquired exclusively by only one process at a time. 11/01/2017 5 Prof. Dhaval R. Chandarana

Necessary Condition for Deadlock Hold and wait : Processes currently holding resources that were granted earlier can request new resources. 11/01/2017 6 Prof. Dhaval R. Chandarana

Necessary Condition for Deadlock No preemption : Once a process has obtained a resources, the system cannot remove it from the process control until the process has finished using the resource. 11/01/2017 7 Prof. Dhaval R. Chandarana

Necessary Condition for Deadlock Circular wait : A circular chain of hold and wait condition exists in the system. 11/01/2017 8 Prof. Dhaval R. Chandarana

Deadlock Detection Deadlock detection is the process of actually determining that a deadlock exists and identifying the processes and resources involved in the deadlock. Detection of a cycle in WFG proceeds concurrently with normal operation in main of deadlock detection. 11/01/2017 9 Prof. Dhaval R. Chandarana

Deadlock Prevention The deadlock prevention approach does not allow any transaction to acquire locks that will lead to deadlocks. The convention is that when more than one transactions request for locking the same data item, only one of them is granted the lock. One of the most popular deadlock prevention methods is pre-acquisition of all the locks. 11/01/2017 10 Prof. Dhaval R. Chandarana

Deadlock Avoidance The deadlock avoidance approach handles deadlocks before they occur. It analyzes the transactions and the locks to determine whether or not waiting leads to a deadlock. There are two algorithms for this purpose, namely wait-die and wound-wait . Let us assume that there are two transactions, T1 and T2, where T1 tries to lock a data item which is already locked by T2. The algorithms are as follows − Wait-Die − If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is younger than T2, T1 is aborted and later restarted. Wound-Wait − If T1 is older than T2, T2 is aborted and later restarted. Otherwise, if T1 is younger than T2, T1 is allowed to wait. 11/01/2017 11 Prof. Dhaval R. Chandarana

Recovery in DBMS Recovery refers to restoring a system to its normal operation state. Once a failure has occurred, it is essential that the process where the failure happened can recover to correct state. Following are some solution on process recovery: Reclaim resources allocated to process. Undo modification made to database. Restart the process. Or Restart the process from point of failure and resume execution. 11/01/2017 12 Prof. Dhaval R. Chandarana

Concept of Recovery Fault Fault Error Valid state Erroneous state 11/01/2017 13 Prof. Dhaval R. Chandarana

Concept of Recovery System failure: System does not meet requirements Erroneous system state: State which could lead to a system failure by a sequence of valid state transitions. Error: the part of the system state which different from its intended value. Fault: Abnormal physical condition. Eg , design errors, manufacturing problems, damage, external disturbances. 11/01/2017 14 Prof. Dhaval R. Chandarana

Classification of Failure In a distributed database system, we need to deal with four types of failures: transaction failures (aborts), site (system) failures , media (disk) failures , and communication line failures . Some of these are due to hardware and others are due to software. Software failures are typically caused by “bugs” in the code. As stated before, most of the software failures are soft failures. 11/01/2017 15 Prof. Dhaval R. Chandarana

1 Transaction Failures Transactions can fail for a number of reasons. Failure can be due to an error in the transaction caused by incorrect input data as well as the detection of a present or potential deadlock . some concurrency control algorithms do not permit a transaction to proceed or even to wait if the data that they attempt to access are currently being accessed by another transaction. This might also be considered a failure. 11/01/2017 16 Prof. Dhaval R. Chandarana

2 Site (System) Failures A system failure is always assumed to result in the loss of main memory contents. Therefore, any part of the database that was in main memory buffers is lost as a result of a system failure. In distributed database terminology, system failures are typically referred to as site failures, since they result in the failed site being unreachable from other sites in the distributed system. Total failure refers to the simultaneous failure of all sites in the distributed system Partial failure indicates the failure of only some sites while the others remain operational. 11/01/2017 17 Prof. Dhaval R. Chandarana

3 Media Failures Media failure refers to the failures of the secondary storage devices that store the database. Such failures may be due to operating system errors, as well as to hardware faults such as head crashes or controller failures. The important point from the perspective of DBMS reliability is that all or part of the database that is on the secondary storage is considered to be destroyed and inaccessible. 11/01/2017 18 Prof. Dhaval R. Chandarana

4 Communication Failures Communication failures, however, are unique to the distributed case. There are a number of types of communication failures. The most common ones are the errors in the messages , improperly ordered messages, lost (or undeliverable) messages , and communication line failures . Lost or undeliverable messages are typically the consequence of communication line failures or (destination) site failures. The detection will be facilitated by the use of timers and a timeout mechanism that keeps track of how long it has been since the sender site has not received a confirmation from the destination site about the receipt of a message. 11/01/2017 19 Prof. Dhaval R. Chandarana

Methods to control failure Failure affected transactions must be aborted. Site failure message is broadcasted to all sites. Checking must be done periodically to see whether the failed site has recovered or not. After restarting the failure site, site must initiate a recovery procedure to abort all partial transaction that were active at the time of failure. 11/01/2017 20 Prof. Dhaval R. Chandarana

Different techniques of recoverability The additional components and abnormal algorithm can be added to a system these components and algorithms attempt to ensure that occurrences of erroneous states do not result in later system failure ideally, they removes these errors and restore them to “correct” states from which normal processing can continue. These additional component and abnormal algorithm, called recovery technique. Backward and Forward Error Recovery Log Based Recovery Write-Ahead Logging Protocol 11/01/2017 21 Prof. Dhaval R. Chandarana

Backward and Forward Error Recovery Recovery Forward-error Backward-error Operational-base State-based 11/01/2017 22 Prof. Dhaval R. Chandarana

Prof. Dhaval R. Chandarana 23 Failure recovery: restore an erroneous state to an error-free state Approaches to failure recovery: Forward-error recovery: Remove errors in process/system state (if errors can be completely assessed) Continue process/system forward execution Backward-error recovery: Restore process/system to previous error-free state and restart from there Comparison: Forward vs. Backward error recovery Backward-error recovery (+) Simple to implement (+) Can be used as general recovery mechanism (-) Performance penalty (-) No guarantee that fault does not occur again (-) Some components cannot be recovered Forward-error Recovery (+) Less overhead (-) Limited use, i.e. only when impact of faults understood (-) Cannot be used as general mechanism for error recovery Backward and Forward Error Recovery 11/01/2017

Prof. Dhaval R. Chandarana 24 Principle: restore process/system to a known, error-free “recovery point”/ “checkpoint”. System model: Approaches: (1) Operation-based approach (2) State-based approach Backward-Error Recovery: Basic approach CPU Main memory secondary storage stable storage Storage that maintains information in the event of system failure Bring object to MM to be accessed Store logs and recovery points Write object back if modified 11/01/2017

Prof. Dhaval R. Chandarana 25 Principle: Record all changes made to state of process (‘audit trail’ or ‘log’) such that process can be returned to a previous state Example: A transaction based environment where transactions update a database It is possible to commit or undo updates on a per-transaction basis A commit indicates that the transaction on the object was successful and changes are permanent (1.a) Updating-in-place Principle: every update (write) operation to an object creates a log in stable storage that can be used to ‘undo’ and ‘redo’ the operation Log content: object name, old object state, new object state Implementation of a recoverable update operation: Do operation: update object and write log record Undo operation: log(old) -> object (undoes the action performed by a do ) Redo operation: log(new) -> object (redoes the action performed by a do ) Display operation: display log record (optional) Problem: a ‘do’ cannot be recovered if system crashes after write object but before log record write (1.b) The write-ahead log protocol Principle: write log record before updating object (1) The Operation-based Approach 11/01/2017

Prof. Dhaval R. Chandarana 26 Principle: establish frequent ‘recovery points’ or ‘checkpoints’ saving the entire state of process Actions: ‘Checkpointing’ or ‘taking a checkpoint’: saving process state ‘Rolling back’ a process: restoring a process to a prior state Note: A process should be rolled back to the most recent ‘recovery point’ to minimize the overhead and delays in the completion of the process Shadow Pages: Special case of state-based approach Only a part of the system state is saved to minimize recovery When an object is modified, page containing object is first copied on stable storage (shadow page) If process successfully commits: shadow page discarded and modified page is made part of the database If process fails: shadow page used and the modified page discarded (2) State-based Approach 11/01/2017

Recovery in concurrent systems Issue: if one of a set of cooperating processes fails and has to be rolled back to a recovery point, all processes it communicated with since the recovery point have to be rolled back. Conclusion: In concurrent and/or distributed systems all cooperating processes have to establish recovery points Orphan messages and the domino effect Case 1: failure of X after x 3 : no impact on Y or Z Case 2: failure of Y after sending msg. ‘m’ Y rolled back to y 2 ‘m’ ≡ orphan massage X rolled back to x 2 Case 3: failure of Z after z 2 Y has to roll back to y 1 X has to roll back to x 1 Domino Effect Z has to roll back to z 1 X Y Z y 1 x 1 z 1 z 2 x 2 y 2 x 3 m Time 11/01/2017 27 Prof. Dhaval R. Chandarana

Lost messages Assume that x 1 and y 1 are the only recovery points for processes X and Y, respectively Assume Y fails after receiving message ‘m’ Y rolled back to y 1 , X rolled back to x 1 Message ‘m’ is lost Note: there is no distinction between this case and the case where message ‘m’ is lost in communication channel and processes X and Y are in states x 1 and y 1 , respectively X Y y 1 x 1 m Time Failure 11/01/2017 28 Prof. Dhaval R. Chandarana

Log Based Recovery When failures occur the following operation that use the log are executed. UNDO: restore database to state prior to execution. REDO: perform the changes to the database over again. 11/01/2017 29 Prof. Dhaval R. Chandarana

UNDO 11/01/2017 30 Prof. Dhaval R. Chandarana

REDO 11/01/2017 31 Prof. Dhaval R. Chandarana

Write-Ahead Logging Protocol write-ahead logging ( WAL ) is a family of techniques for providing atomicity and durability in database systems . In a system using WAL, all modifications are written to a log before they are applied. Usually both redo and undo information is stored in the log . The purpose of this can be illustrated by an example. Imagine a program that is in the middle of performing some operation when the machine it is running on loses power. Upon restart, that program might well need to know whether the operation it was performing succeeded, half-succeeded, or failed. If a write-ahead log is used, the program can check this log and compare what it was supposed to be doing when it unexpectedly lost power to what was actually done. On the basis of this comparison, the program could decide to undo what it had started, complete what it had started, or keep things as they are. 11/01/2017 32 Prof. Dhaval R. Chandarana

Write-Ahead Logging Protocol 11/01/2017 33 Prof. Dhaval R. Chandarana

Advanced recovery techniques Shadow Paging Fuzzy checkpoint ARIES RAID levels 11/01/2017 34 Prof. Dhaval R. Chandarana

Shadow Paging It is inconvenient to maintain logs of all transactions for the purposes of recovery. An alternative is to use a system of shadow paging . This is where the database is divided into pages that may be stored in any order on the disk . In order to identify the location of any given page, we use something called a page table. 11/01/2017 35 Prof. Dhaval R. Chandarana

Shadow Paging During the life of a transaction two page tables are maintained, one called a shadow page table and current page table . When a transaction begins both of these page tables point to the same locations (are identical ). However during the lifetime of a transaction changes may be made update values etc. So whenever we update a page in the database we always write the updated page to a new location . This means that when we then update our current page table to reflect the changes that have been made. 11/01/2017 36 Prof. Dhaval R. Chandarana

Shadow Paging 11/01/2017 37 Prof. Dhaval R. Chandarana

Fuzzy checkpoint In a fuzzy checkpoint, the database server does not flush the modified pages in the shared-memory buffer pool to disk for certain types of operations, called fuzzy operations. When a fuzzy checkpoint completes, the pages might not be consistent with each other, because the database server does not flush all data pages to disk. A fuzzy checkpoint completes much more quickly than a full checkpoint and reduces the amount of physical logging during heavy update activity. When necessary, the database server performs a full checkpoint to ensure the physical consistency of all data on disk. 11/01/2017 38 Prof. Dhaval R. Chandarana

Fuzzy checkpoint Fuzzy Operations Inserts Updates Deletes Important: Fuzzy checkpoints are disabled for the primary and secondary servers in a High-Availability Data Replication pair. 11/01/2017 39 Prof. Dhaval R. Chandarana

ARIES In computer science, Algorithms for Recovery and Isolation Exploiting Semantics , or ARIES is a recovery algorithm designed to work with a no-force, steal database approach; it is used by IBM DB2, Microsoft SQL Server and many other database systems. 11/01/2017 40 Prof. Dhaval R. Chandarana

Principles lie behind ARIES Three main principles lie behind ARIES Write-ahead logging: Any change to an object is first recorded in the log, and the log must be written to stable storage before changes to the object are written to disk. Repeating history during Redo: On restart after a crash, ARIES retraces the actions of a database before the crash and brings the system back to the exact state that it was in before the crash. Then it undoes the transactions still active at crash time. Logging changes during Undo: Changes made to the database while undoing transactions are logged to ensure such an action isn't repeated in the event of repeated restarts. 11/01/2017 41 Prof. Dhaval R. Chandarana

ARIES LSN 11/01/2017 42 Prof. Dhaval R. Chandarana

RAID Level Redundant Array of Inexpensive Disks RAID Level Striping RAID Level 1 Mirroring and performance improvements RAID Level 2 Byte-level parity RAID Level 3 Block-level parity RAID Level 4 Rotating parity RAID Level 5 Tolerates failure of two disk drives 11/01/2017 43 Prof. Dhaval R. Chandarana

RAID Level-0 file data block 1 block 0 block 2 block 3 block 4 Disk 0 Disk 1 0 block 0 1 block 2 2 block 4 3 4 5 sectors 0 block 1 1 block 3 2 3 4 5 sectors 11/01/2017 44 Prof. Dhaval R. Chandarana

RAID Level-1 file data block 1 block 0 block 2 block 3 block 4 Disk 0 Disk 1 0 block 0 1 block 1 2 block 2 3 block 3 4 block 4 5 sectors 0 block 0 1 block 1 2 block 2 3 block 3 4 block 4 5 sectors 11/01/2017 45 Prof. Dhaval R. Chandarana

RAID Level - 2 RAID Level 2 uses concept of parallel access technique. It works on the word(byte) level. So each strip stores one bit. It takes data striping to the extreme, writing only 1 bit per strip, instead of in arbitrary size block. For this reason, it require a minimum, of 8 surface to write data to the hard disk. In RAID level 2, strip are very small, so when a block is read, all disks are accessed in parallel. Hamming code generation is time consuming, therefore RAID level 2 is too slow for most commercial application. 11/01/2017 46 Prof. Dhaval R. Chandarana

RAID Level - 2 Error Correction Code (ECC) 11/01/2017 47 Prof. Dhaval R. Chandarana

RAID Level - 3 11/01/2017 48 Prof. Dhaval R. Chandarana

RAID Level - 4 11/01/2017 49 Prof. Dhaval R. Chandarana

RAID Level - 5 11/01/2017 50 Prof. Dhaval R. Chandarana

RAID Level - 6 11/01/2017 51 Prof. Dhaval R. Chandarana

Two Phase Commit Protocols 11/01/2017 52 Prof. Dhaval R. Chandarana

State Transitions in 2PC Protocol 11/01/2017 53 Prof. Dhaval R. Chandarana

Three Phase Commit Protocols 11/01/2017 54 Prof. Dhaval R. Chandarana

State Transitions in 3PC Protocol 11/01/2017 55 Prof. Dhaval R. Chandarana