Concurrency control!

AshishK17 1,018 views 71 slides Jan 17, 2018
Slide 1
Slide 1 of 71
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71

About This Presentation

DBMS CONCURRENCY CONTROL


Slide Content

Concurrency Control Begin() read BAL add 10 write BAL Commit() Bal = 100 Bal = 70 Bal = 110 Bal = 100 Begin() read BAL Subtract 30 write BAL Commit()

Concurrency control ensures the consistency and reliability properties of transactions. 2

Concurrency Concurrency control is the problem of synchronizing concurrent transactions (i.e., order the operations of concurrent transactions) such that the following two properties are achieved: – the consistency of the DB is maintained – the maximum degree of concurrency of operations is achieved Obviously, the serial execution of a set of transaction achieves consistency, if each single transaction is consistent 3

Why Concurrency Control is needed The Lost Update Problem. This occurs when two transactions that access the same database items have their operations interleaved in a way that makes the value of some database item incorrect. The Temporary Update (or Dirty Read) Problem. This occurs when one transaction updates a database item and then the transaction fails for some reason. The updated item is accessed by another transaction before it is changed back to its original value. 4

Why Concurrency Control is needed The Incorrect Summary Problem . If one transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records, the aggregate function may calculate some values before they are updated and others after they are updated. 5

(a) The lost update problem: Example 6

(b) The temporary update problem: Example 7

(c) The incorrect summary problem: Example 8

Concurrency Control A database must provide a mechanism that will ensure that all possible schedules are either conflict or view serializable , and are recoverable and preferably cascadeless A policy in which only one transaction can execute at a time generates serial schedules, but provides a poor degree of concurrency Testing a schedule for serializability after it has executed is a little too late! Goal – to develop concurrency control protocols that will assure serializability . 9

Concurrency Control vs. Serializability Tests Concurrency-control protocols allow concurrent schedules, but ensure that the schedules are conflict/view serializable , and are recoverable and cascadeless . Concurrency control protocols generally do not examine the precedence graph as it is being created Instead a protocol imposes a discipline that avoids nonseralizable schedules. Different concurrency control protocols provide different tradeoffs between the amount of concurrency they allow and the amount of overhead that they incur. Tests for serializability help us understand why a concurrency control protocol is correct. 10

For locking-based methods, locking granularity , i.e. the size of the portion of database that a lock is applied, is an important issue. This portion is called locking unit ( lu ). Classification of concurrency control algorithms 11

12 Concurrency Control Techniques Categories of concurrency techniques Locking techniques Timestamp ordering techniques Multi-version concurrency control techniques Optimistic concurrency control techniques

Lock modes read lock ( rl ), also called shared lock write lock ( wl ), also called exclusive lock Compatibility of lock modes exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using lock-X instruction. shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction. Lock requests are made to concurrency-control manager. Transaction can proceed only after request is granted. Locking-based Concurrency Control Algorithms 13

Lock-compatibility matrix A transaction may be granted a lock on an item if the requested lock is compatible with locks already held on the item by other transactions Any number of transactions can hold shared locks on an item, but if any transaction holds an exclusive on the item no other transaction may hold any lock on the item. If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held by other transactions have been released. The lock is then granted. Locking-based Concurrency Control Algorithms 14

Lock-Based Protocols … Example of a transaction performing locking: T 2 : lock-S (A) ; read (A) ; unlock (A) ; lock-S (B) ; read (B) ; unlock (B) ; display (A+B) Locking as above is not sufficient to guarantee serializability — if A and B get updated in-between the read of A and B , the displayed sum would be wrong. A locking protocol is a set of rules followed by all transactions while requesting and releasing locks. Locking protocols restrict the set of possible schedules. 15

16

Basic LM Algorithm Problem Early release of data Example (next page): 17

Locking alone don’t insure serializability ! 18 Initial: x = 50 y = 20 Result of above scheduling: x = 102 y = 39 Result of schedule {T1,T2}: x = 102 y = 38 Result of schedule {T2,T1}: x = 101 y = 39 Basic LM algorithm may generate non- serializable schedule!

The Two-Phase Locking Protocol This is a protocol which ensures conflict- serializable schedules. Phase 1: Growing Phase transaction may obtain locks transaction may not release locks Phase 2: Shrinking Phase transaction may release locks transaction may not obtain locks The protocol assures serializability . It can be proved that the transactions can be serialized in the order of their lock points (i.e. the point where a transaction acquired its final lock). 19

The Two-Phase Locking Protocol … 20

The Two-Phase Locking Protocol … Properties of the 2PL protocol – Generates conflict- serializable schedules – But schedules may cause cascading aborts If a transaction aborts after it releases a lock, it may cause other transactions that have accessed the unlocked data item to abort as well Basic 2PL All lock operations before the first unlock operation Conservative 2PL or static 2PL Lock all the items it accesses before the transaction begins execution . Deadlock free protocol Read-set and write-set of the transaction should be known 21

22 Strict 2PL locking protocol – Holds the exclusive locks till the end of the transaction Rigorous 2PL – Holds all locks till the end of the transaction – Cascading aborts are avoided Conversion of locks Upgrade the lock – upgrade from shared to exclusive. Downgrade the lock – downgrade from exclusive to shared. The Two-Phase Locking Protocol …

The Two-Phase Locking Protocol … Strict and Rigorous 2PL locking protocol 23

Pitfalls of Lock-Based Protocols Consider the partial schedule Neither T 3 nor T 4 can make progress — executing lock-S (B) causes T 4 to wait for T 3 to release its lock on B , while executing lock-X (A) causes T 3 to wait for T 4 to release its lock on A . Such a situation is called a deadlock . To handle a deadlock one of T 3 or T 4 must be rolled back and its locks released. 24

Multiple Granularity Allow data items to be of various sizes and define a hierarchy of data granularities, where the small granularities are nested within larger ones Can be represented graphically as a tree (but don't confuse with tree-locking protocol) When a transaction locks a node in the tree explicitly , it implicitly locks all the node's descendents in the same mode. Granularity of locking (level in tree where locking is done) fine granularity (lower in tree) : high concurrency, high locking overhead coarse granularity (higher in tree) : low locking overhead, low concurrency

Example of Granularity Hierarchy The levels, starting from the coarsest (top) level are database area file record

Intention Lock Modes In addition to S and X lock modes, there are three additional lock modes with multiple granularity: intention-shared (IS): indicates explicit locking at a lower level of the tree but only with shared locks. intention - exclusive (IX): indicates explicit locking at a lower level with exclusive or shared locks shared and intention - exclusive (SIX): the subtree rooted by that node is locked explicitly in shared mode and explicit locking is being done at a lower level with exclusive-mode locks. intention locks allow a higher level node to be locked in S or X mode without having to check all descendent nodes.

Compatibility Matrix with Intention Lock Modes The compatibility matrix for all lock modes is: IS IX S S IX X IS IX S S IX X                         

Multiple Granularity Locking Scheme Transaction T i can lock a node Q , using the following rules: The lock compatibility matrix must be observed. The root of the tree must be locked first, and may be locked in any mode. A node Q can be locked by T i in S or IS mode only if the parent of Q is currently locked by T i in either IX or IS mode. A node Q can be locked by T i in X, SIX, or IX mode only if the parent of Q is currently locked by T i in either IX or SIX mode. T i can lock a node only if it has not previously unlocked any node (that is, T i is two-phase). T i can unlock a node Q only if none of the children of Q are currently locked by T i . Observe that locks are acquired in root-to-leaf order, whereas they are released in leaf-to-root order.

Timestamp-based Concurrency Control Algorithms 30

Tiemstamp -based Concurrency Control Algorithms Lock method maintains serializability by mutual exclusion . Timestamp method maintains serializability by assigning a unique timestamp to every transaction and executing transactions accordingly. 31

What is a timestamp? An identifier for transaction Used to permit ordering Monotonicity – timestamps generated by the same TM monotonically increase in value. 32

Basic TO Algorithm Each transaction is issued a timestamp when it enters the system. If an old transaction T i has time-stamp TS( T i ), a new transaction T j is assigned time-stamp TS( T j ) such that TS( T i ) <TS( T j ). The protocol manages concurrent execution such that the time-stamps determine the serializability order. The protocol maintains for each data Q two timestamp W-timestamp ( Q ) or WTS(Q) is the largest time-stamp of any transaction that executed write ( Q ) successfully. R-timestamp ( Q ) or RTS(Q) is the largest time-stamp of any transaction that executed read ( Q ) successfully. 33

Suppose a transaction T i issues a read ( Q ) If TS( T i )  W -timestamp( Q ), then Hence, the read operation is rejected, and T i is rolled back. If TS( T i )  W -timestamp( Q ), then the read operation is executed, and RTS(Q ) = MAXIMUM( RTS(Q), TS(Ti )). Basic TO Algorithm … 34

Suppose that transaction T i issues write ( Q ). If TS( T i ) < R-timestamp( Q ), then Hence, the write operation is rejected, and T i is rolled back. If TS( T i ) < W-timestamp( Q ), then Hence, this write operation is rejected, and T i is rolled back. Otherwise, the write operation is executed, and W-timestamp( Q ) is set to TS( T i ). Basic TO Algorithm … 35

Timestamp Ordering … Schedules generated by the basic TO protocol have the following properties: Serializable Since transactions never wait (but are rejected), the schedules are deadlock-free The price to pay for deadlock-free schedules is the potential restart of a transaction several times 36

Example Use of the Protocol A partial schedule for several data items for transactions with timestamps 1, 2, 3, 4, 5 T 1 T 2 T 3 T 4 T 5 read( Y ) read( X ) read( Y ) write( Y ) write( Z ) read( Z ) read( X ) read( X ) write( Z ) abort write( Y ) write( Z )

Correctness of Timestamp-Ordering Protocol ADVANTAGES: The timestamp-ordering protocol guarantees serializability since all the arcs in the precedence graph are of the form: Thus, there will be no cycles in the precedence graph Timestamp protocol ensures freedom from deadlock as no transaction ever waits. DISADVANTAGES But the schedule may not be cascade-free , and may not even be recoverable . transaction with smaller timestamp transaction with larger timestamp

Thomas’ Write Rule Modified version of the timestamp-ordering protocol in which obsolete write operations may be ignored under certain circumstances. When T i attempts to write data item Q , if TS(T i ) < WTS(Q) , then this { write } operation can be ignored . Otherwise this protocol is the same as the timestamp ordering protocol. Thomas' Write Rule allows greater potential concurrency. Allows some view- serializable schedules that are not conflict- serializable .

Slide 18- 40 Thomas’s Write Rule Suppose that transaction T i issues write ( Q ). If TS( T i ) < R-timestamp( Q ), then Hence, the write operation is rejected, and T i is rolled back. If TS( T i ) < W-timestamp( Q ), then Hence, this write operation is ignored. Otherwise, the write operation is executed, and W-timestamp( Q ) is set to TS( T i ).

Optimistic Concurrency Control Algorithms 41

Optimistic Concurrency Control Algorithms Pessimistic algorithms assume conflicts happen quite often . Optimistic algorithms delay the validation phase until write phase. 42

Concepts Execution of transaction T i is done in three phases. 1. Read and execution phase : Transaction T i writes only to temporary local variables 2. Validation phase : Transaction T i performs a ‘ validation test’ to determine if local variables can be written without violating serializability . 3. Write phase : If T i is validated, the updates are applied to the database; otherwise, T i is rolled back. 43

Validation-Based Protocol (Cont.) Each transaction T i has 3 timestamps Start(T i ) : the time when T i started its execution Validation(T i ): the time when T i entered its validation phase Finish(T i ) : the time when T i finished its write phase Serializability order is determined by timestamp given at validation time, to increase concurrency. Thus TS(T i ) is given the value of Validation(T i ). This protocol is useful and gives greater degree of concurrency if probability of conflicts is low. because the serializability order is not pre-decided, and relatively few transactions will have to be rolled back.

Validation Test for Transaction Tj If for all T i with TS ( T i ) < TS ( T j ) either one of the following condition holds: finish ( T i ) < start ( T j ) start ( T j ) < finish ( T i ) < validation ( T j ) and the set of data items written by T i does not intersect with the set of data items read by T j . then validation succeeds and T j can be committed. Otherwise, validation fails and T j is aborted. Justification : Either the first condition is satisfied, and there is no overlapped execution, or the second condition is satisfied and the writes of T j do not affect reads of T i since they occur after T i has finished its reads. the writes of T i do not affect reads of T j since T j does not read any item written by T i .

Validation of T i – Rule 1 If all T k with ts ( T k ) < ts ( T i ) have completed before T i has started, then the validation succeeds. This is a serial execution order. 46

Validation of T i – Rule 2 If there is any transaction T k , such that ts ( T k ) < ts ( T i ) T k is in write phase T i is in read phase, and WS( T k )  RS( T i ) =  then the validation succeeds. - None of the data items updated by Tk are read by Ti; - Updates of Ti will not be overwritten by the updates of Tk Read and write phases overlap, but T i does not read data items written by T k 47

Validation of T i – Rule 3 If there is any transaction T k , such that ts ( T k ) < ts ( T i ) T k completes the read phase before T i completes the read phase WS( T k )  RS( T i ) =  , and WS( T k )  W S( T i ) =  then the validation succeeds. - Update of Tk will not affect the read phase or write phase of Ti. They overlap, but does not access any common data items. 48

Advantages and Disadvantages Advantages of optimistic algorithm higher concurrency Disadvantages higher storage cost for ordering of transactions 49

Deadlock Management 50

Deadlock Management Deadlock: A set of transactions is in a deadlock situation if several transactions wait for each other. A deadlock requires an outside intervention to take place. Any locking-based concurrency control algorithm may result in a deadlock, since there is mutual exclusive access to data items and transactions may wait for a lock Some TO-based algorithms that require the waiting of transactions may also cause deadlocks 51

Deadlock : Example Consider the following two transactions: T 1 : write ( X ) T 2 : write( Y ) write( Y ) write( X ) Schedule with deadlock T 1 T 2 lock-X on X write ( X ) lock-X on Y write ( X ) wait for lock-X on X wait for lock-X on Y 52

Deadlock Management … There are following principal methods for dealing with deadlock problem: Ignore ➠ Let the application programmer deal with it, or restart the system Prevention ➠ Guaranteeing that deadlocks can never occur in the first place. Check transaction when it is initiated. Requires no run time support. Avoidance ➠ Detecting potential deadlocks in advance and taking action to insure that deadlock will not occur. Requires run time support. Detection and Recovery ➠ Allowing deadlocks to form and then finding and breaking them. As in the avoidance scheme, this requires run time support. 53

Deadlock prevention Deadlock prevention: Guarantee that deadlocks never occur – Check transaction when it is initiated, and start it only if all required resources are available. – All resources which may be needed by a transaction must be predeclared Advantages – No transaction rollback or restart is involved – Requires no run-time support Disadvantages – Reduced concurrency due to pre-allocation – Evaluating whether an allocation is safe leads to added overhead – Difficult to determine in advance the required resources 54

Deadlock Avoidance Transactions are not required to request resources a priori. Transactions are allowed to proceed unless a requested resource is unavailable. In case of conflict, transactions may be allowed to wait for a fixed time interval. Order either the data items or the sites and always request locks in that order. More attractive than prevention in a database environment. 55

Deadlock Prevention… If Ti requests a lock on a data item which is already locked by Tj WAIT-DIE Rule: IF TS(Ti) < TS( Tj ) then Ti ( Ti older than Tj ) is permitted to wait . else Ti ( Ti younger than Tj ) is aborted ( dies ) and restarted with the same timestamp. ➠ non-preemptive: Ti never preempts Tj . ➠ older transaction waits. WOUND-WAIT Rule: IF TS(Ti) < TS( Tj ) then (Ti is older than Tj ) Tj is aborted (Ti wound Tj ) and restarted with the same timestamp. else (Ti younger than Tj ) Ti is permitted to wait ➠ preemptive: Ti preempts Tj if it is younger. ➠ Younger transaction waits. 56 Lock Granted to T i

57 start start attempt Tx Ty TyLa TxLa +--------+--------+--------+--------+--------+--------+ 1 2 3 4 5 6 7... time -> Tx starts at time 1 so TS( Tx ) = 1 Ty starts at time 2 so TS(Ty) = 2 Ty has already locked item A and now Tx attempts to lock item A. Under Wait-Die : TS( Tx ) < TS(Ty) so Tx will wait for Ty to release the lock. In other words, since Tx started first, it gets the privilege of waiting. Under Wound Wait : TS( Tx ) < TS(Ty) so Ty will be aborted. In other words, since Tx started first, it has the privilege of kicking Ty out of the way by aborting it. So now Tx can proceed.

58 Following schemes use transaction timestamps for the sake of deadlock prevention alone Wait-die Scheme (Non-preemptive) older transaction may wait for younger one to release data item younger transactions never wait for older ones; they are rolled back instead a transaction may die several times before acquiring needed data item Wound-wait Scheme (Preemptive) older transaction wounds (forces rollback) of younger transaction instead of waiting for it younger transactions may wait for older ones may be fewer rollbacks than wait-die scheme Transactions are not assigned new timestamps Avoid starvation, but may cause unnecessary rollbacks

Deadlock Detection A Wait-for Graph (WFG ) is a useful tool to identify deadlocks – The nodes represent transactions – An edge from Ti to Tj indicates that Ti is waiting for Tj – If the WFG has a cycle, we have a deadlock situation 59

Deadlock Detection Deadlocks can be described as a wait-for graph , which consists of a pair G = ( V , E ), V is a set of vertices (all the transactions in the system) E is a set of edges; each element is an ordered pair T i  T j . If T i  T j is in E , then there is a directed edge from T i to T j , implying that T i is waiting for T j to release a data item . When T i requests a data item currently being held by T j , then the edge T i T j is inserted in the wait-for graph. This edge is removed only when T j is no longer holding a data item needed by T i . The system is in a deadlock state if and only if the wait-for graph has a cycle . Must invoke a deadlock-detection algorithm periodically to look for cycles.

Deadlock Detection (Cont.) Wait-for graph without a cycle Wait-for graph with a cycle

Deadlock Detection … Advantages – Allows maximal concurrency – The most popular and best-studied method Disadvantages – Considerable amount of work might be undone 62

Recovery from Deadlock Choose a victim: A transaction which can be aborted and that causes least loss of work performed can be chosen as a victim. Abort: The victim which was chosen is aborted. 63

Deadlock Recovery When deadlock is detected : Some transaction will have to rolled back (made a victim ) to break deadlock. Select that transaction as victim that will incur minimum cost . performed less computations used less data items will cause less cascading rollbacks Rollback -- determine how far to roll back transaction Total rollback : Abort the transaction and then restart it. Partial rollback : roll back transaction only as far as necessary to break deadlock. Starvation happens if same transaction is always chosen as victim . Include the number of rollbacks in the cost factor to avoid starvation

MULTIVERSION 65

Multiversion Schemes Multiversion schemes keep old versions of data item to increase concurrency. Multiversion Timestamp Ordering Multiversion Two-Phase Locking Each successful write results in the creation of a new version of the data item written. Use timestamps to label versions. When a read ( Q ) operation is issued, select an appropriate version of Q based on the timestamp of the transaction, and return the value of the selected version. read s never have to wait as an appropriate version is returned immediately.

Multiversion Timestamp Ordering Each data item Q has a sequence of versions < Q 1 , Q 2 ,...., Q m >. Each version Q k contains three data fields: Content -- the value of version Q k . W-timestamp ( Q k ) -- timestamp of the transaction that created (wrote) version Q k R-timestamp ( Q k ) -- largest timestamp of a transaction that successfully read version Q k when a transaction T i creates a new version Q k of Q , Q k 's W-timestamp and R-timestamp are initialized to TS( T i ). R-timestamp of Q k is updated whenever a transaction T j reads Q k , and TS( T j ) > R-timestamp( Q k ).

Multiversion Timestamp Ordering (Cont) The multiversion timestamp scheme presented next ensures serializability . Suppose that transaction T i issues a read ( Q ) or write ( Q ) operation. Let Q k denote the version of Q whose write timestamp is the largest write timestamp less than or equal to TS( T i ). 1. If transaction T i issues a read ( Q ), then the value returned is the content of version Q k . 2. If transaction T i issues a write ( Q ), and if TS( T i ) < R- timestamp( Q k ), then transaction T i is rolled back. Otherwise, if TS( T i ) = W-timestamp( Q k ), the contents of Q k are overwritten, otherwise a new version of Q is created. Reads always succeed; a write by T i is rejected if some other transaction T j that (in the serialization order defined by the timestamp values) should read T i 's write, has already read a version created by a transaction older than T i .

Multiversion Two-Phase Locking Differentiates between read-only transactions and update transactions Update transactions acquire read and write locks, and hold all locks up to the end of the transaction. That is, update transactions follow rigorous two-phase locking. Each successful write results in the creation of a new version of the data item written. each version of a data item has a single timestamp whose value is obtained from a counter ts -counter that is incremented during commit processing. Read-only transactions are assigned a timestamp by reading the current value of ts -counter before they start execution; they follow the multiversion timestamp-ordering protocol for performing reads.

Multiversion Two-Phase Locking (Cont.) When an update transaction wants to read a data item, it obtains a shared lock on it, and reads the latest version. When it wants to write an item, it obtains X lock on; it then creates a new version of the item and sets this version's timestamp to  . When update transaction T i completes, commit processing occurs: T i sets timestamp on the versions it has created to ts -counter + 1 T i increments ts -counter by 1 Read-only transactions that start after T i increments ts -counter will see the values updated by T i . Read-only transactions that start before T i increments the ts -counter will see the value before the updates by T i . Only serializable schedules are produced.

Any Questions ? 71