Deadlock in Distributed Systems

PritomSahaAkash 15,783 views 22 slides Nov 07, 2016
Slide 1
Slide 1 of 22
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22

About This Presentation

Here Deadlock in distributed systems is explained.


Slide Content

IIT, University of Dhaka
Deadlock in
Distributed Systems
Course Name: Distributed Systems
Course Code: CSE 601

Submitted To
Dr. Kazi Muheymin-Us-Sakib
Professor, IIT, University of Dhaka
Submitted By
Pritom Saha Akash
BSSE 0604
11-7-2016

Introduction
A​deadlockisaconditioninasystemwhereasetofprocesses(orthreads)haverequestsfor
resourcesthatcanneverbesatisfied.Essentially,aprocesscannotproceedbecauseitneedsto
obtainaresourceheldbyanotherprocessbutititselfisholdingaresourcethattheotherprocess
needs.Moreformally,Coffmandefinedfourconditionshavetobemetforadeadlocktooccurin
a system:
1.Mutual exclusion​ A resource can be held by at most one process.
2.Hold and wait​ Processes that already hold resources can wait for another resource.
3.Non-preemptionAresource,oncegranted,cannotbetakenawayunlessitreleases
voluntarily.
4.CircularwaitTwoormoreprocessesmustformacircularchaininwhicheachprocess
is waiting for resource held that is held by the next number of the chain.
Allfourconditionsmustholdsimultaneouslyforadeadlocktooccur.Ifanyofthemisabsent,
no deadlock can occur.
Adirectedgraphmodelusedtorecordtheresourceallocationstateofasystem.Thisstate
consists of ​n ​ processes, P​
1​
… P​
n​
, and ​m ​ resources, R​
1​
… R​
m​
. In such a graph:
P​
1​
→ R​
1​
means that resource R​
1​
is allocated to process P​
1​
.
P​
1​
← R​
1​
means that resource R​
1​
is requested by process P​
1​
.
Deadlock is present when the graph has a directed cycle.
Resource Allocation Graph
Insomecases,deadlockscanbeunderstoodmoreclearlythroughtheuseof
Resource-Allocation Graphs​, having the following properties:
●Asetofresourcecategories,{R1,R2,R3,...,RN},whichappearassquarenodesonthe
graph.Dotsinsidetheresourcenodesindicatespecificinstancesoftheresource.(E.g.
two dots might represent two laser printers.)
●A set of processes, {P1, P2, P3, . . ., PN}
●RequestEdges-​AsetofdirectedarcsfromPitoRj,indicatingthatprocessPihas
requested Rj, and is currently waiting for that resource to become available.
●AssignmentEdges-​AsetofdirectedarcsfromRjtoPiindicatingthatresourceRjhas
been allocated to process Pi, and that Pi is currently holding resource Rj.
1​ | ​Page

●Notethata​requestedgecanbeconvertedintoan​assignmentedgebyreversingthe
directionofthearcwhentherequestisgranted.(Howevernotealsothatrequestedges
pointtothecategorybox,whereasassignmentedgesemanatefromaparticularinstance
dot within the box.)
●For example:


Figure 1:​ Resource allocation graph
●Ifaresource-allocationgraphcontainsnocycles,thenthesystemisnotdeadlocked.
(Whenlookingforcycles,rememberthattheseare​directedgraphs.)Seetheexamplein
Figure 2 above.
●Ifaresource-allocationgraphdoescontaincycles​ANDeachresourcecategorycontains
only a single instance, then a deadlock exists.
●Ifaresourcecategorycontainsmorethanoneinstance,thenthepresenceofacycleinthe
resource-allocationgraphindicatesthe​possibilityofadeadlock,butdoesnotguarantee
one.Inthiscase,asufficientconditionfordeadlock.Consider,forexample,Figures3
and 4 below:
2​ | ​Page

Figure 2:​ Resource allocation graph with a cycle but no deadlock

Figure 3​ - Resource allocation graph with a knot with deadlock
[​Knot:​subgraphsuchthatstartingfromanynodeinthesubgraphitisimpossibletoleavethe
knot following the edges of the graph.]
3​ | ​Page

Intermsoftheresourceallocationgraph,thenecessaryandsufficientconditionsfordeadlock
can be summarized as follows:
▪A cycle is a necessary condition for deadlock
▪Ifthereisonlysingleunitofeachresourcetypeinvolvedinthecycle,acycleisboth
a necessary and a sufficient condition for a deadlock to exists.
▪Ifoneormoreoftheresourcetypeinvolvedinthecyclehavemorethanoneunita
knot is a sufficient condition for deadlock.
Wait-for Graph
Whenalltheresourcetypeshaveonlyasingleuniteach,asimplifiedformofresourceallocation
graphisused.Thesimplifiedgraphisobtainedfromtheoriginalresourceallocationgraphby
removingtheresourcenodesandcollapsingtheappropriateedges.Thissimplificationisbased
ontheobservationthataresourcecanalwaysbeidentifiedbyitsownerprocess.Therearea
resource allocation graph and its corresponding wait-for graph below:

Figure 4: (a) resource allocation graph (b) corresponding wait for graph


4​ | ​Page

Handling Deadlocks in Distributed Systems
Thesameconditionsfordeadlockinuniprocessorsapplytodistributedsystems.Unfortunately,
asinmanyotheraspectsofdistributedsystems,theyarehardertodetect,avoid,andprevent.
Four strategies can be used to handle deadlock:
1.Ignorance​:ignoretheproblem;assumethatadeadlockwillneveroccur.Thisisa
surprisingly common approach.
2.Avoidance​:chooseresourceallocationcarefullysothatdeadlockwillnotoccur.
Resourcerequestscanbehonoredaslongasthesystemremainsinasafe(non-deadlock)
state after resources are allocated.
3.Prevention​:makeadeadlockimpossiblebygrantingrequestssothatoneofthe
necessary conditions for deadlock does not hold.
4.Detectionandrecovery​:letadeadlockoccur,detectit,andthendealwithitbyaborting
and later restarting a process that causes deadlock.

Deadlock Avoidance
Deadlockavoidancemerelyworksto​avoiddeadlock;itdoesnottotallypreventit.Thebasic
ideahereistoallocateresourcesonlyiftheresultingglobalstateisasafestate.Inotherwords,
unsafestatesareavoided,meaningthatdeadlockisavoidedaswell.Onefamousalgorithmfor
deadlockavoidanceintheuniprocessorcaseistheBanker'sAlgorithm.Similaralgorithmshave
beenattemptedforthedistributedcase.Deadlockavoidancealgorithmsareusuallyinthe
following steps:

1.Whenaprocessrequestsforaresource,iftheresourceisavailableforallocationitisnot
immediatelyallocatedtotheprocessratherthesystemassumesthattherequestis
granted.
2.Usingadvanceknowledgeofresourceusageofprocessesandassumptionsofstep1the
system analysis whether granting the process request is safe or unsafe.
3.Theresourceisallocatedtotheprocessonlyiftheanalysisofstep2showsthatitissafe
to do so otherwise the request is deferred.
●Itisimportanttolookatthenotionofsafetyinresourceallocationbecauseallthe
algorithmsfordeadlockavoidancearebasedontheconceptofsafeandunsafe
states.
5​ | ​Page

●Asystemissaidtobeinsafestateifitisnotinadeadlockstateandthereexists
someorderingoftheprocessinwhichtherequestsaregrantedtorunallofthem
to completion.
●Anyorderingoftheprocessthatcanguaranteethecompletionofalltheprocesses
is called safe sequence.
●Theformationofsafesequenceshouldsatisfytheconditionthatforanyprocess
Piinasafesequence,theresourcethatPicanstillrequestandcanbesatisfiedby
currentlyavailableresourcesplustheresourcesheldbyallprocesseslyingbefore
Pi in the safe sequence.
●ThisconditionguaranteesthattheprocessPicanberuntocompletionifthe
resourcethatPineedsarenotimmediatelyavailableandPicanwaituntilallthe
processesinthesequencelyingbeforePihavefinished.Whentheyhavefinished
Pi can obtain all its needed resource and run to completion.
●A system is said to be unsafe if no safe sequence exists for that state.
●Forexample:Assumeasystemhaving8unitsofaparticularresourcetypefor
whichthreeprocessesP1,P2andP3arecompetingandthemaximumunitsofthe
resource required by them are 4, 5, and 6 respectively.
●Eachofthethreeresourcesisholding2unitsoftheresourcethereforeinthe
currentstateofthesystem2unitsofresourcearefree.Theaimistofindwhether
statinfig(a)issafeorunsafe.Thefigureshowsthatthestateissafebecause
sequenceofallocationstoallowprocesscompletionexistsandthesafesequences
are (P1, P2, P3) and (P1, P3, P2).
●Infig(a)theschedulercouldsimplerunP1untilitaskedforandgottwomore
unitsoftheresourcethatarecurrentlyfreeleadingtostateoffig(b).WhenP1
completesandreleasestheresourceheldbyitthestateinfig(c)isachieved.Then
theschedulerchoosestorunP2whichleadstothenextstateoffig(d).WhenP2
completesandreleasesresourcesheldbyitthesystementersthestateoffig(e).
P3 can now run to completion with the available resources.
6​ | ​Page

Figure 5:​ Demonstrate that the state (a) is a safe state and has two safe sequence
●Theinitialstateoffig(a)issafestatebecausethesystemcanavoiddeadlockby
carefulscheduling.Ifresourceallocationisnotdonecarefullythesystemcan
move from a safe state to unsafe state.
●Deadlockavoidancealgorithmbasicallyperformresourceallocationissucha
manner that ensures the system will always remain in safe state.
●Sinceinitialstateisalwayssafestate,whenaprocessrequestsaresourcethatis
available,thesystemchecksifallocationcausestheshiftfromsafetounsafe
state. If no, then the request is granted or else it is deferred.


7​ | ​Page

Deadlock Prevention
Thisapproachisbasedontheideaofdesigningthesysteminsuchawaythatthedeadlock
becomeimpossible.Itdiffersfromavoidanceanddetectioninthatnoruntimetestingofpotential
allocationsneedbeperformed.Wesawthatmutualexclusion,holdandwait,no-preemptionand
circularwaitarethefournecessaryconditionsfordeadlocktooccurinasystem.Therefore,ifwe
cansomehowensurethatatleastoneoftheseconditionsisneversatisfied,deadlockwillbe
impossible.Basedonthisidea,therearethreeimportantdeadlock-preventionmethods-
collective requests, ordered requests and preemption.
Collective Requests
Thesemethodsdenytheholdandwaitconditionbyensuringthatwheneveraprocessrequestsa
resource it does not hold any other resource. Various resource allocation policies can be used.
Forinstance,inordertopreventthe​Hold&Waitconditionfromhappeningoneofthefollowing
resource allocation policies may be used:

1.Aprocessmustrequestallofitsresourcesbeforeitbeginsexecution.Ifalltheneeded
resourcesareavailable,theyareallocatedtotheprocesssothattheprocesscanrunto
completion.Ifoneormoreoftherequestedresourcesarenotavailable,nonewillbe
allocated and the process would just wait.
2.Insteadofrequiringalltheresourcesbeforeitbeginsexecution,aprocessmayrequest
resourcesduringitsexecutionifitobeystherulethatitrequestsresourcesonlywhenit
holdsnootherresources.Iftheprocessisholdingsomeresources,itcanadheretothe
rule by first releasing all of them and re-requesting all the necessary resources.

The second policy has the following advantages over the first one:

1.Inpractice,manyprocessdonotknowhowmanyresourcestheywillneeduntilthey
have started running. For such cases, the second approach is more useful.
2.Alongprocessmayrequiresomeresourcesonlytowardstheendofitsexecution.Inthe
firstpolicy,theprocesswillunnecessarilyholdtheseresourcesfortheentiredurationof
itsexecution.Inthesecondpolicy,theprocesscanrequestforthoseresourcesonlywhen
it needs them.



8​ | ​Page

The collective requests method is simple and effective but has the following problems:

●This degrades resource utilization.
●It also leads to starvation for those processes with large resource needs.
●Themethodalsorisesanaccountingquestion.Whenaprocessholdsresourcesfor
extendedperiodsduringwhichtheyarenotneeded,itisnotclearwhoshouldpaythe
charge for the idled resources.

Ordered Requests
Inthismethodcircular-waitisdeniedsuchthateachresourcetypeisassignedauniqueglobal
numbertoimposetotalorderingofallresourcetypes.Nowaresourceallocationpolicyisused
accordingtowhichaprocesscanonlyrequestresourcesinclasseshigherthantheclassesit
currentlyholds​.​Thatis,ifaprocessholdsaresourcetypewhosenumberis​i, ​itmayrequesta
resource type having a number ​j ​ only if ​j>i ​.
Notethatthisalgorithmdoesnotrequirethataprocessmustacquireallitsresourcesinstrictly
increasingsequence.Forinstance,aprocessholdingtworesourceshavingnumbers3and7may
releasetheresourcehavingnumber7beforerequestingtheresourcehavingnumber5.Thisisall
allowedbecausewhentheprocessrequestsfortheresourcehavingnumber5,itisnotholding
any resource having number larger than 5.
This method may face the following problems:
●Thenaturalorderingisnotalwayssameforalljobs.Therefore,ajobthatmatchesthe
decidedorderingcanbeexpectedtousetheresourcesefficientlybuttheotherswould
waste the resources.
●Oncetheorderinghasbeendecided,itwillstayforalongtimebecausetheorderingis
codedintoprograms.Reorderingwillrequirereprogrammingofseveraljobs.However,
reordering may become inevitable when new resources are added.
Despitethesedifficulties,themethodoforderedrequestsisoneofthemostefficientforhandling
deadlocks.

Preemption
A​preemptableresourceisonewhosestatecaneasilybesavedandrestoredlater.Sucharesource
canbetemporarilytakenawayfromtheprocesstowhichitiscurrentlyallocatedwithoutcausing
9​ | ​Page

anyharmtothecomputationperformedsofarbytheprocess.TheCPUregisterandmain
memoryaretheexamplesofpreemptableresources.Iftheresourcesarepreemptable,deadlocks
canbepreventedbyusingoneofthefollowingresourceallocationpolicythatdenythe
no-preemption condition:
1.Whenaprocessrequestsforaresourcethatisnotcurrentlyavailable,alltheresources
heldbytheprocessaretakenaway(preempted)fromitandtheprocessisblocked.The
processisunblockedwhentheresourcerequestedbyitandtheresourcespreempted
from it become available and can be allocated to it.
2.Whenaprocessrequestsforaresourcethatisnotcurrentlyavailable,thesystemchecks
iftherequestedresourceiscurrentlyheldaprocessthatisblocked,waitingforsome
otherresources.Ifso,therequestedresourceistakenaway(preempted)fromthewaiting
processandgiventotherequestingprocess.Otherwise,therequestingprocessisblocked
andwaitsfortheresourcetobecomeavailable.Someoftheresourcesthattheprocessis
alreadyholdingmaybetakenawayfromitwhileitisblocked,waitingfortherequested
resource.Theprocessisunblockedwhentherequestedresourceandanyotherresources
are preempted from it become available.

Inthetransactionbaseddeadlockpreventionmethod,eachtransactionisassignedaunique
prioritynumberbythesystemandwhentwoormoretransactioncompeteforthesameresource,
theirprioritynumbersareusedtobreakthetie.Forexample,Lamport’salgorithmmaybeused
togenerateasystemwidegloballyuniquetimestampwhenitiscreated.Atransaction
timestampmaybeusedasitsprioritynumber;atransactionhavingalowervalueoftimestamp
may have higher priority because it is older.
Rosencrantzetal.proposedthefollowingdeadlockpreventionschemesbasedontheabove
ideas:
Wait-Die Scheme
Inthisscheme,ifatransactionrequeststolockaresource(dataitem),whichisalreadyheldwith
a conflicting lock by another transaction, then one of the two possibilities may occur −
●IfTS(T​
i​
)<TS(T​
j​
)−thatisT​
i​
,whichisrequestingaconflictinglock,isolderthanT​
j

then T​
i​
is allowed to wait until the data-item is available.
●IfTS(T​
i​
)>TS(T​
j​
)−thatisT​
i
isyoungerthanT​
j
−thenT​
i
dies.T​
i
isrestartedlaterwitha
random delay but with the same timestamp.
10​ | ​Page

This scheme allows the older transaction to wait but kills the younger one.

Figure 6: ​Wait-Die Scheme

Wound- Wait Scheme
Inthisscheme,ifatransactionrequeststolockaresource(dataitem),whichisalreadyheldwith
conflicting lock by some other transaction, one of the two possibilities may occur:
●IfTS(T​
i​
)<TS(T​
j​
),thenT​
i
forcesT​
j
toberolledback−thatisT​
i
woundsT​
j​
.T​
j
isrestarted
later with a random delay but with the same timestamp.
●If TS(T​
i​
) > TS(T​
j​
), then T​
i​
is forced to wait until the resource is available.
Thisscheme,allowstheyoungertransactiontowait;butwhenanoldertransactionrequestsan
itemheldbyayoungerone,theoldertransactionforcestheyoungeronetoabortandreleasethe
item.

Figure 7: ​Wound-Wait Scheme
In both the cases, the transaction that enters the system at a later stage is aborted.
11​ | ​Page

Deadlock detection
●Inthisapproachfordeadlockdetection,thesystemdoesnotmakeanyattempttoprevent
deadlockbutallowsprocessestorequestresourcesandwaitforeachotherin
uncontrolled manner.
●Deadlock detection algorithms are same in centralized and distributed systems.
●DeadlockdetectionalgorithmsgetsimplifiedbymaintainingWait-for-graph(WFG)and
searching for cycles. The different approaches for deadlock detection are:
1. Centralized Approach for Deadlock Detection
InthisapproachalocalcoordinatorateachsitemaintainsaWFGforitslocalresourcesanda
centralcoordinatorforconstructingtheunionofalltheindividualWFGs.Thecentral
coordinatorconstructstheglobalWFGfromtheinformationreceivedfromthelocalcoordinators
of all sites. Deadlock detection is performed as follows:
1.IfacycleexistsinthelocalWFGofanysite,thenitrepresentsalocaldeadlockwhichis
detected and resolved by the local coordinator of the site.
2.Deadlocksinvolvingresourcesattwoormoresitesgetreflectedascyclesintheglobal
WFG are detected and resolved by the central coordinator.
Inthecentralizedapproach,thelocalcoordinatorssendlocalstateinformationtothecentral
coordinatorintheformofmessageswhichcanuse​Continuoustransfer,Periodictransferor
Transfer-on-request.
Although,thecentralizeddeadlockdetectionapproachisconceptuallysimple,itsuffersfromthe
following drawback:
●It is vulnerable to failures of the central coordinator.
●Thecentralizedcoordinatorcanconstituteaperformancebottleneckinlargesystems
having too many sites.
●The centralized coordinator may detect false deadlock.
False deadlock detection in centralized approach
●ConsiderasystemwithprocessesAandBrunningonmachine0,andprocessCon
machine 1.
●Three resources exist: R, S and T.
12​ | ​Page

●Initially:
▪A holds S but wants R, which it cannot have because B is using it;
▪C has T and wants S, too.
▪The coordinator's view of the world is shown in (c)
▪Thisconfigurationissafe:assoonasBfinishes,AcangetRandfinish,releasing
S for C.

●After a while:
▪B releases R and asks for T, a perfectly legal and safe swap.
▪Machine 0 sends a message to the coordinator announcing the release of R,
▪Machine1sendsamessagetothecoordinatorannouncingthefactthatBisnow
waiting for its resource, T.
▪Unfortunately,themessagefrommachine1arrivesfirst,leadingthecoordinator
to construct the graph of (d).
▪Thecoordinatorincorrectlyconcludesthatadeadlockexistsandkillssome
process.
Such a situation is called a false deadlock


Figure 8: ​(a) Initial resource graph for machine 0 (b) Initial resource graph for
machine 1 (c) The coordinator's view of the world. (d) The situation after the delayed
message
Way out using Lamport’s algorithm
●Sincethemessagefrommachine1tothecoordinatoristriggeredbytherequestfrom
machine0,themessagefrommachine1tothecoordinatorwillhavealatertimestamp
than the message from machine 0 to the coordinator.
●Whenthecoordinatorgetsthemessagefrommachine1thatleadsittosuspectdeadlock,
sendamessagetoeverymachineinthesystemsaying:“Ijustreceivedamessagewith
13​ | ​Page

timestampTwhichleadstodeadlock.Ifanyonehasamessageformewithanearlier
timestamp, please send it immediately.”
●Wheneverymachinehasreplied,positivelyornegatively,thecoordinatorwillseethat
the arc from R to B has vanished, so the system is still safe.
●Althoughthismethodeliminatesthefalsedeadlock,itrequiresglobaltimeandis
expensive


2​.​ Hierarchical Approach for Deadlock Detection
Thehierarchicalapproachovercomesdrawbacksofthecentralizedapproach.Thisapproachuses
alogicalhierarchyofdeadlockdetectorscalledascontrollers.Eachcontrollerdetectsonlythose
deadlocksthathavethesitesfallingwithintherangeofthehierarchy.GlobalWFGisdistributed
overanumberofdifferentcontrollersinthisapproach.Eachsitehasitsownlocalcontrollerthat
maintains its own local graph. WFG is maintained by a controller use the following rules:
i.EachcontrollerthatformsaleafofthehierarchytreemaintainsthelocalWFGof
a single site.
ii.Eachnon-leafcontrollermaintainsaWFGthatistheunionoftheWFGsofits
immediate children in the hierarchy tree.
ThelowestlevelcontrollerthatfindsacycleinitsWFGdetectsadeadlockanttakesnecessary
actiontoresolveit.WFGthatcontainsacyclethatwillneverbepassedasitistohigherlevel
controller.
LetA,B,andCbecontrollerssuchthatCisthelowestcommonancestorofAandB.Ifpi
appearsinthelocalwait-forgraphsofcontrollersAandB,itmustalsoappearinthewait-for
graph of
●controller C,
●every controller on the path from C to A, and
●every controller on the path from C to B.
Inotherwords,ifpiandpjareinthewait-forgraphofacontrollerDandthereisapathfrompi
to pj at any controller, then the edge (pi, pj) must also be in the graph of controller D.
14​ | ​Page

Inthediagrambelow,weomittheinputandoutputportsandinsteadshowthestateofthe
controller.

Figure 9: ​Hierarchical deadlock detection approach

3​.​ Fully Distributed Approach for Deadlock Detection
Inthisapproacheachsitesharesequalresponsibilityfordeadlockdetection.Thefirstalgorithm
is based on construction of WFG and second one is a probe-based algorithm.
WFG-Based Distributed Algorithm for Deadlock Detection
InthisalgorithmeachsitemaintainsitsownlocalWFGbutforexternalprocessesamodified
formofWGFisused.Inthisanextranode​PexisaddedtothelocalWFGofeachsiteandthis
node is connected to the WFG of the corresponding site in the following manner:
i.Anedge(​Pi,Pex ​)isaddedifprocess​Pi ​iswaitingforaresourceinanothersitebeing
held by any process.
ii.Anedge(​Pex,Pj ​)isaddedif​Pj ​isaprocessofanothersitethatiswaitingforaresource
currently being held by a process of this site.
For example, the local site 1 WFG figure-10(a) shows
●process ​P3​ waiting for a resource held by process ​P2​;
15​ | ​Page

●process ​P2​ waiting for a resource held by process ​P1​; and
●process ​P2​ waiting for a resource held by process ​P4​.


Figure 10(a)​: Local WFG at Site 1 and 2


The local site 2 figure-10(a) WFG shows
•process ​P1​ waiting for a resource held by process ​P3​; and
•process ​P3​ waiting for a resource held by process ​P4​.
These do not show which processes are external to sites 1 or 2
ThealgorithmexpandsthelocalWFGsbyaddinganode,​Pex​,torepresenttheexternal
dependencies.
In Site 1 of the figure-10(b),
●Edge ​(P1, Pex) ​ implies that ​P1​ is waiting for a resource at Site 2 that is held by ​P3​.
●Edge​(Pex,P3)impliesthat​P3isaprocessatSite2waitingforaresourcethatisheldby
P2​ of site 1.
16​ | ​Page

Figure 10(b): ​Expanded Local WFG at Site 1 and 2
In Site 2 of the figure-10(b),
●Edge ​(P3, Pex) ​ implies that ​P3​ is waiting for a resource at Site 1.
●Edge​(Pex,P1)impliesthat​P1isaprocessatSite1waitingforaresourcethatisheldby
P3​ of site 2.
At this point, site 1 recognizes that it has a cycle in its expanded local WFG that contains ​Pex​.
Soitsendsadeadlockmessagetosite2,becauseoneoftheknownexternaldependenciesofsite
1 (​P3​) is on site 2.
ThisdeadlockmessagecontainsnottheentireWFGforsite1,butjustthepathforthatpartof
the cycle that does not include ​Pex​.
From this, site 2 updates its expanded local WFG as figure-10(c):

17​ | ​Page

Figure 10(c):​ Updated Expanded WFG at Site 2
Advantages of WFG based deadlock detection approach is similar to centralized approach
Disadvantageofthisapproachisoverheadofunnecessarymessagetransfer,Duplicationof
deadlock detection jobs.

Probe-Based Distributed Algorithm for Deadlock Detection
ThisalgorithmisalsoknownasChandy-Misra-Hassalgorithmandisconsideredtobethebest
algorithmfordetectingglobaldeadlocksindistributedsystems.Thealgorithmallowsaprocess
to request for multiple resources at a time.
1.Whenaprocessrequestsforaresourceandfailstogettheresourceandtimeoutoccursit
generates a special probe message and sends it to the requested resource process.
2.Each probe message contains the following information:
•the ​id ​ of the process that is blocked ​(the one that initiates the probe message) ​;
•the ​id ​ of the process is sending this particular version of the probe message; and
•the ​id ​ of the process that should receive this probe message.
3.Whenaprocessreceivesaprobemessage,itcheckstoseeifitisalsowaitingfor
resources.
•Ifnot,itiscurrentlyusingtheneededresource andwilleventuallyfinishand
release the resource.
18​ | ​Page

•Ifit​iswaitingforresources,itpassesontheprobemessagetoallprocessesit
knowstobeholdingresourcesithasitselfrequested.Theprocessfirstmodifies
the probe message, changing the sender and receiver ids.

4.Ifaprocessreceivesaprobemessagethatitrecognizesashavinginitiated,itknowsthere
is a cycle in the system and thus, ​deadlock​.

The following example is based on the same data used in the WFG- based approach example.
Inthiscase​P1initiatestheprobemessage,sothatallthemessagesshownhave​P1asthe
initiator.Whentheprobemessageisreceivedbyprocess​P3​,itmodifiesitandsendsittotwo
more processes. Eventually, the probe message returns to process ​P1​. ​Deadlock!



Figure 11: ​Example illustrating the CMH distributed deadlock detection algorithm

The advantages of this algorithm include the following:
●It is easy to implement.
●Each probe message is of fixed length.
●There is very little computation.
●There is very little overhead.
●There is no need to construct a graph, nor to pass graph information to other sites.
●This algorithm does not find false (phantom) deadlock.
19​ | ​Page

●There is no need for special data structures.

Recovery from Deadlock
Oncedeadlockhasbeendetectedwithinadistributedsystem,theremustbeawaytorecover
from it (or why bother to detect it in the first place?).

Weneedtorollbackorabortoneormoreprocesses.(Inadatabasesystem,apartialrollbackmay
work.) Hopefully, the same situation will not re-occur.
Possible methods for recovery:

•Operatorintervention:Atonetime,thiswasafeasiblealternativeforuniprocessor
systems. However, it has little value for today's distributed systems.

•TerminationofProcess(es):Somevictimprocess(orsetofprocesses)ischosenfor
terminationfromthecycleorknotofdeadlockedprocesses.Thisprocessisterminated,
requiringalaterrestart.Alltheresourcesallocatedtothisprocessarereleased,sothat
theymaybereassignedtootherdeadlockedprocesses.Withanappropriatelychosen
victim process, this should resolve the deadlock.

•RollingBackProcess(es):Inordertorollbackavictimprocess,thereneedstohavebeen
somepreviouscheckpointatwhichtimethestateofthevictimprocesswassavedto
stable storage. This requires extra overhead.
Theremustalsobeanassurancethattherolledbackprocessisnotholdingtheresourcesneeded
bytheotherdeadlockedprocessesatthatpoint.Withanappropriatelychosenvictimprocess,
neededresourceswillbereleasedandassignedtotheotherdeadlockedprocesses.Thisshould
resolve the deadlock.
Issues in recovery methods:
•How do we choose a victim process?
By eliminating at least one process and releasing its resources, other processes should
become unblocked.
20​ | ​Page

Clearly, we would prefer to choose a victim process that
ominimizes the cost of recovery ​and
oprevents starvation of processes.
Issues concerning recovery cost include the following:
othe total number of processes affected;
othe priorities of the different processes involved;
othe natures of the different processes;
othe number of resources involved with each process;
othe types of resources involved with each process;
othe length of time a process has been running;
owhether or not a process has been victimized before.
The last item is also related to process starvation.
•What needs to be done?
Once the victim process has been chosen, all resources assigned to that process should be
released and assigned to other deadlocked processes in order to break the deadlock.
Sincethisisadistributedsystem,thereisalsoaneedtoupdatetheinformationatall
sites.Allinformationconcerningthevictimprocessshouldbecleared,sothatallsitesno
longer treat the victim process as an active process.
•What if the victim process performed some centralized function?
That function may need to move to another process.
Election algorithms should be able to help with this ​(e.g. the Bully algorithm) ​.

References
1.Pradeep K. Sinha, Distributed Operating Systems: Concepts and Design.
2.Distributed Deadlock - Computer Science at Rutgers
21​ | ​Page
Tags