Distributed Hash Tables
Amir H. Payberah [email protected]
Amirkabir University of Technology
(Tehran Polytechnic)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 1 / 62
What is the Problem?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 2 / 62
What is the Problem?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 3 / 62
Possible Solutions (1/3)
ICentral directory
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 4 / 62
Possible Solutions (2/3)
IFlooding
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 5 / 62
Possible Solutions (3/3)
IDistributed Hash Table (DHT)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 6 / 62
Distributed Hash Table
(DHT)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 7 / 62
Distributed Hash Table
IAn ordinary, which is ...
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 8 / 62
Distributed Hash Table
IAn ordinary, which is.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 9 / 62
Steps to Build a DHT
IStep 1: decide on.
IStep 2:IStep 3: make a strategy for.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 10 / 62
Steps to Build a DHT
IStep 1: decide on.
IStep 2:IStep 3: make a strategy for.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 10 / 62
Steps to Build a DHT
IStep 1: decide on.
IStep 2:IStep 3: make a strategy for.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 10 / 62
Chord: an Example of a DHT
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 11 / 62
Construct Chord - Step 1
IUse a, called the, consisting of identiers
f0;1;2; ; N1g.
IId space is a N.IEvery node picks a random id though.
IExample:
SpaceN= 16f0; ;15g
Five nodesa,b,c,d,e.
H(a) = 6
H(b) = 5
H(c) = 0
H(d) = 11
H(e) = 2
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 12 / 62
Construct Chord - Step 1
IUse a, called the, consisting of identiers
f0;1;2; ; N1g.
IId space is a N.IEvery node picks a random id though.
IExample:
SpaceN= 16f0; ;15g
Five nodesa,b,c,d,e.
H(a) = 6
H(b) = 5
H(c) = 0
H(d) = 11
H(e) = 2
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 12 / 62
Construct Chord - Step 1
IUse a, called the, consisting of identiers
f0;1;2; ; N1g.
IId space is a N.IEvery node picks a random id though.
IExample:
SpaceN= 16f0; ;15g
Five nodesa,b,c,d,e.
H(a) = 6
H(b) = 5
H(c) = 0
H(d) = 11
H(e) = 2
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 12 / 62
Construct Chord - Step 1
IUse a, called the, consisting of identiers
f0;1;2; ; N1g.
IId space is a N.IEvery node picks a random id though.
IExample:
SpaceN= 16f0; ;15g
Five nodesa,b,c,d,e.
H(a) = 6
H(b) = 5
H(c) = 0
H(d) = 11
H(e) = 2
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 12 / 62
Construct Chord - Step 2 (1/2)
IThe
direction starting at the id.
Isucc(x): is the
x.
succ(12) = 0
succ(1) = 2
succ(6) = 6
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 13 / 62
Construct Chord - Step 2 (2/2)
IEach node points to its.
IThe successor of a nodenissucc(n+ 1).
0's successor issucc(1) = 2.
2's successor issucc(3) = 5.
11's successor issucc(12) = 0.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 14 / 62
Construct Chord - Step 3
IWhere to?
IUse globally known hash functionH.
IEach itemhkey; valueigets identierH(key) =k.
SpaceN= 16f0; ;15g
Five nodesa,b,c,d,e.
H(F atemeh) = 12
H(Cosmin) = 2
H(Seif) = 9
H(Sarunas) = 14
H(T allat) = 4
IStore each item at its successor.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 16 / 62
Construct Chord - Step 3
IWhere to?
IUse globally known hash functionH.
IEach itemhkey; valueigets identierH(key) =k.
SpaceN= 16f0; ;15g
Five nodesa,b,c,d,e.
H(F atemeh) = 12
H(Cosmin) = 2
H(Seif) = 9
H(Sarunas) = 14
H(T allat) = 4
IStore each item at its successor.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 16 / 62
Construct Chord - Step 3
IWhere to?
IUse globally known hash functionH.
IEach itemhkey; valueigets identierH(key) =k.
SpaceN= 16f0; ;15g
Five nodesa,b,c,d,e.
H(F atemeh) = 12
H(Cosmin) = 2
H(Seif) = 9
H(Sarunas) = 14
H(T allat) = 4
IStore each item at its successor.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 17 / 62
How to Lookup?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 18 / 62
Lookup (1/2)
ITo k:
CalculateH(k).
Follow kis found.
IExample:
Lookup Seif at node 2.
H(Seif) = 9
Traverse nodes: 2, 5, 6, 11
Return Stockholm to initiator
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 19 / 62
Lookup (1/2)
ITo k:
CalculateH(k).
Follow kis found.
IExample:
Lookup Seif at node 2.
H(Seif) = 9
Traverse nodes: 2, 5, 6, 11
Return Stockholm to initiator
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 19 / 62
Lookup (1/2)
ITo k:
CalculateH(k).
Follow kis found.
IExample:
Lookup Seif at node 2.
H(Seif) = 9
Traverse nodes: 2, 5, 6, 11
Return Stockholm to initiator
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 19 / 62
Lookup (1/2)
ITo k:
CalculateH(k).
Follow kis found.
IExample:
Lookup Seif at node 2.
H(Seif) = 9
Traverse nodes: 2, 5, 6, 11
Return Stockholm to initiator
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 19 / 62
Lookup (2/2)
Algorithm 1Ask nodento nd the successor ofid
1:proceduren.ndSuccessor(id)
2:ifpred6= andid2(pred; n]then
3:returnn
4:else ifid2(n; succ]then
5:returnsucc
6:else// forward the query around the circle
7:returnsucc.ndSuccessor(id)
8:end if
9:end procedure
I(a; b]the segment of the ring moving clockwise from but not includ-
ingauntil and includingb.
In.foo(.)denotes an RPC offoo(.)to noden.
In.bardenotes and RPC to fetch the value of the variablebarin
noden.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 20 / 62
Put and Get
Algorithm 2Storevaluewith keyidin the DHT
1:proceduren.put(id; value)
2:n= ndSuccessor(id)
3:s.store(id; value)
4:end procedure
Algorithm 3Retrieve the value of the keyidfrom the DHT
1:proceduren.get(id)
2:n= ndSuccessor(id)
3:returns.retrieve(id)
4:end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 21 / 62
Any Improvement?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 22 / 62
Improvement
ISpeeding up lookups.
IIf only the successor pointers are used:
Worst case lookup time isN, forNnodes.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 23 / 62
Speeding up Lookups (1/2)
IFinger/routing table:
Point tosucc(n+ 1)
Point tosucc(n+ 2)
Point tosucc(n+ 4)
...
Point tosucc(n+ 2
M1
)(N= 2
M
,N: the id space size)
IDistance always
destination.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 24 / 62
Speeding up Lookups (1/2)
IFinger/routing table:
Point tosucc(n+ 1)
Point tosucc(n+ 2)
Point tosucc(n+ 4)
...
Point tosucc(n+ 2
M1
)(N= 2
M
,N: the id space size)
IDistance always
destination.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 24 / 62
Speeding up Lookups (2/2)
IEvery nodenknowssucc(n+ 2
i1
)fori= 1; ; M.
ISize of routing tables is:
Routing table size:M, whereN= 2
M
.
Routing entries =log2(N).
Example:Log2(1000000)20 Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 25 / 62
Lookup Improvement (1/3)
Algorithm 4Ask nodento nd the successor ofid
1:proceduren.ndSuccessor(id)
2:ifpred6= andid2(pred; n]then
3:returnn
4:else ifid2(n; succ]then
5:returnsucc
6:else// forward the query around the circle
7:returnsucc.ndSuccessor(id)
8:end if
9:end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 26 / 62
Lookup Improvement (2/3)
Algorithm 5Ask nodento nd the successor ofid
1:proceduren.ndSuccessor(id)
2:ifpred6= andid2(pred; n]then
3:returnn
4:else ifid2(n; succ]then
5:returnsucc
6:else// forward the query around the circle
7:p closestPrecedingNode(id)
8:returnp.ndSuccessor(id)
9:end if
10:end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 27 / 62
Lookup Improvement (3/3)
Algorithm 6Search locally for the highest predecessor of id
1:procedureclosestPrecedingNode(id)
2:fori=mdownto1do
3:iff inget[i]2(n; id)then
4: returnf inger[i]
5:end if
6:end for
7:end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 28 / 62
Lookups (1/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 29 / 62
Lookups (2/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 30 / 62
Lookups (3/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 31 / 62
Lookups (4/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 32 / 62
Lookups (5/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 33 / 62
Lookups (6/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 34 / 62
Lookups (7/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 35 / 62
How to Maintain the Ring?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 36 / 62
Periodic Stabilization (1/2)
IIn Chord, in addition to the
predecessor
Predecessor nis the rst node met in anti-clockwise
direction starting atn.
IPeriodic stabilization
Pointingsuccto closest
successor.
Pointingpredto closest
predecessor.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 37 / 62
Periodic Stabilization (1/2)
IIn Chord, in addition to the
predecessor
Predecessor nis the rst node met in anti-clockwise
direction starting atn.
IPeriodic stabilization
Pointingsuccto closest
successor.
Pointingpredto closest
predecessor.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 37 / 62
Periodic Stabilization (2/2)
Algorithm 7Periodically atn
1:proceduren.stabilize()
2:v succ:pred
3:ifv6= andv2(n; succ]then
4:succ v
5:end if
6:sendnotify(n) tosucc
7:end procedure
Algorithm 8Upon receipt anotify(p)at nodem
1:on receivehnotifyjpifromndo
2:ifpred= orp2(pred; m]then
3: pred p
4:end if
5:end event
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 38 / 62
Handling Join
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 39 / 62
Handling Join
IWhennjoins:
Findn's successor withlookup(n).
Setsuccton's successor.
Stabilization xes the rest.Algorithm 9Join a Chord ring containing node m
1:proceduren.join(m)
2:pred
3:succ m.ndSuccessor(n)
4:end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 40 / 62
Join (1/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 41 / 62
Join (2/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 42 / 62
Join (3/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 43 / 62
Join (4/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 44 / 62
Join (5/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 45 / 62
Fix Fingers (1/4)
IPeriodically, and store the index of the
next nger to x.
Algorithm 10When receivingnotify(p)atn
1:proceduren.xFingers()
2:next next+ 1
3:ifnext > mthen
4:next 1
5:end if
6:f inger[next] ndSuccessor(n2
next1
)
7:end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 46 / 62
Fix Fingers (3/4)
Isucc(212
5
) =succ(53) =?
INew nodeN56joins and stabilizes successor pointer.
IFinger 6 of nodeN21is wrong now.
IN21eventually try to x nger 6 by looking up 53 which stops at
N48.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 48 / 62
Fix Fingers (4/4)
Isucc(212
5
) =succ(53) =N56
IN48will eventually stabilize its successor.
IThis means the ring is correct now.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 49 / 62
Handling Failure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 50 / 62
Successor List (1/2)
IA node has a rcontaining the immediater
successors.
succ(n+ 1)
succ(succ(n+ 1) + 1)
succ(succ(succ(n+ 1) + 1) + 1)
IHow big shouldrbe?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 51 / 62
Successor List (1/2)
IA node has a rcontaining the immediater
successors.
succ(n+ 1)
succ(succ(n+ 1) + 1)
succ(succ(succ(n+ 1) + 1) + 1)
IHow big shouldrbe?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 51 / 62
Successor List (1/2)
IA node has a rcontaining the immediater
successors.
succ(n+ 1)
succ(succ(n+ 1) + 1)
succ(succ(succ(n+ 1) + 1) + 1)
IHow big shouldrbe?log2(N)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 51 / 62
Successor List (2/2)
Algorithm 11Join a Chord ring containing nodem
1:proceduren.join(m)
2:pred
3:succ m.ndSuccessor(n)
4:updateSuccesorList(succ.successorList)
5:end procedure
Algorithm 12Periodically atn
1:proceduren.stabilize()
2:succ nd rst alive node in successor list
3:v succ:pred
4:ifv6= andv2(n; succ]then
5:succ v
6:end if
7:sendnotify(n) tosuccupdateSuccessorList(succ.successorList)
8:end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 52 / 62
Dealing with Failure
IPeriodic stabilization
IIf: replace with closest alive successor
IIf: set predecessor to nil
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 53 / 62
Failure (1/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 54 / 62
Failure (2/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 55 / 62
Failure (3/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 56 / 62
Failure (4/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 57 / 62
Failure (5/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 58 / 62
Summary
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 59 / 62
Summary
IDHTs: distributedhkey; valuei
ILookup service
IPut and Get
IFinger list: improve the lookup
IPeriodically stabilization
ISuccessor list
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 60 / 62
References:
IIon Stoica et al., Chord: A scalable peer-to-peer lookup service for
internet applications, SIGCOMM, 2001.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 61 / 62
Questions?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 62 / 62