Title: Ch10 Synchronization and Election
1Ch10Synchronization and Election
2Introduction
- We shall be concerned with basic distributed
algorithms for - Synchronization (basic idea ordering)
- Required for consistent execution
- e.g. Maintenance of replicated data
- Consistency of distributed shared memory
- Election (agreeing on who is the coordinator)
- Required wherever there is a coordinator
- If the current coordinator fails, one
must reelect a new - coordinator
- e.g. Replicated data management,
- Atomic commit, recovery
management, etc.
3Introduction (cont.)
Communication scenarios One way an
application process sends messages and expects no
reply e.g. Broadcast Client/server a
client sends a service request then waits for a
reply from the server Peer a
symmetrical two ways communication
4Distributed Mutual Exclusion
How coordination is achieved depends on which
communication scenario is assumed One way
communication applications using one way
communication usually dont need
synchronization Client/server communication
if the coordination is required among the
clients, it is handled by the server and there
is no explicit interaction among processes
Peer processes need to exchange information
to reach some conclusion about the system or
some agreement among the cooperating
processes. There is no centralized controller
5Distributed Mutual Exclusion
- Mutual Exclusion Problem
- Ensures that concurrent processes make a
serialized access to - shared resource or data (called Critical
Section, CS for short) - e.g. updating a database
- sending a control command to a
shared I/O device - Distributed Mutual Exclusion Problem
- Achieve the mutual exclusion (fairness,
progress properties) - assuming only peer communication
- Assumptions
- We assume that no failures occur during the
execution - Each node can communicate with every other node
6Distributed Mutual Exclusion
Approaches to achieve distributed mutual
exclusion Contention-based approach
processes compete for the right to use the
shared resource by using a request
resolution criteria Criteria time of
request priorities of requesters voting
Controlled-based approach A logical token
representing the access right to the shared
object is passed in a regulated fashion among
the competing processes. A process
holding the token can enter the CS
7Distributed Mutual Exclusion
Design goal Find Entry Protocol
and Exit Protocol Entry
protocol ltCritical Sectiongt Exit protocol
8Outline
Distributed Mutual Exclusion Timestamp
Algorithms Voting Use of token on
logical structures Path Compression Electio
n The bully Algorithm The invitation
Algorithm
9Distributed Mutual Exclusion
Timestamp Algorithms(Lamports algo.
modified) Idea A processor p requests the CS
by sending Request(t) to every other processor
(tLamports time when the processor p requests
the CS) A processor q replies by sending
Reply(t) to p where t null if q is not
competing or the request of q is later t
the time when q made the request, if qs request
was earlier Each requesting processor p
maintains a queue my_Q such that my_Qj
is the Lamport time when processor j made its
request my_Q is arranged in the ascending
order
10Distributed Mutual Exclusion
Lamports algo. modified (intuition)
p requests CS
every other sends Reply(ti)
p receives a Release
Request(t)
Reply(t1)
Request(t)
Reply(t2)
Release
p
p
q
p
q
my_Qempty p waits until it has received a Reply
from every other
On receipt of Reply(t2) from q, p inserts q
in my_Q only if t2? null
p removes q from my_Q if q is in my_Q
p exits CS
When all other processors have replied, p inserts
itself in my_Q. p can enter CS only if p has
received a Reply(t) from every other processor
and head(my_Q)p
Release
Release
p
q
11Distributed Mutual Exclusion
The modified Lamports algorithm illustrated
p1
p2
p3
p4
RQ(t) timestamped Request
my_Q
RQ(t)
RQ(t)
RQ(t)
RY(t) timestamped Reply
RY(null)
RY(t4)
RY(t3) t3ltt
my_Q
t4ltt3ltt
CS
my_Qp3
Release
my_Qp4,p3,p1
Release
Release
my_Qp3,p1
CS
Release
Release
Release
my_Qp1
CS
Release
my_Q
Release
Release
12Distributed Mutual Exclusion
Timestamp Algorithms (Lamports algorithm
modified) The algorithm consists of
three parts Request_CS(), Monitor_CS()
and Release_CS()
Request_CS() my_PQ ? / my
priority queue / is_requesting True
reply_needed M-1 m Request /
the Request message / my_timestamp
my_current_Lamport_time m.timestamp
my_timestamp / requests time / for
each processor k ? self do send(k,m)
wait until reply_needed 0 and head(my_PQ)
self ltuse CSgt
13Distributed Mutual Exclusion
Release_CS() for each processor k ? self do
m Release send(k,m) is_requesting
False my_PQ ?
14Distributed Mutual Exclusion
Timestamp Algorithms (Lamports algo. modified
Cont.)
CS_Monitor() wait for Reply, Request,
Release from any processor on the
receipt of Reply from p do
if Reply.timestamp ? null then
insert p in my_PQ
reply_needed reply_needed -1
if reply_needed 0 then
insert self in my_PQ
end end
15Distributed Mutual Exclusion
Timestamp Algorithms(Lamportalgo. CS_Monitor()
cont.) on the receipt of Request from p do
m Reply if not (is_requesting)
or (my_timestamp gt Request.timestamp) then
m.timestamp null else
m.timestamp my_timestamp send (p,
m) end on the receipt of Release from p
do if is_requesting then remove p from
my_PQ end end
16Distributed Mutual Exclusion
Analysis of the modified Lamports algorithm
p
q
p sends a Request message to every other (M-1)
Request every other processor q sends a Reply to
p (M-1) Reply p might be at the end of its
priority queue (M-1) Release Thus 3(M-1)
messages might be sent in order for p to enter
its CS In addition, each processor has to manage
(need of insert on) a priority queue
Request
Reply
Release
Can we do better?
17Distributed Mutual Exclusion
Ricart and Agrawalas algorithm does better The
idea (Avoid Release messages, use of
silence!) Request When a processor p wants to
enter the CS, p sends Request(t) to every
other processor then p waits until it receives
a Reply message from every other
processor Reply When a processor q receives a
Request(t) message from p, if (q is not
requesting the CS ) or (qs request is later
than ps) then q sends Reply to p
Otherwise (using the CS or qs request is
earlier), q defers the sending of a Reply
message to p until Exit Exit When a processor
exits the CS, it sends all deferred Reply
messages
18Distributed Mutual Exclusion
Ricart and Agrawalas algorithm Variables
used timestamp current_time / the current
Lamport time / timestamp my_timestamp /
timestamp of your request / integer
reply_count / number of permissions you still
need
to collect before entering your CS / boolean
is_requesting / True iff you are requesting or
you
are in your CS / boolean reply_deferredM
/ reply_deferredj is True iff
you have deferred the
sending of a
Reply message to processor j /
19Distributed Mutual Exclusion
Ricart and Agrawalas algorithm (The algorithm)
Request_CS() my_timestamp
current_timestamp is_requesting True
reply_pending M-1 mRequest
m.timestamp my_timestamp for each j ? self
do send(j, m) wait until reply_pending
0
Release_CS() is_requesting False for each
j ? self such that reply_deferredjTrue do
send(j,Reply) reply_deferredj
False
20Distributed Mutual Exclusion
Ricart and Agrawalas algorithm (The algorithm,
cont.)
Monitor_CS() wait for Request or Reply from any
processor on receipt of Request from p do
if not (is_requesting) or my_timestamp gt
Request.timestamp then send(p,
Reply) else reply_deferredp
True end on receipt of Reply from p do
reply_pending reply_pending - 1
end end
21Example
8
6
RQ,8
RQ,8
9
time
10
7
RQ,6
OK,9
RQ,6
10
11
12
11
OK,12
RQ,10
13
12
RQ,10
14
OK,12
15
13
CS
16
OK,16
17
OK,16
17
OK,18
19
22Distributed Mutual Exclusion
- Ricart and Agrawalas algorithm (Analysis)
- No priority queue
- (M-1) Request messages
- (M-1) Reply messages
- Thus 2(M-1) messages to enter a CS.
- Properties
- Use of symmetric information
- priority queue when a processor enters the CS
it knows that - no other processor can enter the CS
- deferred Reply when a processor receives a
Reply from every - other processor, it knows that no other
processor has the same - information
23Outline
Distributed Mutual Exclusion Timestamp
Algorithms Voting Use of token on
logical structures Path Compression Electio
n The bully Algorithm The invitation
Algorithm
24Distributed Mutual Exclusion
Voting-based Algorithms
Processors that want to enter the CS compete for
votes. Idea Each processor has a unique vote
that it can give to at most one processor
(itself or some other processor) Whenever a
processor p wants to enter the CS, p asks every
other processor for its vote When
p knows that it has received more votes than any
other processor then p can enter the
CS Otherwise, p must wait until the processor
that is in the CS exits, releasing its vote for
other contenders
25Distributed Mutual Exclusion
Voting-based Algorithms (a naïve algorithm)
Naïve_Voting_Enter_CS() Sends a vote
request to all of the processors when you
receive at least ?(M1)/2? votes, enter
the CS Naïve_Voting_Exit_CS() sends a
Release message to all of the processors
Problems a deadlock can occur three competing
processors might
each get one-third of the
votes No significant advantage
over timestamp-based algorithm
O(M) messages are required.
26Distributed Mutual Exclusion
Voting-based algorithms quorums and
coteries Aim Reduce the message complexity
by reducing the number of votes required to
enter the CS Avoid the possibility of
deadlock Idea Every processor p has a voting
district Sp (also called, quorum) The set
S1,,SM is called a coterie It is assumed
that for each processor p, Sp is fixed
Whenever a processor p wants to enter the CS,
processor p requests the votes of only all
the processors in Sp To enter the CS, p must
acquire the votes of all processors in Sp
27Distributed Mutual Exclusion
Voting-based algorithms(voting districts
organization) To ensure mutual exclusion the
intersection rule must holds Si ? Sj ? ?
for each 1? i, j ? M Fairness among quorums
1) Si K for each i in 1,,M and 2)
every processor is in D voting districts How
small can we make K and D and still preserve
the intersection and the fairness properties ?
Maekawas solution M(K-1) K 1 i.e.
KO(? M) easier if we know that
Mn2
28Distributed Mutual Exclusion
Voting-based algorithms(quorums
illustrated) Assume Mn2 and label the processors
(i,j) for 1 ? i,j ? n
4
5
6
2
1
3
10
11
12
7
8
9
16
17
18
14
15
13
21
22
23
24
19
20
27
28
29
30
25
26
33
34
35
32
31
36
29Distributed Mutual Exclusion
Voting-based algorithms(quorums
illustrated) Assume Mn2 and label the processors
(i,j) for 1 ? i,j ? n
4
5
6
2
1
3
n6 M36
10
11
12
7
8
9
16
17
18
14
15
13
S14
21
22
23
24
19
20
27
28
29
30
25
26
33
34
35
32
31
36
30Distributed Mutual Exclusion
Voting-based Algorithms (idea of the general
algorithm) Request_CS when
processor p wants to enter the CS, p
sends a vote request to all of the processors in
Sp Reply a processor q in Sp sends YES only
if q hasnt already cast its vote Enter_CS
p can enter the CS when it receives Reply from
all processors of Sp Exit_CS when p exits
the CS, it sends Release to all processors in Sp
(enabling the members of its voting
district to vote for other candidates)
Problem a deadlock can occur
31Distributed Mutual Exclusion
Voting-based algorithms(the deadlock illustrated)
4
5
6
2
1
3
p3 and p23 compete but p5 votes for p3 p21 votes
for p23 Neither p3 nor p23 receives all the
votes of its voting district Deadlock
10
11
12
7
8
9
16
17
18
14
15
13
21
22
23
24
19
20
27
28
29
30
25
26
33
34
35
32
31
36
32Distributed Mutual Exclusion
Voting-based Algorithms (deadlock avoidance,
Idea) Use Lamport timestamps, Voters
will prefer to vote for the earliest
candidate Request_CS when processor p wants
to enter the CS, p sends a
timestamped vote REQUEST to all of the processors
in Sp Reply When processor q receives ps
REQUEST q sends its vote to p if q hasnt
already cast its vote If q has already cast
its vote for processor r and ps Request is
earlier than rs Request, then q tries to
retrieve its vote from r by sending an INQUIRE
message to r Many requests can arrive
after you have already sent your vote
a waiting is necessary
33Distributed Mutual Exclusion
Voting-based Algorithms (deadlock avoidance,
Idea) Relinquish_Vote when you receive an
INQUIRE message from a processor q,
relinquishe qs vote by sending it a RELINQUISH
message if you have not already
received all the votes of your voting
district Old INQUIRE messages can be in the
system timestamp matching is used Release_CS
send a RELEASE message to all members of your
quorum, take your vote On receipt of a
RELEASE message if your waiting queue is
not empty, vote for the first processor in that
queue Otherwise, keep your vote
34Distributed Mutual Exclusion
Why the deadlock resolution mechanism
presented works? Lamport (systemwide) timestamp
imposes a total order so either
the candidate with the lowest timestamp
eventually gets all of the votes or
the candidate with the lowest timestamp is
blocked by a candidate that enters the
CS.
35Distributed Mutual Exclusion
Voting-based Algorithms (The algorithm)
accounts for the fact that INQUIRE messages
are generated asynchronously, so old INQUIRE
messages must be handled timestamp-based
matching is used Variables used Sself the
processor voting district LTS the systemwide
(Lamport) timestamp my_TS the timestamp of
the current CS request yes_votes the number
of processor that voted YES have_voted True
iff you have already voted for a candidate
(initially False) candidate the candidate that
you voted for candidate_TS the timestamps of
the request of the candidate that you voted for
have_inquired True if you have tried to recall a
vote (initially False) WaitingQ the set of
vote requests that you are defering InCS
True iff you are in the CS
36Distributed Mutual Exclusion
Voting-based Algorithms (The algorithm, cont.)
M_Voting_Entry() / algorithm for
requesting the CS / yes_votes 0 my_TS
LTS for every processor r in Sself do
send(r, REQUEST my_TS) while (yes_votes lt
Sself ) do wait until a YES or INQUIRE
message is received on Yes(sender) do
yes_votes yes_votes 1
end on INQUIRE(sender, inquire_ts) do
if my_TS inquire_ts then
send(sender, RELINQUISH)
yes_votes yes_votes - 1 end
end InCS True / models the fact that you
enter the CS /
37Distributed Mutual Exclusion
Voting-based Algorithms (The algorithm, cont.)
M_Voter() / algorithm executed by the
thread that monitors the CS / while(1) wait
until a REQUEST, RELEASE, or RELINQUISH message
is received on REQUEST(sender
request_ts) do if have_voted False
then send(sender, YES)
candidate_TS request_ts
candidate sender
have_voted True else
add(sender, request_ts) to WaitingQ
if request_ts lt candidate_TS and not
have_inquired then
send(candidate, INQUIRE candidate_ts)
have_inquired True end
38Distributed Mutual Exclusion
Voting-based Algorithms (M_Vote() cont.) on
RELINQUISH(sender) do add (candidate,
candidate_TS) to WaitingQ remove the (s,
rts) from WaitingQ such that rts is the minimum
send(s, YES) candidate_TS rts
candidate s have_inquired False end on
RELEASE(sender) do if WaitingQ is not empty
then remove the (s,rts) from the WaitingQ
such that rts is the minimum send(s,
YES) candidate_TS rts
candidate_TS s else have_voted
False have_inquired False end end / wait
/
39Distributed Mutual Exclusion
Voting-based Algorithms (The algorithm, cont.)
M_Voting_Exit() / algorithm for releasing
the CS / for every processor r in Sself do
send(r, RELEASE) InCS False have_voted
False
40Distributed Mutual Exclusion
The general voting algorithm illustrated
3
A
f
e
d
c
b
a
C
B
7
5
41Outline
Distributed Mutual Exclusion Timestamp
Algorithms Voting Use of token on
logical structures Path Compression Electio
n The bully Algorithm The invitation
Algorithm
42Distributed Mutual Exclusion
Token-based algorithms Idea A logical
token representing the access right to the CS is
passed in a regulated manner among the
processors The holder of the token is
allowed to enter the CS Token-based
algorithms often assume a logical structure on
the processors Frequently used logical
structures Ring, Spanning tree
43Distributed Mutual Exclusion
Token-based algorithms on Ring topology
The ring is unidirectional every processor knows
its successor in the ring
When you wants to enter the CS, you wait for the
token, take it when it arrives, pass it on to the
your successor when you exit the CS If the token
arrives and you dont want it, pass it along
immediately
Ring topology is attractive because it is simple,
deadlock-free and fair frequently used for
access control in Local Area Networks
44Distributed Mutual Exclusion
Token-based algorithms on Ring topology
Problems Unnecessary network traffic The
token needs to circulate even if no
processor wants to enter the CS Consumes
processor resources The waiting time can be
large A processor can wait for a long time to
get the token even if no other processor enters
the CS
45Distributed Mutual Exclusion
Token-based algorithms on Tree topology
Underlying goals Reduce the communication
complexity Processors explicitly request the
token The processor holding the token moves
it only if it knows of a pending request
Exploit the structure of the tree (no cycle) to
avoid deadlock
46Distributed Mutual Exclusion
Token-based algorithms on Tree topology
(Raymonds algorithm)
We can impose a tree structure on the processors
as shown below. Two problems (1) how to let a
request navigate to the token
(2) token navigation to the next processor
to enter the CS
47Distributed Mutual Exclusion
Token-based algorithms on Tree topology
(Raymonds algorithm)
The processor holding the token always resides at
the root of a tree Solution each processor
maintains a pointer to the neighbor (called
parent) that is closest to the token a request
queue the head of which defines the return path
2
1
3
4
7
15
10
5
6
8
9
11
12
13
16
48Distributed Mutual Exclusion
Token-based algorithms on Tree topology
(Raymonds algorithm)
Idea Request insert yourself in your queue,
send a request to your parent if
you have not yet sent a request to it. On
receiving a request insert the requester in
your queue If you dont have the token and
you havent already sent a request to your
parent, send a request to your
parent (a forward action) If you have the
token but you are not using it,
remove the first processor of your queue send
the token to that processor On receiving/
releasing the token Remove the first
processor (if any) of your queue,
send the token to that processor and set your
parent pointer to that processor If your
queue is not empty (there is another pending
request), send a request to the
processor you have just sent the token
49Distributed Mutual Exclusion
Raymonds algorithm illustrated
Initially, the token is at processor 1 processor
4 sends a request to processor 3
2
1
3
(4)
4
7
15
10
4
5
6
8
9
11
12
13
16
50Distributed Mutual Exclusion
Raymonds algorithm illustrated
processor 3 receives request from processor 4
forwards this request to processor 2
2
4
(3)
1
3
4
7
15
10
4
5
6
8
9
11
12
13
16
51Distributed Mutual Exclusion
Raymonds algorithm illustrated
processor 3 requests the token while the request
forwarded by processor 3 is in transit
2
4,3
(3)
1
3
4
7
15
10
4
5
6
8
9
11
12
13
16
52Distributed Mutual Exclusion
Raymonds algorithm illustrated
processor 2 receives request from processor 3
forwards this request to processor 1
2
3
4,3
(2)
1
3
4
7
15
10
4
5
6
8
9
11
12
13
16
53Distributed Mutual Exclusion
Raymonds algorithm illustrated
processor 1 receives request from processor 2
3
4,3
2
1
3
2
4
7
15
10
4
5
6
8
9
11
12
13
16
Note by linking local queues, we obtain a global
queue. I.e. the algorithm is a distributed
implementation of a global queue
54Distributed Mutual Exclusion
Raymonds algorithm illustrated
processor 1 releases the token removes
processor 2 from its queue, sends the token to
processor 2
2
3
4,3
1
3
4
7
15
10
4
5
6
8
9
11
12
13
16
55Distributed Mutual Exclusion
Raymonds algorithm illustrated
processor 2 receives the token, it dont need it,
releases the token removes processor 3 from
its queue, sends the token to processor 3
2
4,3
1
3
4
7
15
10
4
5
6
8
9
11
12
13
16
56Distributed Mutual Exclusion
Raymonds algorithm illustrated
processor 3 receives the token, removes
processor 4 from its queue, sends the token to
processor 4 in addition, processor 3 sends a
request to processor 4
2
3
1
3
(3)
4
7
15
10
4
5
6
8
9
11
12
13
16
57Distributed Mutual Exclusion
Raymonds algorithm illustrated
The request sent by processor 3 is till in
transit. Processor 4 receives the token, since
it is at the head of its queue, it sends the
token to itself (becomes token holder)
2
1
3
3
(3)
4
7
15
10
5
6
8
9
11
12
13
16
58Distributed Mutual Exclusion
Raymonds algorithm illustrated
Processor 4 receives the request from processor 3
2
1
3
3
4
7
15
10
3
5
6
8
9
11
12
13
16
59Distributed Mutual Exclusion
Raymonds algorithm illustrated
Processor 4 releases the token since processor 3
is at the head of its queue, processor 4 sends
the token to processor 3
2
1
3
3
4
7
15
10
5
6
8
9
11
12
13
16
60Distributed Mutual Exclusion
Raymonds algorithm illustrated
Processor 3 receives the token from processor 4.
Since processor 3 is at the head of its own
queue, processor 3 sends the token to itself
2
1
3
4
7
15
10
5
6
8
9
11
12
13
16
61Outline
Distributed Mutual Exclusion Timestamp
Algorithms Voting Use of token on
logical structures Path Compression Electio
n The bully Algorithm The invitation
Algorithm
62Distributed Mutual Exclusion
Path compression
Why? Improving performance Instead of using a
fixed logical tree, one can let the
processors form a tree that takes an arbitrary
shape (i.e. the logical tree dynamically
changes)
63Distributed Mutual Exclusion
Path compression
Basic idea (Li and Hudak modified) As in
Raymonds algorithm, each processor has a
direction pointer current_dir. This
variable serves to indicate where the token
is Invariant to be maintained the paths
formed by the current_dir lead to a processor
that is holding the token or is waiting for
the token. To achieve this, At any moment,
the set of all the processors can be seen as
consisting of two disjoint subsets
SW composed of the token holder and the
processors that are waiting for the CS NW
composed of the processors that are not waiting
for the CS
64Distributed Mutual Exclusion
Path compression
Basic idea (cont.) As in Raymonds algorithm,
each processor has a direction pointer
current_dir a guess of where the token is
In addition, each processor has also a pointer
called next that serves to indicate the next
(if any) waiting processor to which the
processor will pass the token after being
visited by the token The core of the
algorithm lies in the management of current_dir
and next pointers
65Distributed Mutual Exclusion
Path compression
Basic idea (Li and Hudak modified, cont.)
Characteristic the set SW 1) SW is a queue of
processors 2) the head of that queue is the
token holder 3) the last processor in SW is
the last processor that have requested the CS
4) from the head to the end of SW, processors
are linked by next pointers 5) the current_dir
pointer of any processor in SW leads to the end
of the queue. The current_dir pointer of
the last processor in SW points to itself 6)
No cycle of length greater than 1 following the
current_dir pointers of processors in
SW Characteristic of the set NW 1) any
processor in NW is a node of a tree rooted at a
processor in SW. 2) the current_dir of any
processor in NW points towards the parent of
that processor
66Distributed Mutual Exclusion
Path compression
Basic idea (SW and NW illustrated)
current_dir
next
All other processors are in NW
SW
All the current_dir pointers form a tree rooted
at the end of SW
67Distributed Mutual Exclusion
Path compression
Basic idea (management of current_dir and next
pointers)
Request_CS send a REQUEST carrying your id to
the current_dir pointer, set your next
pointer to null, set your current_dir to
yourself / add yourself at the end of SW /
then wait for the token Now,what a processor
does on receiving a REQUEST message ?
68Distributed Mutual Exclusion
Path compression
Basic idea (management of current_dir and next
pointers)
On the receipt of REQUEST(id) If you are in
SW and you are not at the end of SW,
forward REQUEST(id) to your current_dir
/ Note the forwarded request carries the id
/ If you are in SW and you are at the end of
SW, set your next pointer to id If
you have the token but you are not using it,
send the token to id In either case, set your
current_dir to id
69Distributed Mutual Exclusion
Path compression
Basic idea (management of current_dir and next
pointers)
Exit_CS If you are not the last processor in
SW, send the token to your next
set your next pointer to NIL remove
yourself from SW
70Distributed Mutual Exclusion
Path compression The algorithm (li and hudak
modified)
Variables used
Token_hldr True iff the processor holds the
token Incs True iff the processor is using the
CS IsRequesting True iff the processor is
requesting the CS current_dir points to the
parent in the tree next next processor in line
to receive the token
71Distributed Mutual Exclusion
Path compression The algorithm (li and hudak
modified)
Algorithm for requesting the CS
Request_CS() IsRequesting True / I want to
use the token / if not Token_hldr then / the
token is not resident / send(current_dir,
REQUEST self) / send a request /
current_dir self / I add myself at the end
of SW / next NIL / currently I dont
know who is the next after me / wait until
Token_hldr True / wait for the token /
Incs True / I enter the CS /
72Distributed Mutual Exclusion
Path compression The algorithm (li and hudak
modified)
Algorithm for releasing the CS
Release_CS() Incs False / I leave the
CS / IsRequesting False / I dont want
the CS / if next ? NIL then / I am not
at the end of SW / Token_hldr False
/ I am not the current token holder /
send(next, TOKEN) / I send the token to the
next in SW / next NIL /
I do not know who is the next /
73Distributed Mutual Exclusion
Path compressionAlgorithm for monitoring the CS
Monitor_CS() while (True) do wait for a
REQUEST or a TOKEN message on the
receipt of REQUEST(requester) do
if IsRequesting True then if
next NIL then next
requester else
send(current_dir, REQUEST requester)
elseif Token_hldr True then
token_hldr False
send(requester, TOKEN)
else / the processor is not requesting the CS
/ send(current_dir, REQUEST
requester) current_dir
requester end on the
receipt of TOKEN do Token_hldr
True end end
74Outline
Distributed Mutual Exclusion
Timestamp Algorithms Voting Use
of token on logical structures Path
Compression Election The bully
Algorithm The invitation Algorithm
75Election
The problem Getting a set of processors to
agree on a leader Terminology Coordinator ?
leader Participant ? any other processor
Group ? the set consisting of
the coordinator and
all the
participants Use of a coordinator
Replicated data management
Symmetry breaking Assign
subtasks to participants
76Election
Why this is an interesting problem ? In
distributed systems, use of centralized
coordinator simplifies synchronization issues
(e.g. see mutual exclusion) However, a
centralized controller introduces a single point
of failure in a distributed system (this can
limit the service availability) Wherever
a centralized coordinator processor is used, a
leader election algorithm finds its place if
we want to alleviate the problem of single
point of failure If the leader fails, the
computation stops. The goal of an election
is to select a new leader, which will determine
the state of the system and restart the
computation
77Election
- Election vs. token passing
-
- Token passing
- All processors agree about who has the token
- A non-token holder processor only need to that it
does not - have the token
- Usually assume fault-free environment
- Election
- All processors agree about who is the leader
- All processors must know who is the leader
- usually performed only if there are failures
78Outline
Distributed Mutual Exclusion
Timestamp Algorithms Voting Use
of token on logical structures Path
Compression Election The bully
Algorithm The invitation Algorithm
79Election
The Bully algorithm (Assumptions)
Failure assumption processor failure,
When a processor fails, it halts all
processing (fail-stop) Safe storage assumption
Each processor has access to some (local or
remote) permanent storage that survives
processor failures The
permanent storage serves to record version
numbers (well soon see the
role of version numbers)
The survivability of the permanent storage
guarantees that version numbers are
strictly increasing
80Election
The Bully algorithm (Assumptions)
Propagation time assumption The message delivery
subsystem delivers all messages
within Tm seconds of the sending of the
message Message handling time assumption A
node responds to all messages
within Tp seconds of their delivery Resultant
reliable failure detector If a
processor doesnt respond to a message within
T2TmTp seconds, that processor must
have failed. Distributed system with the above
two assumptions are called synchronous
distributed systems
81Election
The Bully algorithm (Assumptions)
Always detectable assumption A failed
processor is always detectable Remember
assumption When a processor recovers from
failure, it knows that it failed
82Election
The Bully algorithm (state transitions of a
processor)
The state of a processor p is maintained in a
variable p.Status that is updated as follows
(in the form precondition follows by the action)
On Recovery Status Down
On starting an election process
StatusElection
On terminating an election process
StatusReorganization
On terminating a reorganization Status
Normal
Ensure that nodes that have recovered from a
failure learn of the changes to the system state
Again ready to proceed
83Election
The Bully algorithm (state transitions of each
processor)
State transitions of every processor in the
system each processor in the system has a
variable Status that is modified as
follows when the processor recovers from a
failure Status Down When the processor
starts the election process Status
Election When the processor knows that the
election is finished Status
Reorganization When the processor receives the
new common state (id of the new leader)
Status Normal
84Election
The Bully algorithm (other variables)
Every processor also has the following
variables Coordinator the current
coordinator Definition the state
information for the task being performed
by the system Up
a set containing names of processors known
to be in the group
(non-failed) Halted name of the
processor that notified you of the current
election
85Election
The Bully algorithm(correctness conditions)
Correctness assertion 1 (always property,
safety) In any consistent state, every pair
of processors p and q satisfies 1) if
p.Status is in Normal, Reorganization and
q.Status is in Normal, Reorganization
then p.Coordinator
q.Coordinator /
p and q have the same knowledge about who is the
leader / 2) if (p.Status Normal) and
(q.Status Normal) then p.Definition
q.Definition / p and q
agree on the state of the task being performed /
Trivial algorithm one that keeps all processors
in Election state
86Election
The Bully algorithm(correctness conditions)
Correctness assertion 2 (eventually property,
liveness) Let G be a consistent state. Then,
in any execution with no further failures
starting in G, P1 ? P2 is eventually true.
Where P1 is the predicate There is a
processor p such (p.Status
Normal) and (p.Coordinator p)
/ there is a leader node in a Normal
state / P2 is the predicate For every
other non-failed processor q,
(q.Status Normal) and (q.Coordinator p)
/ every non-failed processor q is in a
Normal state and q knows that p is the
coordinator /
87Election
The Bully algorithm(assumptions, cont.)
Priority assumption Each processor has a
priority, known by all the
other processors Initiation of an
election An election process is initiated either
on Timeout (by a non-leader node)
you are in Normal or Reorganization state and
you do not hear from the current
leader for a long time on Recovery
you recover from a failure
88Election
The Bully algorithm(assumptions, cont.)
Initiation of an election (cont.) on Detection
of a change in the group (by the coordinator)
you are the current leader and you are
in Normal state and a recovered
processor joins the group OR a
processor leaves the group because of a
failure It is assumed that there is some
mechanism by which the coordinator learns of
recovered processors in a timely manner, failures
are detected by timeout
89Election
The Bully algorithm(Idea)
Priority
p
Intuitively processor p becomes leader only
if there is no non-failed high-priority
processor To achieve this, the idea is that
For a processor p to be elected, it must pass
through four steps. Each step can be
regarded as a sequence of pairs
(question, answer), the question is always sent
by the processor that initiates an
election process
90Election
The Bully algorithm(Idea, intuition behind steps
questions)
Step 1 Check step In this step, a processor
checks if there is a non-failed higher-priority
processor
Priority
p
AreYouUp ?
If no answer within T, processor p goes to Step 2
Otherwise processor p gives up the idea that it
can be the leader
91Election
The Bully algorithm(Idea, intuition behind steps
questions)
Step 2 Bully step In this step, a processor
p attempts to establish itself as the leader
Priority
P.StatusElection
p
Enter_Election ?
Every processor that replies in time is added in
the set Up of processor p When processor p knows
that it has received a timeout or an answer
from every lower-priority processor, it goes to
Step 3
92Election
The Bully algorithm(Idea, intuition behind steps
questions)
Step 3 Acceptation step In this step, p
asks every processor in its Up to accept it as
the leader
P.StatusReorganization
Priority
p
Set_Coordinator ?
If every processor in the set Up of processor p
replies in time, then processor p can go to
Step 4 Otherwise, processor p initiates a new
election process (i.e.go to Step 1) / Note that
it is not necessary to reinitiate an election in
the case where some processors in Up do not
reply /
93Election
The Bully algorithm(Idea, intuition behind
steps questions)
Step 4 Views synchronization step In this
step, p asks every processor in Up to accept its
definition as the correct one
P.Status Reorganization
Priority
p
New_State ?
If every processor in the set Up of processor p
replies in time, then processor p learns that
it is a leader so it can enter the Normal
state. Otherwise, processor p initiates a new
election process (i.e.go to Step 1)
94Election
The Bully algorithm(Idea, intuition behind
steps responses)
Reaction on receiving AreYouUp reply by
sending AYU_answer Enter_Election stop
your current activity (an election process or the
task being performed), enter the Election
state record the name of the sender of
Enter_Election message reply by sending
EE_answer message
95Election
The Bully algorithm(Idea, intuition behind
steps responses)
Reaction on receiving Set_Coordinator If
( your state is Election and the sender is the
processor that made you enter the
Election state) then accept it
as the coordinator, enter the
Reorganization state, and reply by
sending SC_answer
96Election
The Bully algorithm(Idea, intuition behind
steps responses)
Reaction on receiving New_State If (your
state is Reorganization and the sender is the
processor you think is the coordinator)
then enter the Normal state,
synchronize your view of the task being
performed by taking the view of the
new coordinator, reply by sending
NS_answer
97Election
The Bully algorithm(Idea, )
How a coordinator detects that a processor is
failed? A coordinator periodically checks the
state of other processors by sending
AreYouNormal message to every other
processor when a processor q sends a reply to
this question, the coordinator can detect the
exact situation of q
98Election
The Bully algorithm(Code )
Algorithms to initiate an election Coordinator_Tim
eout() / executed when a processor suspects that
the coordinator is
failed/ if
State Normal or State Reorganization then
send(Coordinator, AreYouUp) wait
until Coordinator sends (AYU_answer), timeoutT
on timeout do
Election() end
end Recovery() / executed by a processor that
recovers from a failure / State Down
Election()
99Election
The Bully algorithm(Code )
Algorithm used by the coordinator to check other
processors Check() / executed periodically by
the coordinator / if State Normal and
Coordinator Self then for every
processor j ? Self send(j,
AreYouNormal) wait until j sends
(AYN_answer status), timeoutT
if (j is in Up and statusFalse) or j is not
in Up then Election()
return()
end
100Election
The Bully algorithm(Code )
Algorithm for Election() Election() /different
steps of the bully algorithm / Step 1
highest True for every
higher-priority processor p
send(p, AreYouUp) wait up to
T seconds for (AYU_answer) message
on the receipt of AYU_answer(sender) do
highest False
end end
if highest False then
return()
101Election
The Bully algorithm(Code )
Algorithm for Election() continued Step 2
State Election halted
Self Up empty
for every lower-priority processor p
send(p, Enter_Election)
wait up to T seconds for (EE_answer) message
on the receipt of
EE_answer(sender) do
Up Up ? sender end
end
102Election
The Bully algorithm(Code )
Algorithm for Election() continued Step 3
num_answers 0 Coordinator
Self State Reorganization
for every processor p in Up do
send(p, Set_Coordinator Self)
wait up to T seconds for
(CS_answer) message on
the receipt of CS_answer(sender) do
num_answers num_answers 1
end
end / endwait/ if num_answers
lt Up then / not the same in book /
Election()
return()
103Election
The Bully algorithm(Code )
Algorithm for Election() continued Step 4
num_answers 0 for every
processor p in Up do send(p,
New_State Definition)
wait up to T seconds for (NS_answer) message
on the receipt of
NS_answer(sender) do
num_answers num_answers 1
end end /
endwait/ if num_answers lt Up
then Election()
return()
State Normal / you are elected /
104Election
The Bully algorithm(Code, reaction to questions )
Algorithm for the thread that monitors the
election Monitor_Election() while(true)
wait for a message on the receipt
of AreYouUp(sender) do
send(sender, AYU_answer) end
on the receipt of AreYouNormal do
if State Normal then
send(sender, AYN_answer True)
else send(sender,
AYN_answer False) end
on the receipt of Enter_Election(sender) do
State Election
stop_processing() stop the
election procedure if it is executing
halted sender
send(sender, EE_answer) end
105Election
The Bully algorithm(Code, reaction to questions )
Algorithm for the thread that monitors the
election (cont.) Monitor_Election() on
the receipt of Set_Coordinator(sender,
newleader) do if (State
Election) and (halted newleader) then
Coordinator newleader
State Reorganization
send(sender, SC_answer) end
on the receipt of New_State(sender, newdef) do
if (Coordinator sender) and
(Sate Reorganization) then
Definition newdef
State Normal
send(sender, NS_answer) / not in book /
end
106Election
The Bully algorithm(Concluding remarks
questions)
We can conclude the study of the bully algorithm
by answering the following questions 1) Is
it necessary to have the acceptation step?
2) What happens if two processors try to
become leader by the same time ? 3) Is
it necessary to initiate a new election when you
find that a processor that replied to
Enter_Election is no longer in the
group?
107Election
The Bully algorithm(Concluding remarks answers)
1) Is it necessary to have the acceptation
step? The answer to this question
seems to be Yes if we want to
guarantee the correctness condition
1 Justification?
108Election
The Bully algorithm(Concluding remarks answers)
2) What happens if two
processors try to become leader by the
same time ? The processor with
higher priority will win 3) Is it necessary
to initiate a new election when you find that a
processor that replied to Enter_Election
is no longer in the group?
Yes, if you want to have an accurate view of your
group However, it is not necessary
for the correctness conditions
109Outline
Distributed Mutual Exclusion
Timestamp Algorithms Voting Use
of token on logical structures Path
Compression Election The bully
Algorithm The invitation Algorithm
110Election
The Invitation algorithm(assumptions)
Safe storage assumption as for the bully
algorithm Transmission time assumption
arbitrary Response time assumption
arbitrary Failure assumption processors
network partition
Instead of looking for
a global coordinator,
tie the coordinator to the group that it
coordinates
111Election
The Invitation algorithm(assumptions)
Role of the coordinator Leading its
group in a distributed computation Ensure
that participants in the group are working
according to the same plan Note the
coordinator might form the same group twice
but with different working plan
use of sequence numbers to
make unambiguous
the group identity A new variable is
introduced Group identifier of your group
112Election
The Invitation algorithm(cont.)
Each Group identifier is unique All the members
of the group agree on the same group number The
group number is included in the defining state of
a node Note we still have priorities
113Election
The Invitation algorithm(correctness conditions)
Correctness assertion 3 (always property,
safety) In any consistent state, every pair
of processors p and q satisfies 1) if
p.Status is in Normal, Reorganization and
q.Status is in Normal, Reorganization
and p.Group q.Group then
p.Coordinator q.Coordinator
/ p and q have the same
knowledge about who is the leader / 2) if
(p.Status Normal) and (q.Status Normal) and
p.Group q.Group then
p.Definition q.Definition / p and q agree
on the state of the task being performed /
114Election
The Invitation algorithm(correctness conditions)
Correctness assertion 4 (eventually property,
liveness) Let G0 be a consistent state R be a
maximal set of processors that can communicate in
state G0. Then, in any execution with no further
failures starting in G0 and such that R remains a
maximal set of processors that can communicate,
P1 ? P2 is eventually true. Where P1 is
the predicate There is a processor p in R
such (p.Status Normal) and
(p.Coordinator p) /
there a leader node from R in a Normal state /
P2 is the predicate For every other
non-failed processor q in R,
(q.Status Normal) and (q.Coordinator p)
/ every non-failed processor q is in a
Normal state and q knows that p is the leader /
115Election
The Invitation algorithm(Idea)
Ensuring that correctness condition 3.1 is
satisfied When you want to establish yourself as
the coordinator you create a new group
target a set of potential participants (including
yourself), try to make this group as large as
possible by suggesting the potential
participants to join the new group with you as
the leader To join the new group, a participant
accepts your suggestion Since group id is
unique, this ensures that correctness condition
3.1 is satisfied When the group is formed,
you send the new definition to all the
participants in the group
116Election
The Invitation algorithm(Idea)
Ensuring that correctness condition 3.2 is
satisfied To ensure condition 3.2, A
participant accepts a new definition only from
its current group coordinator and when the
participant is in the Reorganization
state
117Election
The Invitation algorithm(Idea)
- Ensuring that correctness condition 4 is
satisfied -
- The difficulty to ensure correctness condition 4
is that more than one coordinators might compete
for participants -
- How the invitation algorithm solves this problem
? - Let the competing coordinators to agree to
merge their - groups into a single group
- one coordinator invites the other to merge into a
single group - use of delays to avoid livelock
118Election
The Invitation algorithm
Algorithm used by a coordinator to find the
coordinators of other groups Check() if
statusNormal and Coordinatorself then
Others for every other processor p
send(p,AreYouCoordinator) Wait
up to T seconds for (AYC_answer) message
on receipt of AYC_answer(senderis_coordi
nator) if is_coordinator
true then Others
Others ? sender end
if Others then return()
Wait for a time inversely proportional to
your priority Merge(Others)
119Election
The Invitation algorithm
Algorithm used by a non-coordinator to handle a
suspected failure of the coordinator Timeout()
if Coordinatorself then return()
send(Coordinator,AreYouTheregroup)
Wait up to T seconds for (AYC_answer) message
on timeout do
is_coordinator False end
on AYC_answer(senderis_coordinator)
do / nothing /
end if is_coordinator False then
recovery()
120Election
The Invitation algorithm
Algorithm used for recovery after a failure, also
for declaring a new group Recovery() Status
Election stop_processing() Counter
Group (Self Counter) / to make group
number unique , Count serves a version number /
Coordinator Self Up Status
Reorganization Description ( a single node
task description) Status Normal
121Election
The Invitation algorithm
Algorithm used by a Coordinator to Merge other
groups into one Merge(Coordinator_set) If
CoordinatorSelf and StatusNormal then
Status Election stop_processing()
Count Group (Self Count)
Coordinator Self UpSet Up
Up for each p in
Coordinator_set send(p,InvitationS
elf,Group) for each p in UpSet
send(p,InvitationSelf,Group)
Wait for T seconds / answers are collected by
the Monitor_Election thread /
Status Reorganization
122Election
The Invitation algorithm
Algorithm used by a Coordinator to Merge other
groups into one Merge(Coordinator_set) / CONT.
/ num_answer 0 for each p in Up
send(p, Ready Group, Definition)
wait up to T seconds for Ready_answer messages
on Ready_answer(senderingroup,new_gr
oup) do if ingroupTrue and
new_groupGroup then
num_answer end if
num_answer lt Up then Recovery()
else Status Normal
123Election
The Invitation algorithm
Algorithm executed by the thread that handles
invitations Invitation() SEE BOOK