UBI529 Distributed Algorithms - PowerPoint PPT Presentation

1 / 58

About This Presentation

Title:

UBI529 Distributed Algorithms

Description:

Goal: Take a snapshot of the global computation ... Group of photographers taking snaps of different portions and trying to combine ... – PowerPoint PPT presentation

Number of Views:271

Avg rating:3.0/5.0

Slides: 59

Provided by: kevin589

Category:

more less

Transcript and Presenter's Notes

Title: UBI529 Distributed Algorithms

1
UBI529 Distributed Algorithms
Global State of Distributed Systems
2
Motivation

Goal Take a snapshot of the global computation
A snapshot of local states on n processes taken
at exactly the same time
Two terms global state and global snapshot
Useful for debugging
Useful for backup/check-pointing
Useful for calculating global predicate
E.g., Exactly how much currency do we have in the
country (notice that money flows among people
constantly)?
Deadlock Detection
Rollback Recovery
Termination Detection

3
Global state

Global state
A set of local states that are concurrent with
each other
Concurrent states no two states have a happened
before relation with each other

4
The mystery of the missing dollars
Send 100
B
A
300
400

Picture taken at A - 400
A sends 100 to B
Picture taken at B - 400
Total is 800

5
Global Snapshot Problem

Determine the global system state (e.g. the total
money )
Each process records its own state
No shared clock/memory
Group of photographers taking snaps of different
portions and trying to combine to get the overall
picture.

6
Consistent cut

Given computation (E,!) and F µ E is a cut iff
F is a consistent cut (global snapshot) iff

7
Consistent and inconsistent cuts
8
Consistent cut
A cut is a set of events.

(a ? consistent cut C) ? (b happened before
a) ? b ? C

b
g
c
a
d
P1
e
m
f
P2
P3
k
h
i
j
Cut 1
Cut 2
(Not consistent)
(Consistent)
9
Consistent snapshot

The set of states immediately following a
consistent cut forms a consistent snapshot of a
distributed system.
A snapshot that is of practical interest is the
most recent one. Let C1 and C2 be two consistent
cuts and C1 ? C2. Then C2 is more recent than C1.
Analyze why certain cuts in the one-dollar bank
are inconsistent.

10
Consistent snapshot

How to record a consistent snapshot? Note that
1. The recording must be non-invasive
2. Recording must be done on-the-fly.
You cannot stop the system.

11
Chandy Lamport Algorithm

Assumes
FIFO and
Unidirectional channels
A bidirectional channel is modelled as two
unidirectional channels
Each process has an associated color. All
processes are initially white.
A process records it local state just before
turning red
On turning red the process sends out a marker on
all outgoing channels
On receiving a marker a white process turns red

12
Chandy-Lamport Algorithm

Works on a
(1) strongly connected graph
(2) each channel is FIFO.
An initiator initiates the algorithm by sending
out a marker ( )

13
White and red processes

Initially every process is white. When a process
receives a marker, it turns red if it has not
already done so.
Every action by a process, and every message sent
by a process gets the color of that process.

14
Two steps

Step 1. In one atomic action, the initiator (a)
Turns red (b) Records its own state (c) sends a
marker along all outgoing channels
Step 2. Every other process, upon receiving a
marker for the first time (and before doing
anything else) (a) Turns red (b) Records its own
state (c) sends markers along all outgoing
channels
The algorithm terminates when (1) every process
turns red, and (2) Every process has received a
marker through each incoming channel.

15
Why does it work?

Lemma 1. No red message is received in a white
action.

16
Why does it work?
All white
All red
SSS
Easy conceptualization of the snapshot state

Theorem. The global state recorded by
Chandy-Lamport algorithm is equivalent to the
ideal snapshot state SSS.
Hint. A pair of actions (a, b) can be scheduled
in any order, if there is no causal order between
them, so (a b) is equivalent to (b a)

17
Why does it work?
Let an observer observe the following
actions wi wk rk wj ri wl rj rl
? wi wk wj rk ri wl rj rl
Lemma 1 ? wi wk wj rk wl ri rj
rl Lemma 1 ? wi wk wj wl rk ri
rj rl done!
Recorded state
18
Example 1. Count the tokens

Let us verify that Chandy-Lamport snapshot
algorithm correctly counts
the tokens circulating in the system

D
C
A
B
How to account for the channel states? Use sent
and received variables for each process.
19
Chandy Lamport Algorithm
20
Algorithm
public class RecvCamera extends Process
implements Camera . . . public
RecvCamera(Linker initComm, CamUser app)
. . . for (int i 0 i lt N i)
if (isNeighbor(i))
closedi false chani new
LinkedList() else closedi
true public synchronized void
globalState() myColor red
app.localState() // record local State
sendToNeighbors("marker", myId) // send
Markers public synchronized void
handleMsg(Msg m, int src, String tag)
if (tag.equals("marker")) if
(myColor white) globalState()
closedsrc true if (isDone())
----- Display channel state
(transit messages) chan ----
else // application message
if ((myColor red)
(!closedsrc))
chansrc.add(m) app.handleMsg(m,
src, tag) // give it to app
boolean isDone() if (myColor white)
return false for (int i 0 i lt N
i) if (!closedi) return false
return true
21
Lai Yang Algorithm

LY1. The initiator records its own state. When
it needs to send a message m to another process,
it sends a message (m, red).
LY2. When a process receives a message (m, red),
it records its state if it has not already done
so, and then accepts the message m.

22
Another example of distributed snapshot
Communicating State Machines
23
Something unusual

Let machine i start Chandy-lamport snapshot
before it has sent M along ch1. Also, let machine
j receive the marker after it sends out M along
ch2. Observe that the snapshot state is
down ? up M
Doesnt this appear strange? This state was
never reached during the computation!

24
Understanding snapshot
25
Understanding snapshot
The observed state is a feasible state that is
reachable from the initial configuration. It may
not actually be visited during a specific
execution. The final state of the original
computation is always reachable from the
observed state.
26
Discussions

What good is a snapshot if that state has never
been visited by the system?
- It is relevant for the detection of stable
predicates.
- Useful for checkpointing.

27
Discussions

What if the channels are not FIFO?
Study how Lai-Yang algorithm works. It does not
use any marker
LY1. The initiator records its own state. When
it needs to send a message m to another process,
it sends a message (m, red).
LY2. When a process receives a message (m, red),
it records its state if it has not already done
so, and then accepts the message m.
Question 1. Why will it work?
Question 1 Are there any limitations of this
approach?

28
Global state collection

Some applications
- computing network topology
- termination detection
- deadlock detection
Chandy Lamport algorithm does a partial job.
Each process collects a fragment of the global
state, but these pieces have to be stitched
together to form a global state.

29
A simple exercise

Once the pieces of a consistent global state
become available, consider collecting the global
state via all-to-all broadcast
At the end, each process
will compute a set V, where
V s(i) 0 i N-1

s(i)
s(j)
i
j
s(k)
s(l)
k
l
30
All-to-all broadcast
Assume that the topology is strongly connected
graph

Program broadcast (for process i
define V.i, W.i set of values
initially V.is(i), W.i ??
?and?every channel is empty?
do V.i ? W.i? send (V.i \ W.i) to every outgoing
channel W.i V.i
? empty (k, i)? receive X from channel(k, i)
V.i V.i ? X
od

V.i W.i
V.k W.k
(i,k)
Acts like a pump
31
Proof

Lemma. empty (i. k) ? W.i ??V.k.
(Upon termination) ?i V.i W.i,
and all channels are empty.
So, V.i ?? V.k.
On a cyclic path, V.i V.k must be
true. Since s(i) ??V.i, s(i) ??V.k

V.i W.i
V.k W.k
(i,k)
32
Acknowledgements

This part is heavily dependent on Dr. Sukumar
Ghosh Iowa University Distributed Systems course
22C166

33
(No Transcript)
34
Termination Detection and Deadlocks
35
Termination detection

During the progress of a distributed computation,
processes may periodically turn active or
passive.
A distributed computation termination when
(a) every process is passive,
(b) all channels are empty, and
(c) the global state satisfies the desired
postcondition

36
Visualizing diffusing computation
initiator
active
passive
Notice how one process engages another process.
Eventually all processes turn white, and no
message is in transit -this signals termination.
How to develop a signaling mechanism to detect
termination?
37
Dijkstra-Scholten algorithm
The basic scheme

Node j engages node k.

An initiator initiates termination detection
by sending signals (messages) down the
edges via which it engages other nodes.
At a suitable time, the recipient sends an
ack back.
When the initiator receives ack from every
node that it engaged, it detects termination.

j
k
signal
j
k
j
k
ack
38
Dijkstra-Scholten algorithm

Deficit (e) of signals on edge e - of ack
on edge e
For any node, C total deficit along incoming
edges
and D total deficit along outgoing
edges
For the initiator, by definition, C 0
Dijkstra-Scholten algorithm used the following
two
Invariants to develop their algorithm
Invariant 1. (C 0) ? (D 0)
Invariant 2. (C gt 0) ? (D 0)

0
1
2
3
4
5
39
Dijkstra-Scholten algorithm

The invariants must hold when an interim node
sends an ack.
So, acks will be sent when
(C-1 0) ? (C-1 gt 0 ??D0)
follows from INV1 and INV2
(C gt 1) ?? (C 1 ? D0)
(C gt 1) ??(C 1 ? D0)

0
1
2
3
4
5
40
Dijkstra-Scholten algorithm

program detect for an internal node i
initially C0, D0, parent i
do
- m signal ? (C0) ?
C1 state active parent sender
this node can send out messages to engage other
nodes, or turn passive
- m ack ? D D-1
- (C1? D0) ? state passive ? send ack
to parent C 0 parent i
- m signal ? (C1) ?
send ack to the sender
od

0
1
2
3
4
5
Note that the engaged nodes induce a spanning tree
41
Distributed deadlock

Assume each process owns a few resources, and
review how resources are allocated.
Why deadlocks occur?
- Exclusive (i.e not shared) resources
- Non-preemptive scheduling
- Circular waiting by all or a subset of
processes

42
Distributed deadlock

Three aspects of deadlock
deadlock detection
deadlock prevention
deadlock recovery

43
Distributed deadlock

May occur due to bad designs/bad strategy
Sometimes prevention is more expensive than
detection and recovery. So designs may not care
about deadlocks, particularly if it is rare.
Caused by failures or perturbations in the system

44
Wait-for Graph (WFG)

Represents who waits for whom.
No single process can see the WFG.
Review how the WFG is formed.

45
Another classification

Resource deadlock
R1 AND R2 AND R3
also known as AND deadlock
Communication deadlock
R1 OR R2 OR R3
also known as OR deadlock

46
Detection of resource deadlock

Notations
w(j) true ? (j is waiting)
depend j,i true ??
j ? succn(i) (ngt0)
P(i,s,k) is a probe
(iinitiator, s sender, rreceiver)

2
1
3
4
P(4,4,3)
initiator
47
Detection of resource deadlock

Program for process k
do
P(i,s,k) received ? wk ? (k ? i) ??
dependk, i ?
send P(i,k,j) to each successor j dependk,
i true
P(i,s, k) received ??wk ? (k i) ? process k
is deadlocked
od

48
Observations

To detect deadlock, the initiator must be in a
cycle
Message complexity O(E)
(edge-chasing algorithm)

Eset of edges
Should the links be FIFO?
49
Communication deadlock
This has a resource deadlock but no
communication deadlock
50
Detection of communication deadlock

A process ignores a probe, if it is not waiting
for any process. Otherwise,
first probe ?
mark the sender as parent
forwards the probe to successors
Not the first probe ?
Send ack to that sender
ack received from every successor ?
send ack to the parent
Communication deadlock is detected
if the initiator receives ack.

Has many similarities with Dijkstra-Scholtens
termination detection algorithm
51
Distributed deadlock

May occur due to faulty design or resource
sharing problems
Sometimes prevention is more expensive than
detection and recovery. So certain designs
deliberately do not care about deadlocks,
particularly if it is rare.
Sometimes failures failures or perturbations can
modigy the system state and cause deadlock.

Major issues
detection
prevention
recovery
52
Wait-for Graph (WFG)

Represents who waits for whom.
No single process can see the WFG.
Review how the WFG is formed.

53
Another classification

Resource deadlock
R1 AND R2 AND R3
also known as AND deadlock
Communication deadlock
R1 OR R2 OR R3
also known as OR deadlock

54
Detection of resource deadlock

Notations
w(j) true ? (j is waiting)
depend j,i true ??j ? succn(i) (ngt0)
P(i,s,k) is a probe
(iinitiator, s sender, rreceiver)

2
1
3
4
P(4,4,3)
initiator
55
Detection of resource deadlock
Chandy-Misra-Haas algorithm

Program for process k
do P(i,s,k) received ?
wk ? (k ? i) ?? dependk, i ?
send P(i,k,j) to each successor j dependk,
i true
P(i,s,k) received ??wk ? (k i) ? process k
is deadlocked
od

56
Observations

To detect deadlock, the initiator must be in a
cycle
Message complexity O(E)
(edge-chasing algorithm)

Eset of edges
57
Communication deadlock
5
The subgraph of the WFG consisting of black nodes
and black edges has a resource deadlock as well
as a communication deadlock. However, if we add
node 5 and the red edge (4,5) then the
communication deadlock will disappear.
58
Detection of communication deadlock