Title: Distributed Snapshots:
1Distributed Snapshots
- Non-blocking checkpoint coordination protocol
Next Uncoordinated Chkpnt
2Uncoordinated
- Processes take chkpnt independently
- Domino Effect!
Next Coordinated Blocking Chkpnt
3Coordinated Blocking
- Processes are coordinated to form a consistent
global state, and
okay, channels flushed
Ready!
Go!
initiator
p1
p2
p3
Next Coordinated Blocking Chkpnt (cont)
4Coordinated Blocking (cont)
- Advantage
- Always consistent
- No Domino Effect
- Less storage overhead
- Disadvantage
- Large latency to chkpnt!
Next Coordinated Non-blocking Chkpnt
5Coordinated Non-blocking
- Processes are coordinated, but
- Do we really need to block ?
!
K. Mani Chandy
Leslie Lamport
!
Next Global-state Recording Algorithm
6Global-state Recording Alg.
Distributed snapshots determining global
states of distributed systems, K. Mani
Chandy and Leslie Lamport
- Step 1 process states
- Step 2 channel states
- Step 3 end of the algorithm
Next Model of Distributed System
7Model of Distributed System
- Processes
- Channels directed, FIFO, error-free
Next Step 1, process states
8Step 1 process states
- Initiator
- Save its local state
- Send marker tokens on all outgoing edges
- All other processes
- On receiving the first marker on any incoming
edges, - Save state, and propagate markers on all outgoing
edges - Resume execution.
- Further markers will be eaten up.
Next Example
9p
x
x
q
x
x
x
r
Next Proof
10Let us assume that a message m exists, and it
makes our cut inconsistent.
p
m
q
Next Proof (cont)
11Incomplete page
p
m
x1
- x1 is the 1st marker
- for process q
q
x2
p
m
(2) x1 is not the 1st marker for process q
x1
q
x2
Contradict the assumption.
Next Step 2, channel states
12Step 2 channel states
p
In-flight messages
q
- Sent along the channel before the senders
chkpnt - Received along the channel after the receivers
chkpnt
Next Example
13(1) p is receiving messages
(2) p has just saved its state
r
r
s
s
q
q
x
x
7
7
x
x
8
8
5
5
x
3
6
6
2
1
4
4
p
p
x
x
u
u
t
t
Next Example (cont)
14ps chkpnt triggered by a marker from q
r
s
x
q
x
7
1
2
3
5
4
6
7
8
p
x
8
5
x
x
3
6
q
2
1
4
x
x
x
p
r
x
s
u
t
x
Next Algorithm (revised)
15Algorithm (revised)
- Initiator
- Save its local state
- Send marker tokens on all outgoing edges
- All other processes
- On receiving the first marker on any incoming
edges, - Save state, and propagate markers on all outgoing
edges - Resume execution, but also save incoming messages
until a marker arrives through the channel - Guarantees a consistent global state!
Next Step 3, end of the algorithm
16Step 3 end of the algorithm
- Did every process save its state and in-flight
messages?
- direct channel to the initiator?
- spanning tree?
- General solution?
Next References
17References
- Distributed snapshots determining global
- States of distributed systems,
- K. Mani Chandy and Leslie Lamport