Title: Ch13 Checkpointing and Recovery
1Ch13Checkpointing and Recovery
2Outline
Introduction What ?
Why?
Where? Problems in Rollback
Incarnation numbers Taxonomy of
solution techniques Uncoordinated
checkpoint Coordinated checkpoint
Synchronous Logging Asynchronous
Logging Adaptive Logging
3Checkpointing and Recovery
Introduction During a computation, a
node might fail and then be repaired
After a failed processor has been
repaired, how to take the system
to a consistent global state? If every
processor periodically records
its local state on stable storage,
records messages received on stable storage
Then One can take the system to
a consistent global state by rolling back the
system to a previously recorded global
state Terminology checkpointing
record state in a stable storage log
received messages record received messages on a
stable storage
4Checkpointing and Recovery
Recovery line A set C of local
checkpoints forms a consistent state
(also called recovery line) if the following
conditions are satisfied
1) there are no lost messages in C 2)
there are no orphan messages in C 3) C
contains exactly one checkpoint for each
processor
5Checkpointing and Recovery
Problems in rollback Goal of rollback
is to roll back the system to a consistent
state Some precautions have to be taken
for this to work properly For
simplicity, we do not consider channel state for
the rollback To see the problem,
assume 1) processors checkpoint
from time to time 2) checkpoints
are established independently without any
coordination between themselves
6Checkpointing and Recovery
Problems in rollback To see the problem,
assume 1) processors checkpoint
periodically 2) checkpoints are
established independently without any
coordination between themselves
p3
p1
p2
m2
m3
c2
c3
The global state formed by c1,c2,c3 is
inconsistent it contains lost messages m2,
m3 orphan messages m1
m1
c1
7Checkpointing and Recovery
Problems in rollback cascading rollbacks
p
q
r
p
p3
p rolls back to p3 requires , because
of message m1 that r rolls back to r4
...
q1
r1
p1
?
m1
r
r4
r2
?
m2
q2
q
q4
p2
?
r3
m3
m4
p
q3
p2
m5
?
m4
q
r4
p2,q3,r3 is a recovery line
q3
q4
m3
?
m5
r
m2
r3
p3
A rollback by a processor can cause an avalanche
of rollbacks
m1
How to avoid this ?
8Checkpointing and Recovery
Problems in rollback I/O stuttering
p
q
r
Rolling back processor p to pi requires that the
I/O event be re-executed I/O stuttering How can
we avoid this ?
pi
I/O
Log inputs avoid input stuttering Output commit
avoid output stuttering
9Checkpointing and Recovery
Problems in rollback messages duplication
q
p
q
r(m)
p
pi
pi
Rollback(p)
m
r(m)
After p recovers
m
r(m)
After recovery, processor p sends m again.
Processor q should recognize that message m is
a duplicate message
Processor p rolls back to pi No need for q to
roll back
10Checkpointing and Recovery
Incarnation numbers handling duplicate messages
Every processor maintains an incarnation
number on a stable storage stores a guess
of the incarnation number of every other
processor On every recovery from failure or
rollback, the incarnation number is
incremented Each message carries the
incarnation number of the sender
11Checkpointing and Recovery
Incarnation numbers handling duplicate messages
Evolution of a processor is organized into
periods. Incarnations numbers serve to
identify these periods
Recovery from failure
Rollback
2
0
1
period 0 period 1
When processor p receives a message m from
processor q, processor p behaves as follows if
m.incarnation lt incarnationq message m is a
duplicate, discard it if
deliver m if
gt m belongs
to an incarnation that p dont know yet, so block
the delivery of m until m.incarnationincar
nationq
12Checkpointing and Recovery
Choices to be made to implement a recovery
scheme To log or not to log messages
? Log messages
increases flexibility at the recovery time
- expensive (space)
processes must be deterministic (which is not
often the case)
13Checkpointing and Recovery
Choices to be made to implement a recovery
scheme To coordinated or not to coordinated
recording state? Uncoordinated
checkpoints Sufficient information
(well see later) must be kept for rollback
keeps the cost of establishing
checkpoints low - the amount
of rollback may be unbounded
Coordinated checkpoints The set
of checkpoints together form a recovery line
limits the amount of rollback
- increases the cost of establishing
checkpoints
14Checkpointing and Recovery
Uncoordinated checkpointing Assumptions 1.
Processors asynchronously checkpoint from time to
time 2. No coordination between processors
for establishment of checkpoints
3. No log of messages Goal find a
maximal recovery line (latest recovery line)
i.e the one that happens after every other
possible recovery line
15Checkpointing and Recovery
Uncoordinated checkpointing Checkpoint interval
algorithm (progressive rollback) Notations
Ci,j the jth checkpoint at processor
pi Ii,j the interval Ci,j
Ci,j1, processing interval of pi between
Ci,j and Ci,j1 Definition
Ik,l depends on Ii,j iff there is a
message m sent in Ii,j and received in
Ik,l
pi
pk
Ci,j
m
Ck,l
Ci,j1
Ck,l1
16Checkpointing and Recovery
Uncoordinated checkpointing Checkpoint interval
algorithm (progressive rollback) Idea of the
algorithm When a processor pi fails and
then is repaired 1. Processor pi
initiates recovery by restoring its last
checkpoint, say Ci,j 2. Every
processor pk in Ik,l such that Ik,l depends on
Ii,j rolls back (but to which
checkpoint ? Well see later) 3. This
process continues recursively (transitively)
until a recovery line is
determined To support recovery, the
information about interval dependence must
be recorded (This is the sufficient information !)
17Checkpointing and Recovery
Uncoordinated checkpointing Interval dependence
graph to capture rollback requirements GI is a
graph in which VI vertices are checkpoint
intervals that exist when recovery starts EI
directed edges such that 1). for every
processor pi, (Ii,j , Ii,j1) is
in EI 2). If Ik,l depends on Ii,j then
(Ii,j , Ik,l) is added to EI
If
then
then
If
Ii,j
Ii,j
Ii,j
Ii,j
Ii,j1
Ii,j1
Ik,k1
Ik,l
in GI
in GI
18Checkpointing and Recovery
Uncoordinated checkpointing Intuition behind
interval dependence graph If processor pi rolls
back to Ci,j and Ik,l depends on Ii,j then
processor pk must roll back to Ck,,l This, to
avoid orphan messages
If
then
and
Ii,j
m
pi
pk
Ci,j
Ck,l
Ik,l
Because of m
19Checkpointing and Recovery
Uncoordinated checkpointing Interval dependence
graph illustrated
p1
p3
p2
2,1
1,1
3,1
I1,1
I3,1
I2,1
m5
I3,2
I1,2
1,2
3,2
I2,2
2,2
m3
m4
I1,3
I3,3
1,3
m2
3,3
I2,3
2,3
m1
I1,4
I3,4
1,4
3,4
Message passing and checkpoiting
Interval dependence graph
20Checkpointing and Recovery
Uncoordinated checkpointing The checkpoint
interval algorithm (progressive rollback) When a
processor pi fails and then is repaired, then pi
performs Step 1. Compute GI Step 2. Mark
the node of GI corresponding to its last
checkpoint interval Let Ii,j be
that node. Mark all the nodes of
GI that are reachable from Ii,j Step 3.
Define for each processor k, the best
checkpoint of k w.r.t. recovery
of pi to be Ck,l such that
l min j Ik,j is marked
every processor rolls back to its best
checkpoint
21Checkpointing and Recovery
Uncoordinated checkpointing The algorithm
illustrated assume that p2 fails and then is
repaired
2,1
1,1
3,1
Step 1. p2 computes GI
1,2
3,2
2,2
1,3
3,3
2,3
1,4
3,4
Interval dependence graph
22Checkpointing and Recovery
Uncoordinated checkpointing The algorithm
illustrated assume that p2 fails and then is
repaired
2,1
1,1
3,1
Step 2. p2 marks all the nodes of GI
reachable from its last checkpoint interval
1,2
3,2
2,2
1,3
3,3
2,3
Recall for each processor k the best
checkpoint of k w.r.t. recovery of p2 is Ck,l
such that l min j Ik,j is marked
1,4
3,4
Interval dependence graph
23Checkpointing and Recovery
Uncoordinated checkpointing The algorithm
illustrated assume that p2 fails and then is
repaired
p1
p3
p2
Step 3. Each processor rolls back to its
best checkpoint w.r.t. Recovery of p2
I1,1
I3,1
I2,1
m5
I3,2
I1,2
I2,2
m3
m4
I1,3
I3,3
m2
I2,3
Recall for processor k the best checkpoint of
k w.r.t. recovery of p2 is Ck,l such that l
min j Ik,j is marked
m1
I1,4
I3,4
The recovery line determined
24Checkpointing and Recovery
Uncoordinated checkpointing Some comments about
the checkpoint interval algorithm Rollback can
take the system to the initial state The
algorithm presented is a centralized algorithm
can be implemented on a recovery manager that
directs all the participants to restart,
each from its best checkpoint For a
distributed version, recovery control
messages are must be used to communicate
parts of GI
25Checkpointing and Recovery
Coordinated checkpointing Idea Processors
coordinate the checkpointing of their local
states to ensure that the checkpoints taken by
the different processors form a recovery line
This avoid
cascading rollback Method used Similar to
that used for computing a global snapshot
However, there are some differences
26Checkpointing and Recovery
Coordinated checkpointing Subtleties 1. Only
processor states are recorded (save space) 2.
Failures during checkpointing are handled 3.
Store the minimum number of checkpoints (save
space) 4. Lost messages are handled by the
communication protocol (a consistent set
of checkpoints may now contain lost messages)
5. No orphan messages in the computed set of
checkpoints
27Checkpointing and Recovery
Coordinated checkpointing Subtleties (cont.) 6.
Only a minimum number of processors must
checkpoint idea old
checkpoints together with new checkpoints of some
processors may form a consistent
set of checkpoints
28Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) Uses a two-phase protocol
to ensure that either all processors
checkpoint or none do Two types of
checkpoints are used for that
tentative checkpoint
established when global state recording is
ongoing permanent checkpoint
if the recorded state is consistent,
tentative checkpoints become
permanent checkpoints
29Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) Basic idea Phase 1
Initiator q 1. an initiator
processor q takes a tentative checkpoint
2. q requests all other processors to take
tentative checkpoints Non-initiator
p on receiving this request
1. p establish/ not establish the
tentative checkpoint 2. p
sends its decision to the initiator
3. p waits for the final decision from q
(i.e. refrains from any
communication with any other until the second
phase is over)
30Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) Basic idea (cont.)
Phase 2 Initiator q 1.
Processor q collects decisions from all other
processors 2. If
all other processors have taken tentative
checkpoints then
q makes its tentative checkpoint permanent
else q undo
its tentative checkpoint 3. q
requests all others to perform the same final
decision Non-initiator p on
receiving this final decision
processor p executes the order
31Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) The Basic idea ensures
that there are no orphan messages
Why?
32Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) The Basic idea ensures
that there are no orphan messages
Why? Answer no
communication is allowed until the second phase
is over
33Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) It is not necessary that
all processors record their state during
checkpointing Why ?
34Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) It is not necessary that
all processors record their state during
checkpointing Why ?
p2
p1
p3
p1 initiates checkpointing by establishing
c1,1 then p1 contacts p2, p3 sending red
messages assume that everything went fine and
p2, p3 establish c2,2 and c3,2 respectively as
new checkpoints c1,2 , c2,2 , c3,2 form a
consistent set of checkpoints However, c1,2 ,
c2,1 , c3,2also form a consistent set of
checkpoints (i.e. no orphan messages) Hence,
processor p2 need not take a new checkpoint
C1,1
C2,1
C3,1
C1,2
C2,2
C3,2
35Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) Ensuring a minimum number
of checkpoints Every processor assigns
monotonically increasing sequence numbers
to each message it sends Each processor p
uses p.last_rec1..M an array of sequence
numbers p.last_reci sequence number
of the last message that processor p received
from
processor pi since ps last checkpoint
p.first_sent1..M an array of sequence numbers
p.first_senti sequence number of the
first message that processor p sent to
processor pi
since ps last checkpoint
36Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) Ensuring a minimum number
of checkpoints When an initiator processor
q requests a processor p to take a
tentative checkpoint, processor q
appends q.last_recp to its request On
receiving this request from q,
processor p takes the tentative checkpoint only
if (p.first_sentq ?
q.last_recp)
p
q
Last checkpoint of p
Last checkpoint of q
p takes a new checkpoint only in this case ?
avoid orphan messages
Current checkpoint of q
37Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) Ensuring a minimum number
of checkpoints (cont.) Only processors that
have sent messages to the initiator processor q
since qs last checkpoint need to consider the
establishment of a new checkpoint requested
by q ? an initiator processor q should send
requests only to those processors p such
that
p
q
Last checkpoint of q
Current checkpoint of q
38Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) Ensuring a minimum number
of checkpoints (cont.) Every processor q
maintains q.checkpoint_cohort a set
that contains those processors from
which q has
received some messages
since qs last chekpoint
i.e. q.checkpoint_cohort stores processors p such
that
p
q
Last checkpoint of q
Current checkpoint of q
39Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) The algorithm Phase 1
Initiator processor q 1. Take
tentative checkpoint 2. for every
processor p in q.checkpoint_cohort do
send (Request_tentative_chkp q.last_recp)
to p
40Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) The algorithm Phase 1
Non-initiator processor p On receiving
Request_tentative_chkp q.last_recp from q
if (ready to perform tentative checkpoint)
and (p.first_sentq ? q.last_recp) then
take tentative checkpoint
for every processor r in
p.checkpoint_cohort do
send (Request_tentative_chkp p.last_recr) to
r p.replies empty
for every processor r in
p.checkpoint_cohort do
wait until r sends OK or KO , TimeoutT
on OK add r to
p.replies / set of replies /
If p.replies ? p.checkpoint_cohort then
send KO to q
else send OK to q
41Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) The algorithm Phase 2
Initiator processor q 1. q.replies
empty 2. for every processor p in
q.checkpoint_cohort do wait
until p sends OK or KO , TimeoutT
on OK add p to q.replies
/ set of replies / if q.replies ?
q.checkpoint_cohort then
undo tentative send
undo tentative checkpoint to
every processor in q.checkpoint_cohort
else
permanent tentative
send make tentative checkpoint permanent to
every processor in
q.checkpoint_cohort
42Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) The algorithm Phase 2
Non-initiator processor p wait until q
sends undo or make permanent timeout
T on undo do
undo tentative checkpoint
end on make permanent
do checkpoint
tentative_checkpoint end
if no timeout then m
message received for every
processor r in p.checkpoint_cohort do
send m to r
43Checkpointing and Recovery
Coordinated checkpointing Koo Toueg 87 (the
original algorithm) Handling failures idea
Failures are detected by timeouts On
recovery, if the recovering processor was
the initiator, it undoes its
tentative checkpoint and sends this decision to
the other processors else
the recovered processor consults the
initiator oe some other processor
to find the final decision
44Checkpointing and Recovery
Logging Idea Processors record incoming
messages Purpose avoid need of
resending reduce the amount of
rollback (idea of virtual checkpoint)
Log messages
flexibility - expensive
Virtual checkpoint
45Checkpointing and Recovery
Synchronous Logging Idea Each message must
be logged before it can be delivered During
recovery, logged messages are replayed
until the recovering processor is up to
date (guarantee of replay after all
sends that can cause subsequent
rollback) Problem
expensive
46Checkpointing and Recovery
Asynchronous Logging Idea Each message must
be logged but not necessarily before it can be
delivered Messages can be first saved in
main memory Exploit idle period to log
messages several messages can be packed
together then logged simultaneously (efficient
used of I/O devices) Problem some messages may
be lost ? not always possible to replay