RollbackRecovery - PowerPoint PPT Presentation

1 / 57

About This Presentation

Title:

RollbackRecovery

Description:

primary logs directly to stable storage information needed by backups. if primary crashes, a newly initialized process is given content of logs ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 58

Provided by: lorenzoire

Category:

more less

Transcript and Presenter's Notes

Title: RollbackRecovery

1
Rollback-Recovery
2
Fault-Tolerance the Good Old Days

Target
life-critical applications
Primary concern
tolerate arbitrary failures
Secondary concerns
performance
resources
transparency

3
The times they are a-changin

Target
non-life-critical applications
Primary Concerns
tolerate common failures with few dedicated
resources
negligible impact during failure-free executions
fast recovery
transparency
Secondary Concerns
tolerate arbitrary failures

4
Replica Coordination

Agreement Every non-faulty replica receives
every request
Order Every non-faulty replica processes
requests in the same relative order

5
Implementing Replica Coordination

Clients use (Causal) Atomic Broadcast to
disseminate their requests
Clients forward requests to one of the replicas
That replica initiates the Reliable Broadcast to
the other replicas

What are the differences?
6
Primary-Backup The Idea

One replica (primary) executes all
non-deterministic events
Primary broadcasts to other replicas (backups)
requests form clients
outcome of executing non-deterministic events at
the primary

7
Definitions

Failover time of a PB service longest time
during which some client does not know the
identity of the primary
Server outage at t Some correct client sends a
request at time t to service, but does not
receive a response

(k,?)-bofo server service in which all server
outages can be grouped into at most k intervals
of time, each of at most length ?

8
Primary-Backup The Spec(Budhiraja, Marzullo,
Schneider, Toueg)
Safety

PB1 There exists a local predicate Prmys on the
state of each server s. At any time, there is at
most one server s whose state satisfies Prmys

PB3 If a client request arrives at a server
that is not the current primary, then that
request is not enqueued (and therefore is not
processed)
PB2 Each client i maintains a server identity
Desti such that to make a request, client i sends
a message to Desti
PB4 There exist fixed values k and ? such that
the service behaves like a single (k,?)-bofo
server
Liveness
9
A simple protocol

In addition
p1 sends heartbeat message to p2 every ? seconds

Assume
point-to-point communication
non-faulty channels
upper bound ? on message delivery time
at most one process crashes
Primary p1
Backup p2

Process p2
updates its state upon receiving state update
from p1
if it doesnt receive heartbeat for ??? seconds,
p2 becomes primary
informs clients
begins processing subsequen requests from clients

On receipt of a request, process p1
Processes request and updates its state
Sends info about update to p2 (state update
message)
Without waiting for an ack from p2, p1 sends a
response to client

10
that meets the PB spec

PB1
Failover Time during which

p1
p2

Definition of Prmy

has not received a message from p1 for ? ? ?
? ? 2?
11
indeed, it does!

k 1 (since at most one crash)
? ? longest interval during which a request
elicits no response
assume p1 crashes at tc
any client request sent to p1 at time tc ??? or
later may be lost
p2 may not learn about p1 crash until tc ??? ??2?
client may not learn that p2 is new primary for
another ?

PB2, PB3 Follow immediately from protocol
PB4 Find k, ? to implement (k,?)-bofo server

? ? ? ? 4?
12
Active Replication vs. Primary-Backup

Active Replication
tolerates arbitrary failures
masks failures
consumes lots of resources

Primary Backup
does not tolerate arbitrary failures
if the primary fails, requests may be lost
service can become unavailable while a leader
election algorithm is run to determine the new
primary
consumes less resources

13
Some like it hot

Hot Backups process information from the primary
as soon as they receive it
Cold Backups log information received from
primary, and process it only if primary fails
Rollback Recovery implements cold backups
cheaply
primary logs directly to stable storage
information needed by backups
if primary crashes, a newly initialized process
is given content of logsbackups are generated
on demand

14
Uncoordinated Checkpointing

Easy to understand
No synchronization overhead
Flexible
can choose when to checkpoint
To recover from a crash
go back to last checkpoint
restart

p
?
?
15
How to (not) take a checkpoint

Block execution, save entire process state to
stable storage
very high overhead during failure-free execution
lots of unnecessary data saved on stable storage

16
How to take a checkpoint

Take checkpoints incrementally
save only pages modified since last checkpoint
use dirty bit to determine which pages to save
Save only interesting parts of address space
use application hints or compiler help to avoid
saving useless data (e.g. dead variables)
Do not block application execution during
recovery
copy-on-write
precopying

17
The Domino Effect
18
How to Avoid theDomino Effect

Coordinated Checkpointing
No independence
Synchronization Overhead
Easy Garbage Collection
Communication Induced Checkpointing detect
dangerous communication patterns and checkpoint
appropriately
Less synchronization
Less independence
Complex

19
The Output Commit Problem

Coordinated checkpoint for every output commit
High overhead if frequent I/O with external
environment

20
Message Logging

Can avoid domino effect
Works with coordinated checkpoint
Works with uncoordinated checkpoint
Can reduce cost of output commit
More difficult to implement

21
How Message Logging Works
Recovery Unit
Application

To tolerate crash failures
periodically checkpoint application state
log on stable storage determinants of
non-deterministic events executed after
checkpointed state.
for message delivery events
m (m.dest, m.rsn, m.source, m.ssn)

Log
Recovery restore latest checkpointed
state replay non-deterministic events
according to determinants
22
Pessimistic Logging
p1
m2
p2
m3
m1
p3

Never creates orphans
may incur blocking
straightforward recovery

23
Case study 1Sender Based Logging
(Johnson and Zwaenepoel, FTCS 87)

Message log is maintained in volatile storage at
the sender.
A message m is logged in two steps
i) before sending m, the sender logs its
content m is partially logged
ii) the receiver tells the sender the receive
sequence number of m, and the sender adds this
information to its log m is fully logged .

24
More on SBL

Recovery the recovering process collects the
logs from the senders, and replays the messages
in ascending rsn order
Optimistic SBL may create orphans. Assume
transient link failures

25
Optimistic Logging

p2 sends m3 without first logging
determinants.
If p2 fails before logging the determinants
of m1 and m2, p3 becomes an orphan.

p1
m2
p2
m3
m1
p3

Eliminates orphans during recovery
non-blocking during failure-free executions
rollback of correct processes
complex recovery

26
Causal Logging

No blocking in failure-free executions
No orphans
No additional messages
Tolerates multiple concurrent failures
Keeps determinant in volatile memory
Localized output commit

27
Preliminary Definitions
Given a message m sent from m.source to m.dest,
Depend(m)
Log(m) set of processes with a copy of the
determinant of m in their volatile memory
p orphan of a set C of crashed processes
28
The No-OrphansConsistency Condition
No orphans after crash C if
No orphans after any C if
The Consistency Condition
29
Optimistic and Pessimistic
No orphans after crash C if
Optimistic weakens it to

No orphans after any crash if

Pessimistic strengthens it to
30
Causal Message Logging
No orphans after any crash of size at most f
if
Causal strengthens it to
31
An Example
Causal Logging
If f 1, stable(m) ºLog(m) ³ 2
p1
m2
m4
p2
m3ltm1,m2gt
m1
m5ltm3gt
p3
32
Recovery for f 1
1
2
3
4
parents of p
Messages previously sent to p by its parents
SSN order
what is the next message from each parent?
p
who is my next parent?
RSN order
Determinants of messages delivered by p
8
5
2
6
children of p
33
Family-Based Logging

Each process p
maintains in a volatile log Dp all the
determinants m such that p ??Log(m)
piggybacks on application messages to q all
determinants m ? Dp such that

upon receipt of application message m
adds m to Dp
adds to Dp any new determinant piggybacked on m
scans the information piggybacked to m to update
its estimate of ?Log(m)?p for all determinants
m ? Dp
caches in a volatile log Sp all the messages it
sends

34
Estimating Log(m) and ?Log(m)?

Each process p maintains estimates of
and
p piggybacks m on m? to q if

How can p estimate and ?
How accurate should these estimates be?
inaccurate estimates cause useless piggybacking
keeping estimates accurate requires extra
piggybacking

35
?Det Keep It Simple

p piggybacks m on m? to q
Updating Rule
Cost
requires no additional space over the piggybacked
determinants.

36
??Log???Send?the Size

Whenever p piggybacks m on m??to q, it also
includes ?Log(m) ?p .
Updating Rule
when q receives m for the first time
Cost
requires 1 integer associated with each
determinant.
a similar protocol can be implemented that
carries f n additional integers with each
message.

37
?Log Tell All You Know

Whenever p piggybacks m on m? to q, it also
includes Log(m)p .
Updating Rule
Cost
requires up to f integers associated with each
determinant.
a similar protocol can be implemented that
carries n? additional integers with each message.

38
Estimating Log(m)

Because
we can approximate Log(m) from below with
and then use vector clocks to track Depend(m)!

39
Dependency Vectors

Dependency Vector (DV) vector clock that tracks
causal dependencies between message delivery
events.

40
Weak Dependency Vectors
Weak Dependency Vector (WDV) track causal
dependencies on deliver(m) as long as
41
Dependency Matrix

Use WDVs to determine if p ? Log(m)

Each process p maintains a Dependency Matrix
(DMp), whose rows are weak dependency
vectors. Given m ltu, s, 14,
15gt, let
s
and Log(m)p p, q, s
42
Rollback Recovery Protocols A Success Story?

Over 300 papers in the area
Relatively few implementations
Why?
Integrating recovery protocols with applications
non trivial
Performance issues not understood
One size doesnt fit all

43
Egida

A toolkit for supporting rollback recovery
Transparent
seamless integration with applications
Extensible
can easily handle new sources of non-determinism
can easily include new protocols
Flexible
allows to select best protocol for application
Smart
dont want to implement 300 protocols...
Powerful
a microscope to understand rollback recovery

44
The Unifying Theme

All rollback recovery protocols enforce the
no-orphans consistency condition
The challenge is handling non-determinism
A process may execute non-deterministic events
A process may interact with other processes or
with the environment and generate dependencies on
these events
Characterize a protocol according to how it
handles non-determinism
Identify relevant events
Specify which actions to take when event occurs

45
Handling Non-Determinism

Five classes of relevant events
Non-deterministic events
Ex message delivery, file read, clock read, lock
acquire
Failure-detection events
time-out, message delivery
Internal dependency-generating events
Ex message send, file write, lock release
External dependency generating events
output to printer or screen, file write
Checkpointing events
Ex timeout, explicit instruction, message
delivery

46
The Architecture

Event handlers invoked on relevant events
Library of modules
implement core functionalities
(checkpointing, creating determinants, logging,
piggybacking, detecting orphans, restarting a
faulty process, etc.)
provide basic services
(stable storage, failure detection, etc)
single interface multiple implementations
Use a specification language to select desired
modules and corresponding implementations
Synthesize protocol automatically from
specification

47
An Example of Protocol Specification

Causal Logging
/ non-deterministic events statement /
receive
determinant source, ssn, dest, desn
Log determinant on volatile memory of processes
/ internal dependency-generating events
statement /
send
Piggyback determinants
Log message on volatile memory of self

/ external dependency-generating events
statement/
send
Output Commit determinants
Implementation independent
/ checkpoint statement /
Checkpoint independent, asynchronous on NFS
disk
Implementation incremental
Scheduling policy periodic

48
Integration with MPICH

MPICH
2-layered architecture
upper layer exports MPI functions to application
lower layer performs data transfer using platform
specific libraries (e.g. P4)

Modifications to MPICH
In upper layer, replace calls to P4 with
corresponding calls to Egida API
Modification to P4
Handle socket-level errors
Allow recovering process to set up connections
with correct processes
Modifications to Applications NONE

Egida
49
Bringing the Recovery back to Rollback-Recovery

Traditionally, high availability active
replication
Few incentives for studying recovery performance
of rollback-recovery protocols
Lots of qualitative arguments
No experimental study

50
Experimental Setup

Protocol Suite
Pessimistic receiver-based
Pessimistic sender-based
Optimistic
Causal
Application Suite
Benchmarks from NASAs NPB 2.3

Methodology
4 Pentium-based workstations
Solaris 2.5
Lightly-loaded 100Mb/s Ethernet
Failures induced about 3 minutes after checkpoint
95 confidence interval
For the optimistic protocol, process flushes its
volatile logs to disk asynchronously once every
10 seconds

51
The stopngo Effect

In sender-based and causal, sender stores
messages in volatile memory
If sender fails, can get stopngo effect
recovery of receiver delayed until sender
regenerates messages
Impact of stopngo depends on how much blocking
during failure-free execution

cg
200
Receiver-based
Pessimistic
150
Sender-based
Pessimistic
100
Time (sec.)
50
0
f 1
f 2
f 3
52
Failure-free Overhead
53
Bad News?

Receiver-based pessimistic
Fast crash recovery
Fault containment
Slow failure-free execution

Sender-based pessimistic
Fault containment
Slow crash recovery when f gt 1

Optimistic
Fast crash recovery and fast failure-free
execution
No fault containment

Causal
Fast failure-free execution
Fault containment
Slow crash recovery when f gt 1

54
Hybrid Protocols

Sender logs message in volatile memory
Receiver logs message and determinant
asynchronously to disk
On prefix of recovery information available to
recovering process, no stopngo !
Best of both worlds
Low overhead during failure-free execution
Fast crash recovery

55
Hybrid ProtocolsRecovery Performance
cg
200
Receiver-based
Pessimistic
150
Optimistic
Time (sec.)
100
Hybrid-Causal
50
0
f 1
f 2
f 3
Number of failures
56
Hybrid Protocols Failure-free Overhead
300
Receiver-based
250
Pessimistic
200
150
Failure-free Overhead()
100
Causal
50
0
bt
lu
cg
sp
mg
Application
Hybrid causal imposes at most 2 higher overhead
than causal
57
A Comparison of RR Protocols

Write a Comment

User Comments (0)