RollbackRecovery - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

RollbackRecovery

Description:

primary logs directly to stable storage information needed by backups. if primary crashes, a newly initialized process is given content of logs ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 58
Provided by: lorenzoire
Category:

less

Transcript and Presenter's Notes

Title: RollbackRecovery


1
Rollback-Recovery
2
Fault-Tolerance the Good Old Days
  • Target
  • life-critical applications
  • Primary concern
  • tolerate arbitrary failures
  • Secondary concerns
  • performance
  • resources
  • transparency

3
The times they are a-changin
  • Target
  • non-life-critical applications
  • Primary Concerns
  • tolerate common failures with few dedicated
    resources
  • negligible impact during failure-free executions
  • fast recovery
  • transparency
  • Secondary Concerns
  • tolerate arbitrary failures

4
Replica Coordination
  • Agreement Every non-faulty replica receives
    every request
  • Order Every non-faulty replica processes
    requests in the same relative order

5
Implementing Replica Coordination
  • Clients use (Causal) Atomic Broadcast to
    disseminate their requests
  • Clients forward requests to one of the replicas
  • That replica initiates the Reliable Broadcast to
    the other replicas

What are the differences?
6
Primary-Backup The Idea
  • One replica (primary) executes all
    non-deterministic events
  • Primary broadcasts to other replicas (backups)
  • requests form clients
  • outcome of executing non-deterministic events at
    the primary

7
Definitions
  • Failover time of a PB service longest time
    during which some client does not know the
    identity of the primary
  • Server outage at t Some correct client sends a
    request at time t to service, but does not
    receive a response
  • (k,?)-bofo server service in which all server
    outages can be grouped into at most k intervals
    of time, each of at most length ?

8
Primary-Backup The Spec(Budhiraja, Marzullo,
Schneider, Toueg)
Safety
  • PB1 There exists a local predicate Prmys on the
    state of each server s. At any time, there is at
    most one server s whose state satisfies Prmys

PB3 If a client request arrives at a server
that is not the current primary, then that
request is not enqueued (and therefore is not
processed)
PB2 Each client i maintains a server identity
Desti such that to make a request, client i sends
a message to Desti
PB4 There exist fixed values k and ? such that
the service behaves like a single (k,?)-bofo
server
Liveness
9
A simple protocol
  • In addition
  • p1 sends heartbeat message to p2 every ? seconds
  • Assume
  • point-to-point communication
  • non-faulty channels
  • upper bound ? on message delivery time
  • at most one process crashes
  • Primary p1
  • Backup p2
  • Process p2
  • updates its state upon receiving state update
    from p1
  • if it doesnt receive heartbeat for ??? seconds,
    p2 becomes primary
  • informs clients
  • begins processing subsequen requests from clients
  • On receipt of a request, process p1
  • Processes request and updates its state
  • Sends info about update to p2 (state update
    message)
  • Without waiting for an ack from p2, p1 sends a
    response to client

10
that meets the PB spec
  • PB1
  • Failover Time during which

p1
p2
  • Definition of Prmy

has not received a message from p1 for ? ? ?
? ? 2?
11
indeed, it does!
  • k 1 (since at most one crash)
  • ? ? longest interval during which a request
    elicits no response
  • assume p1 crashes at tc
  • any client request sent to p1 at time tc ??? or
    later may be lost
  • p2 may not learn about p1 crash until tc ??? ??2?
  • client may not learn that p2 is new primary for
    another ?
  • PB2, PB3 Follow immediately from protocol
  • PB4 Find k, ? to implement (k,?)-bofo server

? ? ? ? 4?
12
Active Replication vs. Primary-Backup
  • Active Replication
  • tolerates arbitrary failures
  • masks failures
  • consumes lots of resources
  • Primary Backup
  • does not tolerate arbitrary failures
  • if the primary fails, requests may be lost
  • service can become unavailable while a leader
    election algorithm is run to determine the new
    primary
  • consumes less resources

13
Some like it hot
  • Hot Backups process information from the primary
    as soon as they receive it
  • Cold Backups log information received from
    primary, and process it only if primary fails
  • Rollback Recovery implements cold backups
    cheaply
  • primary logs directly to stable storage
    information needed by backups
  • if primary crashes, a newly initialized process
    is given content of logsbackups are generated
    on demand

14
Uncoordinated Checkpointing
  • Easy to understand
  • No synchronization overhead
  • Flexible
  • can choose when to checkpoint
  • To recover from a crash
  • go back to last checkpoint
  • restart

p
?
?
15
How to (not) take a checkpoint
  • Block execution, save entire process state to
    stable storage
  • very high overhead during failure-free execution
  • lots of unnecessary data saved on stable storage

16
How to take a checkpoint
  • Take checkpoints incrementally
  • save only pages modified since last checkpoint
  • use dirty bit to determine which pages to save
  • Save only interesting parts of address space
  • use application hints or compiler help to avoid
    saving useless data (e.g. dead variables)
  • Do not block application execution during
    recovery
  • copy-on-write
  • precopying

17
The Domino Effect
18
How to Avoid theDomino Effect
  • Coordinated Checkpointing
  • No independence
  • Synchronization Overhead
  • Easy Garbage Collection
  • Communication Induced Checkpointing detect
    dangerous communication patterns and checkpoint
    appropriately
  • Less synchronization
  • Less independence
  • Complex

19
The Output Commit Problem
  • Coordinated checkpoint for every output commit
  • High overhead if frequent I/O with external
    environment

20
Message Logging
  • Can avoid domino effect
  • Works with coordinated checkpoint
  • Works with uncoordinated checkpoint
  • Can reduce cost of output commit
  • More difficult to implement

21
How Message Logging Works
Recovery Unit
Application
  • To tolerate crash failures
  • periodically checkpoint application state
  • log on stable storage determinants of
    non-deterministic events executed after
    checkpointed state.
  • for message delivery events
  • m (m.dest, m.rsn, m.source, m.ssn)

Log
Recovery restore latest checkpointed
state replay non-deterministic events
according to determinants
22
Pessimistic Logging
p1
m2
p2
m3
m1
p3
  • Never creates orphans
  • may incur blocking
  • straightforward recovery

23
Case study 1Sender Based Logging
(Johnson and Zwaenepoel, FTCS 87)
  • Message log is maintained in volatile storage at
    the sender.
  • A message m is logged in two steps
  • i) before sending m, the sender logs its
    content m is partially logged
  • ii) the receiver tells the sender the receive
    sequence number of m, and the sender adds this
    information to its log m is fully logged .


24
More on SBL
  • Recovery the recovering process collects the
    logs from the senders, and replays the messages
    in ascending rsn order
  • Optimistic SBL may create orphans. Assume
    transient link failures

25
Optimistic Logging
  • p2 sends m3 without first logging
    determinants.
  • If p2 fails before logging the determinants
    of m1 and m2, p3 becomes an orphan.

p1
m2
p2
m3
m1
p3
  • Eliminates orphans during recovery
  • non-blocking during failure-free executions
  • rollback of correct processes
  • complex recovery

26
Causal Logging
  • No blocking in failure-free executions
  • No orphans
  • No additional messages
  • Tolerates multiple concurrent failures
  • Keeps determinant in volatile memory
  • Localized output commit

27
Preliminary Definitions
Given a message m sent from m.source to m.dest,
Depend(m)
Log(m) set of processes with a copy of the
determinant of m in their volatile memory
p orphan of a set C of crashed processes
28
The No-OrphansConsistency Condition
No orphans after crash C if
No orphans after any C if
The Consistency Condition
29
Optimistic and Pessimistic
No orphans after crash C if
Optimistic weakens it to
  • No orphans after any crash if

Pessimistic strengthens it to
30
Causal Message Logging
No orphans after any crash of size at most f
if
Causal strengthens it to
31
An Example
Causal Logging
If f 1, stable(m) ºLog(m) ³ 2
p1
m2
m4
p2
m3ltm1,m2gt
m1
m5ltm3gt
p3
32
Recovery for f 1
1
2
3
4
parents of p
Messages previously sent to p by its parents
SSN order
what is the next message from each parent?
p
who is my next parent?
RSN order
Determinants of messages delivered by p
8
5
2
6
children of p
33
Family-Based Logging
  • Each process p
  • maintains in a volatile log Dp all the
    determinants m such that p ??Log(m)
  • piggybacks on application messages to q all
    determinants m ? Dp such that
  • upon receipt of application message m
  • adds m to Dp
  • adds to Dp any new determinant piggybacked on m
  • scans the information piggybacked to m to update
    its estimate of ?Log(m)?p for all determinants
    m ? Dp
  • caches in a volatile log Sp all the messages it
    sends

34
Estimating Log(m) and ?Log(m)?
  • Each process p maintains estimates of
  • and
  • p piggybacks m on m? to q if
  • How can p estimate and ?
  • How accurate should these estimates be?
  • inaccurate estimates cause useless piggybacking
  • keeping estimates accurate requires extra
    piggybacking

35
?Det Keep It Simple
  • p piggybacks m on m? to q
  • Updating Rule
  • Cost
  • requires no additional space over the piggybacked
    determinants.

36
??Log???Send?the Size
  • Whenever p piggybacks m on m??to q, it also
    includes ?Log(m) ?p .
  • Updating Rule
  • when q receives m for the first time
  • Cost
  • requires 1 integer associated with each
    determinant.
  • a similar protocol can be implemented that
    carries f n additional integers with each
    message.

37
?Log Tell All You Know
  • Whenever p piggybacks m on m? to q, it also
    includes Log(m)p .
  • Updating Rule
  • Cost
  • requires up to f integers associated with each
    determinant.
  • a similar protocol can be implemented that
    carries n? additional integers with each message.

38
Estimating Log(m)
  • Because
  • we can approximate Log(m) from below with
  • and then use vector clocks to track Depend(m)!

39
Dependency Vectors
  • Dependency Vector (DV) vector clock that tracks
    causal dependencies between message delivery
    events.

40
Weak Dependency Vectors
Weak Dependency Vector (WDV) track causal
dependencies on deliver(m) as long as
41
Dependency Matrix
  • Use WDVs to determine if p ? Log(m)

Each process p maintains a Dependency Matrix
(DMp), whose rows are weak dependency
vectors. Given m ltu, s, 14,
15gt, let
s
and Log(m)p p, q, s
42
Rollback Recovery Protocols A Success Story?
  • Over 300 papers in the area
  • Relatively few implementations
  • Why?
  • Integrating recovery protocols with applications
    non trivial
  • Performance issues not understood
  • One size doesnt fit all

43
Egida
  • A toolkit for supporting rollback recovery
  • Transparent
  • seamless integration with applications
  • Extensible
  • can easily handle new sources of non-determinism
  • can easily include new protocols
  • Flexible
  • allows to select best protocol for application
  • Smart
  • dont want to implement 300 protocols...
  • Powerful
  • a microscope to understand rollback recovery

44
The Unifying Theme
  • All rollback recovery protocols enforce the
    no-orphans consistency condition
  • The challenge is handling non-determinism
  • A process may execute non-deterministic events
  • A process may interact with other processes or
    with the environment and generate dependencies on
    these events
  • Characterize a protocol according to how it
    handles non-determinism
  • Identify relevant events
  • Specify which actions to take when event occurs

45
Handling Non-Determinism
  • Five classes of relevant events
  • Non-deterministic events
  • Ex message delivery, file read, clock read, lock
    acquire
  • Failure-detection events
  • time-out, message delivery
  • Internal dependency-generating events
  • Ex message send, file write, lock release
  • External dependency generating events
  • output to printer or screen, file write
  • Checkpointing events
  • Ex timeout, explicit instruction, message
    delivery

46
The Architecture
  • Event handlers invoked on relevant events
  • Library of modules
  • implement core functionalities
  • (checkpointing, creating determinants, logging,
    piggybacking, detecting orphans, restarting a
    faulty process, etc.)
  • provide basic services
  • (stable storage, failure detection, etc)
  • single interface multiple implementations
  • Use a specification language to select desired
    modules and corresponding implementations
  • Synthesize protocol automatically from
    specification

47
An Example of Protocol Specification
  • Causal Logging
  • / non-deterministic events statement /
  • receive
  • determinant source, ssn, dest, desn
  • Log determinant on volatile memory of processes
  • / internal dependency-generating events
    statement /
  • send
  • Piggyback determinants
  • Log message on volatile memory of self
  • / external dependency-generating events
    statement/
  • send
  • Output Commit determinants
  • Implementation independent
  • / checkpoint statement /
  • Checkpoint independent, asynchronous on NFS
    disk
  • Implementation incremental
  • Scheduling policy periodic

48
Integration with MPICH
  • MPICH
  • 2-layered architecture
  • upper layer exports MPI functions to application
  • lower layer performs data transfer using platform
    specific libraries (e.g. P4)
  • Modifications to MPICH
  • In upper layer, replace calls to P4 with
    corresponding calls to Egida API
  • Modification to P4
  • Handle socket-level errors
  • Allow recovering process to set up connections
    with correct processes
  • Modifications to Applications NONE

Egida
49
Bringing the Recovery back to Rollback-Recovery
  • Traditionally, high availability active
    replication
  • Few incentives for studying recovery performance
    of rollback-recovery protocols
  • Lots of qualitative arguments
  • No experimental study

50
Experimental Setup
  • Protocol Suite
  • Pessimistic receiver-based
  • Pessimistic sender-based
  • Optimistic
  • Causal
  • Application Suite
  • Benchmarks from NASAs NPB 2.3
  • Methodology
  • 4 Pentium-based workstations
  • Solaris 2.5
  • Lightly-loaded 100Mb/s Ethernet
  • Failures induced about 3 minutes after checkpoint
  • 95 confidence interval
  • For the optimistic protocol, process flushes its
    volatile logs to disk asynchronously once every
    10 seconds

51
The stopngo Effect
  • In sender-based and causal, sender stores
    messages in volatile memory
  • If sender fails, can get stopngo effect
  • recovery of receiver delayed until sender
    regenerates messages
  • Impact of stopngo depends on how much blocking
    during failure-free execution

cg
200
Receiver-based
Pessimistic
150
Sender-based
Pessimistic
100
Time (sec.)
50
0
f 1
f 2
f 3
52
Failure-free Overhead
53
Bad News?
  • Receiver-based pessimistic
  • Fast crash recovery
  • Fault containment
  • Slow failure-free execution
  • Sender-based pessimistic
  • Fault containment
  • Slow crash recovery when f gt 1
  • Optimistic
  • Fast crash recovery and fast failure-free
    execution
  • No fault containment
  • Causal
  • Fast failure-free execution
  • Fault containment
  • Slow crash recovery when f gt 1

54
Hybrid Protocols
  • Sender logs message in volatile memory
  • Receiver logs message and determinant
    asynchronously to disk
  • On prefix of recovery information available to
    recovering process, no stopngo !
  • Best of both worlds
  • Low overhead during failure-free execution
  • Fast crash recovery

55
Hybrid ProtocolsRecovery Performance
cg
200
Receiver-based
Pessimistic
150
Optimistic
Time (sec.)
100
Hybrid-Causal
50
0
f 1
f 2
f 3
Number of failures
56
Hybrid Protocols Failure-free Overhead
300
Receiver-based
250
Pessimistic
200
150
Failure-free Overhead()
100
Causal
50
0
bt
lu
cg
sp
mg
Application
Hybrid causal imposes at most 2 higher overhead
than causal
57
A Comparison of RR Protocols
Write a Comment
User Comments (0)
About PowerShow.com