The weakest failure detector question in distributed computing - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

The weakest failure detector question in distributed computing

Description:

... process pi uses its DAG to simulate runs of A in the system, i.e., every process ... Ei := {Pi(j)} jk. write(Ei) to register Ri. Ei := Ei U Pi(k) send (k,?) to all ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 75
Provided by: petrkou
Category:

less

Transcript and Presenter's Notes

Title: The weakest failure detector question in distributed computing


1
The weakest failure detector question in
distributed computing
Petr Kouznetsov Distributed Programming Lab EPFL
2
Outline
  • Impossibility results and failure detectors
  • Model asynchronous system with failure detectors
  • The weakest failure detector question and the CHT
    proof
  • Determining the weakest failure detectors for
    various problems
  • (implementing shared memory, solving consensus,
    solving non-blocking atomic commit, boosting
    consensus power of atomic objects)

3
Centralized computing
Clients
Centralized computing unit
4
Distributed computing
Clients
Distributed computing unit
5
Redundancy and synchronization
  • The distributed implementation should create an
    illusion of a centralized one
  • The components (processes) must be synchronized
    in a consistent way.

Distributed computing unit
6
Consensus
  • Processes propose values and must agree on a
    common value in a non-trivial manner
  • Agreement no two correct processes decide
    differently
  • Validity every decided value is a proposed value
  • Termination every correct process eventually
    decides

7
Ideal computing
  • The consistency and progress of the
    implementation are preserved even if
  • Processes can fail by crashing
  • The system is asynchronous
  • Communication is not bounded
  • Processing is not bounded
  • (There is no bound ? such that, taking local ?
    steps, a process can surely hear from every
    correct process.)

8
FLP impossibility
  • Consensus is impossible in an asynchronous system
    if at least one process might crash.
  • Fischer, Lynch and Paterson, 1985

9
Adding (some) synchrony
  • Consensus is impossible in a system with
    asynchronous processing or asynchronous
    communication if at least one process might
    crash. Dolev, Dwork, Stockmeyer, 1987
  • ( in a shared memory system
    Loui, Abu-Amara, 1987)

10
Why?
  • It is impossible to distinguish a crashed
    process from a sleeping one, no matter how many
    steps you take.

?
(1)
?
(2)
11
Adding partial synchrony
  • Assume that for in every execution there is an
    upper bound on time to execute a processing step
    and to communicate a message.
  • Consensus is solvable if a majority of processes
    are correct.
  • (If communication is synchronous and processing
    is partially synchronous, then consensus is
    solvable for any number of failures.)
  • Dwork, Lynch, Stockmeyer, 1988

12
Adding less synchrony
  • Assume we (eventually) have a leader, i.e.,
    eventually all processes that take enough steps
    will hear from some correct process.

1
2
3
4
13
Eventual leader abstraction O
  • At every process, O outputs a process identifier.
  • Eventually, the same correct process id is output
    at all processes.

1
2
3
3
3
3
1
2
4
1
3
3
2
4
1
4
3
3
3
4
4
14
O is sufficient for consensus!
  • Consensus is solvable in an asynchronous system
    equipped with O, where a majority of processes
    are correct.
  • Lam90,CT91
  • (If communication is synchronous, then consensus
    is solvable for any number of failures.)
  • DLS88,LH94

15
The question
  • What is the smallest amount of synchrony that
    must be introduced into the asynchronous system
    to solve an unsolvable problem?

16
Outline
  • Impossibility results and failure detectors
  • Model asynchronous system with failure detectors
  • The weakest failure detector question and the CHT
    proof
  • Determining the weakest failure detectors for
    various problems
  • (implementing shared memory, solving consensus,
    solving non-blocking atomic commit (NBAC),
    boosting consensus power of atomic objects)

17
General system model
  • Processes p1,,pn communicate through reliable
    message-passing channels. ()
  • In addition, every processes can query its
    failure detector module that produces some
    (maybe incomplete and inaccurate) information
    about failures.
  • () Later we consider also registers and atomic
    objects of given power.

18
Failure detector modules
FD
p
q
r
FD
FD
19
Failure detectors
  • The information output to the processes depends
    only on failures

p
fail(q)
Information on failures
query
FD
20
Example perfect failure detector P
  • At each process, P outputs a set of suspected
    process identifiers.
  • Eventually, every crashed process is suspected
  • No process is suspected before it crashes

Ø
Ø
4
4
1
Ø
4
Ø
4
2
Ø
Ø
4
4
4
3
4
21
Example failure signal failure detector FS
  • At each process, FS outputs green or red.
  • If red is output, then a failure previously
    occurred.
  • If a failure occurs, then eventually red is
    output at all correct processes.

red
green
1
red
green
2
red
red
green
3
4
22
Environments
  • An environment E specifies when and where
    failures might occur
  • Examples
  • Majority of processes are correct
  • At most one process crash

23
Failure detector reductions
  • Failure detector D is weaker than failure
    detector D if D can be extracted from D, i.e.,
    there exists an algorithm that simulates D using
    D.

D
D
p
D
D
q
r
D
D
24
The weakest failure detector
  • D is the weakest failure detector to solve
    problem M in an environment E if and only if
  • D is sufficient for M in E D can be used to
    solve M in E
  • D is necessary for M in E D is weaker than any
    failure detector D that can be used to solve M
    in E

25
The question
  • Given a problem M and an environment E,
  • what is the weakest failure detector for solving
    M in E?

26
Outline
  • Impossibility results and failure detectors
  • Model asynchronous system with failure detectors
  • The weakest failure detector question and the CHT
    proof
  • Determining the weakest failure detectors for
    various problems
  • (implementing shared memory, solving consensus,
    solving non-blocking atomic commit (NBAC),
    boosting consensus power of atomic objects)

27
The CHT result
  • The CHT Theorem If a failure detector D
    implements consensus, then D implements ?
  • Corollary ? is the weakest failure detector for
    consensus with a majority of correct processes
  • Chandra, Hadzilacos and Toueg, 1996

28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
d1
d3
p1
(p1,d1)
(p1,d3)
(p1,d1)
(p2,d2)
(p2,d2)
(p2,d4)
d2
d4
p2
(p1,d1)
(p1,d1)
(p1,d3)
(p2,d2)
(p2,d2)
(p2,d4)
32
(No Transcript)
33
p1
Decide(0)
p1
I0
p2
Decide(0)
Decide(1)
p1
I1
p2
Decide(1)
Decide(1)
p1
I2
p2
Decide(1)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
Outline
  • Impossibility results and failure detectors
  • Model asynchronous system with failure detectors
  • The weakest failure detector question and the CHT
    proof
  • Determining the weakest failure detectors for
    various problems

43
Problem implementing a register
  • A register is an object accessed through reads
    and writes
  • The write(v) stores v at the register and returns
    ok
  • The read returns the last value written at the
    register
  • NB In an asynchronous system a register can be
    implemented if and only if a majority of
    processes are correct ABD95.

44
Quorum failure detector S
  • At each process, S outputs a set of processes
  • Any two sets (output at any times and at any
    processes) intersect.
  • Eventually every set contains only correct
    processes.
  • NB Given a majority of correct processes, S can
    be implemented in an asynchronous system.

45
S is sufficient to implement registers
  • Adapt the correct majority-based algorithm of
    ABD95 to implement (1 reader, 1 writer)
    atomic register using S
  • Substitute
  •  process p waits until a majority of
    processes reply 
  • with
  •  process p waits until all processes in S
    reply 

46
S is necessary to implement registers
  • Let A be any implementation of registers that
    uses some failure detector D.
  • Must show that we can extract S from D.
  • Each write operation involves a set of
    participants the processes that help the
    operation take effect (w.r.t. A and D)
  • Claim the set of participants includes at least
    one correct process

47
Extraction algorithm
  • Every process p periodically
  • writes in its register the participant sets of
    its previous writes
  • reads participant sets of other processes
  • outputs
  • the participant set of its previous write, and
  • for every known participant set S, one live
    process in S
  • All output sets intersect and eventually contain
    only correct processes

48
Emulating S the reduction algorithm
  • Let Pi(k) be the set of participants in k-th
    write operation by process i
  • Round k
  • Ei Pi(j) jk
  • write(Ei) to register Ri
  • Ei Ei U Pi(k)
  • send (k,?) to all
  • for every j1,,n, wait until received (k,ack)
    from at lest one process in every S read in
    register Rj
  • current output of S set of all processes
    from which (ack,k) plus Pi(k-1)

49
Emulating S the proof intuition
  • For any round k, process i stores all Pi(k)
    (kltk) in Ri and includes Pi(k-1) to its emulated
    set Si
  • gt
  • Any process j that reads Ri afterwards will
    include at least one process from Pi(k-1) to its
    emulated set Sj
  • gt
  • Every two emulated sets intersect
  • Eventually, only correct processes send acks
  • gt
  • Eventually, the emulation set includes only
    correct processes

50
Registers the weakest failure detector
  • S is the weakest failure detector to implement
    atomic registers, in any environment

51
Consensus ? registers ?
  • ? can be used to solve consensus with registers,
    in any environment LH94
  • Consensus gt Registers any consensus algorithm
    can be used to implement registers, in any
    environment Lam86,Sch90
  • Consensus gt ? ? can be extracted from any
    failure detector D that solves consensus, in any
    environment CHT96

52
Consensus the weakest failure detector
  • Consensus ? registers ? (in any environment)
  • S is the weakest FD to implement registers (in
    any environment)
  • Thus,
  • (?, S) is the weakest failure detector to solve
    consensus, in any environment

53
Problem quittable consensus (QC)
  • QC is like consensus except that
  • if a failure occurs, then processes can agree
    either
  • on one of the proposed values (as in consensus),
  • or
  • on the special value Q ( Quit )

54
Quittable consensus (QC)
  • propose(v) (v in 0,1) returns a value in
    0,1,Q
  • (Q stands for  quit )
  • Agreement no two processes return different
    values
  • Termination every correct process eventually
    returns a value
  • Validity only a value v in 0,1,Q can be
    returned
  • If v in 0,1, then some process previously
    proposed v
  • If vQ, then a failure previously occurred

55
Failure detector ?
  • For some initial period of time ? outputs some
    predefined value Ø
  • Eventually,
  • ? behaves like (O,S), or
  • (only if a failure occurs) ? behaves like FS
    (outputs red)
  • NB If a failure occurs, ? can choose to behave
    like (O,S) or like FS (the choice is the same at
    all processes)

56
? is sufficient to solve QC
  • Propose(v) // v in 0,1
  • wait until ? ? Ø
  • if ? red then return Q // If ? behaves like
    FS
  • d ConsPropose(v) // If ? behaves like
    (O,S)
  • // run a consensus algorithm
  • return d

57
? is necessary to solve QC
  • Let A be a QC algorithm that uses a failure
    detector D.
  • Must show that we can extract ? from A
    and D

58
Simulating runs of A
  • Every process periodically samples D and
    exchanges its FD samples with other processes
  • gt using these FD samples, the process locally
    simulates runs of A CHT96

D
Simulate A
p
D
D
q
r
Simulate A
Simulate A
59
Extracting ?
  • Each process pi runs the simulation until, for
    every j1,,n, there is a simulated run starting
    from Ij in which pi decides.
  • If pi decides Q in one of the simulated runs
    propose 0 to QC.
  • Otherwise, propose 1 to QC.
  • If QC decides 0 or Q --- output red.
  • Otherwise, it is possible to output (O,S).

60
Extracting (O,S)
  • If there are enough simulated runs of A in
    which non-Q values are decided, then it is
    possible to extract (O,S).
  • Extracting O --- like in CHT, locating a critical
    index, etc. (by construction, a critical index
    exists)
  • Extracting S --- a novel technique

61
QC the weakest failure detector
  • ? is the weakest failure detector to solve QC, in
    any environment

62
Problem NBAC
  • A set of processes need to agree on whether to
    commit or to abort a transaction.
  • Initially, each process votes Yes (I want to
    commit) or No (We must abort)
  • Eventually, processes must reach a common
    decision (Commit or Abort).

63
Problem NBAC
  • Agreement no two processes return different
    values
  • Termination every correct process eventually
    returns a value
  • Validity a value in Commit, Abort is returned
  • If Commit is returned, then every process voted
    Yes
  • If Abort is returned, then some process voted no
    or a failure previously occurred

64
NBAC ? QC FS
  • NBAC gt QC Any algorithm for NBAC
    can be used to solve QC
  • NBAC gt FS Any algorithm for
    NBAC can be used to extract FS
  • QCFS gt NBAC given (a) any algorithm
    for QC and (b) FS, we can solve NBAC

65
(QC,FS) ?NBAC
  • Given (a) any algorithm for QC and (b) FS, we can
    solve NBAC
  • send v to all
  • wait until received all votes or FS outputs red
  • \\ wait until all votes received or \\ a
    failure occurs
  • if all votes are received and are Yes then
  • proposal 1 \\ propose to commit
  • else
  • proposal 0 \\ propose to abort
  • if QC.Propose(proposal) returns 1 then
  • return Commit
  • else
  • return Abort

66
NBAC the weakest failure detector
  • NBAC ? QC FS (in any environment)
  • ? is the weakest FD to solve QC (in any
    environment)
  • Thus,
  • (?,FS) is the weakest failure detector to solve
    NBAC, in any environment

67
Problem boosting consensus power
  • Assume that processes communicate through atomic
    (wait-free linearizable) objects.
  • An object type specifies the interface of the
    object
  • The set of states
  • The set of operations
  • The set of possible state transitions

68
Problem boosting consensus power
  • Consensus power Herlihy, 1991 of an object type
    T is the maximum number of processes that can
    solve consensus using atomic objects of type T
    and registers.
  • cons(Register)1
  • cons(TS)2
  • cons(CS) infinity
  • By definition, given a type T with consensus
    power n, n1 processes cannot solve consensus
    using objects of type T and registers.

69
Problem boosting consensus power
  • n 1 processes
  • Registers
  • Shared objects of type T cons(T) n
  • What is the weakest failure detector D to solve
    consensus?

70
Neigers conjecture Nei95
  • O(k) outputs a set of at most k processes so
    that,
  • Eventually, all correct processes detect the same
    set that includes at least one correct process
  • O(k1) is weaker than O(k)
  • O(n) is sufficient to solve (n 1)-process
    consensus using objects of T and registers.
  • Is O(n) necessary?

71
Partial response
  • Yes, if T is one-shot deterministic.
  • Every operation triggers exactly one transition
  • At most one operation on an object of type T is
    allowed for every process

72
Partial response
  • Theorem O(n) is necessary to implement wait-free
    (n 1)-process consensus with registers and
    objects of a one-shot deterministic type T such
    that cons(T) n.
  • Corollary O(n) is necessary to implement
    (n 1)-process consensus using registers and (n
    - 1)-resilient objects of any types.

73
The sources
  • C. Delporte-Gallet, H. Fauconnier, R. Guerraoui,
  • V. Hadzilacos, P. Kouznetsov, and S. Toueg
  • The weakest failure detectors to solve certain
    fundamental problems in distributed computing
  • PODC 2004
  • R. Guerraoui and P. Kouznetsov
  • Failure Detectors and Type Boosters
  • DISC 2003
  • C. Delporte-Gallet, H. Fauconnier, R. Guerraoui,
  • and P. Kouznetsov
  • Mutual Exclusion in Asynchronous Systems with
    Failure Detectors
  • To appear in JPDC 2005

74
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com