Title: The weakest failure detector question in distributed computing
1The weakest failure detector question in
distributed computing
Petr Kouznetsov Distributed Programming Lab EPFL
2Outline
- Impossibility results and failure detectors
- Model asynchronous system with failure detectors
- The weakest failure detector question and the CHT
proof - Determining the weakest failure detectors for
various problems - (implementing shared memory, solving consensus,
solving non-blocking atomic commit, boosting
consensus power of atomic objects)
3Centralized computing
Clients
Centralized computing unit
4Distributed computing
Clients
Distributed computing unit
5Redundancy and synchronization
- The distributed implementation should create an
illusion of a centralized one - The components (processes) must be synchronized
in a consistent way.
Distributed computing unit
6Consensus
- Processes propose values and must agree on a
common value in a non-trivial manner - Agreement no two correct processes decide
differently - Validity every decided value is a proposed value
- Termination every correct process eventually
decides
7Ideal computing
- The consistency and progress of the
implementation are preserved even if - Processes can fail by crashing
- The system is asynchronous
- Communication is not bounded
- Processing is not bounded
- (There is no bound ? such that, taking local ?
steps, a process can surely hear from every
correct process.)
8FLP impossibility
- Consensus is impossible in an asynchronous system
if at least one process might crash. - Fischer, Lynch and Paterson, 1985
9Adding (some) synchrony
- Consensus is impossible in a system with
asynchronous processing or asynchronous
communication if at least one process might
crash. Dolev, Dwork, Stockmeyer, 1987 - ( in a shared memory system
Loui, Abu-Amara, 1987)
10Why?
- It is impossible to distinguish a crashed
process from a sleeping one, no matter how many
steps you take.
?
(1)
?
(2)
11Adding partial synchrony
- Assume that for in every execution there is an
upper bound on time to execute a processing step
and to communicate a message. - Consensus is solvable if a majority of processes
are correct. - (If communication is synchronous and processing
is partially synchronous, then consensus is
solvable for any number of failures.) - Dwork, Lynch, Stockmeyer, 1988
12Adding less synchrony
- Assume we (eventually) have a leader, i.e.,
eventually all processes that take enough steps
will hear from some correct process.
1
2
3
4
13Eventual leader abstraction O
- At every process, O outputs a process identifier.
- Eventually, the same correct process id is output
at all processes.
1
2
3
3
3
3
1
2
4
1
3
3
2
4
1
4
3
3
3
4
4
14O is sufficient for consensus!
- Consensus is solvable in an asynchronous system
equipped with O, where a majority of processes
are correct. - Lam90,CT91
- (If communication is synchronous, then consensus
is solvable for any number of failures.) - DLS88,LH94
15The question
- What is the smallest amount of synchrony that
must be introduced into the asynchronous system
to solve an unsolvable problem?
16Outline
- Impossibility results and failure detectors
- Model asynchronous system with failure detectors
- The weakest failure detector question and the CHT
proof - Determining the weakest failure detectors for
various problems - (implementing shared memory, solving consensus,
solving non-blocking atomic commit (NBAC),
boosting consensus power of atomic objects)
17General system model
- Processes p1,,pn communicate through reliable
message-passing channels. () - In addition, every processes can query its
failure detector module that produces some
(maybe incomplete and inaccurate) information
about failures. - () Later we consider also registers and atomic
objects of given power.
18Failure detector modules
FD
p
q
r
FD
FD
19Failure detectors
- The information output to the processes depends
only on failures
p
fail(q)
Information on failures
query
FD
20Example perfect failure detector P
- At each process, P outputs a set of suspected
process identifiers. - Eventually, every crashed process is suspected
- No process is suspected before it crashes
Ø
Ø
4
4
1
Ø
4
Ø
4
2
Ø
Ø
4
4
4
3
4
21Example failure signal failure detector FS
- At each process, FS outputs green or red.
- If red is output, then a failure previously
occurred. - If a failure occurs, then eventually red is
output at all correct processes.
red
green
1
red
green
2
red
red
green
3
4
22Environments
- An environment E specifies when and where
failures might occur - Examples
- Majority of processes are correct
- At most one process crash
23Failure detector reductions
- Failure detector D is weaker than failure
detector D if D can be extracted from D, i.e.,
there exists an algorithm that simulates D using
D.
D
D
p
D
D
q
r
D
D
24The weakest failure detector
- D is the weakest failure detector to solve
problem M in an environment E if and only if -
- D is sufficient for M in E D can be used to
solve M in E - D is necessary for M in E D is weaker than any
failure detector D that can be used to solve M
in E
25The question
- Given a problem M and an environment E,
- what is the weakest failure detector for solving
M in E?
26Outline
- Impossibility results and failure detectors
- Model asynchronous system with failure detectors
- The weakest failure detector question and the CHT
proof - Determining the weakest failure detectors for
various problems - (implementing shared memory, solving consensus,
solving non-blocking atomic commit (NBAC),
boosting consensus power of atomic objects)
27The CHT result
- The CHT Theorem If a failure detector D
implements consensus, then D implements ? - Corollary ? is the weakest failure detector for
consensus with a majority of correct processes - Chandra, Hadzilacos and Toueg, 1996
28(No Transcript)
29(No Transcript)
30(No Transcript)
31d1
d3
p1
(p1,d1)
(p1,d3)
(p1,d1)
(p2,d2)
(p2,d2)
(p2,d4)
d2
d4
p2
(p1,d1)
(p1,d1)
(p1,d3)
(p2,d2)
(p2,d2)
(p2,d4)
32(No Transcript)
33p1
Decide(0)
p1
I0
p2
Decide(0)
Decide(1)
p1
I1
p2
Decide(1)
Decide(1)
p1
I2
p2
Decide(1)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42Outline
- Impossibility results and failure detectors
- Model asynchronous system with failure detectors
- The weakest failure detector question and the CHT
proof - Determining the weakest failure detectors for
various problems
43Problem implementing a register
- A register is an object accessed through reads
and writes - The write(v) stores v at the register and returns
ok - The read returns the last value written at the
register - NB In an asynchronous system a register can be
implemented if and only if a majority of
processes are correct ABD95.
44Quorum failure detector S
- At each process, S outputs a set of processes
- Any two sets (output at any times and at any
processes) intersect. - Eventually every set contains only correct
processes. - NB Given a majority of correct processes, S can
be implemented in an asynchronous system.
45S is sufficient to implement registers
- Adapt the correct majority-based algorithm of
ABD95 to implement (1 reader, 1 writer)
atomic register using S - Substitute
- process p waits until a majority of
processes reply - with
- process p waits until all processes in S
reply
46S is necessary to implement registers
- Let A be any implementation of registers that
uses some failure detector D. - Must show that we can extract S from D.
- Each write operation involves a set of
participants the processes that help the
operation take effect (w.r.t. A and D) - Claim the set of participants includes at least
one correct process
47Extraction algorithm
- Every process p periodically
- writes in its register the participant sets of
its previous writes - reads participant sets of other processes
- outputs
- the participant set of its previous write, and
- for every known participant set S, one live
process in S - All output sets intersect and eventually contain
only correct processes
48Emulating S the reduction algorithm
- Let Pi(k) be the set of participants in k-th
write operation by process i - Round k
- Ei Pi(j) jk
- write(Ei) to register Ri
- Ei Ei U Pi(k)
- send (k,?) to all
- for every j1,,n, wait until received (k,ack)
from at lest one process in every S read in
register Rj - current output of S set of all processes
from which (ack,k) plus Pi(k-1)
49Emulating S the proof intuition
- For any round k, process i stores all Pi(k)
(kltk) in Ri and includes Pi(k-1) to its emulated
set Si - gt
- Any process j that reads Ri afterwards will
include at least one process from Pi(k-1) to its
emulated set Sj - gt
- Every two emulated sets intersect
- Eventually, only correct processes send acks
- gt
- Eventually, the emulation set includes only
correct processes
50Registers the weakest failure detector
- S is the weakest failure detector to implement
atomic registers, in any environment
51Consensus ? registers ?
- ? can be used to solve consensus with registers,
in any environment LH94 - Consensus gt Registers any consensus algorithm
can be used to implement registers, in any
environment Lam86,Sch90 - Consensus gt ? ? can be extracted from any
failure detector D that solves consensus, in any
environment CHT96
52Consensus the weakest failure detector
- Consensus ? registers ? (in any environment)
- S is the weakest FD to implement registers (in
any environment) - Thus,
- (?, S) is the weakest failure detector to solve
consensus, in any environment
53Problem quittable consensus (QC)
- QC is like consensus except that
- if a failure occurs, then processes can agree
either - on one of the proposed values (as in consensus),
- or
- on the special value Q ( Quit )
54Quittable consensus (QC)
- propose(v) (v in 0,1) returns a value in
0,1,Q - (Q stands for quit )
- Agreement no two processes return different
values - Termination every correct process eventually
returns a value - Validity only a value v in 0,1,Q can be
returned - If v in 0,1, then some process previously
proposed v - If vQ, then a failure previously occurred
55Failure detector ?
- For some initial period of time ? outputs some
predefined value Ø - Eventually,
- ? behaves like (O,S), or
- (only if a failure occurs) ? behaves like FS
(outputs red) - NB If a failure occurs, ? can choose to behave
like (O,S) or like FS (the choice is the same at
all processes)
56? is sufficient to solve QC
- Propose(v) // v in 0,1
- wait until ? ? Ø
- if ? red then return Q // If ? behaves like
FS -
- d ConsPropose(v) // If ? behaves like
(O,S) - // run a consensus algorithm
- return d
57? is necessary to solve QC
- Let A be a QC algorithm that uses a failure
detector D. - Must show that we can extract ? from A
and D
58Simulating runs of A
- Every process periodically samples D and
exchanges its FD samples with other processes - gt using these FD samples, the process locally
simulates runs of A CHT96
D
Simulate A
p
D
D
q
r
Simulate A
Simulate A
59Extracting ?
- Each process pi runs the simulation until, for
every j1,,n, there is a simulated run starting
from Ij in which pi decides. - If pi decides Q in one of the simulated runs
propose 0 to QC. - Otherwise, propose 1 to QC.
- If QC decides 0 or Q --- output red.
- Otherwise, it is possible to output (O,S).
60Extracting (O,S)
- If there are enough simulated runs of A in
which non-Q values are decided, then it is
possible to extract (O,S). - Extracting O --- like in CHT, locating a critical
index, etc. (by construction, a critical index
exists) - Extracting S --- a novel technique
61QC the weakest failure detector
- ? is the weakest failure detector to solve QC, in
any environment
62Problem NBAC
- A set of processes need to agree on whether to
commit or to abort a transaction. - Initially, each process votes Yes (I want to
commit) or No (We must abort) - Eventually, processes must reach a common
decision (Commit or Abort).
63Problem NBAC
- Agreement no two processes return different
values - Termination every correct process eventually
returns a value - Validity a value in Commit, Abort is returned
- If Commit is returned, then every process voted
Yes - If Abort is returned, then some process voted no
or a failure previously occurred
64NBAC ? QC FS
- NBAC gt QC Any algorithm for NBAC
can be used to solve QC - NBAC gt FS Any algorithm for
NBAC can be used to extract FS - QCFS gt NBAC given (a) any algorithm
for QC and (b) FS, we can solve NBAC
65(QC,FS) ?NBAC
- Given (a) any algorithm for QC and (b) FS, we can
solve NBAC - send v to all
- wait until received all votes or FS outputs red
- \\ wait until all votes received or \\ a
failure occurs - if all votes are received and are Yes then
- proposal 1 \\ propose to commit
- else
- proposal 0 \\ propose to abort
- if QC.Propose(proposal) returns 1 then
- return Commit
- else
- return Abort
66NBAC the weakest failure detector
- NBAC ? QC FS (in any environment)
- ? is the weakest FD to solve QC (in any
environment) - Thus,
- (?,FS) is the weakest failure detector to solve
NBAC, in any environment
67Problem boosting consensus power
- Assume that processes communicate through atomic
(wait-free linearizable) objects. - An object type specifies the interface of the
object - The set of states
- The set of operations
- The set of possible state transitions
68Problem boosting consensus power
- Consensus power Herlihy, 1991 of an object type
T is the maximum number of processes that can
solve consensus using atomic objects of type T
and registers. - cons(Register)1
- cons(TS)2
- cons(CS) infinity
- By definition, given a type T with consensus
power n, n1 processes cannot solve consensus
using objects of type T and registers.
69Problem boosting consensus power
- n 1 processes
- Registers
- Shared objects of type T cons(T) n
- What is the weakest failure detector D to solve
consensus?
70Neigers conjecture Nei95
- O(k) outputs a set of at most k processes so
that, - Eventually, all correct processes detect the same
set that includes at least one correct process - O(k1) is weaker than O(k)
- O(n) is sufficient to solve (n 1)-process
consensus using objects of T and registers. - Is O(n) necessary?
71Partial response
- Yes, if T is one-shot deterministic.
- Every operation triggers exactly one transition
- At most one operation on an object of type T is
allowed for every process
72Partial response
- Theorem O(n) is necessary to implement wait-free
(n 1)-process consensus with registers and
objects of a one-shot deterministic type T such
that cons(T) n. - Corollary O(n) is necessary to implement
(n 1)-process consensus using registers and (n
- 1)-resilient objects of any types.
73The sources
- C. Delporte-Gallet, H. Fauconnier, R. Guerraoui,
- V. Hadzilacos, P. Kouznetsov, and S. Toueg
- The weakest failure detectors to solve certain
fundamental problems in distributed computing - PODC 2004
- R. Guerraoui and P. Kouznetsov
- Failure Detectors and Type Boosters
- DISC 2003
- C. Delporte-Gallet, H. Fauconnier, R. Guerraoui,
- and P. Kouznetsov
- Mutual Exclusion in Asynchronous Systems with
Failure Detectors - To appear in JPDC 2005
74