The weakest failure detector question in distributed computing - PowerPoint PPT Presentation

1 / 74

About This Presentation

Title:

The weakest failure detector question in distributed computing

Description:

... process pi uses its DAG to simulate runs of A in the system, i.e., every process ... Ei := {Pi(j)} jk. write(Ei) to register Ri. Ei := Ei U Pi(k) send (k,?) to all ... – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 75

Provided by: petrkou

Category:

more less

Transcript and Presenter's Notes

Title: The weakest failure detector question in distributed computing

1
The weakest failure detector question in
distributed computing
Petr Kouznetsov Distributed Programming Lab EPFL
2
Outline

Impossibility results and failure detectors
Model asynchronous system with failure detectors
The weakest failure detector question and the CHT
proof
Determining the weakest failure detectors for
various problems
(implementing shared memory, solving consensus,
solving non-blocking atomic commit, boosting
consensus power of atomic objects)

3
Centralized computing
Clients
Centralized computing unit
4
Distributed computing
Clients
Distributed computing unit
5
Redundancy and synchronization

The distributed implementation should create an
illusion of a centralized one
The components (processes) must be synchronized
in a consistent way.

Distributed computing unit
6
Consensus

Processes propose values and must agree on a
common value in a non-trivial manner
Agreement no two correct processes decide
differently
Validity every decided value is a proposed value
Termination every correct process eventually
decides

7
Ideal computing

The consistency and progress of the
implementation are preserved even if
Processes can fail by crashing
The system is asynchronous
Communication is not bounded
Processing is not bounded
(There is no bound ? such that, taking local ?
steps, a process can surely hear from every
correct process.)

8
FLP impossibility

Consensus is impossible in an asynchronous system
if at least one process might crash.
Fischer, Lynch and Paterson, 1985

9
Adding (some) synchrony

Consensus is impossible in a system with
asynchronous processing or asynchronous
communication if at least one process might
crash. Dolev, Dwork, Stockmeyer, 1987
( in a shared memory system
Loui, Abu-Amara, 1987)

10
Why?

It is impossible to distinguish a crashed
process from a sleeping one, no matter how many
steps you take.

?
(1)
?
(2)
11
Adding partial synchrony

Assume that for in every execution there is an
upper bound on time to execute a processing step
and to communicate a message.
Consensus is solvable if a majority of processes
are correct.
(If communication is synchronous and processing
is partially synchronous, then consensus is
solvable for any number of failures.)
Dwork, Lynch, Stockmeyer, 1988

12
Adding less synchrony

Assume we (eventually) have a leader, i.e.,
eventually all processes that take enough steps
will hear from some correct process.

1
2
3
4
13
Eventual leader abstraction O

At every process, O outputs a process identifier.
Eventually, the same correct process id is output
at all processes.

1
2
3
3
3
3
1
2
4
1
3
3
2
4
1
4
3
3
3
4
4
14
O is sufficient for consensus!

Consensus is solvable in an asynchronous system
equipped with O, where a majority of processes
are correct.
Lam90,CT91
(If communication is synchronous, then consensus
is solvable for any number of failures.)
DLS88,LH94

15
The question

What is the smallest amount of synchrony that
must be introduced into the asynchronous system
to solve an unsolvable problem?

16
Outline

Impossibility results and failure detectors
Model asynchronous system with failure detectors
The weakest failure detector question and the CHT
proof
Determining the weakest failure detectors for
various problems
(implementing shared memory, solving consensus,
solving non-blocking atomic commit (NBAC),
boosting consensus power of atomic objects)

17
General system model

Processes p1,,pn communicate through reliable
message-passing channels. ()
In addition, every processes can query its
failure detector module that produces some
(maybe incomplete and inaccurate) information
about failures.
() Later we consider also registers and atomic
objects of given power.

18
Failure detector modules
FD
p
q
r
FD
FD
19
Failure detectors

The information output to the processes depends
only on failures

p
fail(q)
Information on failures
query
FD
20
Example perfect failure detector P

At each process, P outputs a set of suspected
process identifiers.
Eventually, every crashed process is suspected
No process is suspected before it crashes

Ø
Ø
4
4
1
Ø
4
Ø
4
2
Ø
Ø
4
4
4
3
4
21
Example failure signal failure detector FS

At each process, FS outputs green or red.
If red is output, then a failure previously
occurred.
If a failure occurs, then eventually red is
output at all correct processes.

red
green
1
red
green
2
red
red
green
3
4
22
Environments

An environment E specifies when and where
failures might occur
Examples
Majority of processes are correct
At most one process crash

23
Failure detector reductions

Failure detector D is weaker than failure
detector D if D can be extracted from D, i.e.,
there exists an algorithm that simulates D using
D.

D
D
p
D
D
q
r
D
D
24
The weakest failure detector

D is the weakest failure detector to solve
problem M in an environment E if and only if
D is sufficient for M in E D can be used to
solve M in E
D is necessary for M in E D is weaker than any
failure detector D that can be used to solve M
in E

25
The question

Given a problem M and an environment E,
what is the weakest failure detector for solving
M in E?

26
Outline

Impossibility results and failure detectors
Model asynchronous system with failure detectors
The weakest failure detector question and the CHT
proof
Determining the weakest failure detectors for
various problems
(implementing shared memory, solving consensus,
solving non-blocking atomic commit (NBAC),
boosting consensus power of atomic objects)

27
The CHT result

The CHT Theorem If a failure detector D
implements consensus, then D implements ?
Corollary ? is the weakest failure detector for
consensus with a majority of correct processes
Chandra, Hadzilacos and Toueg, 1996

28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
d1
d3
p1
(p1,d1)
(p1,d3)
(p1,d1)
(p2,d2)
(p2,d2)
(p2,d4)
d2
d4
p2
(p1,d1)
(p1,d1)
(p1,d3)
(p2,d2)
(p2,d2)
(p2,d4)
32
(No Transcript)
33
p1
Decide(0)
p1
I0
p2
Decide(0)
Decide(1)
p1
I1
p2
Decide(1)
Decide(1)
p1
I2
p2
Decide(1)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
Outline

Impossibility results and failure detectors
Model asynchronous system with failure detectors
The weakest failure detector question and the CHT
proof
Determining the weakest failure detectors for
various problems

43
Problem implementing a register

A register is an object accessed through reads
and writes
The write(v) stores v at the register and returns
ok
The read returns the last value written at the
register
NB In an asynchronous system a register can be
implemented if and only if a majority of
processes are correct ABD95.

44
Quorum failure detector S

At each process, S outputs a set of processes
Any two sets (output at any times and at any
processes) intersect.
Eventually every set contains only correct
processes.
NB Given a majority of correct processes, S can
be implemented in an asynchronous system.

45
S is sufficient to implement registers

Adapt the correct majority-based algorithm of
ABD95 to implement (1 reader, 1 writer)
atomic register using S
Substitute
process p waits until a majority of
processes reply
with
process p waits until all processes in S
reply

46
S is necessary to implement registers

Let A be any implementation of registers that
uses some failure detector D.
Must show that we can extract S from D.
Each write operation involves a set of
participants the processes that help the
operation take effect (w.r.t. A and D)
Claim the set of participants includes at least
one correct process

47
Extraction algorithm

Every process p periodically
writes in its register the participant sets of
its previous writes
reads participant sets of other processes
outputs
the participant set of its previous write, and
for every known participant set S, one live
process in S
All output sets intersect and eventually contain
only correct processes

48
Emulating S the reduction algorithm

Let Pi(k) be the set of participants in k-th
write operation by process i
Round k
Ei Pi(j) jk
write(Ei) to register Ri
Ei Ei U Pi(k)
send (k,?) to all
for every j1,,n, wait until received (k,ack)
from at lest one process in every S read in
register Rj
current output of S set of all processes
from which (ack,k) plus Pi(k-1)

49
Emulating S the proof intuition

For any round k, process i stores all Pi(k)
(kltk) in Ri and includes Pi(k-1) to its emulated
set Si
gt
Any process j that reads Ri afterwards will
include at least one process from Pi(k-1) to its
emulated set Sj
gt
Every two emulated sets intersect
Eventually, only correct processes send acks
gt
Eventually, the emulation set includes only
correct processes

50
Registers the weakest failure detector

S is the weakest failure detector to implement
atomic registers, in any environment

51
Consensus ? registers ?

? can be used to solve consensus with registers,
in any environment LH94
Consensus gt Registers any consensus algorithm
can be used to implement registers, in any
environment Lam86,Sch90
Consensus gt ? ? can be extracted from any
failure detector D that solves consensus, in any
environment CHT96

52
Consensus the weakest failure detector

Consensus ? registers ? (in any environment)
S is the weakest FD to implement registers (in
any environment)
Thus,
(?, S) is the weakest failure detector to solve
consensus, in any environment

53
Problem quittable consensus (QC)

QC is like consensus except that
if a failure occurs, then processes can agree
either
on one of the proposed values (as in consensus),
or
on the special value Q ( Quit )

54
Quittable consensus (QC)

propose(v) (v in 0,1) returns a value in
0,1,Q
(Q stands for quit )
Agreement no two processes return different
values
Termination every correct process eventually
returns a value
Validity only a value v in 0,1,Q can be
returned
If v in 0,1, then some process previously
proposed v
If vQ, then a failure previously occurred

55
Failure detector ?

For some initial period of time ? outputs some
predefined value Ø
Eventually,
? behaves like (O,S), or
(only if a failure occurs) ? behaves like FS
(outputs red)
NB If a failure occurs, ? can choose to behave
like (O,S) or like FS (the choice is the same at
all processes)

56
? is sufficient to solve QC

Propose(v) // v in 0,1
wait until ? ? Ø
if ? red then return Q // If ? behaves like
FS
d ConsPropose(v) // If ? behaves like
(O,S)
// run a consensus algorithm
return d

57
? is necessary to solve QC

Let A be a QC algorithm that uses a failure
detector D.
Must show that we can extract ? from A
and D

58
Simulating runs of A

Every process periodically samples D and
exchanges its FD samples with other processes
gt using these FD samples, the process locally
simulates runs of A CHT96

D
Simulate A
p
D
D
q
r
Simulate A
Simulate A
59
Extracting ?

Each process pi runs the simulation until, for
every j1,,n, there is a simulated run starting
from Ij in which pi decides.
If pi decides Q in one of the simulated runs
propose 0 to QC.
Otherwise, propose 1 to QC.
If QC decides 0 or Q --- output red.
Otherwise, it is possible to output (O,S).

60
Extracting (O,S)

If there are enough simulated runs of A in
which non-Q values are decided, then it is
possible to extract (O,S).
Extracting O --- like in CHT, locating a critical
index, etc. (by construction, a critical index
exists)
Extracting S --- a novel technique

61
QC the weakest failure detector

? is the weakest failure detector to solve QC, in
any environment

62
Problem NBAC

A set of processes need to agree on whether to
commit or to abort a transaction.
Initially, each process votes Yes (I want to
commit) or No (We must abort)
Eventually, processes must reach a common
decision (Commit or Abort).

63
Problem NBAC

Agreement no two processes return different
values
Termination every correct process eventually
returns a value
Validity a value in Commit, Abort is returned
If Commit is returned, then every process voted
Yes
If Abort is returned, then some process voted no
or a failure previously occurred

64
NBAC ? QC FS

NBAC gt QC Any algorithm for NBAC
can be used to solve QC
NBAC gt FS Any algorithm for
NBAC can be used to extract FS
QCFS gt NBAC given (a) any algorithm
for QC and (b) FS, we can solve NBAC

65
(QC,FS) ?NBAC

Given (a) any algorithm for QC and (b) FS, we can
solve NBAC
send v to all
wait until received all votes or FS outputs red
\\ wait until all votes received or \\ a
failure occurs
if all votes are received and are Yes then
proposal 1 \\ propose to commit
else
proposal 0 \\ propose to abort
if QC.Propose(proposal) returns 1 then
return Commit
else
return Abort

66
NBAC the weakest failure detector

NBAC ? QC FS (in any environment)
? is the weakest FD to solve QC (in any
environment)
Thus,
(?,FS) is the weakest failure detector to solve
NBAC, in any environment

67
Problem boosting consensus power

Assume that processes communicate through atomic
(wait-free linearizable) objects.
An object type specifies the interface of the
object
The set of states
The set of operations
The set of possible state transitions

68
Problem boosting consensus power

Consensus power Herlihy, 1991 of an object type
T is the maximum number of processes that can
solve consensus using atomic objects of type T
and registers.
cons(Register)1
cons(TS)2
cons(CS) infinity
By definition, given a type T with consensus
power n, n1 processes cannot solve consensus
using objects of type T and registers.

69
Problem boosting consensus power

n 1 processes
Registers
Shared objects of type T cons(T) n
What is the weakest failure detector D to solve
consensus?

70
Neigers conjecture Nei95

O(k) outputs a set of at most k processes so
that,
Eventually, all correct processes detect the same
set that includes at least one correct process
O(k1) is weaker than O(k)
O(n) is sufficient to solve (n 1)-process
consensus using objects of T and registers.
Is O(n) necessary?

71
Partial response

Yes, if T is one-shot deterministic.
Every operation triggers exactly one transition
At most one operation on an object of type T is
allowed for every process

72
Partial response

Theorem O(n) is necessary to implement wait-free
(n 1)-process consensus with registers and
objects of a one-shot deterministic type T such
that cons(T) n.
Corollary O(n) is necessary to implement
(n 1)-process consensus using registers and (n
- 1)-resilient objects of any types.

73
The sources

C. Delporte-Gallet, H. Fauconnier, R. Guerraoui,
V. Hadzilacos, P. Kouznetsov, and S. Toueg
The weakest failure detectors to solve certain
fundamental problems in distributed computing
PODC 2004
R. Guerraoui and P. Kouznetsov
Failure Detectors and Type Boosters
DISC 2003
C. Delporte-Gallet, H. Fauconnier, R. Guerraoui,
and P. Kouznetsov
Mutual Exclusion in Asynchronous Systems with
Failure Detectors
To appear in JPDC 2005