Title: Asynchronous Consensus
1Asynchronous Consensus
2Outline of talk
- Reminder about models
- Asynchronous consensus Impossibility result
- Solution to the problem
- With an oracle that detects failures
- Without oracles, using timeout
- Big issues? Revisit from Byzantine agreement
- Is this model realistic? In what ways is it
legitimate? - Should we focus on impossibility, or
possibility? - Asynchronous consensus in real world systems
3Distributed Computing Models
- Recall that we had two models
- To reason about networks and applications we need
to be precise about the setting in which our
protocols run - But real world networks are very complex
- They can drop packets, or reorder them
- Intruders might be able to intercept and modify
data - Timing is totally unpredictable
4Asynchronous network model
- Asynchronous because we lack clocks
- Network can arbitrarily delay a message
- But we assume that messages are sequenced and
retransmitted (arbitrary numbers of times), so
they eventually get through. - Free to say lossless, ordered
- No value to assumptions about process speed
- Failures in asynchronous model?
- Usually, limited to process crash faults
- If detectable, we call this fail-stop but how
to detect?
5An asynchronous network
Not causal!
6An asynchronous network
Time shrinks
7An asynchronous network
Time shrinks
Time stretches
8Justification?
- If we can do something in the asynchronous model,
we can probably do it even better in a real
network - Clocks, a-priori knowledge can only help
- But today we will focus on an impossibility
result - By definition, impossibility in this model means
xxx cant always be done
9Paradigms
- Fundamental problems, the solution of which
yields general insight into a broad class of
questions - In distributed systems
- Agreement (on value proposed by a leader)
- Consensus (everyone proposes a value pick one)
- Electing a leader
- Atomic broadcast/multicast (send a message,
reliably, to everyone who isnt faulty, such that
concurrent messages are delivered in the same
order everywhere) - Deadlock detection, clock or process
synchronization, taking a snapshot (picture) of
the system state.
10Consensus problem
- Models distributed agreement
- Comes in various forms (with subtle differences
in the associated results)! - With a leader leader gives an order, like
attack, and non-faulty participants either
attack or do nothing, despite some limited number
of failures Byzantine Agreement - Without a leader participants have an initial
vote protocol runs and eventually all non-faulty
participants chose the same outcome, and it is
one of the initial votes (typically, 0 or 1)
Fault-tolerant Consensus
11Consensus problem
P0 Q0 R1
P1 Q1 R1
12Fault-tolerance
- Goal an algorithm tolerant of one failure
- Failure process crashes but this is not
detectable - So the algorithm must work both in the face of
arbitrary message delay caused by the network,
and in the event of a single failure
13If some process stays up
- Suppose we knew that P wont fail
- Then P could simply broadcast its input
- All would decide upon this value
- Solves the problem
14If one process stays up
- Indeed, suppose that P stays up only long enough
to send one message - But there is only one failure
- And we knew that P would lead
- Then we can relay Ps message, using an
all-to-all broadcast
15Algorithm
- P broadcast my input
- Q ? P on receiving Ps message for first time,
broadcast a copy - Tolerates anything except failure of P in the
first step, but we need to agree upon P before
starting (ie P is the least ranked process, using
alphabetic ranking)
16Another algorithm
- All processes start by broadcasting own value to
all other processes - If we know that there is always exactly one
failure, could wait until n-1 messages received,
then using any deterministic rule - But doesnt work if sometimes we have one
failure, sometimes none
17FLP result
- Considers general case
- Assumes an algorithm that can decide with zero or
one failures - Proves that this algorithm can be prevented from
reaching decision, indefinitely
18Basic idea
- Think of system state as a configuration
- Configuration is v-valent if decision to pick v
has become inevitable all runs lead to v - If not 0-valent or 1-valent, configuration is
bivalent - Initial configuration includes
- At least one 0-valent 0,0,0.0
- At least one 1-valent 1,1,1..1
- At least one bivalent 0,0,1,1
19Basic idea
0-valentconfigurations
1-valentconfigurations
bi-valentconfigurations
20Transitions between configurations
- Configuration is a set of processes and messages
- Applying a message to a process changes its
state, hence it moves us to a new configuration - Because the system is asynchronous, cant predict
which of a set of concurrent messages will be
delivered next - But because processes only communicate by
messages, this is unimportant
21Basic Lemma
- Suppose that from some configuration C, the
schedules ?1, ?2 lead to configurations C1 and
C2, respectively. - If the sets of processes taking actions in ?1 and
?2, respectively, are disjoint than ?2 can be
applied to C1 and ?1 to C2, and both lead to the
same configuration C3
22Basic Lemma
C
?2
?1
C1
C2
?2
?1
C3
23Main result
- No consensus protocol is totally correct in spite
of one fault - Note Uses total in formal sense (guarantee of
termination)
24Basic FLP theorem
- Suppose we are in a bivalent configuration now
and later will enter a univalent configuration - We can draw a form of frontier, such that a
single message to a single process triggers the
transition from bivalent to univalent
25Basic FLP theorem
C
e
bivalent
e
D0
C1
univalent
e
e
D1
26Single step decides
- They prove that any run that goes from a bivalent
state to a univalent state has a single decision
step, e - They show that it is always possible to schedule
events so as to block such steps - Eventually, e can be scheduled but in a state
where it no longer triggers a decision
27Basic FLP theorem
- They show that we can delay this magic message
and cause the system to take at least one step,
remaining in a new bivalent configuration - Uses the diamond-relation seen earlier
- But this implies that in a bivalent state there
are runs of indefinite length that remain
bivalent - Proves the impossibility of fault-tolerant
consensus
28Notes on FLP
- No failures actually occur in this run, just
delayed messages - Result is purely abstract. What does it mean?
- Says nothing about how probable this adversarial
run might be, only that at least one such run
exists
29FLP intuition
- Suppose that we start a system up with n
processes - Run for a while close to picking value
associated with process p - Someone will do this for the first time,
presumably on receiving some message from q - If we delay that message, and yet our protocol is
fault-tolerant, it will somehow reconfigure - Now allow the delayed message to get through but
delay some other message
30Key insight
- FLP is about forcing a system to attempt a form
of reconfiguration - This takes time
- Each unfortunate suspected failure causes such
a reconfiguration
31FLP and our first algorithm
- P is the leader and is supposed to send its input
to Q - Q times out and
- Tells everyone that P has apparently failed
- Then can disseminate its own value
- If P wakes up, we re-admit it to the system but
it is no longer considered least ranked - One can make such algorithms work
- But they can be attacked by delaying first P,
then Q, then R, etc
32FLP in the real world
- Real systems are subject to this impossibility
result - But in fact often are subject to even more severe
limitations, such as inability to tolerate
network partition failures - Also, asynchronous consensus may be too slow for
our taste - And FLP attack is not probable in a real system
- Requires a very smart adversary!
33Chandra/Toueg
- Showed that FLP applies to many problems, not
just consensus - In particular, they show that FLP applies to
group membership, reliable multicast - So these practical problems are impossible in
asynchronous systems, in formal sense - But they also look at the weakest condition under
which consensus can be solved
34Chandra/Toueg Idea
- Separate problem into
- The consensus algorithm itself
- A failure detector a form of oracle that
announces suspected failure - But it can change its mind
- Question what is the weakest oracle for which
consensus is always solvable?
35Sample properties
- Completeness detection of every crash
- Strong completeness Eventually, every process
that crashes is permanently suspected by every
correct process - Weak completeness Eventually, every process that
crashes is permanently suspected by some correct
process
36Sample properties
- Accuracy does it make mistakes?
- Strong accuracy No process is suspected before
it crashes. - Weak accuracy Some correct process is never
suspected - Eventual strong accuracy there is a time after
which correct processes are not suspected by any
correct process - Eventual weak accuracy there is a time after
which some correct process is not suspected by
any correct process
37A sampling of failure detectors
Completeness Accuracy Accuracy Accuracy Accuracy
Completeness Strong Weak Eventually Strong Eventually Weak
Strong PerfectP StrongS Eventually Perfect?P Eventually Strong ? S
Weak D WeakW ? D Eventually Weak? W
38Perfect Detector?
- Named Perfect, written P
- Strong completeness and strong accuracy
- Immediately detects all failures
- Never makes mistakes
39Example of a failure detector
- The detector they call W eventually weak
- More commonly ?W diamond-W
- Defined by two properties
- There is a time after which every process that
crashes is suspected by some correct process - There is a time after which some correct process
is never suspected by any correct process - Think we can eventually agree upon a leader.
If it crashes, we eventually, accurately detect
the crash
40?W Weakest failure detector
- They show that ?W is the weakest failure detector
for which consensus is guaranteed to be achieved - Algorithm is pretty simple
- Rotate a token around a ring of processes
- Decision can occur once token makes it around
once without a change in failure-suspicion status
for any process - Subsequently, as token is passed, each recipient
learns the decision outcome
41Rotating a token versus 2-phase commit
Propose v ack Decide v
phase
42Rotating a token versus 2-phase commit
- Their protocol is basically a 2-phase commit
- But with n processes, 2PC requires 2(n-1)
messages per phase, 3(n-1) total - Passing a token only requires n messages per
phase, for 2n total (when nothing fails) - Tolerates f lt ? n/2 ? failures
43Set of problems solvable in
Clock synchronization TRBnon-blocking atomic
commitconsensusatomic broadcast reliablebroa
dcast
Synchronous systems Asynchronous using
P Asynchronous using W Asynchronous TRB
Byzantine Generals with only crash failures
44Building systems with ?W
- Unfortunately, this failure detector is not
implementable - Using timeouts we can make mistakes at arbitrary
times - But with long enough timeouts, could produce a
close approximation to ?W
45Would we want to?
- Question are we solving the right problem?
- Pros and cons of asynchronous consensus
- Think about an air traffic control application
- Find one problem for which asynchronous consensus
is a good match - Find one problem for which the match is poor
46French ATC system (simplified)
Onboard
Radar
X.500 Directory
Controllers
Air Traffic Database (flight plans, etc)
47Potential applications
- Maintaining replicated state within console
clusters - Distributing radar data to participants
- Distributing data over wide-area links within
large geographic scale - Management and control (administration) of the
overall system - Distributing security keys to prevent
unauthorized action - Agreement when flight control handoffs occur
48Broad conclusions?
- The protocol seems unsuitable for high
availability applications - If the core of the system must make progress, the
agreement property itself is too strong - If a process becomes unresponsive might not want
to wait for it to recover - Also, since we cant implement any of these
failure detectors, the whole issue is abstract - Hence real systems dont try to solve consensus
as defined and used in these kinds of protocols!
49Value of FLP/Consensus
- A clear and elegant problem statement
- Highlights limitations
- Perhaps with clocks we can overcome them
- More likely, we need a different notion of
failure - Crash failure is too narrow, unreachable also
treated as failure in many real systems - Caused much debate about real systems
50Nature of debate
- Well see many practical systems soon
- Do they
- Evade FLP in some way?
- Are they subject to FLP? If so, what problem do
they solve, given that consensus (and most
problems reduce to consensus) is impossible to
solve? - Or are they subject to even more stringent
limitations? - Is fault-tolerant consensus even an issue in real
systems?