Title: Computer Science 328 Distributed Systems
1Computer Science 328Distributed Systems
2Consensus
- Consensus N Processes agree on a value.
- e.g. synchronized action (go / abort)
- Consensus may have to be reached in the presence
of failure. - Process failure process crash (fail-stop
failure), arbitrary failure. - Communication failure lost or corrupted
messages. - In a consensus algorithm
- All Pi start in an undecided state.
- Each Pi proposes a value vi from a set D and
communicates it to some or all other processes. - A consensus is reached if all non-failed
processes agree on the same value, d. - Each non-failed Pi sets its decision variable to
d and changes its state to decided.
3Consensus Requirements
- Termination Eventually each correct process
sets its decision value. - This may not be possible in the presence of
process crashes in asynchronous systems - Agreement The decision value is the same for
all correct processes, i.e., if pi and pj are
correct and have entered the decided state, then
didj - Arbitrary (Byzantine) failures may cause
inconsistency and prevent agreement - Integrity If all correct processes Pis propose
the same value, d, then any correct process in
the decided state has decision value d. - Consensus may involve a proposal stage and an
agreement stage.
4Consensus in a Synchronous System
- For a system with at most f processes crashing,
the algorithm proceeds in f1 rounds (with
timeout), using basic multicast. - Valuesri the set of proposed values known to Pi
at the beginning of round r. - Initially Values0i Values1i vi
- for round 1 to f1 do
- multicast (Values ri Valuesr-1i)
- Values r1i ? Valuesri
- for each Vj received
- Values r1i Values r1i ? Vj
- end
- end
- di minimum(Values f1i)
5Proof of Correctness
- Proof by contradiction.
- Assume that two processes differ in their final
set of values. - Assume that pi possesses a value v that pj does
not possess. - ? A third process, pk, sent v to pi, and crashed
before sending v to pj. - ? Any process sending v in the previous round
must have crashed otherwise, both pk and pj
should have received v. - ? Proceeding in this way, we infer at least one
crash in each of the preceding rounds. - ? But we have assumed at most f crashes can occur
and there are f1 rounds ? contradiction.
6Interactive Consistency Requirements
- Interactive consistency is a special case of
consensus where processes agree on a vector of
values, one value for each process (e.g. load,
current state) - Termination Eventually each correct process
sets its decision vector - Agreement The decision vector is the same for
all correct processes - Integrity If Pi is correct then all correct
processes decide on vi as the ith element of the
decision vector - The Vi value for failed processes may be ignored
or decided by consensus
7Example Consensus Interactive Consistency
P1
Interactive Consistency
V1 go
d1 ? go
V3 go
V2 go
P1
Consensus Alg.
P3
V1 5
P2
d1 ? (5,7,2, -)
V3 2
V2 7
d3 ? go
d2 ? go
Consensus Alg.
V4 abort
P3
P2
P4
Crashed
d3 ? (5,7,2, -)
d2 ? (5,7,2, -)
V4 ?
Consensus
P4
Crashed
8Agreement in light of failure The Byzantine
Generals
- 3 or more generals need to agree to attack or to
retreat. - Problem
- The commander issues the order.
- One or more of the generals (including the
commander) could be a traitor wholl give wrong
information. - Each general sends his/her information to all
others (assuming reliable communication). - Once each general has collected all values, it
determines the right value (attack or retreat). - The requirements are termination, agreement, and
integrity.
9Byzantine Generals in Synchronous Systems
- Now a fault process may send any message with any
value at any time or it may omit to send any
message. - In the case of arbitrary failure, no solution
exists if Nlt3f.
If a solution exists, process p2 is bound to
decide on value v when the commander is correct,
by the integrity condition. If we accept that no
algorithm can possibly distinguish between the
two scenarios, p2 must also choose the value
sent by the commander in the right scenario.
10Solution with One Faulty Process
- To solve the Byzantine generals problem in a
synchronous system, we require. Ngt3f1 - Consider N4, f1
- In the first round, the commander sends a value
to each of the lieutenants. - In the second round, each of the lieutenants
sends the value it received to its peers. - The correct lieutenants need only apply a simple
majority function on the set of values received. - As N-f-1 gt 2f, the majority function will ignore
any faulty value.
11Four Byzantine Generals
12Example Byzantine Generals
13Generalization of Byzantine Generals Problem
- In the general case (fgt1), the algorithm
operates over f1 rounds. In each round, a
process sends to a subset of the other processes
the values that it received in the previous
round. - The message overhead is O(N f1).
- No algorithm can guarantee to reach consensus in
an asynchronous system, even with one process
crash failure. - Instead of detecting the faulty entity, one masks
any process failures that occur (redundancy is
the key). - One solution is to use failure detector to turn
an asynchronous system into a synchronous one.
Processes agree to deem a process that has not
responded for more than a time limit to have
failed.