Title: Byzantine Fault Tolerance
1Byzantine Fault Tolerance
- Presented By Paper Wade Fagen 1 Lucas
Cook 2, 3
2The Papers
- 1 The Byzantine Generals Problem, Lamport et
al. - 2 Practical Byzantine Fault Tolerance, Castro
et al. - 3 Preserving Peer Replicas By Rate-Limited
Sampled Voting, Maniatis et al.
3The Byzantine Generals
- Let us assume we have five generals
4The Byzantine Generals
- Let us assume one is malicious
5The Byzantine Generals
- Each local general decides on an attack
6The Byzantine Generals
- and accurately relays their plan
7The Byzantine Generals
- except the random malicious node
8The Byzantine Generals
- Each general collects his or her votes
9The Byzantine Generals
- Assume each general takes the majority vote
10The Byzantine Generals
- The generals now move based upon their agreed
orders
11The Byzantine Generals
- Since less than half of the military attacked,
the military attack failed
12The Byzantine Generals
- Whats interesting the remaining loyal nodes
dont know which node(s) among them are disloyal.
13The Byzantine Generals
14The Byzantine General Problem
- Let the generals decide for one to be the leader
and others to simply be lieutenants.
Ill be the general!
15The Byzantine General Problem
- Now the general plans the attack
16The Byzantine General Problem
- The general sends out his or her order to all
lieutenants
17The Byzantine General Problem
- Each site records the message they received
18The Byzantine General Problem
- Each site now sends the attack plan theyve
received to the other sites
19The Byzantine General Problem
- Again, each site records all messages received
20The Byzantine General Problem
- This process may continue for any number of
rounds, but well stop here for now
21The Byzantine General Problem
- Each site finds the majority value of its final
round
22The Byzantine General Problem
- Result All loyal nodes agree on the same result!
23The Byzantine General Problem
- What assumptions were made?
- A1 Every message sent was delivered correctly.
as we didnt see
24The Byzantine General Problem
- What assumptions were made?
- A1 Every message sent was delivered correctly.
- A2 The receiver of the message knows who sent it.
as we didnt see
25The Byzantine General Problem
- What assumptions were made?
- A1 Every message sent was delivered correctly.
- A2 The receiver of the message knows who sent
it. - A3 All sites sent a message.
???
as we didnt see
26The Byzantine General Problem
- What assumptions were made?
- A1 Every message sent was delivered correctly.
- A2 The receiver of the message knows who sent
it. - A3 The absence of a message can be detected.
No message for me (
so we might see
a pre-defined default value may be used
27The Byzantine General Problem
- How many disloyal troops can we have and still
reach consensus?
28The Byzantine General Problem
- Pre-determined general creates an attack plan
29The Byzantine General Problem
- Round 1Send out messages Record
30The Byzantine General Problem
- Round 2Send out messages Record
31The Byzantine General Problem
- Round 3Send out messages Record
32The Byzantine General Problem
- Seems like 1 disloyal troop with 2 loyal troops
works
33The Byzantine General Problem
34The Byzantine General Problem
- Round 1Send out messages Record
35The Byzantine General Problem
- Round 2Send out messages Record
36The Byzantine General Problem
- Round 3Send out messages Record
37The Byzantine General Problem
- Round 4Send out messages Record
38The Byzantine General Problem
- Round 5Send out messages Record
39The Byzantine General Problem
- Round 6Send out messages Record
40The Byzantine General Problem
- Lamport shows (by proof)
- For a system of n1 nodes, there cannot exist
more than n/3 faulty nodes. - Alternatively
- There must be more than 3m troops in any army
with up to m traitors.
41The Byzantine General Problem
- General Proof Outline
- Pair two loyal troops with each disloyal troop
42The Byzantine General Problem
- General Proof Outline
- There must exist one more loyal troop to sway the
balance of the majority
43The Byzantine General Problem
- General Proof Outline
- But the proof only holds if the algorithm runs
for m (or more) total rounds!
1
4
2
5
3
6
44The Byzantine General Problem
- Up until now, the node has been malicious. But
is that all? - A Byzantine failure is
- An arbitrary failure of the node.
- Adversarial assumption (worst-case)
- The adversary is as smart as the system.
- Thus, a system prone to Byzantine failures may
not always suffer a Byzantine failure
fail-stop failures may also exist.
45The Byzantine General Problem
- Tough stuff. But if we add one more assumption,
we can make the problem a lot easier - A4 Messages are signed.
- a) A loyal general has a signature that cannot be
forged - b) A signed message cannot be altered without
detection - c) Anyone can verify the signature
46The Byzantine General Problem
- Returning to the problem that didnt work with
unsigned messages
47The Byzantine General Problem
- Previously, our general sent two order out
48The Byzantine General Problem
- But when the algorithm runs for a second round
Conflicting Orders!
49The Byzantine General Problem
- The authors find that, by using signed messages
- Any number of disloyal generals may exist in a
system. - Problem is trivial if n lt m2.
- All loyal generals will agree on a common result
after m rounds.
50The Two Generals Problem
- Reviewing our assumptions
- A1 Every message sent was delivered correctly.
- The Two Generals Problem showed that two
generals cannot ever reach consensus with the
possibility of lost messages. - Developed by Akkoyunlu et al. in 1975.
51The Byzantine General Problem
- The core Byzantine problem is well studied and
understood - Works for systems where not all nodes can
communicate with one another. - 1 presents the case of 3-regular graphs
- Clock synchronization problems have be solved in
Byzantine-prone systems - Interactive Consistency Algorithms (Lamport,
1986)
52Useful?
- In a system with a bound on adversarial nodes,
you must perform at least m rounds to reach
consensus. - Unsigned Messages m (n-1)/3
- Signed Messages m n
- Requires PKI or some similar system.
53Useful?
- Could you develop a practical replica server
based upon - Message loss, reordering, duplication
- Independent node failures
- PKI and collision-resistant hashing
- Strong adversary
- Replicated service based on state machine
- As we examine 2, we will find out
54Consensus Protocol Goals
- Liveness
- Clients receive replies to requests
- Safety
- Replicated service is linearizable
- i.e. it appears centralized w/ atomic ops
- We need n gt 3f nodes!
- 2f1 to act with confidence, f may never respond
55Consensus Protocol Views
- Leader p v mod n
- Advance leader with failure v1
- View change needs to be coordinated
- New leader sends request
- Get replies, sends those out as proof
- Coordinates checkpoints (shown later)
56Consensus Protocol Consensus
- Client receives replies from replicas
- Needs f1 identical results
- Ensure total ordering across views
57Consensus Protocol Consensus
- Pre-prepare from node i
- Contains view , sequence , hash of request,
request - Prepare from node j
- Contains view , seq , hash, ID for j
- Commit from node j
- Contains view , seq , hash, ID for j
58Consensus Protocol Checkpoints
- Checkpoint to catch all nodes up to current state
- Messages used as proof (signed)
- Advance seq. space with each checkpoint
- During view change, new leader finds consistent
checkpoint to distribute.
59Optimizations and Implementation
- Messaging delays
- Tentative execution of requests
- Messaging overhead
- Only one replica actually replies with full
answer others with hash - Implement BFS
- Byzantine-fault-tolerant file system
60Practical?
- 133 MHz Alpha 21064
- 128 MB mem
- DEC RZ26 disks at each replica
- Tests done without view change / adversarial
elements
61Practical?
62But what about scale?
- So far
- Absolute security/safety guarantees
- Strict upper bound on faulty processes
- Byzantine in P2P?
- Limited view of entire system
- Exactly how many users can you trust?
- Issues with consensus (e.g. broadcast)
63LOCKSS 3 The Problem
- Persistent Distributed Storage
- Low cost nodes
- Cheap storage
- No long term secrets
- Long term guarantees
- Powerful and long term adversary
- e.g. Library journal storage
- Built as a P2P system
64LOCKSS The Solution
- Lots Of Copies Keep Stuff Safe
- Use inertia
- No need for speed
- Force proofs of computation
- Sybil attack
- Lengthen the adversarial commitment
- Voting among peers
65LOCKSS Lists
- Four lists of neighbors in LOCKSS
- Reference List
- Maintained
- Inner Circle
- From reference list, per poll
- Outer Circle
- Nominated by inner circle, per poll
- Friends List
- From outside the system
66LOCKSS Polling
67LOCKSS Neighbors
- Discovery process Outer Circle
- Use their votes to prove that they have that
content - Remove disagreeing inner circle and some agreeing
inner circle - Churn with friends list
68LOCKSS Benefits
- Benefits of the voting process
- Rate limited by loyal peers
- Encourages clusters of similar data
- Prevents freeriding and theft
- Byzantine tolerance without f!
- Probabilistic guarantee
69LOCKSS Simulation
- Up to 30 years
- 1000 peers
- 2 mins to hash an AU (archival unit)
- With 20 invitees it would take 6 hours
- Stealth Modification Adversary
- Controls some of peers at start
- Lurk phase Bias lists, gain foothold
- Attack phase try to corrupt data
70LOCKSS Results for lurking
71LOCKSS Results for lurking
72LOCKSS Results for damage
73LOCKSS Results for Churn
74Discussion LOCKSS
- In the presentation of LOCKSS
- ExpectedArchivalUnitHashTime 120s
- What would happen if?
- A malicious collective had significantly more
processing power than the average of the system. - ExpAUHashTime 5s?
75Discussion LOCKSS
- In the presentation of LOCKSS
- ExpectedArchivalUnitHashTime 120s
- What would happen if?
- A malicious collective had (or pretended to have)
significantly less than processing power than the
average of the system? - ExpAUHashTime timeout 1ms?
76Discussion LOCKSS
- In the presentation of LOCKSS
- ExpectedArchivalUnitHashTime 120s
- What would happen if?
- LOCKSS was deployed as-is 10 years after its
release? - ExpAUHashTime 120s / 2( years)
- Assumes a doubling of computing power/yr
- ExpAUHashTime 0.1171875s 117ms
77Discussion The Byzantine Generals
- In the beginning of this presentation, we began
with a problem of every general giving an initial
value and no coordinated leader.
78Discussion The Byzantine Generals
- How do we reach the end such that all loyal
generals agree on the same outcome?
79Discussion The Byzantine Generals
- Trivial Solution (Lamport et al., 1982)
- Run Byzantine Generals a total of n times, where
the chosen general is a different site each of
the n times. - Take the majority vote of the total of n rounds.
- Is there a more optimal solution?
80Discussion The Byzantine Generals
- The Byzantine generals problem is presented in
1 in terms of only two options attack or
retreat. What if we needed an agreed upon int?
81Discussion The Byzantine Generals
- The Byzantine Generals problem requires m rounds
to protect against m disloyal troops. - We could reduce the number of rounds if we could
somehow determine how much disloyalty exists in
the system. - Could we?
82Discussion The Byzantine Generals
- With a PKI (signed messages)
- Allows m n therefore, n rounds must be made
AND requires the overhead of a PKI. - Without a PKI (unsigned messages)
- Forces m lt n/3 therefore, only n/3 rounds and no
PKI. - Therefore, are there a significant number of
systems where a PKI-free system would be
desirable?
83Thanks!