Title: Zeno: Eventually Consistent Byzantine Fault Tolerance
1Zeno Eventually Consistent Byzantine Fault
Tolerance
Atul Singh, Pedro Fonseca, Petr Kuznetsov,
Rodrigo Rodrigues, Petros Maniatis MPI-SWS
Rice University TU Berlin/DT Labs
Intel Research Berkeley
1. E-commerce Storage Systems
2. Byzantine Fault Tolerance (BFT)?
- High reliability and availability requirements
- Replicate state across
- Ensure high availability even when few replicas
are reachable - Most deployed systems only tolerate crash faults
- But arbitrary (Byzantine) faults happen (e.g., S3
outage caused by simple bit flip)?
- Requires 3f1 total replicas to tolerate f
Byzantine faults - Existing BFT protocols provide strong consistency
- But provide low availability (e.g., under network
partitions)? - Since BFT protocols require 2/3 replicas to be
available
We need a solution to achieve high availability
and Byzantine fault tolerance
3. Key Idea Relax Consistency for High
Availability
- Semantics inspired by Amazon's Dynamo SOSP07
- Used to store shopping cart state
- Needs to be responsive and reliable despite
faults - Example
- Add to BFT a new kind of operations (weak
operations)? - Two types of operations provided to the
application - Strong operations, for strong consistency
- Weak operations, for high availability
- Strong operation are similar to traditional BFT
operations - Provide abstraction to single correct server
- Weak operations observe eventual consistency
- May miss effects of some concurrent operations
- Never lost, eventually committed
- Application developer decides if an operation is
strong or weak
Fault!
Put( )?
Time
Put( )?
Put( )?
Put( )?
Immediately gets a stale response
Eventually gets consistent response
To achieve high availability and robustness (BFT)
we propose the Zeno protocol
- Key challenges
- Weak View Change
- Requires only a Weak Quorum
- Conflict detection
- Replicas periodically compare histories
- Conflict resolution for weak operations
- May require roll-back and re-execution
4. Zeno Highly Available BFT Protocol
- Weak operations use a smaller quorum
- Primary-backup protocol (3f1 total replicas)?
- Strong operations require 2f1 available replicas
(strong quorum) - Weak operations require only f1 available
replicas (weak quorum)? - Weak quorums are sufficient to provide eventual
consistency - Correct replica ensures state is based on past
operations - Guarantees eventual propagation to a strong
quorum - Evaluation shows high availability, good
performance, and reasonable merge cost
weak
strong
Exec
Exec
Detect and merge concurrent histories
Exec
Weak view change
weak
weak
5. Conclusions
- Zeno protocol explores a new point in the design
space of fault tolerance protocols - Tolerates Byzantine faults
- But also provides high availability by
sacrificing consistency - Future work
- Explore other weak forms of consistency