Title: Prophecy: Using History for High-Throughput Fault Tolerance
1Prophecy Using History for High-Throughput Fault
Tolerance
- Siddhartha Sen
- Joint work with Wyatt Lloyd and Mike Freedman
- Princeton University
2Non-crash failures happen
Model as Byzantine (malicious)
3Mask Byzantine faults
Service
Clients
4Mask Byzantine faults
Throughput
Clients
Replicated service
5Mask Byzantine faults
Throughput
Linearizability (strong consistency)
Clients
Replicated service
6Byzantine fault tolerance (BFT)
- Low throughput
- Modifies clients
- Long-lived sessions
7Prophecy
- High throughput good consistency
- No free lunch
- Read-mostly workloads
- Slightly weakened consistency
8Byzantine fault tolerance (BFT)
- Low throughput
- Modifies clients
- Long-lived sessions
D-Prophecy
Prophecy
9Traditional BFT reads
application
Agree?
Clients
Replica Group
10A cache solution
cache
application
Agree?
Clients
Replica Group
11A cache solution
cache
application
- Problems
-
- Huge cache
- Invalidation
Agree?
Clients
Replica Group
12A compact cache
cache
application
Requests Responses
req1 resp1
req2 resp2
req3 resp3
Clients
Replica Group
13A compact cache
cache
application
Requests Responses
sketch(req1) sketch(resp1)
sketch(req2) sketch(resp2)
sketch(req3) sketch(resp3)
Requests Responses
Clients
Replica Group
14A sketcher
sketcher
application
Clients
Replica Group
15A sketcher
sketch
webpage
Clients
Replica Group
16Executing a read
sketch
webpage
Agree?
- Fast, load-balanced reads
Clients
Replica Group
17Executing a read
sketch
webpage
Agree?
Clients
Replica Group
18Executing a read
sketch
webpage
key-value store
replicated state machine
Clients
Replica Group
19Executing a read
sketch
webpage
Agree?
Maintain a fresh cache
Clients
Replica Group
20NO!
Did we achieve linearizability?
21Executing a read
sketch
webpage
Clients
Replica Group
22Executing a read
sketch
webpage
Agree?
Clients
Replica Group
23Executing a read
sketch
webpage
Agree?
Fast reads may be stale
Clients
Replica Group
24Load balancing
sketch
webpage
Agree?
Pr(k stale) gk
Clients
Replica Group
25D-Prophecy vs. BFT
- Traditional BFT
- Each replica executes read
- Linearizability
- D-Prophecy
- One replica executes read
- Delay-once linearizability
26Byzantine fault tolerance (BFT)
- Low throughput
- Modifies clients
- Long-lived sessions
D-Prophecy
Prophecy
27Key-exchange overhead
11
3
28Internet services
Clients
Replica Group
29A proxy solution
Consolidate sketchers
Clients
Replica Group
30A proxy solution
Sketcher must be fail-stop
Clients
Trusted
Replica Group
31A proxy solution
Sketcher must be fail-stop
- Trust middlebox already
- Small and simple
Clients
Trusted
Replica Group
32Executing a read
Prophecy
Fast, load-balanced reads
q
Clients
Trusted
Req Resp
s(q)
??? ???
Replica Group
33Prophecy
Fast reads may be stale
Clients
Trusted
Req Resp
s(q)
??? ???
Replica Group
34Delay-once linearizability
35Delay-once linearizability
Read-after-write property
? W, R, W, W, R, R, W, R ?
36Delay-once linearizability
Read-after-write property
? W, R, W, W, R, R, W, R ?
37Example application
- Upload embarrassing photos
- 1. Remove colleagues from ACL
- 2. Upload photos
- 3. (Refresh)
- Weak may reorder
- Delay-once preserves order
38Byzantine fault tolerance (BFT)
- Low throughput
- Modifies clients
- Long-lived sessions
D-Prophecy
Prophecy
39Implementation
- Modified PBFT
- PBFT is stable, complete
- Competitive with Zyzzyva et. al.
- C, Tamer async I/O
- Sketcher ?2000 LOC
- PBFT library ?1140 LOC
- PBFT client ?1000 LOC
40Evaluation
- Prophecy vs. proxied-PBFT
- Proxied systems
- D-Prophecy vs. PBFT
- Non-proxied systems
41Evaluation
- Prophecy vs. proxied-PBFT
- Proxied systems
- We will study
- Performance on null workloads
- Performance with real replicated service
- Where system bottlenecks, how to scale
42Basic setup
(concurrent)
Clients (100)
Replica Group (PBFT)
43Fraction of failed fast reads
Alexa top sites lt 15
44Small benefit on null reads
45Apache webserver setup
Clients
Replica Group
46Large benefit on real workload
3.7x
2.0x
47Benefit grows with work
94?s (Apache)
Null workloads are misleading!
48Benefit grows with work
49Single sketcher bottlenecks
50Scaling out
51Scales linearly with replicas
52Summary
- Prophecy good for Internet services
- Fast, load-balanced reads
- D-Prophecy good for traditional services
- Prophecy scales linearly while PBFT stays flat
- Limitations
- Read-mostly workloads (meas. study corroborates)
- Delay-once linearizability (useful for many apps)
53Thank You
54Additional slides
55Transitions
- Prophecy good for read-mostly workloads
- Are transitions rare in practice?
56Measurement study
- Alexa top sites
- Access main page every 20 sec for 24 hrs
57Mostly static content
58Mostly static content
15
59Dynamic content
- Rabin fingerprinting on transitions
- 43 differ by single contiguous change
- Sampled 4000 of them, over half due to
- Load balancing directives
- Random IDs in links, function parameters