Byzantine Fault Isolation in the Farsite Distributed File System - PowerPoint PPT Presentation

About This Presentation
Title:

Byzantine Fault Isolation in the Farsite Distributed File System

Description:

Title: Slide 1 Author: John Douceur Last modified by: John Douceur Created Date: 2/15/2006 5:52:51 PM Document presentation format: On-screen Show Company – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 38
Provided by: JohnDo77
Category:

less

Transcript and Presenter's Notes

Title: Byzantine Fault Isolation in the Farsite Distributed File System


1
Byzantine Fault Isolation in the Farsite
Distributed File System
  • John R. Douceur and Jon Howell

2
Definitions
Farsite \'fär-sit\ n (2000) serverless
distributed file system developed at Microsoft
Research, designed to be scalable, strongly
consistent, and secure despite running on an
untrusted infrastructure of desktop PCs
3
Talk Outline
  • Context Farsite system
  • Why BFT doesnt scale
  • Farsites use of multiple BFT groups
  • The need for isolating Byzantine faults
  • Formal system specification
  • BFI in Farsite

4
Farsite System
client
server
server
client
server
5
Farsite System
Metadata
metadata
users
BFT group
clients
6
Farsite System
Metadata
T tolerable faults
R count of replicas
R gt 3 T
  • Using Byzantineagreement protocol,assign
    sequencenumbers to messages
  • Prepare-commitamong 2 T 1 servers
  • Deterministicallyupdate metadata
  • Reply to client

users
BFT group
clients
7
The Cost of BFT Groups
computation
? 1
? 4
message delays
5
2
messages
2
32
8
Throughput vs. Scale
7
6
5
4
throughput multiple
3
2
1
0
1
2
3
4
5
6
7
machine count
ideal
typical
flat
BFT
9
Workload Sharing
Workload
client
server
10
BFT at Scale
11
Multiple BFT Groups
12
Tree of BFT Groups
13
Tree of BFT Groups
/
users
public
cruft
emacs
Alice
Bob
vi
Outlook
docs
code
C
C
Proj X
foo
bar
src
bin
src
bin
14
Delegation to New Group
/
users
public
cruft
emacs
Alice
Bob
vi
Outlook
docs
code
C
C
Proj X
foo
bar
src
bin
src
bin
15
Pathname Resolution
/users/Alice/code/C/bar
16
Machine Failures at Scale
17
Group Failures at Scale
18
System Failure at Scale
19
Quantitative Fault Analysis
  • Example system
  • File system distributed among interacting BFT
    groups
  • Simplifying assumptions
  • Files are partitioned evenly among BFT groups
  • Machine failures are independent
  • Machine fault probability 0.001
  • Evaluate operational fault rate
  • Probability that an operation on a randomly
    selected file exhibits a fault

20
Operational Faults vs. System Scale
operational fault rate
1
10
100
1,000
10,000
100,000
system scale (count of BFT groups)
BFT 4, no BFI
BFT 7, no BFI
BFT 10, no BFI
BFT 4, ideal BFI
BFT 4, tree (4) BFI
BFT 4, tree (16) BFI
21
BFI versus no BFI
22
BFI versus no BFI
4-member BFT groups with BFI
10-member BFT groups without BFI
computation
? 4
? 10
messages
200
32
throughput reduction
60
84
23
BFI via Formal Specification
state
state
actions
actions
faults
faults
distributedsystemspec
semanticspec
24
Farsite Semantic Spec
/
tools
code
C
emacs
src
bin
a.h
a.cpp
a.exe
cl.exe
a.obj
read
open
move
open handles
pending operations
25
Farsite Distributed-System Spec
26
Farsite Refinement
del
27
Actions are State Transitions
/
a.cpp
openhandles
pending operations
28
Proving Refinement Inductively
/
a.cpp
openhandles
pending operations
29
Refinement with Byzantine Faults
30
Refinement with Byzantine Faults
/
tools
code
C
emacs
src
bin
a.h
a.cpp
a.exe
cl.exe
a.obj
read
del
move
open handles
pending operations
31
Semantic Fault Specification
  • Safety
  • A tainted file may have arbitrary contents and
    attributes
  • A tainted file may appear not linked into
    namespace
  • A tainted file may pretend not to have children
    it actually has
  • A tainted file may pretend to have children that
    do not exist
  • A tainted file may pretend another tainted file
    is a child or parent
  • Liveness
  • Operations involving a tainted file may not
    complete

A tainted file may have arbitrary contents and
attributes
A tainted file may appear not linked into
namespace
A tainted file may pretend not to have children
it actually has
A tainted file may pretend to have children that
do not exist
A tainted file may pretend another tainted file
is a child or parent
Operations involving a tainted file may not
complete
/
Hello world
,,)() 19x o . 2
_at__at_)
,. ,. \--/ " "
,". ltogt _ ltogt / _ .Y. _
_/ ----' \_ / \ / \ / (
) y \ ! ! / ,-.i i i
i,-. (!!( V )!!) -'-'--'-'-
code
tools
emacs
src
bin
C
foo
bar
a.h
a.cpp
a.exe
a.obj
cl.exe
32
Distributed-System Improvements
Maintain redundant info across BFT group
boundaries
  • Maintain redundant info across BFT group
    boundaries
  • Augment messages with info that justifies
    correctness
  • Ensure unambiguous chains of authority over data
  • Carefully order messages and state updates for
    operations involving multiple BFT groups

Augment messages with info that justifies
correctness
Ensure unambiguous chains of authority over data
Carefully order messages and state updates
foroperations involving multiple BFT groups
33
Summary of BFI Methodology
  • Formally specify your system
  • Semantic spec users view of system
  • Distributed-system spec designers view of
    system
  • Refinement interprets distributed-system spec in
    semantic terms
  • Modify distributed-system spec to express
    Byzantine faults
  • Simultaneously
  • Strategically weaken semantic spec to describe
    faults
  • Improve distributed-system spec to quarantine
    faults
  • Refinement lets you know when you are done

34
Conclusions
  • BFT groups have negative throughput scaling
  • Scalable systems can be built from multiple BFT
    groups
  • System scale increases the probability of
    non-maskable Byzantine faults
  • If faults are not isolated, a single faulty group
    can corrupt the entire system.
  • BFI is a methodology for isolating Byzantine
    faults
  • BFI uses formal system specification
  • Improves fault tolerance without hurting
    throughput, unlike increasing BFT group size

35
Contact Information
  • JohnDo_at_microsoft.com
  • Howell_at_microsoft.com
  • http//research.microsoft.com/farsite

36
Backup Slides
37
Farsite Spec Stats
  • Semantic specification
  • 1800 lines of TLA
  • 114 definitions
  • Distributed-system specification
  • 11,500 lines of TLA
  • 775 definitions
  • Why so big?
  • Windows file-system semantics are complex
  • Scalability and strong consistency
  • Byzantine fault isolation
Write a Comment
User Comments (0)
About PowerShow.com