Presentazione di PowerPoint - PowerPoint PPT Presentation

About This Presentation
Title:

Presentazione di PowerPoint

Description:

pic=1 corresponds to a server blatantly exposing itself as faulty ... Let pic denote the probability with which a faulty server returns an incorrect ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 32
Provided by: Giovann61
Category:

less

Transcript and Presenter's Notes

Title: Presentazione di PowerPoint


1
Re-Configurable Byzantine Quorum System
Lei Kong S. Arun Mustaque Ahamad Doug Blough
2
Related Work
  • Byzantine Quorum System D. Malkhi and M.
    Reiter Increase read/write quorum intersection
    size to tolerate arbitrary server failures.
  • Dynamic Byzantine Quorum System L. Alvisi, D.
    Malkhi, E. Pierce M. Reiter and R. N. Wright
    define fault resilience threshold b as an
    variable, so it can be adjusted dynamically.
  • Fault Detection for Byzantine Quorum Systems
    Alvisi, Malkhi, Pierce and Reiter, IEEE Trans. on
    Parallel and Distributed Systems, Sept. 2001.

3
Our Contribution
  • Explicitly add fault detection mechanism into the
    system, and remove faulty servers for smaller
    quorum size and lighter system load.
  • Proxy scheme enables servers to monitor each
    other instead of using clients to monitor
    servers.
  • A new statistical fault detection technique.

4
System Model
  • Server failure model is Byzantine failure, the
    number of concurrent failures is assumed to be in
    the range bmin,bmax.
  • The data service protocol is based on threshold
    masking quorum system, and but our
    re-configurable approach could be applied to
    other types of quorum systems.
  • Quorum data operations are done through proxy
    servers.
  • Communication channels are assumed to be
    asynchronous but reliable.

5
System Architecture
Distributed store
P
P
Server fault detection
Clients
6
Quorum Variables
  • Four quorum variables N, B, Qmin and S are
    defined.
  • Defining system size a variable makes it possible
    to remove faulty nodes out of the system.
  • Suppose Q(V) stands for the number servers that
    are currently in the system and belong to the
    quorum of the most recently finished write of V,
    then Qmin is the minimum value of Q(V) for all
    data objects in the system.
  • S is an boolean array used to indicate server
    status.
  • Quorum variable operations use a static quorum
    setting.

7
Read/Write Quorum Variables
  • Read Protocol
  • Read from a quorum of 3bmax1 servers
  • Return the value that is returned by at least
    bmax1 servers and is not countermanded, return
    error if such a pair doesnt exist.
  • Write Protocol
  • Write to a quorum of n- bmax servers
  • Write quorum size n-bmax, read quorum size
    3bmax1,
  • then the intersection size is at least 2bmax1.

8
Read Data Object V
  • 1. The client executes a read on quorum
    variables
  • 2. The client randomly chooses a server as the
    proxy and choose a read quorum of size
    n2b1-qmin
  • 3. The client sends the read request and the
    chosen quorum to the proxy
  • 4. The proxy first read from nbbmin1-qmin
    servers in the chosen quorum and forwards results
    to the client

9
Read Data Object V - continue
  • 5. Among all pairs returned by at least bmin1
    servers, if the one with the highest timestamp
    doesnt have at least b1 representatives, the
    proxy read from the rest of the servers in the
    read quorum and forwards results to the client.
  • 6. The client chooses pairs returned by at least
    bmin1 servers, if the one with the highest
    timestamp has at least b1 representatives, then
    return it, otherwise restart from step 2.

10
Write Data Object V
  • 1. The client executes a read on V to get the
    current timestamp of V and current quorum
    variable values
  • 2. If read quorum size increases according to
    quorum variable values received in step one, then
    read quorum variable values from all server
    nodes, until at least n-bmax servers return the
    same values
  • 3. The client generate a new timestamp for V
  • 4. The client chooses its proxy and write quorum
    of size (n2b1)/2

11
Write Data Object V - continue
  • 5. The client sends the write request and the
    write quorum to the proxy.
  • 6. The proxy writes to the servers in the write
    quorum and forwards back server confirmations
    back to the client
  • 7. The client check server confirmations, restart
    from step 2 if error detected.

12
Quorum Sizes for Data Objects
  • Write quorum size for data objects (n2b1)/2
  • Read quorum size for data objects n2b1-qmin,
    and qmin intersection size is at least 2b1.

13
Misc.
  • Message authentication code is used to protect
    message integrity, faulty proxy servers could
    only drop messages, they cannot tamper with them.
  • Proxy nodes do not forward client read request
    MACs to server nodes, which makes it feasible for
    server nodes to use explicit testing on each
    other.
  • Reducing the overhead of reading quorum
    variables cached them on client side.

14
Simulation - Plot of n vs. b
15
Simulation - Comparison of Read Quorum Size
16
Simulation - Comparison of Write Quorum Size
17
Simulation -Comparison of Workload
(r.Qrw.Qw)/n
18
Fault Detection and Diagnosis
  • Identify faulty servers thereby enabling their
    removal
  • Fault detection done by monitoring a servers
    responses to read requests over time
  • Fault detection probability close to 1 for a wide
    range of pic and very low false alarm probability
  • To avoid detection, a faulty server must operate
    as though it were a correct server

19
Fault Detection Algorithm
  • Observe correct responses over several read
    operations returned by each server
  • Two-tiered diagnosis proxy-node level and
    diagnosis-node level
  • Faulty servers try to avoid getting detected
  • If a faulty server has the correct value for a
    variable, it returns an incorrect value with
    probability pic
  • pic1 corresponds to a server blatantly exposing
    itself as faulty
  • pic0 corresponds to a correct server

20
Analysis based on Hypothesis Testing
  • Probability a correct server returns u correct
    responses in r read operations is rCu (Qw/n)u
    (1-Qw/n)r-u , where Qw is the size of the
    write quorum and n is the total number of servers
    in the system.

21
Plot for n50, Qw34, r10000
22
Plot for n50, Qw34, r1000
23
Plot for n50, Qw34, r100
24
Faulty Server Modeling and Analysis
  • If 0.05 is the false alarm probability tolerated
    at the proxy-node level, then let uth be the
    maximum value of u such that Sui0 rCi (Qw/n)i
    (1-Qw/n)r-i 0.05.
  • A server is diagnosed as faulty if it returns uth
    or fewer correct responses in r read
    operations.
  • Let pic denote the probability with which a
    faulty server returns an incorrect response when
    it has the correct value for the data object
    being read.
  • Probability that a faulty server will be detected
    in r read operations is Suthi0 rCi
    ((1-pic)Qw/n)i (1-Qw/npicQw/n)r-i

25
Faulty Server Detection Prob. for n50,Qw34
26
Prob. Faulty Server is Undetected, n50, Qw34
27
Faulty Server Det. Prob. for different Qw, n50
28
Fault Detection Algorithm at the Diagnosis Node
  • Proxy nodes report detected state of servers
    (faulty or not-faulty) to the diagnosis node
    every r read operations
  • If at any time a server has been found to be
    faulty by m servers, then that server is
    diagnosed as faulty and removed from the system
  • Choice of m depends on the desired final false
    alarm probability

29
Diagnosis-Node Fault Detection Algorithm (contd)
  • If the final desired false alarm probability is
    10-4, then let m2 be the smallest m1 such that
    n-bmaxCm1 0.05m1 (1-0.05)n-bmax-m1 10-4 , where
    0.05 is the false alarm probability at the
    proxy-node level.
  • Then m is given by m2bmax.
  • An assumption The behavior of faulty servers is
    independent of the identity of the proxy servers.

30
More Algorithm Specifications and Features
  • Data servers keep track of the Qw/n ratio for
    each object they store and return this value
    along with the data object during a read.
  • Several values of r considered to increase
    chances of detecting faulty servers at the
    earliest.
  • Statistical analysis over several read operations
    tolerates low levels of concurrency between reads
    and writes.
  • Above analysis holds only if write quorums are
    chosen randomly. Analysis can be easily modified
    to suit other quorum-picking strategies.

31
Future Work
  • Make use of background dissemination, explore
    possible approaches to adjust Qmin termination
    determination, objects groups.
  • Allow new nodes to be added into the system, add
    replacements for removed faulty servers.
Write a Comment
User Comments (0)
About PowerShow.com