CFS - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

CFS

Description:

Claims to avoid O(log N) consistent hashing imbalance using virtual servers. However, the number of virtual servers on a machine is a local administration decision ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 13
Provided by: vasileg
Category:
Tags: cfs | servers

less

Transcript and Presenter's Notes

Title: CFS


1
CFS
  • MIT Laboratory of Computer Science
  • Frank Dabek, M. Frans Kaashoek,David Karger,
    Robert Morris, Ion Stoica
  • UMD CMSC818L presentation by Vasile Gaburici

2
CFS Overview
  • CFS Cooperative File System
  • Wants to address all peer-to-peer distributed
    file system problems
  • Has three layers
  • FS a read-only file system interface to programs
  • DHash a reliable distributed block storage
  • Chord to find blocks
  • Data is published using a special application

3
DHash Features
  • Load balance
  • blocks of large files are split amongst nodes
  • small files are cached on Chord likely paths
  • Fault tolerance
  • replicates each block
  • Controls the amount of data
  • that each server may inject (weak quotas)
  • replicated at each server (via virtual servers)

4
Additions to Chord
  • Server selection
  • reduces lookup latency by preferentially
    connecting to nearby servers instead of jumping
    as far as possible on the ring
  • a cost estimate is calculated for each server in
    the finger list
  • C(ni) di Edones ((ni-key) gtgt (160-log N))
  • Node ID authentication
  • ID hash(IP virtual node index)
  • finger updates query the IP and verify the hash

5
DHash Layer
  • Key design idea split each file in many blocks
    and distribute them across many servers
  • bandwidth require to make the request is small
    compared to bandwidth required to receive the
    file
  • prefetches hide the additional latency
  • avoids hot spots storing large or popular files
    in one place
  • large but unpopular files are slower to get
  • small files are cached along Chord paths

6
Replication
  • Each block is replicated on k servers
  • These servers are chosen amongst the r Chord
    successors of a node (r ? k)
  • these servers are not likely to be close to each
    other in the real network
  • A client fetches the entire list of servers that
    hold a given block, then gets the block from the
    server with the lowest latency

7
Caching
  • Necessary to avoid hot spots on small files
  • Each node sets aside a fixed amount of disk
    storage for its cache
  • After a successful lookup, the client sends the
    block to be cached to the nodes on the path
    traversed by the lookup
  • Since hops get shorter and shorter closer to the
    destination, lookups tend to visit the same set
    of servers closer to the destination
  • LRU is the policy for replacement

8
Load Balance
  • Claims to avoid O(log N) consistent hashing
    imbalance using virtual servers
  • However, the number of virtual servers on a
    machine is a local administration decision
  • To avoid additional network hops virtual servers
    on the same machine look at each others
    routing tables

9
Weak Quotas
  • Avoids injection of data that could fill up the
    file system
  • Reliable identification of publishers is avoided
    as it requires a central authority
  • Instead, a limit based on IP of the publisher is
    imposed
  • To avoid IP forging, publisher machines must
    respond to a random nonce

10
Updates and Deletion
  • CFS stores two types of blocks
  • content blocks
  • signed root blocks
  • A content block is accepted by a server if the
    SHA-1 hash of the block matches the key
  • A root block must be signed by a public key whose
    SHA-1 hash matches the blocks CFS key
  • There is no delete operation
  • blocks simply expire after a given interval
  • the publisher must update them to prevent this

11
Experimental Results
  • Internet
  • FTP-like performance but with a narrower
    dispersion of times more repeatable performance
  • Controlled experiment
  • all test ran in loopback on one machine
  • most results are close to their expected value
  • for 6 replicas per block 20 server failure
    produces no lookup failures

12
Comments Questions
  • Limitations
  • does not address anonymity
  • does not cope with malicious participants
  • Lack of locality in Chord had to be fixed
Write a Comment
User Comments (0)
About PowerShow.com