WideArea Cooperative Storage with CFS

About This Presentation

Title:

WideArea Cooperative Storage with CFS

Description:

Decreases the number of message exchanges to O(log N) DHash/Chord Interface ... Lookup(blockID) List of node-ID, IP address finger table with node IDs, IP address ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 27

Provided by: robert908

Category:

more less

Transcript and Presenter's Notes

Title: WideArea Cooperative Storage with CFS

1
Wide-Area Cooperative Storage with CFS

Robert Morris
Frank Dabek, M. Frans Kaashoek,
David Karger, Ion Stoica
MIT and Berkeley

2
Target CFS Uses
node
node
node
Internet
node
node

Serving data with inexpensive hosts
open-source distributions
off-site backups
tech report archive
efficient sharing of music

3
How to mirror open-source distributions?

Multiple independent distributions
Each has high peak load, low average
Individual servers are wasteful
Solution aggregate
Option 1 single powerful server
Option 2 distributed service
But how do you find the data?

4
Design Challenges

Avoid hot spots
Spread storage burden evenly
Tolerate unreliable participants
Fetch speed comparable to whole-file TCP
Avoid O(participants) algorithms
Centralized mechanisms Napster, broadcasts
Gnutella
CFS solves these challenges

5
Why Blocks Instead of Files?

Cost one lookup per block
Can tailor cost by choosing good block size
Benefit load balance is simple
For large files
Storage cost of large files is spread out
Popular files are served in parallel

6
The Rest of the Talk

Software structure
Chord distributed hashing
DHash block management
Evaluation

7
CFS Architecture
client
server
client
server
Internet
node
node

Each node is a client and a server (like xFS)
Clients can support different interfaces
File system interface
Music key-word search (like Napster and Gnutella)

8
Client-server interface
Insert file f
Insert block
FS Client
server
server
Lookup block
Lookup file f
node
node

Files have unique names
Files are read-only (single writer, many readers)
Publishers split files into blocks
Clients check files for authenticity

9
Server Structure
DHash
DHash
Chord
Chord
Node 1
Node 2

DHash stores, balances, replicates, caches
blocks
DHash uses Chord SIGCOMM 2001 to locate blocks

10
Chord Hashes a Block ID to its Successor
N10
B112, B120, , B10
Block ID Node ID
N100
B100
Circular ID Space
N32
B11, B30
N80
B65, B70
N60
B33, B40, B52

Nodes and blocks have randomly distributed IDs
Successor node with next highest ID

11
Successor Lists Ensure Robust Lookup
10, 20, 32
N5
20, 32, 40
N10
5, 10, 20
N110
32, 40, 60
N20
110, 5, 10
N99
40, 60, 80
N32
N40
60, 80, 99
99, 110, 5
N80
N60
80, 99, 110

Each node stores r successors, r 2 log N
Lookup can skip over dead nodes to find blocks

12
Finger tables aids efficient lookup

For a m bit key, each node n has a finger table
with m entries.
Key of each entry increases in the power of two.
Decreases the number of message exchanges to
O(log N)

13
DHash/Chord Interface
Lookup(blockID)
List of ltnode-ID, IP addressgt
DHash
server
Chord
finger table with ltnode IDs, IP addressgt

lookup() returns list with node IDs closer in ID
space to block ID
Sorted, closest first

14
Replicate blocks at r successors
N5
N10
N110
N20
N99
Block 17
N40
N80
N50
N60
N68

Node IDs are SHA-1 of IP Address
Ensures independent replica failure

15
DHash Copies to Caches Along Lookup Path
N5
N10
N110
1.
N20
N99
2.
N40
4.
RPCs 1. Chord lookup 2. Chord lookup 3. Block
fetch 4. Send to cache
N80
N50
3.
N60
N68
Lookup(BlockID45)
16
Caching at Fingers Limits Load
N32

Only O(log N) nodes have fingers pointing to N32
This limits the single-block load on N32

17
Load Balance with virtual nodes
N60
N10
N101
N5
Node B
Node A

Hosts may differ in disk/net capacity
Hosts may advertise multiple IDs
Chosen as SHA-1(IP Address, index)
Each ID represents a virtual node
Host load proportional to v.n.s
Manually controlled can be made adaptive

18
Quotas

Malicious injection of large quantities of data
can use up all disk space
To prevent this, we have quotas for each
publisher
Eg. Only 2 storage space for requests from
particular IP address

19
Aging and Deletion

CFS deletes old blocks that have not been
refreshed recently to prevent aging
Publishers need to refresh their blocks if they
dont want it deleted by CFS

20
How things work?

Read Operation
To get the 1st block of /foo
get(public key) gt return Root-block
Read content-hash of foos inode
get(hash(foos inode)) gt return inode of foo
Read content-hash of 1st block from inode
get(hash(1st block)) gt return 1st block

21
Experimental Setup (12 nodes)
To vu.nl lulea.se ucl.uk
To kaist.kr, .ve

One virtual node per host
8Kbyte blocks
RPCs use UDP

Caching turned off
Proximity routing turned off

22
CFS Fetch Time for 1MB File
Fetch Time (Seconds)
Prefetch Window (KBytes)

Average over the 12 hosts
No replication, no caching 8 KByte blocks

23
Distribution of Fetch Times for 1MB
24 Kbyte Prefetch
40 Kbyte Prefetch
8 Kbyte Prefetch
Fraction of Fetches
Time (Seconds)
24
CFS Fetch Time vs. Whole File TCP
40 Kbyte Prefetch
Whole File TCP
Fraction of Fetches
Time (Seconds)
25
CFS Summary