Evaluating a Defragmented DHT Filesystem - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Evaluating a Defragmented DHT Filesystem

Description:

Evaluating a Defragmented DHT Filesystem. Jeff Pang ... With parallel playback, the Defragmented suffers on the small number of very long tasks ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 23
Provided by: jeff9
Category:

less

Transcript and Presenter's Notes

Title: Evaluating a Defragmented DHT Filesystem


1
Evaluating a Defragmented DHT Filesystem
  • Jeff Pang
  • Phil Gibbons, Michael Kaminksy, Haifeng Yu,
    Sinivasan Seshan
  • Intel Research Pittsburgh, CMU

2
Problem Summary
  • TRADITIONAL DISTRIBUTED HASH TABLE (DHT)
  • Each server responsible for pseudo-random range
    of ID space
  • Objects are given pseudo-random IDs

324
987
160
211-400
401-513
150-210
800-999
3
Problem Summary
  • DEFRAGMENTED DHT
  • Each server responsible for dynamically balanced
    range of ID space
  • Objects are given contiguous IDs

320
321
322
211-400
401-513
150-210
800-999
4
Motivation
  • Better availability
  • You depend on fewer servers when accessing your
    files
  • Better end-to-end performance
  • You dont have to perform as many DHT lookups
    when accessing your files

5
Availability Setup
  • Evaluated via simulation
  • 250 nodes with 1.5Mbps each
  • Faultload PlanetLab failure trace (2003)
  • included one 40 node failure event
  • Workload Harvard NFS trace (2003)
  • primarily home directories used by researchers
  • Compare
  • Traditional DHT data placed using consisent
    hashing
  • Defragmented DHT data placed contiguously and
    load balanced dynamically (via Mercury)

6
Availability Setup
  • Metric failure rate of user tasks
  • Task(i,m) sequence of accesses with a
    interarrival threshold of i and max time of m
  • Task(1sec,5min) sequence of accesses that are
    spaced no more than 1 sec apart and last no more
    than 5 minutes
  • Idea capture notion of useful unit of work
  • Not clear what values are right
  • Therefore we evaluated many variations

lt1sec
lt1sec
5min

Task(1sec,)
Task(1sec,5min)
7
Availability Results
  • Failure rate of 5 trials
  • Lower is better
  • Note log scale
  • Missing bars have 0 failures
  • Explanation
  • User tasks access 10-20x fewer nodes in the
    defragmented design

8
Performance Setup
  • Deploy real implementation
  • 200-1000 virtual nodes with 1.5Mbps (Emulab)
  • Measured global e2e latencies (MIT King)
  • Workload Harvard NFS
  • Compare
  • Traditional vs Defragmented
  • Implementation
  • Uses Symphony/Mercury DHTs, respectively
  • Both use TCP for data transport
  • Both employ a Lookup Cache remembers recently
    contacted nodes and their DHT ranges

9
Performance Setup
  • Metric task(1sec,infinity) speedup
  • Task t takes 200msec in Traditional
  • Task t takes 100msec in Defragmented
  • speedup(t) 200/100 2
  • Idea capture speedup for each unit of work that
    is independent of user think time
  • Note 1 second interarrival threshold is
    conservative gt tasks are longer
  • Defragmented does better with shorter tasks(next
    slide)

10
Performance Setup
  • Accesses within a task may or may not be
    inter-dependent
  • Task (A,B,)
  • App. may read A, then depending on contents of A,
    read B
  • App. may read A and B regardless of contents
  • Replay trace to capture both extremes
  • Sequential - Each access must complete before
    starting the next (best for Defragmented)
  • Parallel - All accesses in a task can be
    submitted in parallel (best for Traditional)
    caveat limited to 15 outstanding

11
Performance Results
12
Performance Results
  • Other factors
  • TCP slow start
  • Most tasks are small

13
Overhead
  • Defragmented design is not free
  • We want to maintain load balance
  • Dynamic load balance gt data migration

14
Conclusions
  • Defragmented DHT Filesystem benefits
  • Reduces task failures by an order of magnitude
  • Speeds up tasks by 50-100
  • Overhead might be reasonable 1 byte written
    1.5 bytes transferred
  • Key assumptions
  • Most tasks are small to medium sized (file
    systems, web, etc. -- not streaming)
  • Wide area e2e latencies are tolerable

15
Tommy Maddox Slides
16
Load Balance
17
Lookup Traffic
18
Availability Breakdown
19
Performance Breakdown
20
Performance Breakdown 2
  • With parallel playback, the Defragmented suffers
    on the small number of very long tasks

ignore - due to topology
21
Maximum Overhead
22
Other Workloads
Write a Comment
User Comments (0)
About PowerShow.com