Replica Placement Strategy for WideArea Storage Systems - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Replica Placement Strategy for WideArea Storage Systems

Description:

Detect permanent node failures and trigger data recovery. Final Presentation:3. Assumptions ... Cumulative number of triggered data recovery v. time ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 23
Provided by: tri563
Category:

less

Transcript and Presenter's Notes

Title: Replica Placement Strategy for WideArea Storage Systems


1
Replica Placement Strategy for Wide-Area Storage
Systems
  • Byung-Gon Chun and Hakim Weatherspoon
  • RADS Final Presentation
  • December 9, 2004

2
Environment
  • Store large quantities of data persistently and
    availably
  • Storage Strategy
  • Redundancy - duplicate data to protect against
    data loss
  • Place data throughout wide area for availability
    and durability
  • Avoid correlated failures
  • Continuously repair loss redundancy as needed
  • Detect permanent node failures and trigger data
    recovery

3
Assumptions
  • Data is maintained on nodes, in the wide area,
    and in well maintained sites.
  • Sites contribute resources
  • Nodes (storage, cpu)
  • Network - bandwidth
  • Nodes collectively maintain data
  • Adaptive - Constant change, Self-organizing,
    self-maintaining
  • Costs
  • Data Recovery
  • Process of maintaining data availability
  • Limit wide area bandwidth used to maintain data

4
Challenge
  • Avoiding correlated failures/downtime with
    careful data placement
  • Minimize cost of resources used to maintain data
  • Storage
  • Bandwidth
  • Maximize
  • Data availability

5
Outline
  • Analysis of correlated failures
  • Show that correlated failures exist - are
    significant
  • Effects of common subnet (admin area, geographic
    location, etc)
  • Pick a threshold and extra redundancy
  • Effects of extra redundancy
  • Vary extra redundancy
  • Compare random, random w/ constraint, and oracle
    placement
  • Show that margin between oracle and random is
    small

6
Analysis of PlanetLab Trace characteristics
  • Trace-driven simulation
  • Model maintaining data on PlanetLab
  • Create trace using all-pairs ping
  • Collected from February 16, 2003 to October 6,
    2004
  • Measure
  • Correlated failures v. time
  • Probability of k nodes down simultaneously
  • 5th Percentile, Median number of available
    replicas v. time
  • Cumulative number of triggered data recovery v.
    time
  • Jeremy Stribling http//infospect.planet-lab.org/
    pings

7
Analysis of PlanetLab II Correlated failures
8
Analysis I - Node characteristics
9

Analysis II- Correlated Failures
10

Correlated Failures
11

Correlated Failures (machine with downtime 1000 slots)
12

Availability Trace
13
Replica Placement Strategies
  • Random
  • RandomSite
  • Avoid to place multiple replicas in the same site
  • A site in PlanetLab is identified by 2B IP
    address prefix.
  • RandomBlacklist
  • Avoid to use machines, in blacklist, that are top
    k machines with long down time
  • RandomSiteBlacklist
  • Combine RandomSite and RandomBlacklist

14
Comparison of simple strategies(m1, th9, n14,
blacklist35)
15
Simulation setup
  • Placement Algorithm
  • Random vs. Oracle
  • Oracle strategies
  • Max-Lifetime-Availability
  • Min-Max-TTR, Min-Sum-TTR, Min-Mean-TTR
  • Simulation Parameters
  • Replication m 1, threshold th 9, total
    replicas n 15
  • Initial repository size 2TB
  • Write rate 1Kbps per node and 10Kbps per node
  • 300 storage nodes
  • System increases in size at rate of 3TB and 30TB
    per year, respective.
  • Metrics
  • Number of available nodes
  • Number of data repairs

16

Comparison of simple strategies(m1, th9)
17
Results - Random Placement(1Kbps)
18
Results - Oracle Max-Lifetime-Avail(1Kbps)
19
Results - Breakdown of Random (1Kbps)
20
Results - Random(10Kbps)
21
Results - Breakdown of Random (10Kbps)
22
Conclusion
  • There does exist correlated downtimes.
  • Random is sufficient
  • A minimum data availability threshold and extra
    redundancy is sufficient to absorb most
    correlation.
Write a Comment
User Comments (0)
About PowerShow.com