Trends and Challenges in LargeScale Data Center Availability - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Trends and Challenges in LargeScale Data Center Availability

Description:

disk, network, power supply. software/OS bugs. human errors. One ... Automatic failure detection and failure recovery. many parameters. how to tune them? ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 14
Provided by: spea60
Category:

less

Transcript and Presenter's Notes

Title: Trends and Challenges in LargeScale Data Center Availability


1
Trends and Challenges in Large-Scale Data Center
Availability
  • Wei ChenMicrosoft Research Asia

2
Data Center Availability
  • Highly-available data center technology is
    critical to global companies and IT industries,
    as well as to average consumers.
  • Design of HA technology for large-scale data
    centers raises new challenges to industrial
    research and academia.

My top-ten list for trends and challenges in
large-scale data center availability
3
No.10 Around-the-clock data access
  • Demand from global companies and global internet
    services
  • The sun is always up somewhere in the world
  • Availability goals
  • Four-nine availability 50 min down time per year
  • Five-nine availability 5 minutes down time per
    year

4
No.9. Cost is king!
  • Demand to drive down total cost of ownership
    (TCO)
  • hardware cost
  • software cost
  • operation cost
  • human cost
  • electricity cost

5
No.8. Very large scale
  • Petabyte storage --- Growing digitization trend
  • Built on cheap machines --- tens or hundreds of
    thousands of them
  • large storage
  • fast parallel computation (search and mining)

6
No.7. Failures are norms, not exceptions
  • Failures everywhere
  • disk, network,
  • power supply
  • software/OS bugs
  • human errors
  • One machine fails once every year ? One out of
    10,000 machines fails every 50 min.
  • Fault tolerance should be first-class concern,
    not an after-thought

7
No.6. New replication schemes, new replica
placement schemes
  • Replication schemes need to consider
  • performance optimizations
  • data access patterns
  • flexible consistency models
  • e.g. Niobe
  • Replica placement schemes --- greatly affect
    data availabilityand reliability ICDCS05,
    SRDS07
  • sequential vs. random placement
  • network topology aware placement

8
No.5. Proven guarantees.
  • Can you prove your protocol?
  • manual proof
  • computer aided proof
  • Model checking
  • Distributed debuggingWiDS, NSDI07

9
No.4. Quantify the tradeoffs
  • tradeoffs between availability, performance, and
    cost.
  • Higher availability ? sophisticated replications
    ? lower performance ? higher cost
  • HotDep07 dependability, access diversity, low
    cost pick two
  • SRDS07 Analytical framework for reliability, a
    component for studying the tradeoff

10
No.3. Autonomous management
  • Automatic failure detection and failure recovery
  • many parameters
  • how to tune them?
  • how are they translated to ultimate service level
    guarantee?
  • Delay the involvement of human assistance
  • 24x7 availability with 8x5 maintenance

11
No.2. Not local, go global!
  • Geographically distributed data centers
  • disaster tolerance
  • load balancing
  • enable caching, edge computing

12
No.1. Be friendly!
  • To err is human!
  • The biggest challenge is perhaps designing
    technology that meets all previous challenges,
    yet still human-friendly, easy to use, and able
    to reduce human errors
  • Need to work with People, Policy and Process

13
Final Remarks
  • The problems/challenges are not new, but the data
    center environment is more demanding
  • How to get it to work with security and privacy?

14
No.3. Redundancy, everywhere!
  • No just data replication
  • Redundancy in
  • data storage
  • computing servers
  • networking
  • power supply
Write a Comment
User Comments (0)
About PowerShow.com