SEDCL: Stanford Experimental Data Center Laboratory - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

SEDCL: Stanford Experimental Data Center Laboratory

Description:

SEDCL: Stanford Experimental Data Center Laboratory Tackle Data Center Scaling Challenges with Stanford s research depth and breadth ... – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 15
Provided by: simulaSta
Category:

less

Transcript and Presenter's Notes

Title: SEDCL: Stanford Experimental Data Center Laboratory


1
SEDCLStanford Experimental Data Center
Laboratory

2
Tackle Data Center Scaling Challenges
with Stanfords research depth
and breadth

3
Data Center Scaling
  • A network of data centers and web services are
    the key building blocks for future computing
  • Factors contributing to data center scaling
    challenges
  • Explosive growth of data with no locality of any
    kind
  • Legal requirement to backup data in
    geographically-separated locations---big concern
    for financial industry
  • Emergence of mobile and Cloud Computing
  • Massive interactive web application
  • Energy as a major new factor and constraint
  • Increasing capex and opex pressures
  • Continued innovations critical to sustain growth

4
Stanford Research Themes
  • RAMCloud main-memory based persistent storage
  • Extremely low latency RPC
  • Networking
  • Large, high-bandwidth, low-latency network fabric
  • Scalable, error-free packet transport
  • Software defined data center networking with
    OpenFlow
  • Servers and computing
  • Error and failure resilient design
  • Energy aware and energy proportional design
  • Virtualization and mobile VMs

5
Major research topics of SEDCL
  • RAMCloud Scalable DRAM-based Storage
  • Scalable nvRAM
  • All data in DRAMs all the time
  • Interconnect fabric
  • Bufferless networks low-latency, high-bandwidth
    network
  • Packet transport
  • Reliable delivery of packets R2D2L2.5
  • Congestion management QCN (IEEE 802.1Qau),
    ECN-HAT, DCTCP
  • Programmable bandwidth partitioning for
    multi-tenanted DCs AF-QCN
  • Low-latency 10GBaseT
  • Related projects
  • OpenFlow
  • Energy aware and energy proportional design

6
Experimentation is Key to Success
  • Many promising ideas and technologies
  • Will need iterative evaluation at scale with real
    applications
  • Interactions of subsystems and mechanisms not
    clear
  • Experimentation best way to understand the
    interactions
  • Difficult to experiment with internal mechanisms
    of a DC
  • No experimental facilities and that is a big
    barrier to innovations
  • Ongoing efforts to enable experimentation
  • Facebook, Microsoft, NEC, Yahoo!, Google, Cisco,
    Intel,

7
Overview of Research Projects
  • RAMCloud
  • Packet transport mechanisms
  • Reliable and reliable data delivery R2D2L2.5
  • ECN-HAT, DCTCP collaboration with Microsoft
  • Data center switching fabric
  • Extremely low latency, low errors and congestion
    (bufferless)
  • High port density with very large bisection
    bandwidth
  • ? project just initiated

8
RAMCloud OverviewLead John Ousterhout
  • Storage for datacenters
  • 1000-10000 commodity servers
  • 64 GB DRAM/server
  • All data always in RAM
  • Durable and available
  • Low-latency access5µs RPC
  • High throughput1M ops/sec/server

Application Servers
Storage Servers
Datacenter
9
RAMCloud Research Issues
  • Data durability and availability
  • Low latency RPC 5 microseconds
  • Need suitable network!
  • Data model
  • Concurrency/consistency model
  • Data distribution, scaling
  • Automated management
  • Multi-tenancy
  • Client-server functional distribution

10
Layer 2.5 Motivation and use cases
  • Speed up TCP performance in data centers
  • TCP performs poorly when there is a large number
    of packet drops
  • Applications like MapReduce/Hadoop and GFS cause
    the incast problem where a large number of
    packets are dropped at switches
  • L2.5 is a highly scalable method of rapidly
    retransmitting dropped packets
  • FCoE
  • Corruption losses, though rare, lead to SCSI
    timeouts.
  • Priority flow control (IEEE 802.1Qbb) enables
    Ethernet switches not to drop packets, but
    requires skid or PAUSE-absorption buffers.
    But skid buffers grow as bandwidth-delay product
    of links and are very expensive.
  • L2.5 enables FCoE to overcome corruption losses

11
L2.5 Research Issues
  • Determine simple signaling method
  • Simplify (or get rid of) headers/tags for L2.5
    encapsulation
  • Develop and refine the basic algorithm for TCP
  • In the kernel
  • In hardware (NICs)
  • Develop the algorithm for storage (FC, FCoE)
  • Deploy in a large testbed
  • Collaborate on standardization

12
DCTCP
  • DCTCP TCP for data centers
  • Operates with really small buffers
  • Optimized for low-latency
  • Uses ECN marking
  • ? with Mohammad Alizadeh, and Greenberg et al at
    Microsoft
  • ? Influenced by ECN-HAT (with Abdul Kabbani)

13
DCTCP Transport Optimized for Data Centers
  • High throughput
  • Creating multi-bit feedback at TCP sources
  • Low Latency (milliseconds matter)
  • Small buffer occupancies due to early and
    aggressive ECN marking
  • Burst tolerance
  • Sources react before packets are dropped
  • Large buffer headroom for bursts

Queue buildup
Incast
Sauce
  • Use full info in stream of ECN marks
  • Adapt quickly and in proportion to level of
    congestion

DCTCP Reduces variability Reduces queuing
?
14
Research Themes and Teams
WEB App Framework
J. Ousterhout
M. Rosenblum
S. Mitra
Resilient Systems
N. McKeown
Virtualization Server and network
M. Rosenblum B. Prabhakar
K. Kozyrakis
Energy Aware
P. Levis
N. McKeown
J. Ousterhout
Storage
M. RosenblumD. Mazieres
N. McKeown
Networking
G. Parulkar
B. Prabhakar
Write a Comment
User Comments (0)
About PowerShow.com