SEDCL: Stanford Experimental Data Center Laboratory

About This Presentation

Title:

SEDCL: Stanford Experimental Data Center Laboratory

Description:

SEDCL: Stanford Experimental Data Center Laboratory Tackle Data Center Scaling Challenges with Stanford s research depth and breadth ... – PowerPoint PPT presentation

Number of Views:159

Avg rating:3.0/5.0

Slides: 15

Provided by: simulaSta

Category:

more less

Transcript and Presenter's Notes

Title: SEDCL: Stanford Experimental Data Center Laboratory

1
SEDCLStanford Experimental Data Center
Laboratory

2
Tackle Data Center Scaling Challenges
with Stanfords research depth
and breadth

3
Data Center Scaling

A network of data centers and web services are
the key building blocks for future computing
Factors contributing to data center scaling
challenges
Explosive growth of data with no locality of any
kind
Legal requirement to backup data in
geographically-separated locations---big concern
for financial industry
Emergence of mobile and Cloud Computing
Massive interactive web application
Energy as a major new factor and constraint
Increasing capex and opex pressures
Continued innovations critical to sustain growth

4
Stanford Research Themes

RAMCloud main-memory based persistent storage
Extremely low latency RPC
Networking
Large, high-bandwidth, low-latency network fabric
Scalable, error-free packet transport
Software defined data center networking with
OpenFlow
Servers and computing
Error and failure resilient design
Energy aware and energy proportional design
Virtualization and mobile VMs

5
Major research topics of SEDCL

RAMCloud Scalable DRAM-based Storage
Scalable nvRAM
All data in DRAMs all the time
Interconnect fabric
Bufferless networks low-latency, high-bandwidth
network
Packet transport
Reliable delivery of packets R2D2L2.5
Congestion management QCN (IEEE 802.1Qau),
ECN-HAT, DCTCP
Programmable bandwidth partitioning for
multi-tenanted DCs AF-QCN
Low-latency 10GBaseT
Related projects
OpenFlow
Energy aware and energy proportional design

6
Experimentation is Key to Success

Many promising ideas and technologies
Will need iterative evaluation at scale with real
applications
Interactions of subsystems and mechanisms not
clear
Experimentation best way to understand the
interactions
Difficult to experiment with internal mechanisms
of a DC
No experimental facilities and that is a big
barrier to innovations
Ongoing efforts to enable experimentation
Facebook, Microsoft, NEC, Yahoo!, Google, Cisco,
Intel,

7
Overview of Research Projects

RAMCloud
Packet transport mechanisms
Reliable and reliable data delivery R2D2L2.5
ECN-HAT, DCTCP collaboration with Microsoft
Data center switching fabric
Extremely low latency, low errors and congestion
(bufferless)
High port density with very large bisection
bandwidth
? project just initiated

8
RAMCloud OverviewLead John Ousterhout

Storage for datacenters
1000-10000 commodity servers
64 GB DRAM/server
All data always in RAM
Durable and available
Low-latency access5µs RPC
High throughput1M ops/sec/server

Application Servers
Storage Servers
Datacenter
9
RAMCloud Research Issues

Data durability and availability
Low latency RPC 5 microseconds
Need suitable network!
Data model
Concurrency/consistency model
Data distribution, scaling
Automated management
Multi-tenancy
Client-server functional distribution

10
Layer 2.5 Motivation and use cases

Speed up TCP performance in data centers
TCP performs poorly when there is a large number
of packet drops
Applications like MapReduce/Hadoop and GFS cause
the incast problem where a large number of
packets are dropped at switches
L2.5 is a highly scalable method of rapidly
retransmitting dropped packets
FCoE
Corruption losses, though rare, lead to SCSI
timeouts.
Priority flow control (IEEE 802.1Qbb) enables
Ethernet switches not to drop packets, but
requires skid or PAUSE-absorption buffers.
But skid buffers grow as bandwidth-delay product
of links and are very expensive.
L2.5 enables FCoE to overcome corruption losses

11
L2.5 Research Issues

Determine simple signaling method
Simplify (or get rid of) headers/tags for L2.5
encapsulation
Develop and refine the basic algorithm for TCP
In the kernel
In hardware (NICs)
Develop the algorithm for storage (FC, FCoE)
Deploy in a large testbed
Collaborate on standardization

12
DCTCP

DCTCP TCP for data centers
Operates with really small buffers
Optimized for low-latency
Uses ECN marking
? with Mohammad Alizadeh, and Greenberg et al at
Microsoft
? Influenced by ECN-HAT (with Abdul Kabbani)

13
DCTCP Transport Optimized for Data Centers

High throughput
Creating multi-bit feedback at TCP sources
Low Latency (milliseconds matter)
Small buffer occupancies due to early and
aggressive ECN marking
Burst tolerance
Sources react before packets are dropped
Large buffer headroom for bursts

Queue buildup
Incast
Sauce

Use full info in stream of ECN marks
Adapt quickly and in proportion to level of
congestion

DCTCP Reduces variability Reduces queuing
?
14
Research Themes and Teams
WEB App Framework
J. Ousterhout
M. Rosenblum
S. Mitra
Resilient Systems
N. McKeown
Virtualization Server and network
M. Rosenblum B. Prabhakar
K. Kozyrakis
Energy Aware
P. Levis
N. McKeown
J. Ousterhout
Storage
M. RosenblumD. Mazieres
N. McKeown
Networking
G. Parulkar
B. Prabhakar

Write a Comment

User Comments (0)