Data Center Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Data Center Networks

Description:

Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101 http://www.cs.princeton.edu/courses/archive/spr12/cos461/ – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 38
Provided by: Kai45
Category:

less

Transcript and Presenter's Notes

Title: Data Center Networks


1
Data Center Networks
  • Jennifer Rexford
  • COS 461 Computer Networks
  • Lectures MW 10-1050am in Architecture N101
  • http//www.cs.princeton.edu/courses/archive/spr12/
    cos461/

2
Networking Case Studies
Data Center
Backbone
Enterprise
Cellular
Wireless
3
Cloud Computing
4
Cloud Computing
  • Elastic resources
  • Expand and contract resources
  • Pay-per-use
  • Infrastructure on demand
  • Multi-tenancy
  • Multiple independent users
  • Security and resource isolation
  • Amortize the cost of the (shared) infrastructure
  • Flexible service management

5
Cloud Service Models
  • Software as a Service
  • Provider licenses applications to users as a
    service
  • E.g., customer relationship management, e-mail,
    ..
  • Avoid costs of installation, maintenance,
    patches,
  • Platform as a Service
  • Provider offers platform for building
    applications
  • E.g., Googles App-Engine
  • Avoid worrying about scalability of platform

6
Cloud Service Models
  • Infrastructure as a Service
  • Provider offers raw computing, storage, and
    network
  • E.g., Amazons Elastic Computing Cloud (EC2)
  • Avoid buying servers and estimating resource needs

7
Enabling Technology Virtualization
  • Multiple virtual machines on one physical machine
  • Applications run unmodified as on real machine
  • VM can migrate from one computer to another

8
Multi-Tier Applications
  • Applications consist of tasks
  • Many separate components
  • Running on different machines
  • Commodity computers
  • Many general-purpose computers
  • Not one big mainframe
  • Easier scaling

9
Multi-Tier Applications
Front end Server
Aggregator

Aggregator
Aggregator
Aggregator


Worker
Worker
Worker
Worker
Worker
10
Data Center Network
11
Virtual Switch in Server
12
Top-of-Rack Architecture
  • Rack of servers
  • Commodity servers
  • And top-of-rack switch
  • Modular design
  • Preconfigured racks
  • Power, network, andstorage cabling

13
Aggregate to the Next Level
14
Modularity, Modularity, Modularity
  • Containers
  • Many containers

15
Data Center Network Topology
Internet
CR
CR
. . .
AR
AR
AR
AR
S
S
. . .
S
S
S
S
  • Key
  • CR Core Router
  • AR Access Router
  • S Ethernet Switch
  • A Rack of app. servers



A
A
A
A
A
A
1,000 servers/pod
16
Capacity Mismatch
CR
CR
2001
AR
AR
AR
AR
S
S
S
S
401
. . .
S
S
S
S
S
S
S
S
51

A
A
A

A
A
A


A
A
A
A
A
A
17
Data-Center Routing
Internet
CR
CR
DC-Layer 3
. . .
AR
AR
AR
AR
DC-Layer 2
S
S
S
S
. . .
S
S
S
S
S
S
S
S
  • Key
  • CR Core Router (L3)
  • AR Access Router (L3)
  • S Ethernet Switch (L2)
  • A Rack of app. servers



A
A
A
A
A
A
1,000 servers/pod IP subnet
18
Reminder Layer 2 vs. Layer 3
  • Ethernet switching (layer 2)
  • Cheaper switch equipment
  • Fixed addresses and auto-configuration
  • Seamless mobility, migration, and failover
  • IP routing (layer 3)
  • Scalability through hierarchical addressing
  • Efficiency through shortest-path routing
  • Multipath routing through equal-cost multipath
  • So, like in enterprises
  • Connect layer-2 islands by IP routers

19
Case Study Performance Diagnosis in Data Centers
  • http//www.eecs.berkeley.edu/minlanyu/writeup/nsd
    i11.pdf

20
Applications Inside Data Centers
.
.
.
.
Aggregator
Workers
Front end Server
21
Challenges of Datacenter Diagnosis
  • Multi-tier applications
  • Hundreds of application components
  • Tens of thousands of servers
  • Evolving applications
  • Add new features, fix bugs
  • Change components while app is still in operation
  • Human factors
  • Developers may not understand network well
  • Nagles algorithm, delayed ACK, etc.

22
Diagnosing in Todays Data Center
App logs Reqs/sec Response time 1 req. gt200ms
delay
Packet trace Filter out trace for long delay req.
Host
App
Packet sniffer
OS
Switch logs bytes/pkts per minute
SNAP Diagnose net-app interactions
23
Problems of Different Logs
App logs Application-specific
Packet trace Too expensive
Host
App
Packet sniffer
OS
Switch logs Too coarse-grained
SNAP Generic, fine-grained, and lightweight
Runs everywhere, all the time
24
TCP Statistics
  • Instantaneous snapshots
  • Bytes in the send buffer
  • Congestion window size, receiver window size
  • Snapshots based on random sampling
  • Cumulative counters
  • FastRetrans, Timeout
  • RTT estimation SampleRTT, SumRTT
  • RwinLimitTime
  • Calculate difference between two polls

25
Identifying Performance Problems
  • Not any other problems
  • Send buffer is almost full
  • Fast retransmission
  • Timeout
  • RwinLimitTime
  • Delayed ACK
  • diff(SumRTT)/diff(SampleRTT) gt MaxDelay

Sender App
Send Buffer
Sampling
Network
Direct measure
Receiver
Inference
26
SNAP Architecture
At each host for every connection
Collect data
  • Direct access to OS
  • Polling per-connection statistics
  • Snapshots (bytes in send buffer)
  • Cumulative counters (FastRestrans)
  • Adaptive tuning of polling rate

27
SNAP Architecture
At each host for every connection
Collect data
Performance Classifier
  • Classifying based on the life of data transfer
  • Algorithms for detecting performance problems
  • Based on direct measurement in the OS

28
SNAP Architecture
At each host for every connection
Cross-connection correlation
Collect data
Performance Classifier
  • Direct access to data center configurations
  • Input
  • Topology, routing information
  • Mapping from connections to processes/apps
  • Correlate problems across connections
  • Sharing the same switch/link, app code

29
SNAP Deployment
  • Production data center
  • 8K machines, 700 applications
  • Ran SNAP for a week, collected petabytes of data
  • Identified 15 major performance problems
  • Operators Characterize key problems in data
    center
  • Developers Quickly pinpoint problems in app
    software, network stack, and their interactions

30
Characterizing Perf. Limitations
Apps that are limited for gt 50 of the time
Sender App
  • Bottlenecked by CPU, disk, etc.
  • Slow due to app design (small writes)

551 Apps
1 App
Send Buffer
  • Send buffer not large enough
  • Fast retransmission
  • Timeout

6 Apps
Network
8 Apps
  • Not reading fast enough (CPU, disk, etc.)
  • Not ACKing fast enough (Delayed ACK)

Receiver
144 Apps
31
Delayed ACK
  • Delayed ACK caused significant problems
  • Delayed ACK was used to reduce bandwidth usage
    and server interruption

B
A
Data
Delayed ACK should be disabled in data centers
B has data to send
DataACK
.
Data
B doesnt have data to send
200 ms
ACK
32
Diagnosing Delayed ACK with SNAP
  • Monitor at the right place
  • Scalable, low overhead data collection at all
    hosts
  • Algorithms to identify performance problems
  • Identify delayed ACK with OS information
  • Correlate problems across connections
  • Identify the apps with significant delayed ACK
    issues
  • Fix the problem with operators and developers
  • Disable delayed ACK in data centers

33
Conclusion
  • Cloud computing
  • Major trend in IT industry
  • Todays equivalent of factories
  • Data center networking
  • Regular topologies interconnecting VMs
  • Mix of Ethernet and IP networking
  • Modular, multi-tier applications
  • New ways of building applications
  • New performance challenges

34
Load Balancing
35
Load Balancers
  • Spread load over server replicas
  • Present a single public address (VIP) for a
    service
  • Direct each request to a server replica

10.10.10.1
Virtual IP (VIP) 192.121.10.1
10.10.10.2
10.10.10.3
36
Wide-Area Network
37
Wide-Area Network Ingress Proxies
Write a Comment
User Comments (0)
About PowerShow.com