AOL Visit to Caltech - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

AOL Visit to Caltech

Description:

AOL Visit to Caltech. Discussion of Advanced Networking ... peering & optical connectivity. Excellent relationships and connectivity with research and ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 32

Provided by: harvey

Category:

more less

Transcript and Presenter's Notes

Title: AOL Visit to Caltech

1

AOL Visit to Caltech
Discussion of Advanced Networking
Tuesday, September 23, 2003
1000AM 1230PM
248-Lauritsen
ravot_at_caltech.edu

2
Agenda

Overview of LHCnet
High TCP performance over wide area networks
Problem Statement
Fairness
Solutions
Awards
Internet2 land speed record
Fastest and biggest in the West (CENIC award)
IPv6 internet2 land speed record
Demos at Telecom World 2003, SC2003, WSIS

3
LHCnet (I)

CERN - US production traffic
A test-bed to experiment with massive file
transfers across the Atlantic
Provide high-performance protocols for gigabit
networks underlying data-intensive Grids
Guarantee interoperability between several major
Grid projects in Europe and USA

Feb. 2003 Sept. 2003 setup
4
LHCnet (II)
New setup

Unique Multi-platform / Multi-technology optical
transatlantic test-bed
layer-2 and layer-3 capabilities
Cisco, Juniper, Alcatel, Extreme Networks,
Procket
Powerful Linux farms
Native IPv6, QoS, LBE
New level of service MPLS GMPLS
Get hands-on experience with the operation of
gigabit networks
Stability and reliability of hardware and
software
Interoperability

5
LHCnet peering optical connectivity

Excellent relationships and connectivity with
research and academic networks
UCAID, CENIC and NLR in particular
Extension of the LHCnet to Sunnyvale during
SC2002
Cisco and Level(3) loan
Internet2 Land speed record
22 TeraBytes transferred in 6 hours from
Baltimore to Sunnyvale
The optical triangle will be extend to the UK,
forming an optical quadrangle

High TCP performance over wide area networks

7
Problem Statement

End-users perspective
Using TCP as the data-transport protocol for
Grids leads to a poor bandwidth utilization in
high speed WANs
Network protocol designers perspective
TCP is inefficient in high bandwidthdelay
networks
TCPs congestion control algorithm (AIMD) is not
suited to gigabit networks
When window size is 1 -gt 100 increase in window
size
When window size is 1000 -gt 0.1 increase in
window size
Due to TCPs limited feedback mechanisms, line
errors are interpreted as congestion
RFC 2581 (which gives the formula for increasing
cwnd) forgot delayed ACKs
The future performance of computational grids
looks bad if we continue to rely on the
widely-deployed TCP RENO

8
Single TCP stream performance under periodic
losses
MSS1500 Bytes C1.22

Loss rate 0.01
LAN BW utilization 99
WAN BW utilization1.2

9
Single TCP stream
TCP connection between Geneva and Chicago C1
Gbit/s MSS1,460 Bytes RTT120ms
10
Responsiveness (I)

The responsiveness r measures how quickly we go
back to using the network link at full capacity
after experiencing a loss if we assume that the
congestion window size is equal to the Bandwidth
Delay product when the packet is lost.

C Capacity of the link
2
C . RTT
r
2 . MSS
11
Responsiveness (II)
The Linux kernel 2.4.x implements delayed
acknowledgment. Due to delayed acknowledgments,
the responsiveness is multiplied by two.
Therefore, values above have to be multiplied by
two!
12
Measurements with Different MTUs
13
MTU and Fairness
Starlight (Chi)
CERN (GVA)
Host 1
1 GE
Host 1
1 GE
GbE Switch
POS 2.5 Gbps
1 GE
Host 2
Host 2
1 GE
Bottleneck

Two TCP streams share a 1 Gb/s bottleneck
RTT117 ms
MTU 1500 Bytes Avg. throughput over a period
of 4000s 698 Mb/s
MTU 9000 Bytes Avg. throughput over a period
of 4000s 50 Mb/s
Factor 14 !

14
RTT and Fairness
Sunnyvale
Starlight (Chi)
CERN (GVA)
Host 1
1 GE
10GE
1 GE
GbE Switch
POS 2.5 Gb/s
POS 10 Gb/s
Host 2
Host 2
1 GE
1 GE
Bottleneck
Host 1

Two TCP streams share a 1 Gb/s bottleneck
CERN lt-gt Sunnyvale RTT181ms Avg. throughput
over a period of 7000s 202Mb/s
CERN lt-gt Starlight RTT117ms Avg. throughput
over a period of 7000s 514Mb/s
MTU 9000 bytes
Link utilization 71,6

15
Why TCP perform better in a LAN?

Better reactivity (see previous slides)
Buffering capacity

(cwnd)
W
Buffering capacity
BDP
W/2
Area 1
Area 2
(RTT)
W
W/2

Area 1
CwndltBDP gt Throughput lt Bandwidth
RTT constant
Throughput Cwnd / RTT

Area 2
Cwnd gt BDP gt Throughput Bandwidth
RTT increase (proportional to cwnd)

16
Why TCP perform better in a LAN?

Buffering capacity

(cwnd)
W
Buffering capacity
W/2
BDP
Area 2
(RTT)
W
W/2

Area 1
CwndltBDP gt Throughput lt Bandwidth
RTT constant
Throughput Cwnd / RTT

Area 2
Cwnd gt BDP gt Throughput Bandwidth
RTT increase (proportional to cwnd)

17
Solution?

GRID DT
Increase TCP responsiveness
Higher Additive increase
Smaller backoff
Reduce the strong penalty imposed by a loss
Better Fairness
between flows with different RTT
between flows with different MTU (Virtual
increase of the MTU)
FAST TCP
Uses end-to-end delay and loss
Achieves any desired fairness, expressed by
utility function
Very high utilization (99 in theory)

18
Internet 2 CENIC Awards

Current Internet 2 Land speed record IPv4 class
On Feb. 27, a Terabyte of data was transferred in
3700 seconds between the Level3 PoP in Sunnyvale
near SLAC and CERN from memory to memory as a
single TCP/IP stream at average rate of 2.38
Gbps. This beat the former record by a factor of
2.5, and used the US-CERN link at 99
efficiency.
Current Internet 2 Land speed record IPv6 class
On may 2, Caltech and CERN set new Internet2 Land
SpeedRecords using next generation Internet
Protocols (IPv6) by achieving 983
megabits-per-second with a single IPv6 stream for
more than an hour across a distance of 7,067
kilometers (more than 4,000 miles) from Geneva,
Switzerland to Chicago.
CENIC award
The Biggest, Fastest in the West Award honors the
fastest and most scalable high-performance
networking application/technology.

One Terabyte of data transferred in less than an
hour
Geneva-Sunnyvale 10037Km
19
Single stream TCP performance
20

Telecom World 2003/ Internet2 Fall Members'
meeting
SC2003
World Summit Information Society
Caltech CERN/DataTAG Internet2 CENIC -
Starlight
Cisco Intel Level(3)

21
LHCnet Geneva Los Angeles 10 Gbps path
22
LHCnet Telecom World 2003/ Internet2 Fall
Members' meeting
23
LHCnet SC2003
24
Conclusion

Transcontinental testbed Geneva Chicago Los
Angeles
The future performance of computational grids
looks bad if we continue to rely on the
widely-deployed TCP RENO
Grid DT
Virtual MTU
RTT bias correction
Achieve multi-streams performance with a single
stream
How to define the fairness?
Taking into account the MTU
Taking into account the RTT
Larger packet size (Jumbogram payload larger
than 64K)
Is standard MTU the largest bottleneck?
J. Cain (Cisco) Its very difficult to build
switches to switch large packets such as
jumbogram
Our vision of the network
The network, once viewed as an obstacle for
virtual collaborations and distributed computing
in grids, can now start to be viewed as a
catalyst instead. Grid nodes distributed around
the world will simply become depots for dropping
off information for computation or storage, and
the network will become the fundamental fabric
for tomorrow's computational grids and virtual
supercomputers.

Extra slides

26
UltraLight

Integrated packet switched and circuit switched
hybrid experimental research network
10 GE backbone across the US, (G)MPLS, PHY-TAG,
larger MTU
End-to-end monitoring
Dynamic bandwidth provisioning,
Agent-based services spanning all layers of the
system, from the optical cross-connects to the
applications.
Three flagship application areas
Particle physics experiments exploring the
frontiers of matter and spacetime (LHC),
Astrophysics projects studying the most distant
objects and the early universe (e-VLBI)
Medical teams distributing high resolution
real-time images

27
Fast TCP

Equilibrium properties
Uses end-to-end delay and loss
Achieves any desired fairness, expressed by
utility function
Very high utilization (99 in theory)
Stability properties
Stability for arbitrary delay, capacity, routing
load
Robust to heterogeneity, evolution,
Good performance
Negligible queueing delay loss
Fast response

28
FAST TCP vs Reno

Channel 2 FAST

Channel 1 newReno

Utilization 90
Utilization 70
29
FAST demo via OMNInet and Datatag
NU-E (Leverone)
San Diego
Workstations
FAST dispaly
2 x GE
Nortel Passport 8600
A. Adriaanse, C. Jin, D. Wei (Caltech)
10GE
FAST Demo Cheng Jin, David Wei Caltech
J. Mambretti, F. Yeh (Northwestern)
OMNInet
StarLight-Chicago
Nortel Passport 8600
10GE
CERN -Geneva
Workstations
2 x GE
2 x GE
7,000 km
2 x GE
2 x GE
OC-48 DataTAG
CERN Cisco 7609
CalTech Cisco 7609
Alcatel 1670
Alcatel 1670
S. Ravot (Caltech/CERN)
30
Effect of the RTT on the fairness

Objective Improve fairness between two TCP
streams with different RTT and same MTU
We can adapt the model proposed by Matt. Mathis
by taking into account a higher additive
increment
Assumptions
Approximate the packet loss of probability p by
assuming that each flow delivers 1/p consecutive
packets followed by one drop.
Under these assumptions, the congestion window
of the flows oscillate with a period T0.
If the receiver acknowledges every packet, then
the congestion window size opens by x (additive
increment) packets each RTT.

W
Number of packets delivered by each stream in one
period
W/2
(t)
2T0
T0
Relation between t and t
CWND evolution under periodic loss
By modifying the congestion increment dynamically
according to RTT, guarantee fairness among TCP
connections
31
Linux farms (Summary)

CENIC PoP (LA)
1Dual Opteron 1.8 GHz with 10GE Intel Card (disk
server, 2 TeraBytes)
2Dual Xeon 3 GHz with 10GE Intel Card (disk
server, 2 TeraBytes)
12Dual Xeon 2.4 GHz with 1 GE Syskonnect Cards
Starlight (CHI)
3Dual XEON 2.4 GHz with 10GE Intel Card (disk
server, 2 TeraBytes)
6Dual XEON 2.2 GHz with 21GE Syskonnect Cards
each
CERN computer center (GVA)
4Dual XEON 2.4 GHz with 21GE Syskonnect Cards
each
OpenLabs Intanium systems ???
Convention Center (GVA)
2 Itanium Dual 1,5 GHz with 10GE Intel Cards
1Dual Xeon 2.4 GHz with 10GE Intel Cards (disk
server, 2 TeraBytes to be sent from Starlight)
1Dual Xeon 2.4 GHz with 10GE Intel Cards (disk
server, 2 TeraBytes to be sent from Caltech)
1Dual Xeon 3 GHz with 10GE Intel Card (disk
server, 2 TeraBytes to be sent from Caltech)
2Dual Xeon 2.2 GHz with 21GE Syskonnect Cards
each