Title: BREAKING THE DATA TRANSFER BOTTLENECK
1BREAKING THE DATA TRANSFER BOTTLENECK
UDT A High Performance Data Transport Protocol
Yunhong GU gu_at_lac.uic.edu Laboratory for
Advanced Computing National Center for Data
Mining University of Illinois at Chicago October
10, 2005
udt.sourceforge.net
2Outline
INTRODUCTION
PROTOCOL DESIGN IMPLEMENTATION
CONGESTION CONTROL
PERFORMANCE EVALUATION
COMPOSABLE UDT
CONCLUSIONS
3gtgt INTRODUCTION
PROTOCOL DESIGN IMPLEMENTATION
CONGESTION CONTROL
PERFORMANCE EVALUATION
COMPOSABLE UDT
CONCLUSIONS
4Motivations
- The widespread use of high-speed networks (1Gb/s,
10Gb/s, etc.) has enabled many new distributed
data intensive applications - Inexpensive fibers and advanced optical
networking technologies (e.g., DWDM - Dense
Wavelength Division Multiplexing) - 10Gb/s is common in high speed network testbeds,
40 Gb/s is emerging - Large volumetric datasets
- Satellite weather data
- Astronomy observation
- Network monitoring
- The Internet transport protocol (TCP) does NOT
scale well as network bandwidth-delay product
(BDP) increases - New transport protocol is needed!
5Data Transport Protocol
- Functionalities
- Streaming, messaging
- Reliability
- Timeliness
- Unicast vs. multicast
- Congestion control
- Efficiency
- Fairness
- Convergence
- Distributedness
Applications
Transport Layer
Network Layer
Data link Layer
Physical Layer
6TCP
- Reliable, data streaming, unicast
- Congestion control
- Increase congestion window size (cwnd) one full
sized packet per RTT - Halve the cwnd per loss event
- Poor efficiency in high bandwidth-delay product
networks - Bias on flows with larger RTT
7TCP
Throughput (Mb/s)
Throughput (Mb/s)
Packet Loss
Round Trip Time (ms)
8Related Work
- TCP variants
- HighSpeed, Scalable, BiC, FAST, H-TCP, L-TCP
- Parallel TCP
- PSockets, GridFTP
- Rate-based reliable UDP
- RBUDP, Tsunami, FOBS, FRTP (based on SABUL),
Hurricane (based on UDT) - XCP
- SABUL
9Problems of Existing Work
- Hard to deploy
- TCP variants and XCP
- Need modifications in OS kernel and/or routers
- Cannot be used in shared networks
- Most reliable UDP-based protocols
- Poor fairness
- Intra-protocol fairness
- RTT fairness
- Manual parameter tuning
10A New Protocol
Throughput (Mb/s)
Throughput (Mb/s)
Packet Loss
Round Trip Time (ms)
11UDT (UDP-based Data Transfer Protocol)
- Application level, UDP-based
- Similar functionalities to TCP
- Connection-oriented reliable duplex unicast data
streaming - New protocol design and implementation
- New congestion control algorithm
- Configurable congestion control framework
12Objective Non-objective
- Objective
- For distributed data intensive applications in
high speed networks - A small number of flows share the abundant
bandwidth - Efficient, fair, and friendly
- Configurable
- Easily deployable and usable
- Non-objective
- Replace TCP on the Internet
13UDT Project
- Open source (udt.sourceforge.net)
- Design and implement the UDT protocol
- Design the UDT congestion control algorithm
- Evaluate experimentally the performance of UDT
- Design and implement a configurable protocol
framework based on UDT (Composable UDT)
14gtgt PROTOCOL DESIGN IMPLEMENTATION
INTRODUCTION
CONGESTION CONTROL
PERFORMANCE EVALUATION
COMPOSABLE UDT
CONCLUSIONS
15UDT Overview
- Two orthogonal elements
- The UDT protocol
- The UDT congestion control algorithm
- Protocol design implementation
- Functionality
- Efficiency
- Congestion control algorithm
- Efficiency, fairness, friendliness, and stability
16UDT Overview
17Functionality
- Reliability
- Packet-based sequencing
- Acknowledgment and loss report from receiver
- ACK sub-sequencing
- Retransmission (based on loss report and timeout)
- Streaming
- Buffer/memory management
- Connection maintenance
- Handshake, keep-alive message, teardown message
- Duplex
- Each UDT instance contains both a sender and a
receiver
18Protocol Architecture
Sender
UDP Channel
19Software Architecture
CC
20Efficiency Consideration
- Less packets
- Timer-based acknowledging
- Less CPU time
- Reduce per packet processing time
- Reduce memory copy
- Reduce loss list processing time
- Light ACK vs. regular ACK
- Parallel processing
- Threading architecture
- Less burst in processing
- Evenly distribute the processing time
21Application Programming Interface (API)
- Socket API
- New functionalities
- sendfile/recvfile
- Overlapped IO support
- Transparent to existing applications
- Recompilation needed
- Certain limitations exist
- XIO support (in Globus Toolkit 4.0)
- Wrapper for other programming languages
- Java, Python
22gtgt CONGESTION CONTROL
INTRODUCTION
PROTOCOL DESIGN IMPLEMENTATION
PERFORMANCE EVALUATION
COMPOSABLE UDT
CONCLUSIONS
23Overview
- Congestion control vs. flow control
- Congestion control effectively utilize the
network bandwidth - Flow control prevent the receiver from being
overwhelmed by incoming packets - Window-based vs. rate-based
- Window-based tune the maximum number of
on-flight packets (TCP) - Rate-based tune the inter-packet sending time
(UDT) - AIMD additive increases multiplicative decreases
- Feedback
- Packet loss (Most TCP variants, UDT)
- Delay (Vegas, FAST)
24AIMD with Decreasing Increases
- AIMD
- x x ?(x), for every constant interval (e.g.,
RTT) - x (1 - ?) x, when there is a packet loss event
- where x is the packet sending rate.
- TCP
- ?(x) ? 1, and the increase interval is RTT.
- ? 0.5
- AIMD with Decreasing Increase
- ?(x) is non-increasing, and limx-gt? ?(x) 0.
25AIMD with Decreasing Increases
26UDT Control Algorithm
- Increase
- ?(x) f( B - x ) c
- where B is the link capacity (Bandwidth), c is a
constant parameter - Constant rate control interval (SYN), irrelevant
to RTT - SYN 0.01 seconds
- Decrease
- Randomized decrease factor
- ? 1 (8/9)n
27The Increase Formula an Example
Bandwidth (B) 10 Gbps, Packet size 1500 bytes
28Dealing with Packet Loss
- Loss synchronization
- Randomization method
- Non-congestion loss
- Do not decrease sending rate for the first packet
loss
29Bandwidth Estimation
P2
P1
Packet Size / Space ? Bottleneck Bandwidth
- Filters
- Cross traffic
- Interrupt Coalescence
- Robust to estimation errors
- Randomized interval to send packet pair
30gtgt PERFORMANCE EVALUATION
INTRODUCTION
PROTOCOL DESIGN IMPLEMENTATION
CONGESTION CONTROL
COMPOSABLE UDT
CONCLUSIONS
31Performance Characteristics
- Efficiency
- Higher bandwidth utilization, less CPU usage
- Intra-protocol fairness
- Max-min fairness
- Jain's fairness index
- TCP friendliness
- Bulk TCP flow vs Bulk UDT flow
- Short-lived TCP flow (slow start phase) vs Bulk
UDT flow - Stability (oscillations)
- Stability index (standard deviation)
32Evaluation Strategies
- Simulations vs. experiments
- NS2 network simulator, NCDM teraflow testbed
- Setup
- Network topology, bandwidth, distance, queuing,
Link error rate, etc. - Concurrency (number of parallel flows)
- Comparison (against TCP)
- Real world applications
- SDSS data transfer, high performance mining of
streaming data, etc. - Independent evaluation
- SLAC, JGN2, UvA, Unipmn (Italy), etc.
33Efficiency, Fairness, Stability
34Efficiency, Fairness, Stability
35TCP Friendliness
- 500 1MB TCP flows vs. 0 10 bulk UDT flows
- 1Gb/s between Chicago and Amsterdam
36gtgt COMPOSABLE UDT
INTRODUCTION
PROTOCOL DESIGN IMPLEMENTATION
CONGESTION CONTROL
PERFORMANCE EVALUATION
CONCLUSIONS
37Composable UDT - Objectives
- Easy implementation and deployment of new control
algorithms - Easy evaluation of new control algorithms
- Application awareness support and dynamic
configuration
38Composable UDT - Methodologies
- Packet sending control
- Window-based, rate-based, and hybrid
- Control event handling
- onACK, onLoss, onTimeout, onPktSent, onPktRecved,
etc. - Protocol parameters access
- RTT, loss rate, RTO, etc.
- Packet extension
- User-defined control packets
39Composable UDT - Evaluation
- Simplicity
- Can it be easily used?
- Expressiveness
- Can it be used to implement most control
protocols? - Similarity
- Can Composable UDT based implementations
reproduce the performance of their native
implementations? - Overhead
- Will the overhead added by Composable UDT be too
large?
40Simplicity Expressiveness
- Eight event handlers, four protocol control
functions, and one performance monitoring
function. - Support a large variety of protocols
- Reliable UDT blast
- TCP and its variants (both loss and delay based)
- Group transport protocols
41Simplicity Expressiveness
CCC Base Congestion Control Class
42Similarity and Overhead
- Similarity
- How Composable UDT based implementations can
simulate their native implementations - CTCP vs. Linux TCP
- CPU usage
- Sender CTCP uses about 100 more times of CPU as
Linux TCP - Receiver CTCP uses about 20 more CPU than Linux
TCP
43gtgt CONCLUSIONS
INTRODUCTION
PROTOCOL DESIGN IMPLEMENTATION
CONGESTION CONTROL
PERFORMANCE EVALUATION
COMPOSABLE UDT
44Contributions
- A high performance data transport protocol and
associated implementation - The UDT protocol
- Open source UDT library (udt.sourceforge.net)
- User includes ANL, ORNL, PNNL, etc.
- An efficient and fair congestion control
algorithm - DAIMD the UDT control algorithm
- Packet loss handling techniques
- Using bandwidth estimation technique in
congestion control - A configurable transport protocol framework
- Composable UDT
45Publications
- Papers on the UDT Protocol
- Supporting Configurable Congestion Control in
Data Transport Services, Yunhong Gu and Robert L.
Grossman, SC 2005, Nov 12 - 18, Seattle, WA. - Optimizing UDP-based Protocol Implementation,
Yunhong Gu and Robert L. Grossman, PFLDNet 2005,
Lyon, France, Feb. 2005. - Experiences in Design and Implementation of a
High Performance Transport Protocol, Yunhong Gu,
Xinwei Hong, and Robert L. Grossman, SC 2004, Nov
6 - 12, Pittsburgh, PA. - An Analysis of AIMD Algorithms with Decreasing
Increases, Yunhong Gu, Xinwei Hong and Robert L.
Grossman, First Workshop on Networks for Grid
Applications (Gridnets 2004), Oct. 29, San Jose,
CA. - SABUL A Transport Protocol for Grid Computing,
Yunhong Gu and Robert L. Grossman, Journal of
Grid Computing, 2003, Volume 1, Issue 4, pp.
377-386. - Internet Draft
- UDT A Transport Protocol for Data Intensive
Applications, Yunhong Gu and Robert L. Grossman,
draft-gg-udt-01.txt.
46Publications
- Papers on Data Transfer Service using UDT
- Experimental Studies of Data Transport and Data
Access of Earth Science Data over Networks with
High Bandwidth Delay Products, Robert Grossman,
Yunhong Gu, Dave Hanley, Xinwei Hong and
Parthasarathy Krishnaswamy, Computer Networks,
Volume 46, Issue 3, Oct. 2004, pp. 411-421. - Teraflows over Gigabit WANs with UDT, Robert
Grossman, Yunhong Gu, Xinwei Hong, Antony Antony,
Johan Blom, Freek Dijkstra, and Cees de Laat,,
Journal of Future Computer Systems, Vol. 21,
Issue 4, pp. 501-513, April 2005. - The Photonic TeraStream Enabling Next Generation
Applications Through Intelligent Optical
Networking at iGrid 2002, J. Mambretti, J.
Weinberger, J. Chen, E. Bacon, F. Yeh, D.
Lillethun, R. Grossman, Y. Gu, M. Mazzuco,,
Journal of Future Computer Systems, Volume 19,
Number 6, pages 897-908. - Experimental Studies Using Photonic Data Services
at IGrid 2002, R. Grossman, Y. Gu, D. Hanley, X.
Hong, D. Lillethun, J. Levera, J. Mambretti, M.
Mazzucco, and J. Weinberger, Journal of Future
Computer Systems, 2003, Volume 19, Number 6,
pages 945-955.
47Publications
- Papers on Applications using UDT
- Open DMIX High Performance Web Services for
Distributed Data Mining, R. Grossman, Y. Gu, C.
Gupta, D. Hanley, X. Hong, and P. Krishnaswamy,
7th International Workshop on High Performance
and Distributed Mining, . - Open DMIX - Data Integration and Exploration
Services for Data Grids, R. Grossman, Y. Gu, D.
Hanley, X. Hong, and G. Rao, First International
Workshop on Knowledge Grid and Grid Intelligence
(KGGI 2003). - Global Access to Large Distributed Data Sets
using Photonic Data Services, R. Grossman, Y. Gu,
D. Hanley, X. Hong, D. Lillethun, J. Levera, J.
Mambretti, M. Mazzucco, and J. Weinberger, 20th
IEEE/11th NASA Goddard Conference on Mass Storage
Systems and Technologies (MSST 2003), Los
Alamitos, CA. - Data Webs for Earth Science Data, Asvin
Ananthanarayan, Rajiv Balachandran, Yunhong Gu,
Robert Grossman, Xinwei Hong, Jorge Levera, Marco
Mazzucco, Parallel Computing, Volume 29, 2003,
pages 1363-1379.
48Achievements
- SC 2002 Bandwidth Challenge Best Use of Emerging
Network Infrastructure Award - SC 2003 Bandwidth Challenge Application
Foundation Award - SC 2004 Bandwidth Challenge Best Replacement for
FedEx / UDP Fairness Award - SC 2005 ?
- Nov. 12 18, Seattle WA
- High Performance Mining of Streaming Data using
UDT - iGrid 2005
- Exploring and mining remote data at 10Gb/s
49Vision
- Short-term
- A practical solution to the distributed data
intensive applications in high BDP environments - Long-term
- Evolve with new technologies (open source open
standard) - More functionalities and support for more use
scenarios - Network research platform (e.g., fast prototyping
and evaluation of new control algorithms)
50The End
- Thank You!
- Yunhong Gu, October 10, 2005