Title: A Machine Learning Approach to TCP Throughput Prediction
1A Machine Learning Approach to TCP Throughput
Prediction
- Mariyam Mirza
- Joel Sommers
- Paul Barford
- Xiaojin Zhu
-
2Motivation Why Predict TCP Throughput?
- Multiple Paths b/w senders and receivers
- Select best path
- Common definition of best path highest
throughput path
3Talk Outline
- Goals and Challenges of TCP Throughput Prediction
- Previous Work
- Our Approach
- Results
- Summary
4TCP Throughput Prediction Challenges and
Existing Approaches
- Goals or Challenges
- Accuracy
- Timeliness, i.e., Responsiveness to changing
conditions - Cost Volume of probe traffic introduced
- Previous approaches
- Formula-Based e.g., Padhye et. al., 1998
- History-Based He et. al., 2005
5Overview of Our Approach
- Measure path characteristics using lightweight
probes - Use Support Vector Regression (SVR), for
prediction - Advantages over existing approaches
- Formula-Based (FB)
- Need different formulae for different flavors of
TCP - History-Based (HB)
- Heavyweight
- So far, shown to work only for bulk transfers
- Our Probe- and SVR-Based Approach (SVR)
- Lightweight 10x less traffic than HB
- Per-path, so different flavors of TCP
accommodated automatically, unlike FB - Wide range of background traffic rates
- Level shifts
- Large and small TCP transfers
6Experimental Setup
Traffic Generator Hosts
Traffic Generator Hosts
Adtech SX-14 25 ms one-way delay
Cisco 12000
GE
OC-12
GE
GE
OC-3
Cisco 12000
Cisco 12000
Cisco 6500
Cisco 6500
GE
OC-3
GE
Cisco 12000
OC-12
GE
GE
Probe Sender
Endace DAG Monitor with 3.5/3.8 cards, pkts
copied via optical splitters
Probe Sender
7Path Characteristics Considered
- Highly Accurate Oracular Passive Measurements
- Practical Active Measurements
- Loss (L)
- Loss Frequency
- Loss Duration
- Active Measurements via Badabing Sommers et.
al., 2005 - Queuing Delay (Q)
- Active Measurements via Badabing
- Available Bandwidth (AB)
- Active Measurements via Yaz Sommers et. al.,
2006
8Support Vector Regression (SVR)
- State-of-the-art machine learning tool for
multivariate regression - Input features to the SVR AB, Q, L
- Use a Radial Basis Function (RBF) Kernel
- Training produces a highly non-linear prediction
function - Non-linearity captures the complex relationship
between throughput and measurements AB, Q, L - Apply function to test input features to get
predictions
9Experimental Protocol
Yaz run for one AB measurement, 10-30 sec to
converge
File Transferred Oracular measurements of Q, L,
and AB
Badabing run for 30 sec, Q L measured
time
One Experiment
- Training Set and Test Set
- 100 Experiments per set
- Background Traffic 135Mbps, generated by Harpoon
Sommers et. al., 2004 - Bottleneck Bandwidth OC-3, 150Mbps
10Results HB Prediction (Baseline)
Predicted Throughput, Mbps
Actual Throughput, Mbps
11AB-based Predictions,Oracular Passive
Measurements
Predicted Throughput, Mbps
Actual Throughput, Mbps
12Q-based Predictions, Oracular Passive Measurements
Predicted Throughput, Mbps
Actual Throughput, Mbps
13L-Based Predictions, Oracular Passive
Measurements
Predicted Throughput, Mbps
Actual Throughput, Mbps
14Best Prediction Results Q- and L-Based, Oracular
Predicted Throughput, Mbps
Actual Throughput, Mbps
15Best ResultsQ- and L-Based, Oracular and
Practical
Predicted Throughput, Mbps
Actual Throughput, Mbps
16Results Summary
- Available Bandwidth not necessary for accurate
throughput prediction - Relative Error He et. al., 2005
- Relative Error Predicted Throughput - Actual
Throughput - min(predicted throughput, actual throughput)
17Results Level Shifts, Practical Measurements
18Results Different File Sizes
Predicted Throughput, Mbps
Training Sizes 32KB, 512KB, 8MB Test Sizes
Randomly Generated, b/w 2KB and 8MB
Actual Throughput, Mbps
19PathPerf Online Tool for TCP Throughput
Prediction
Make Prediction Retrain if necessary
Badabing run for 30 sec, Q L measured
File Transferred
time
One Experiment
- Training Set First Measurement to start out with
- Test Set Each new measurement
- Retrain if prediction error exceeds threshold
- Practical Passive Measurements only no Oracular
Passive Measurements - No AB measurements
- Will be released soon
20PathPerf Wide-Area Experiments
- Run on the RON testbed in Dec 2006
- Paths cross-section of 7 nodes
- Nodes in Amsterdam, London, Utah, NYC, Ithaca,
New Mexico, Maryland - Base RTT range 10ms-150ms
- Throughput on most paths window limited
21Sample Wide-Area Result Without Retraining
22Sample Wide-Area Result With Retraining
23Summary
- Active Probing and Machine Learning Based
Mechanism for TCP Throughput Prediction - Accurate
- Lightweight
- Responsive
24Acknowledgements
- David Anderson, for help using the RON testbed
25Questions?