Title: Fast PatternBased Throughput Prediction for TCP Bulk Transfers
1Fast Pattern-Based Throughput Prediction for
TCP Bulk Transfers
- Tsung-i (Mark) Huang
- Jaspal Subhlok
- University of Houston
- GAN05 / May 10, 2005
2Outline
- Background
- Problem Description
- Methodology
- Experiments and Results
- Conclusion and Future Works
3Are we there yet?
- When you need Throughput Prediction?
- File download xx minutes left MS IE vs. Mozilla
- Mirror site selection Knoppix Florida State
Univ. (fsu.edu) or TU Ilmenau, Germany
(tu-ilmenau.de) - Resource selection in a grid environment
- Cache selection for web content
- delivery services
4Which site will give the best throughput?
- Current approaches and tools
- Geographical distance
- Ping (ICMP)
- Download 512 KBytes (fixed size) NWS / iperf
- Download 10 seconds (fixed duration) - iperf
- Last two approaches are most accurate
- How much data to download / How long?
- Is Bandwidth Delay the answer? One size fits
all? - All or nothing no result is available until
the - end of transmission
5Problem Description
- Predicted future throughput can be used in
mirror/replica site selection - Predict throughput of a TCP bulk transfer
- Single TCP stream
- Input Time Series of (Arrival time, Bytes
received) - Output Predicted future throughput
- Make a prediction of future throughput after 10
100 RTTs - Utilize knowledge of TCP flow patterns
- Assume TCP flow patterns will repeat later in the
same TCP stream
6TCP Flow Patterns
(a) Rate Control
(b) Congestion Control
(c) Rate Control with delay
(d) Mixed Congestion Control
7Approach to Throughput Prediction
- Analyze Time-Series (TS1) of (Arrival Time, Bytes
received) to get a meaningful throughput
Time-Series - Possible solutions
- Instant throughput throughput since previous TCP
segment - Fixed Interval throughput avg throughput over a
fixed time period - Per RTT throughput partition using fixed SYN-ACK
RTT - Idea TCP sends a window full of data segments
every RTT - Partition Time-Series (TS1 ) with fixed SYN-ACK
RTT, and get per RTT Throughput (TS2 ) - Analyze per RTT Throughput Time-Series (TS2 ) to
predict future throughput - Compare different prediction methods across all
traces
8TCP Segment Partitioning (1)
SYN-ACK RTT 176 ms
per RTT Throughput
Fixed Interval of 100 ms
Log Scaled
121 KB/sec
40 KB/sec
Instant throughput shows wide-range of
fluctuation.
Fixed Interval throughput shows less fluctuation.
9TCP Segment Partitioning (2)
- RTT estimation
- Use fixed SYN-ACK RTT
- Simple and effective
- Partition TCP segments into per RTT throughput
time series
10Throughput Prediction (1)
- TCP Patterns
- Rate Control limited (RC)
- Congestion Control limited (CC)
- Identify basic elements
- Flat regions
- Exponential Climb regions
- Linear Climb regions
- Drop points
11Throughput Prediction (2)
- Peak of slow start
- Data points up to end of 1st slow
- start are ignored for prediction
- initial slow start does not repeat
- RC-based prediction
- Use flat regions
- CC-based prediction
- Use complete CC cycles
- Window-based prediction
- If no clear pattern observed
12Experiments (1) - Setup
- Download data files from 290 web sites
(Debian/Gentoo mirrors) - Use TCPDUMP to capture receivers traffic
- Record SYN-ACK RTTs
- Include Retransmitted packets (0.09)
- Average file size is 30 MBytes
- 461 traces collected at Univ. of Houston
- Traces are analyzed using perl scripts
13Experiments (2) Prediction Methods
- Prediction methods compared
- Moving Average (MA) avg throughput of previous
10 RTTs - Exponential Weighted Moving Average (EWMA)
- Aggregate throughput average past throughput
(same as cumulative average) use this as
predicted throughput - TCP Pattern prediction
- Average error in predicted future throughput
- Cut off at 100 if over, in case measured future
throughput is very small
14Illustration of Prediction (1)
Make a prediction for next 200 RTTs
Drop at 27th RTT
Window size (in RTTs)
25th RTT
40th RTT
Prediction at 25th RTT
- Aggregate Throughput Prediction average
throughput - of 025 RTTs
- TCP Throughput Prediction average throughput of
- 925 RTTs (RC-based prediction)
Prediction at 40th RTT
- TCP Throughput Prediction using Window-based
- prediction after 27th RTTs (a significant drop)
15Illustration of Prediction (2)
Make a prediction for next 200 RTTs
Window size (in RTTs)
Closer to 0, better the prediction.
- Avg error against measured future throughput of
next 200 RTTs - (for example, at 20th RTT, avg throughput of
21220 RTTs is used)
16Illustration of Prediction (3)
Make a prediction for next 200 RTTs
One complete CC cycle
Prediction made at 65th RTT using 3 CC complete
cycles
Closer to 0, better the prediction.
Throughput prediction using Congestion-Control
based patterns.
17Results (1) predict next 200 RTTs at different
time
30th RTT
- Aggregate is not accurate for small window size
(lt 30 RTTs)
- MA / EWMA generally not as accurate
18Results (2) predict at 15th RTT for different
time in the future
- When only limited data is available,
- Aggregate is not accurate
- MA performs best TCP Pattern is close
19Results (3) predict at 25th RTT for different
time in the future
- Aggregate performs better
- TCP Pattern performs best MA is close
20Results (4) predict at 50th RTT for different
time in the future
- Even more data is available,
- TCP Pattern best and Aggregate is close
- MA now performs worse, due to dynamic of TCP
flows
21Summary of Results
- Aggregate is accurate with sufficient data, not
with a few RTTs of data - MA performs very well for a few RTTs of data
- EWMA is not a good predictor
- TCP Pattern generally performs better or as well
as other methods
22Summary of Results (table view)
23Conclusion and Future Works
- TCP-pattern based throughput prediction is as
good or better than other methods. - Good predictions within 25 RTTs (or 5 sec).
- Patterns observed 65 Rate Control, few
Congestion Control - Methods using Aggregate (e.g. NWS) can not be
expected to work well for small test files - Whats next?
- Identify more patterns
- Add a degree of confidence for each prediction
- Multiple TCP streams
24Thats all, folks!
25Supplement Slides
26Characteristics of collected traces (1)
27Characteristics of collected traces (2)
- Classification one trace presents over 50
some type of patterns.
28Some Trace Patterns (300 RTTs)
Under-estimated RTT 100 RTTs
29Results (0.5) predict next 100 RTTs at
different time
30Results (1.5) predict next 400 RTTs at
different time
31Bandwidth
- Bandwidth
- The amount of data that can be pushed through a
link in unit time. Usually measured in bits or
bytes per second. - Bottleneck Bandwidth (BB)
- Available Bandwidth (AB)
- Throughput (T)
- T AB BB