Title: On Congestion Control in HighSpeed Networks
1On Congestion Control in High-Speed
Networks D.J.Leith Hamilton Institute Ireland
2Outline of current TCP congestion control
algorithm Additive-Increase Multiplicative-Decreas
e Probing seems essential in view of queue
properties (no feedback as to link bandwidth
until the queue starts to fill). TCP adopts a
linear increase law- when source i receives an
ACK packet it increases its window size cwndi
by with increase parameter ?i1 in standard TCP.
When the window size eventually exceeds the
pipe capacity, a packet will be dropped. When
the source detects this (after a delay), the
window size is reduced with decrease parameter
?i0.5 in standard TCP.
cwndi
time
3High-speed Networks
The pipe size of a link is roughly BTqmax where
B is the the link rate (packets/s), T is the
propagation delay and qmax is the queue size. On
a long distance gigabit link, B100,000
packets/s, T200ms, qmax1000 and BTqmax21,00
0 Note that the pipe size determines the peak
window size of a TCP source.
21000
11000 packets
11000 RTTs 2200 seconds or 36 minutes
- TCP becomes sluggish, and requires v.low drop
rate to achieve reasonable throughput.
4High-speed Networks
Simply making the increase parameter ? larger is
inadmissable on low-speed networks we require
backward compatibility with current
sources. Large ? in high-speed regimes, ?1 in
low-speed regimes suggests some sort of mode
switch. One approach HS-TCP is to vary AIMD
parameters as a fiunction of cwnd (increase ?,
decrease ? as cwnd becomes large)
5High-speed Networks One alternative approach
HS-TCP
Example of two HS-TCP flows - the second flow
experiences a drop early in slow-start focussing
attention on the responsiveness of the congestion
avoidance algorithm. (NS simulation 500Mb
bottleneck link, 100ms delay, queue 500 packets)
6High-speed Networks One alternative approach
HS-TCP
Persistent HS-TCP flows from node 1?5 and from
node 4 ?5. Consider HS-TCP cross-flow between
nodes 1 and 2
7High-speed Networks One alternative approach
HS-TCP
(NS simulation 500Mb bottleneck link, 100ms
delay, queue 500 packets)
8High-speed Networks Slow convergence - intuition
W1
?1
W1 ?1W1 ?1T gt W1 ?1T/(1-?1)
?1
T
W2
W2 ?2W2 ?2T gt W2 ?2T/(1-?2)
?2
?2
W2/W1 ?2/(1-?2) / ?1/(1-?1) - moving
goalpost
9High-speed Networks Scaleable TCP
Scaleable TCP also has convergence issues
10High-speed Networks Scaleable TCP
11Current TCP congestion control algorithm revisted.
Note cwnd never converges to a steady value with
this probe/back-off approach. Also, we are
ignoring slow-start, timouts etc here so as to
focus on the congestion avoidance behaviour.
12Synchronisation Model Typical congestion window
evolution for a TCP source in congestion
avoidance
Synchronisation assumption ta, tb, tc are the
same for all sources. e.g. when a shared
bottleneck link, RTT is the same for all sources,
each source transmits at least one packet every
RTT (??1)
13Synchronisation Model
The source congestion windows are subject to
constraints
Number of packets in pipe is non-negative
At congestion, total number of packets in pipe
matches pipe size, P
For source i we have
14Synchronisation Model
- Collecting the evolution equations for all n
sources yields the network dynamics - where
is the vector of window sizes at
congestion - and
- where ??i is the AIMD increase parameter for
source i, ?i the decrease parameter. - Observe that
- The dynamics are linear
- A is a positive matrix with very special
structure - This model incorporates important network
features such as the hybrid nature of AIMD,
time-varying delay and drop-tail queueing.
15Synchronisation Model
- Analysis A network of synchronised AIMD sources
- possesses a unique stationary point, Wss?xp
where ? is a positive constant and - the stationary point is globally exponentially
stable. The rate of convergence depends on the
second largest eigenvalue of A.
Fairness Stationary point Wss ? xp where ? is
a positive constant and ? For
standard TCP, ?1, ?0.5 so ?2 and ?i2(1-
?i) is the condition for fair co- existence of
AIMD flows with TCP.
16Synchronisation Model
Fairness Example (NS simulation 10Mb link,
100ms delay, queue 40 Packets)
17Synchronisation Model
Responsiveness Special case All of the sources
have the same decrease parameter ?i ? ?i. Then
the eigenvalues of A (other than the Perron
eigenvalue) are equal to ? ? rate of convergence
is ?k, where k is the congestion epoch. 95
rise time (measured in congestion epochs) is
log(0.05)/log ? e.g. for ?0.5, rise time
is 4 congestion epochs. Note, duration of
congestion epochs depends on increase parameters
?i .
18Synchronisation Model
Responsiveness (cont) e.g.
19Synchronisation Model
Responsiveness (cont) e.g.
20High-speed Networks H-TCP
Simply making the increase parameter ? larger is
inadmissable on low-speed networks we require
backward compatibility with current
sources. Large ? in high-speed regimes, ?1 in
low-speed regimes suggests some sort of mode
switch. E.g.
(Note that when ?L ?H we recover the usual AIMD
algorithm)
21High-speed Networks
22High-speed Networks Rate of
convergence
Example of two H-TCP flows illustrating rapid
convergence to fairness - the second flow
experiences a drop early in slow-start focusing
attention on the responsiveness of the congestion
avoidance algorithm. (NS simulation 500Mb link,
100ms delay, queue 500 packets H-TCP parameters
?L1, ?H20, ?0.5, ?L19 corresponding to
window size threshold of 38)
23High-speed Networks
H-TCP
Converges in 4 congestion epochs
converges in 4 congestion epochs
(NS simulation 500Mb bottleneck link, 100ms
delay, queue 500 packets)
24High-speed Networks Backward
compatibility
On low-speed links where duration of congestion
epoch is less than ?L, H-TCP is identical to
standard TCP. As the duration increases above
?L, the effective ? of H-TCP increases and so
does the degree of unfairness with standard TCP.
25High-speed Networks
Example of standard TCP and H-TCP flows
co-existing on a low speed link (NS simulation,
network parameters 5Mb link, 100ms delay, queue
44 packets H-TCP parameters ?L1, ?H20, ?0.5,
?L19 corresponding to window size threshold of
38)
26Implementing H-TCP
- Linux 2.4.20/22
- Change to AIMD rule straightforward
- For large window sizes (gt2000 packets), Linux
loss recovery is observed to be poor - Linux maintains write_queue data structure which
contains packets not yet acknowledged. SACK
implementation walks write_queue on every ACK (in
fact 3, sometimes 4 or 5 times on every ACK).
For large window sizes, this walk seems to be
interrupted by arrival of next ACK and SACK
algorithm misbehaves. - Our solution improve efficiency of write_queue
walk now scales with number of lost packets
rather than cwnd size. Succesfully tested up to
window sizes of 10000 packets. - Other issues rate halving adjustment may be
useful for multiple losses is restriction of
cwnd to packets_in_flight1 optimal or not ?
27Implementing H-TCP
Initial tests CERN-Chicago. Bottleneck in NIC
and with web100 throughput maxs out regardless
of congestion avoidance algorithm used.
28Implementing H-TCP
- Current test results SLAC in late sept/early
oct 2003. - Further algorithmic issues fairness,
throughput. Revised H-TCP implementation should
be available by end of first week in December
2003. - How do we design good experiments for TCP
proposals to bring out key issues
(responsiveness, fairness, friendliness,
efficiency etc) over a range of network
conditions (which topologies and traffic mixes) ? - Next proposed tests CERN-Dublin tests over
Geant. Key issue for us in testing behaviour of
congestion control algorithms is that bottleneck
lies in network rather than NIC.
29High-speed Networks One alternative approach
HS-TCP
We can gain an insight into this behaviour as
follows. First of all, recall that the we can
write our synchronised model dynamics as The
HS-TCP algorithm uses increase and decrease
parameters that are function of cwnd. We can
extend our synchronisation model to encompass
this by using effective increase and decrease
parameters and allowing these to vary with wi
and time k yielding Note that Wss(k) is the
fixed point of and thus now varies
with time. So lets change co-ordinates to
yielding the dynamics where F is determined
by the update law for algorithm parameters ? and
?.