Title: Week 11 TCP Congestion Control
1Week 11TCP Congestion Control
2Principles of Congestion Control
- Congestion
- informally too many sources sending too much
data too fast for network to handle - different from flow control!
- manifestations
- lost packets (buffer overflow at routers)
- long delays (queueing in router buffers)
- a top-10 problem!
3Causes/costs of congestion scenario 1
- two senders, two receivers
- one router, infinite buffers
- no retransmission
- large delays when congested
- maximum achievable throughput
4Causes/costs of congestion scenario 2
- one router, finite buffers
- sender retransmission of lost packet
Host A
lout
lin original data
l'in original data, plus retransmitted data
Host B
finite shared output link buffers
5Causes/costs of congestion scenario 2
- always (goodput)
- perfect retransmission only when loss
- retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
- costs of congestion
- more work (retrans) for given goodput
- unneeded retransmissions link carries multiple
copies of pkt
6Causes/costs of congestion scenario 3
- four senders
- multihop paths
- timeout/retransmit
Q what happens as and increase ?
lout
lin original data
l'in original data, plus retransmitted data
finite shared output link buffers
7Example
What happens when each demand peaks at unity
rate? Throughput 1.52? (How) twice the unity
rate T 1.07?
8Max-min fair allocation
- Given a network and a set of sessions we would
like to find a maximal flow that it is fair - We will see different definitions for max-min
fairness and will learn a flow control algorithm - The tutorial will give understanding what is
max-min fairness
9How define fairness?
- Any session is entitled to as much network use as
is any other. - Allocating the same share to all.
10Max-Min Flow Control Rule
- The rule is maximizing the network use allocated
to the sessions with the minimum allocation - An alternative definition is to maximize the
allocation of each session i under constraint
that an increase in is allocation doesnt cause
a decrease in some other session allocation with
the same or smaller rate than i
11Example
Session 3
Session 2
Session 1
C1
C1
Session 0
- Maximal fair flow division will be to give for
the sessions 0,1,2 a flow rate of 1/3 and for the
session 3 a flow rate of 2/3
12Notation
- G(N,A) - Directed network graph (N is set of
vertexes and A is set of edges) - Ca the capacity of a link a
- Fa the flow on a link a
- P a set of the sessions
- rp the rate of a session p
- We assume a fixed, single-path routing method
13Definitions
- We have following constraints on the vector r
rp p ? P - A vector r satisfying these constraints is said
to be feasible
14Definitions
- A vector of rates r is said to be max-min fair if
is a feasible and for each p ? P, rp can not be
increased while maintaining feasibility without
decreasing rp for some session p for which
rp rp - We want to find a rate vector that is max-min fair
15Bottleneck Link for a Session
- Given some feasible flow r, we say that a is a
bottleneck link with respect to r, for a session
p crossing a if Fa Ca and rp rp for all
sessions p crossing link a.
All link capacity is 1. Bottlenecks for 1,2,3,4,5
respectively are c,a,a,d,a Note c is not a
bottleneck for 5 and b is not a bottleneck for 1
16Max-Min Fairness Definition Using Bottleneck
- Theorem A feasible rate vector r is max-min fair
if and only if each session has a bottleneck link
with respect to r
17Algorithm for Computing Max-Min Fair Rate Vectors
- The idea of the algorithm
- Bring all the sessions to the state that they
have a bottleneck link and then according to
theorem it will be the maximal fair flow - We start with all-zero rate vector and to
increase rates on all paths together until Fa
Ca for one or more links a. - At this point, each session using a saturated
link has the same rate as every other session
using this link. Thus, these saturated links
serve as bottleneck links for all sessions using
them
18Algorithm for Computing Max-Min Fair Rate Vectors
- At the next step, all sessions not using the
saturated links are incremented equally in rate
until one or more new links become saturated - Note that the sessions using the previously
saturated links might also be using these newly
saturated links (at a lower rate) - The algorithm continues from step to step, always
equally incrementing all sessions not passing
through any saturated link until all session pass
through at least one such link
19Algorithm for Computing Max-Min Fair Rate Vectors
- Init k1, Fa00, rp00, P1P and A1A
- For all a?A, nak num of sessions p?Pk crossing
link a - ?rmina?Ak(Ca-Fak-1)/nak (find inc size)
- For all p ? Pk, rpkrpk-1 ?r, (increment)for
other p, rpkrpk-1 - FakSp crossing arpk (Update flow)
- Ak1 The set of unsaturated links.
- Pk1all ps, such that p cross only links in
Ak1 - kk1
- If Pk is empty then stop, else goto 1
20Example of Algorithm Running
- Step 1 All sessions get a rate of 1/3, because
of a and the link a is saturated. - Step 2 Sessions 1 and 4 get an additional rate
increment of 1/3 for a total of 2/3. Link c is
saturated now. - Step 3 Session 4 gets an additional rate
increment of 1/3 for a total of 1. Link d is
saturated. - End
21Example revisited
- Max-min fair vector if Tij 8
- r (½ ½ ½ ½ )
- T 2 gt 1.52
- What if the demands T13 and T31 ¼ , T24 ½,
T42 8 - r (¼ ½ ¼ ¾)
22Causes/costs of congestion scenario 3
lout
- Another cost of congestion
- when packet dropped, any upstream transmission
capacity used for that packet was wasted!
23Approaches towards congestion control
Two broad approaches towards congestion control
- Network-assisted congestion control
- routers provide feedback to end systems
- single bit indicating congestion (SNA, DECbit,
TCP/IP ECN, ATM) - explicit rate sender should send at
- End-end congestion control
- no explicit feedback from network
- congestion inferred from end-system observed
loss, delay - approach taken by TCP
24Case study ATM ABR congestion control
- ABR available bit rate
- elastic service
- if senders path underloaded
- sender should use available bandwidth
- if senders path congested
- sender throttled to minimum guaranteed rate
- RM (resource management) cells
- sent by sender, interspersed with data cells
- bits in RM cell set by switches
(network-assisted) - NI bit no increase in rate (mild congestion)
- CI bit congestion indication
- RM cells returned to sender by receiver, with
bits intact -
25Case study ATM ABR congestion control
- two-byte ER (explicit rate) field in RM cell
- congested switch may lower ER value in cell
- sender send rate thus minimum supportable rate
on path - EFCI bit in data cells set to 1 in congested
switch - if data cell preceding RM cell has EFCI set,
sender sets CI bit in returned RM cell
26TCP Congestion Control
- How does sender perceive congestion?
- loss event timeout or 3 duplicate acks
- TCP sender reduces rate (CongWin) after loss
event - three mechanisms
- AIMD
- slow start
- conservative after timeout events
- end-end control (no network assistance)
- sender limits transmission
- LastByteSent-LastByteAcked
- ? CongWin
- Roughly,
- CongWin is dynamic, function of perceived network
congestion
27TCP AIMD
additive increase increase CongWin by 1 MSS
every RTT in the absence of loss events probing
- multiplicative decrease cut CongWin in half
after loss event
Long-lived TCP connection
28Additive Increase
- increase CongWin by 1 MSS every RTT in the
absence of loss events probing - cwnd SMSSSMSS/cwnd ()
- This adjustment is executed on every incoming
non-duplicate ACK. - Equation () provides an acceptable approximation
to the underlying principle of increasing cwnd by
1 full-sized segment per RTT.
29TCP Slow Start
- When connection begins, increase rate
exponentially fast until first loss event
- When connection begins, CongWin 1 MSS
- Example MSS 500 bytes RTT 200 msec
- initial rate 20 kbps
- available bandwidth may be gtgt MSS/RTT
- desirable to quickly ramp up to respectable rate
30TCP Slow Start (more)
- When connection begins, increase rate
exponentially until first loss event - double CongWin every RTT
- done by incrementing CongWin for every ACK
received - Summary initial rate is slow but ramps up
exponentially fast
Host A
Host B
one segment
RTT
two segments
four segments
31Refinement
Philosophy
- After 3 dup ACKs
- CongWin is cut in half, Threshold is set to
CongWin - window then grows linearly
- But after timeout event
- Threshold set to CongWin/2 and CongWin instead
set to 1 MSS - window then grows exponentially
- to a threshold, then grows linearly
- 3 dup ACKs indicates network capable of
delivering some segments - timeout before 3 dup ACKs is more alarming
32Refinement (more)
- Q When should the exponential increase switch to
linear? - A When CongWin gets to 1/2 of its value before
timeout. -
- Implementation
- Variable Threshold
- At loss event, Threshold is set to 1/2 of CongWin
just before loss event
33Summary TCP Congestion Control
- When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially. - When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows
linearly. - When a triple duplicate ACK occurs, Threshold set
to CongWin/2 and CongWin set to Threshold. - When timeout occurs, Threshold set to CongWin/2
and CongWin is set to 1 MSS.
34TCP sender congestion control
Event State TCP Sender Action Commentary
ACK receipt for previously unacked data Slow Start (SS) CongWin CongWin MSS, If (CongWin gt Threshold) set state to Congestion Avoidance Resulting in a doubling of CongWin every RTT
ACK receipt for previously unacked data Congestion Avoidance (CA) CongWin CongWinMSS (MSS/CongWin) Additive increase, resulting in increase of CongWin by 1 MSS every RTT
Loss event detected by triple duplicate ACK SS or CA Threshold CongWin/2, CongWin Threshold, Set state to Congestion Avoidance Fast recovery, implementing multiplicative decrease. CongWin will not drop below 1 MSS.
Timeout SS or CA Threshold CongWin/2, CongWin 1 MSS, Set state to Slow Start Enter slow start
Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked CongWin and Threshold not changed
35TCP Futures
- Example 1500 byte segments, 100ms RTT, want 10
Gbps throughput - Requires window size W 83,333 in-flight
segments - Throughput in terms of loss rate
- p 2?10-10
- New versions of TCP for high-speed needed!
36Macroscopic TCP model
Deterministic packet losses
1/p packets transmitted in a cycle
success
loss
37TCP Model Contd
Equate the trapozeid area 3/8 W2 under to 1/p
38TCP Fairness
- Fairness goal if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K
39Why is TCP fair?
- Two competing sessions
- Additive increase gives slope of 1, as throughout
increases - multiplicative decrease decreases throughput
proportionally
R
equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increase
Connection 2 throughput
loss decrease window by factor of 2
congestion avoidance additive increase
Connection 1 throughput
R
40Fairness (more)
- Fairness and parallel TCP connections
- nothing prevents app from opening parallel
cnctions between 2 hosts. - Web browsers do this
- Example link of rate R supporting 9 cnctions
- new app asks for 1 TCP, gets rate R/10
- new app asks for 10 TCPs, gets R/2 !
- Fairness and UDP
- Multimedia apps often do not use TCP
- do not want rate throttled by congestion control
- Instead use UDP
- pump audio/video at constant rate, tolerate
packet loss - Research area TCP friendly, DCCP
41Queuing Disciplines
- Each router must implement some queuing
discipline - Queuing allocates both bandwidth and buffer
space - Bandwidth which packet to serve (transmit) next
- Buffer space which packet to drop next (when
required) - Queuing also affects latency
42Typical Internet Queuing
- FIFO drop-tail
- Simplest choice
- Used widely in the Internet
- FIFO (first-in-first-out)
- Implies single class of traffic
- Drop-tail
- Arriving packets get dropped when queue is full
regardless of flow or importance - Important distinction
- FIFO scheduling discipline
- Drop-tail drop policy
43FIFO Drop-tail Problems
- Leaves responsibility of congestion control
completely to the edges (e.g., TCP) - Does not separate between different flows
- No policing send more packets ? get more service
- Synchronization end hosts react to same events
44FIFO Drop-tail Problems
- Full queues
- Routers are forced to have have large queues to
maintain high utilizations - TCP detects congestion from loss
- Forces network to have long standing queues in
steady-state - Lock-out problem
- Drop-tail routers treat bursty traffic poorly
- Traffic gets synchronized easily ? allows a few
flows to monopolize the queue space
45Active Queue Management
- Design active router queue management to aid
congestion control - Why?
- Router has unified view of queuing behavior
- Routers see actual queue occupancy (distinguish
queue delay and propagation delay) - Routers can decide on transient congestion, based
on workload
46Design Objectives
- Keep throughput high and delay low
- High power (throughput/delay)
- Accommodate bursts
- Queue size should reflect ability to accept
bursts rather than steady-state queuing - Improve TCP performance with minimal hardware
changes
47Lock-out Problem
- Random drop
- Packet arriving when queue is full causes some
random packet to be dropped - Drop front
- On full queue, drop packet at head of queue
- Random drop and drop front solve the lock-out
problem but not the full-queues problem
48Full Queues Problem
- Drop packets before queue becomes full (early
drop) - Intuition notify senders of incipient congestion
- Example early random drop (ERD)
- If qlen gt drop level, drop each new packet with
fixed probability p - Does not control misbehaving users
49Random Early Detection (RED)
- Detect incipient congestion
- Assume hosts respond to lost packets
- Avoid window synchronization
- Randomly mark packets
- Avoid bias against bursty traffic
50RED Algorithm
- Maintain running average of queue length
- If avg lt minth do nothing
- Low queuing, send packets through
- If avg gt maxth, drop packet
- Protection from misbehaving sources
- Else mark packet in a manner proportional to
queue length - Notify sources of incipient congestion
51RED Operation
Min thresh
Max thresh
Average Queue Length
P(drop)
1.0
maxP
minth
maxth
Avg queue length
52Improving QOS in IP Networks
- Thus far making the best of best effort
- Future next generation Internet with QoS
guarantees - RSVP signaling for resource reservations
- Differentiated Services differential guarantees
- Integrated Services firm guarantees
- simple model for sharing and congestion
studies
53Principles for QOS Guarantees
- Example 1Mbps IP phone, FTP share 1.5 Mbps
link. - bursts of FTP can congest router, cause audio
loss - want to give priority to audio over FTP
Principle 1
packet marking needed for router to distinguish
between different classes and new router policy
to treat packets accordingly
54Principles for QOS Guarantees (more)
- what if applications misbehave (audio sends
higher than declared rate) - policing force source adherence to bandwidth
allocations - marking and policing at network edge
- similar to ATM UNI (User Network Interface)
Principle 2
provide protection (isolation) for one class from
others
55Principles for QOS Guarantees (more)
- Allocating fixed (non-sharable) bandwidth to
flow inefficient use of bandwidth if flows
doesnt use its allocation
Principle 3
While providing isolation, it is desirable to use
resources as efficiently as possible
56Principles for QOS Guarantees (more)
- Basic fact of life can not support traffic
demands beyond link capacity
Principle 4
Call Admission flow declares its needs, network
may block call (e.g., busy signal) if it cannot
meet needs
57Summary of QoS Principles
Lets next look at mechanisms for achieving this
.
58Scheduling And Policing Mechanisms
- scheduling choose next packet to send on link
- FIFO (first in first out) scheduling send in
order of arrival to queue - real-world example?
- discard policy if packet arrives to full queue
who to discard? - Tail drop drop arriving packet
- priority drop/remove on priority basis
- random drop/remove randomly
59Scheduling Policies more
- Priority scheduling transmit highest priority
queued packet - multiple classes, with different priorities
- class may depend on marking or other header info,
e.g. IP source/dest, port numbers, etc.
60Scheduling Policies still more
- round robin scheduling
- multiple classes
- cyclically scan class queues, serving one from
each class (if available)
61Scheduling Policies still more
- Weighted Fair Queuing
- generalized Round Robin
- each class gets weighted amount of service in
each cycle
62Deficit Weighted Round Robin (DWRR)
- DWRR addresses the limitations of the WRR model
by accurately supporting the weighted fair
distribution of bandwidth when servicing queues
that contain variable-length packets. - DWRR addresses the limitations of the WFQ model
by defining a scheduling discipline that has
lower computational complexity and that can be
implemented in hardware. This allows DWRR to
support the arbitration of output port bandwidth
on high-speed interfaces in both the core and at
the edges of the network.
63DRR
- In DWRR queuing, each queue is configured with a
number of parameters - A weight that defines the percentage of the
output port bandwidth allocated to the queue. - A DeficitCounter that specifies the total number
of bytes that the queue is permitted to transmit
each time that it is visited by the scheduler.
The DeficitCounter allows a queue that was not
permitted to transmit in the previous round
because the packet at the head of the queue was
larger than the value of the DeficitCounter to
save transmission credits and use them during
the next service round.
64DWRR
- In the classic DWRR algorithm, the scheduler
visits each non-empty queue and determines the
number of bytes in the packet at the head of the
queue. - The variable DeficitCounter isincremented by the
value quantum. If the size of the packet at the
head of the queue is greater than the variable
DeficitCounter, then the scheduler moves on to
service the next queue. - If the size of the packet at the head of the
queue is less than or equal to the variable
DeficitCounter, then the variable DeficitCounter
is reduced by the number of bytes in the packet
and the packet is transmitted on the output port.
65DWRR
- A quantum of service that is proportional to the
weight of the queue and is expressed in terms of
bytes. The DeficitCounter for a queue is
incremented by the quantum each time that the
queue is visited by the scheduler. - The scheduler continues to dequeue packets and
decrement the variable DeficitCounter by the size
of the transmitted packet until either the size
of the packet at the head of the queue is greater
than the variable DeficitCounter or the queue is
empty. - If the queue is empty, the value of
DeficitCounter is set to zero.
66DWRR Example
Queue 1 50 BW, quantum1 1000
600
400
300
Queue 2 25 BW, quantum2 500
400
400
300
Queue 3 25 BW, quantum3 500
600
400
300
Modified Deficit Round Robin gives priority to
one say VoIP class
67Policing Mechanisms
- Goal limit traffic to not exceed declared
parameters - Three common-used criteria
- (Long term) Average Rate how many pkts can be
sent per unit time (in the long run) - crucial question what is the interval length
100 packets per sec or 6000 packets per min have
same average! - Peak Rate e.g., 6000 pkts per min. (ppm) avg.
1500 ppm peak rate - (Max.) Burst Size max. number of pkts sent
consecutively (with no intervening idle)
68Policing Mechanisms
- Token Bucket limit input to specified Burst Size
and Average Rate. - bucket can hold b tokens
- tokens generated at rate r token/sec unless
bucket full - over interval of length t number of packets
admitted less than or equal to (r t b).
69Policing Mechanisms (more)
- token bucket, WFQ combine to provide guaranteed
upper bound on delay, i.e., QoS guarantee!