Title: An Introduction to Computer Networks
1 An Introduction to Computer Networks
Lecture 11 Congestion Control
- University of Tehran
- Dept. of EE and Computer Engineering
- By
- Dr. Nasser Yazdani
2Outline
- Allocating resource among competing users.
- Bandwidth on the line
- Buffers on the routers
- Congestion control and resource allocation, two
side of the same coin. - Different congestion control Policies.
- Reacting to Congestion
- Avoiding Congestion
3Congestion
- Different sources compete for resources inside
network - Why is it a problem?
- Sources are unaware of current state of resource
- Sources are unaware of each other
- In many situations will result in lt 1.5 Mbps of
throughput (congestion collapse)
4Issues
- Two sides of the same coin
- pre-allocate resources so to avoid congestion
- control congestion if (and when) is occurs
- Two points of implementation
- hosts at the edges of the network (transport
protocol) - routers inside the network (queuing discipline)
- Underlying service model
- best-effort (assume for now)
- multiple qualities of service (later)
5Why is Congestion Bad?
- Wasted bandwidth retransmission of dropped
packets. - Poor user service unpredictable delay, reduced
throughput. Increased load can even result in
lower network throughput. - Switched nets heavy traffic -gt long queues
-gt lost packets -gtretransmits - Ethernet high demand -gt many collisions
- compare with highways too much traffic slows
down throughput - Solutions? Redesign the network.
- Add capacity to congested links
- Reroute traffic.
- What are the options?
6Framework
- Connectionless flows
- sequence of packets sent between
source/destination pair - maintain soft state at the routers
7Evaluation
- We need to know how good is a resource
allocation/congestion aviodness mechanism. - Fairness
- Power of network (ratio of throughput to delay),
Maximize this ratio. - Distributedness
- Efficiency
Throughput/delay
Optimal load
Load
8Evaluation (Fairness)
- What is fair resource allocation?
- Equal share of bandwidth
- How about the length of paths?
- Given a set of flow throughput (x1,x2,, xn)
- f(x1,x2,, xn) (Si1n xi)2/n Si1nxi2
- Fairness is always between 0, 1 and 1 completely
fair and 0 completely unfair.
9Possible Solutions
- Redesign the network.
- Add capacity to congested links
- Very slow solution takes days to months!
- Reroute traffic.
- Alternate paths are not always available
- Also too slow takes 10s of seconds
- In practice, most routing algorithms are static
- Adjust the load in the network. Load balancing
- What are the options?
10Principles of Congestion Control
- Congestion
- informally too many sources sending too much
data too fast for network to handle - different from flow control!
- manifestations
- lost packets (buffer overflow at routers)
- long delays (queuing in router buffers)
- a top-10 problem!
11Causes/costs of congestion scenario 1
- two senders, two receivers
- one router, infinite buffers
- no retransmission
- large delays when congested
- maximum achievable throughput
12Causes/costs of congestion scenario 3
- four senders
- multihop paths
- timeout/retransmit
Q what happens as and increase ?
13Causes/costs of congestion scenario 3
- Another cost of congestion
- when packet dropped, any upstream transmission
capacity used for that packet was wasted!
14Framework (cont)
- Taxonomy different approaches for congestion
Ctrl. - router-centric versus host-centric
- Router-Centric each router takes the
responsibility for dropping packets, and
informing the generating host the amount of
traffic allowed to be sent. - Host-Centric the end hosts observe the amount of
traffic that is successfully getting through the
network and adjust their transmission rates
accordingly. - reservation-based versus feedback-based
- Reservation-Based the end host asks the network
for a certain amount of bandwidth when requesting
a connection (or flow). If the network does not
have enough bandwidth it will reject the
connection. - Feedback-Based the end hosts send data without
first reserving any capacity and then adjust
their sending rate according to the feedback they
received - Explicit feedback
- Implicit feedback.
15Framework (cont)
- window-based versus rate-based
- Window-Based the receiver advertises a window to
the sender. The window corresponds to how much
buffer space the receiver has and it limits how
much data the sender can transmit. Network also
can do that, like X.25. - Rate-Based How many bit the sender can send or
the network can absorb. - Rate-Based can support video.
- Rate-Based still is an open problem
16Where to Prevent Collapse?
- Can end hosts prevent problem?
- Yes, but must trust end hosts to do right thing
- E.g., sending host must adjust amount of data it
puts in the network based on detected congestion - Can routers prevent collapse?
- No, not all forms of collapse
- Doesnt mean they cant help
- Sending accurate congestion signals
- Isolating well-behaved from ill-behaved sources
17Whats Really Happening?
packet loss
- Knee point after which
- throughput increases very slow
- delay increases fast
- Cliff point after which
- throughput starts to decrease very fast to zero
(congestion collapse) - delay approaches infinity
knee
cliff
Throughput
congestion collapse
Load
Delay
Load
18Goals
- Operate near the knee point
- Remain in equilibrium
- How to maintain equilibrium?
- Dont put a packet into network until another
packet leaves. How do you do it? - Use ACK send a new packet only after you receive
and ACK (self-clocking) - This maintains the number of packets in network
constant
19How Do You Do It?
- Detect when network approaches/reaches knee point
- Stay there
- Questions
- How do you get there?
- What if you overshoot (i.e., go over knee point)
? - Possible solution
- Increase window size until you notice congestion
- Decrease window size if network congested
20TCP Congestion Control
- Idea
- assumes best-effort network (FIFO or FQ
routers)each source determines network capacity
for itself - uses implicit feedback
- ACKs pace transmission (self-clocking)
- Challenge
- determining the available capacity in the first
place - adjusting to changes in the available capacity
21Additive Increase/Multiplicative Decrease
- Objective adjust to changes in the available
capacity - New state variable per connection
CongestionWindow (cwnd) - Idea
- increase Cwnd when congestion goes down
- decrease Cwnd when congestion goes up
MaxWindow min(cwnd, AdvertisedWindow)
EffectiveWindow MaxWindow (LastByteSent
LastByteAcked)
sequence number increases
LastByteSent
LastByteAcked
22AIMD (cont)
- Question how does the source determine whether
or not the network is congested? - Answer a timeout occurs
- timeout signals that a packet was lost
- packets are seldom lost due to transmission error
- lost packet implies congestion
23AIMD (cont)
- Algorithm
- increment Cwnd by one packet per RTT (linear or
additive increase) - divide Cwnd by two whenever a timeout occurs
(multiplicative decrease)
Destination
Source
- In practice increment a little for each ACK
- Increment MSS (MSS/cwnd)
- cwnd Increment
- MSS Maximum segment size
24AIMD (cont)
70
60
50
40
KB
30
20
10
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
T
ime (seconds)
25TCP Slow Start
- Question? How much is the initial cwnd size?
- Small cwnd implies less network utilization
- Big cwnd means congestion again
- What is optimal of cwnd.
- Solution Start with cwnd 1 and Quickly
increase cwnd until network congested ? get a
rough estimate of the optimal of cwnd - Each time a segment is acknowledged increment
cwnd by one (cwnd) - Increase of cwnd is exponential
26Slow Start Example
destination
source
- TCP slows down the increase of cwnd when cwnd
gtssthresh - Ssthresh - slow start threshold value.
- After ssthresh, TCP change from slow start to
congestion avoidance!
cwnd 2
cwnd 4
cwnd 8
27Slow Start/Congestion Avoidance Example
Client
Server
14
12
10
8
ssthresh
Cwnd (in segments)
6
4
2
0
0
2
4
6
t
t
t
t
Roundtrip times
28Putting Everything TogetherTCP Pseudocode
- Initially
- cwnd 1
- ssthresh infinite
- New ack received
- if (cwnd lt ssthresh)
- / Slow Start/
- cwnd cwnd 1
- else
- / Congestion Avoidance /
- cwnd cwnd 1/cwnd
- Timeout
- / Multiplicative decrease /
- ssthresh win/2
- cwnd 1 / Again slow start /
29The big picture
cwnd
Timeout
Congestion Avoidance
Slow Start
Time
30Slow Start (cont)
- Exponential growth, but slower than all at once
- Used
- when first starting connection
- when connection goes dead waiting for timeout
- Trace
- Problem lose up to half a Cwnds worth of data
31Packet Loss Detection
- Wait for Retransmission Time Out (RTO)
- Whats the problem with this?
- Because RTO is performance killer
- In BSD TCP implementation, RTO is usually more
than 1 second - the granularity of RTT estimate is 500 ms
- retransmission timeout is at least two times of
RTT - Solution Dont wait for RTO to expire
32Fast Retransmit
destination
source
- Resend a segment after 3 duplicate ACKs
- Remember, a duplicate ACK means that an out-of
sequence segment was received - Notes
- Duplicate ACKs due packet reordering!
- If window is small dont get duplicate ACKs!
ACK 2
cwnd 2
segment 2
segment 3
ACK 3
ACK 4
cwnd 4
segment 4
segment 5
segment 6
segment 7
ACK 4
ACK 4
3 duplicate ACKs
- Set cwnd to 1 after each retransmit!!
33Results
- Compare to slow start, it avoids long time out
waiting in 4.0 5.0 period. - How avoid slow start at each fast retransmission?
34Fast Recovery
- After a fast-retransmit set cwnd to ssthresh/2
- i.e., dont reset cwnd to 1, then, avoid slow
start again!. - But when RTO expires still do cwnd 1
- Fast Retransmit and Fast Recovery ? implemented
by TCP Reno most widely used version of TCP
today
35Fast Retransmit and Fast Recovery
cwnd
Congestion Avoidance
Slow Start
Fast retransmit
Time
- Prevent expensive timeouts
- No need to slow start again
- At steady state, cwnd oscillates around the
optimal window size.
36Congestion Control Summary
- Architecture end system detects congestion and
slow down - Starting point
- Slow start/congestion avoidance
- packet drop detected by retransmission timeout
RTO as congestion signal - Fast retransmission/fast recovery
- packet drop detected by (three) duplicate acks
- Router support
- Binary feedback scheme explicit signaling
- Today Explicit Congestion Notification RF99
37Congestion Control vs. Congestion Avoidance
- Congestion control goal
- Stay left of cliff
- Congestion avoidance goal
- Stay left of knee
knee
cliff
Throughput
congestion collapse
Load
38Congestion Avoidance
- TCPs strategy
- control congestion once it happens
- repeatedly increase load in an effort to find the
point at which congestion occurs, and then back
off - Alternative strategy
- predict when congestion is about to happen
- reduce rate before packets start being discarded
- call this congestion avoidance, instead of
congestion control - Two possibilities
- router-centric DECbit and RED Gateways
- host-centric TCP Vegas
39DECbit
- Add binary congestion bit to each packet header
- Router
- monitors average queue length over last busyidle
cycle - set congestion bit if average queue length gt 1
- attempts to balance throughout against delay
Queue length
Current
time
T
ime
Current
Previous
cycle
cycle
A
veraging
interval
40End Hosts
- Destination echoes bit back to source
- Source records how many packets resulted in set
bit - If less than 50 of last windows worth had bit
set - increase Cwnd by 1 packet
- If 50 or more of last windows worth had bit set
- decrease CongestionWindow by 0.875 times
41Random Early Detection (RED)
- Notification is implicit
- just drop the packet (TCP will timeout)
- could make explicit by marking the packet
- Early random drop
- rather than wait for queue to become full, drop
each arriving packet with some drop probability
whenever the queue length exceeds some drop level
42RED Details
- Compute average queue length AvgLen
- AvgLen (1 - Weight) AvgLen
- Weight SampleLen
- 0 lt Weight lt 1 (usually 0.002)
- SampleLen is queue length each time a packet
arrives
MaxThreshold
MinThreshold
Queue
A
vgLen
43RED Details (cont)
- Two queue length thresholds
- if AvgLen lt MinThreshold then
- enqueue the packet
- if MinThreshold lt AvgLen lt MaxThreshold then
- calculate probability P
- drop arriving packet with probability P
- if ManThreshold lt AvgLen then
- drop arriving packet
44RED Details (cont)
- Computing probability P
- TempP MaxP (AvgLen - MinThreshold)/
(MaxThreshold - MinThreshold) - P TempP/(1 - count TempP)
- Drop Probability Curve
Count is the of queued from last drop.
45Tuning RED
- P of a particular flows packet(s) is roughly
proportional to the share of the flows
bandwidth. - MaxP is typically 0.02, meaning when is halfway
between the two thresholds, router drops roughly
one out of 50 packets. - If traffic is bursty, then MinTh. should be
sufficiently large to allow link utilization to
be acceptably high. - Difference between two thresholds should be
larger than the typical increase in the
calculated average queue length in one RTT
setting MaxThreshold to twice MinThreshold is
reasonable. - Penalty Box for Offenders
46Sourced-based Cong. Avoidance
- End hosts adapt traffic before any lost in the
net. - Watch for router queues lengths by checking RTT.
- A collection of related algorithms
- In each 2 RTT check if RTT gt (minRTT maxRTT)/2
-gt decrease cwnd, cwnd - cwnd/8. Increase
otherwise. - Check (currentWind oldWind)x (CurrentRTT-
oldRTT), if the result is positive, decrease
cwnd, cwnd - cwnd/8. Increase 1 otherwise. - Check flattening of sending rate. Increase Cwnd
by 1 in each RTT, compare throughput with
previous one, if it is less than half of one
packet, decrease cwnd by one. - TCP Vegas is like the last one with a difference
to compare to expected rate.
47TCP Vegas
- Idea source watches for some sign that routers
queue is building up and congestion will happen
too e.g., - RTT grows
- sending rate flattens
- Top is cwn size
- Middle is ave. rate in source
- Bottom is ave. queue length in the bottleneck
router - In Shade, Cwnd increase, but ave. rate stay flat.
48TCP Vegas
- Keep track of extra data in the network.
- Extra data is the amount more than available
bandwidth. - Maintain Right amount of extra data.
49Algorithm
- Let BaseRTT be the minimum of all measured RTTs
(commonly the RTT of the first packet) - If not overflowing the connection, then
- ExpectRate Cwnd/BaseRTT
- Source calculates sending rate (ActualRate) once
per RTT - Source compares ActualRate with ExpectRate
- Diff ExpectedRate - ActualRate
- if Diff lt a
- increase Cwnd linearly
- else if Diff gt b
- decrease Cwnd linearly
- else
- leave Cwnd unchanged
50Algorithm (cont)
- Parameters
- a 1 packet
- b 3 packets
- Top Cwn size
- Even faster retransmit
- keep fine-grained timestamps for each packet
- check for timeout on first duplicate ACK
- black line is Actual rate
- Colored expected rate
- Shade area is ? and ?