Title: Lecture 04: Transport Layer
1Lecture 04 Transport Layer
- Transport layer protocols in the Internet
- UDP connectionless transport
- TCP connection-oriented transport
- TCP congestion control
2Provides end-to-end connectivity, but not
necessarily good performance
name
link
session
path
address
3Internet transport-layer protocols
- reliable, in-order delivery (TCP)
- congestion control
- flow control
- connection setup
- unreliable, unordered delivery UDP
- no-frills extension of best-effort IP
- services not available
- delay guarantees
- bandwidth guarantees
4Two Basic Transport Features
- Demultiplexing port numbers
- Error detection checksums
Server host 128.2.194.242
Service request for 128.2.194.24280 (i.e., the
Web server)
Client host
Web server (port 80)
OS
Client
Echo server (port 7)
IP
payload
detect corruption
5User Datagram Protocol (UDP)
- Datagram messaging service
- Demultiplexing port numbers
- Detecting corruption checksum
- Lightweight communication between processes
- Send and receive messages
- Avoid overhead of ordered, reliable delivery
SRC port
DST port
checksum
length
DATA
6Advantages of UDP
- Fine-grain control
- UDP sends as soon as the application writes
- No connection set-up delay
- UDP sends without establishing a connection
- No connection state
- No buffers, parameters, sequence s, etc.
- Small header overhead
- UDP header is only eight-bytes long
7Popular Applications That Use UDP
- Multimedia streaming
- Retransmitting packets is not always worthwhile
- E.g., phone calls, video conferencing, gaming,
IPTV - Simple query-response protocols
- Overhead of connection establishment is overkill
- E.g., Domain Name System (DNS), DHCP, etc.
Address for www.cnn.com?
12.3.4.15
8Transmission Control Protocol (TCP)
- Stream-of-bytes service
- Sends and receives a stream of bytes
- Reliable, in-order delivery
- Corruption checksums
- Detect loss/reordering sequence numbers
- Reliable delivery acknowledgments and
retransmissions
- Connection oriented
- Explicit set-up and tear-down of TCP connection
- Flow control
- Prevent overflow of the receivers buffer space
- Congestion control
- Adapt to network congestion for the greater good
9Breaking a Stream of Bytes into TCP Segments
10TCP Stream of Bytes Service
Host A
Byte 0
Byte 1
Byte 2
Byte 3
Byte 80
Host B
Byte 0
Byte 1
Byte 2
Byte 3
Byte 80
11Emulated Using TCP Segments
Host A
Byte 0
Byte 1
Byte 2
Byte 3
Byte 80
- Segment sent when
- Segment full (Max Segment Size),
- Not full, but times out, or
- Pushed by application.
TCP Data
TCP Data
Host B
Byte 0
Byte 1
Byte 2
Byte 3
Byte 80
12TCP Segment
- IP packet
- No bigger than Maximum Transmission Unit (MTU)
- E.g., up to 1500 bytes on an Ethernet link
- TCP packet
- IP packet with a TCP header and data inside
- TCP header is typically 20 bytes long
- TCP segment
- No more than Maximum Segment Size (MSS) bytes
- E.g., up to 1460 consecutive bytes from the stream
13Sequence Number
Host A
ISN (initial sequence number)
Byte 81
Sequence number 1st byte
TCP Data
TCP Data
Host B
14Reliable Delivery on a Lossy Channel With Bit
Errors
15Challenges of Reliable Data Transfer
- Over a perfectly reliable channel
- Easy sender sends, and receiver receives
- Over a channel with bit errors
- Receiver detects errors and requests
retransmission - Over a lossy channel with bit errors
- Some data are missing, and others corrupted
- Receiver cannot always detect loss
- Over a channel that may reorder packets
- Receiver cannot distinguish loss from out-of-order
16An Analogy
- Alice and Bob are talking
- What if Bob couldnt understand Alice?
- Bob asks Alice to repeat what she said
- What if Bob hasnt heard Alice for a while?
- Is Alice just being quiet? Has she lost
reception? - How long should Bob just keep on talking?
- Maybe Alice should periodically say uh huh
- or Bob should ask Can you hear me now? ?
17Take-Aways from the Example
- Acknowledgments from receiver
- Positive okay or uh huh or ACK
- Negative please repeat that or NACK
- Retransmission by the sender
- After not receiving an ACK
- After receiving a NACK
- Timeout by the sender (stop and wait)
- Dont wait forever without some acknowledgment
18TCP Support for Reliable Delivery
- Detect bit errors checksum
- Used to detect corrupted data at the receiver
- leading the receiver to drop the packet
- Detect missing data sequence number
- Used to detect a gap in the stream of bytes
- ... and for putting the data back in order
- Recover from lost data retransmission
- Sender retransmits lost or corrupted data
- Two main ways to detect lost packets
19TCP Acknowledgments
Host A
ISN (initial sequence number)
Sequence number 1st byte
TCP Data
ACK sequence number next expected byte
TCP Data
Host B
20Automatic Repeat reQuest (ARQ)
- ACK and timeouts
- Receiver sends ACK when it receives packet
- Sender waits for ACK and times out
- Simplest ARQ protocol
- Stop and wait
- Send a packet, stop and wait until ACK arrives
21Flow ControlTCP Sliding Window
22Motivation for Sliding Window
- Stop-and-wait is inefficient
- Only one TCP segment is in flight at a time
- Especially bad for high delay-bandwidth product
bandwidth
delay
23Numerical Example
- 1.5 Mbps link with 45 msec round-trip time (RTT)
- Delay-bandwidth product is 67.5 Kbits (or 8
KBytes) - Sender can send at most one packet per RTT
- Assuming a segment size of 1 KB (8 Kbits)
- 8 Kbits/segment at 45 msec/segment ? 182 Kbps
- Thats just one-eighth of the 1.5 Mbps link
capacity
24Pipelined protocols
- Pipelining sender allows multiple, in-flight,
yet-to-be-acknowledged packets - range of sequence numbers must be increased
- buffering at sender and/or receiver
- Pipelined protocols concurrent logical channels,
sliding window protocol
25Sliding Window Protocol
- Consider an infinite array, Source, at the
sender, and an infinite array, Sink, at the
receiver.
send window
Source
P1 Sender
0
1
2
a1
a
s1
s
acknowledged
unacknowledged
next expected
r RW 1
Sink
received
P2 Receiver
0
1
2
r
delivered
receive window
RW receive window size SW send window size (s -
a ? SW)
26Sliding Windows in Action
- Data unit r has just been received by P2
- Receive window slides forward
- P2 sends cumulative ack with sequence number it
expects to receive next (r3)
r3
27Sliding Windows in Action
- P1 has just received cumulative ack with r3 as
next expected sequence number - Send window slides forward
send window
Source
P1 Sender
0
1
2
a1
a
s1
s
acknowledged
next expected
r RW 1
Sink
P2 Receiver
0
1
2
r
delivered
receive window
28Sliding Window protocol
- Functions provided
- error control (reliable delivery)
- in-order delivery
- flow and congestion control (by varying send
window size) - TCP uses only cumulative acks
- Other kinds of acks
- selective nack
- selective ack (TCP SACK)
- bit-vector representing entire state of receive
window (in addition to first sequence number of
window)
29Sliding Window Protocol
At the sender, a will be pointed to by SendBase,
and s by NextSeqNum
send window
Source
P1 Sender
0
1
2
a1
a
s1
s
acknowledged
unacknowledged
next expected
r RW 1
Sink
received
P2 Receiver
0
1
2
r
delivered
receive window
RW receive window size SW send window size (s -
a ? SW)
30TCP Flow Control
- receiver explicitly informs sender of
(dynamically changing) amount of free buffer
space - RcvWindow field in TCP segment
- sender keeps amount of transmitted, unACKed data
less than most recently received RcvWindow value
sender wont overrun receivers buffers
by transmitting too much, too fast
buffer at receive side of a TCP connection
31Optimizing Retransmissions
32Reasons for Retransmission
Timeout
Timeout
Timeout
Packet
Timeout
Timeout
Timeout
ACK lost DUPLICATE PACKET
Early timeout DUPLICATEPACKETS
Packet lost
33How Long Should Sender Wait?
- Sender sets a timeout to wait for an ACK
- Too short wasted retransmissions
- Too long excessive delays when packet lost
- TCP sets timeout as a function of the RTT
- Expect ACK to arrive after an round-trip time
- plus a fudge factor to account for queuing
- But, how does the sender know the RTT?
- Running average of delay to receive an ACK
34TCP Round Trip Time and Timeout
- Q how to estimate RTT?
- SampleRTT measured time from segment
transmission until ACK receipt - ignore retransmissions
- SampleRTT will vary, want estimated RTT
smoother - average several recent measurements, not just
current SampleRTT
35TCP Round Trip Time and Timeout
EstimatedRTT (1- ?)EstimatedRTT ?SampleRTT
- Exponential weighted moving average
- influence of past sample decreases exponentially
fast - typical value ? 0.125
36Example RTT estimation
37TCP retransmission scenarios
Host A
Host B
Seq92, 8 bytes data
Seq100, 20 bytes data
ACK100
ACK120
Seq92, 8 bytes data
Sendbase 100
SendBase 120
ACK120
Seq92 timeout
SendBase 100
SendBase 120
premature timeout scenario
38TCP retransmission scenarios (more)
SendBase 120
39Fast Retransmit
- Time-out period often relatively long
- long delay before resending lost packet
- Detect lost segments via duplicate ACKs.
- Sender often sends many segments back-to-back
- If segment is lost, there will likely be many
duplicate ACKs.
- If sender receives 3 ACKs for the same data, it
supposes that segment after ACKed data was lost - fast retransmit resend segment before timer
expires
40Figure 3.37 Resending a segment after triple
duplicate ACK
41Fast retransmit algorithm
event ACK received, with ACK field value of y
if (y gt SendBase)
SendBase y
if (there remains a not-yet-acknowledged
segment) start timer
else
increment count of dup
ACKs received for y if
(count of dup ACKs received for y 3)
resend segment with
sequence number y reset timer for y
a duplicate ACK for already ACKed segment
fast retransmit
42Effectiveness of Fast Retransmit
- When does Fast Retransmit work best?
- High likelihood of many packets in flight
- Long data transfers, large window size,
- Implications for Web traffic
- Most Web transfers are short (e.g., 10 packets)
- So, often there arent many packets in flight
- Making fast retransmit is less likely to kick
in - Forcing users to click reload more often ?
43Starting and Ending a ConnectionTCP Handshakes
44Establishing a TCP Connection
B
A
SYN
SYN ACK
Each host tells its ISN to the other host.
ACK
Data
Data
- Three-way handshake to establish connection
- Host A sends a SYN (open) to the host B
- Host B returns a SYN acknowledgment (SYN ACK)
- Host A sends an ACK to acknowledge the SYN ACK
45What if the SYN Packet Gets Lost?
- Suppose the SYN packet gets lost
- Packet is lost inside the network, or
- Server rejects the packet (e.g., listen queue is
full) - Eventually, no SYN-ACK arrives
- Sender sets a timer and wait for the SYN-ACK
- and retransmits the SYN if needed
- How should the TCP sender set the timer?
- Sender has no idea how far away the receiver is
- Some TCPs use a default of 3 or 6 seconds
46SYN Loss and Web Downloads
- User clicks on a hypertext link
- Browser creates a socket and does a connect
- The connect triggers the OS to transmit a SYN
- If the SYN is lost
- The 3-6 seconds of delay is very long
- The impatient user may click reload
- User triggers an abort of the connect
- Browser connects on a new socket
- Essentially, forces a fast send of a new SYN!
47Lecture 04 Transport Layer
- Transport layer protocols in the Internet
- UDP connectionless transport
- TCP connection-oriented transport
- TCP congestion control
48Principles of Congestion Control
- Congestion
- informally too many sources sending too much
data too fast for network to handle - different from flow control!
- manifestations
- lost packets (buffer overflow at routers)
- long delays (queueing in router buffers)
- a top-10 problem!
49Receiver Window vs. Congestion Window
- Flow control
- Keep a fast sender from overwhelming a slow
receiver - Congestion control
- Keep a set of senders from overloading the
network - Different concepts, but similar mechanisms
- TCP flow control receiver window
- TCP congestion control congestion window
- Sender TCP window
- min congestion window, receiver window
50How it Looks to the End Host
- Delay Packet experiences high delay
- Loss Packet gets dropped along path
- How does TCP sender learn this?
- Delay Round-trip time estimate
- Loss Timeout and/or duplicate acknowledgments
?
51Congestion Collapse
- Easily leads to congestion collapse
- Senders retransmit the lost packets
- Leading to even greater load
- and even more packet loss
Increase in load that results in a decrease in
useful work done.
congestion collapse
Goodput
Load
52Approaches towards congestion control
- Network-assisted congestion control
- routers provide feedback to end systems
- single bit indicating congestion (SNA, DECbit,
TCP/IP ECN, ATM) - explicit sending rate for sender
- End-to-end congestion control
- no explicit feedback from network
- congestion inferred from end-systems observed
loss and/or delay - approach taken by TCP
53TCP Congestion control
- end-to-end control (no network assistance)
- Tradeoff
- Pro avoids needing explicit network feedback
- Con continually under- and over-shoots right
rate
54TCP Congestion control
- Each TCP sender maintains a congestion window
- Max number of bytes to have in transit (not yet
ACKd) - Adapting the congestion window
- Decrease upon losing a packet backing off
- Increase upon success optimistically exploring
- Always struggling to find right transfer rate
55TCP Congestion Control
- How does sender determine CongWin?
- loss event timeout or 3 duplicate acks
- TCP sender reduces CongWin after loss event
- three mechanisms
- slow start
- AIMD
- reduce to 1 segment after timeout event
56TCP Slow Start
- Probing for usable bandwidth
- When connection begins, CongWin 1 MSS
- Example MSS 500 bytes RTT 200 msec
- initial rate 20 kbps
- available bandwidth may be gtgt MSS/RTT
- desirable to quickly ramp up to a higher rate
57TCP Slow Start (more)
- When connection begins, increase rate
exponentially until first loss event or
threshold - double CongWin every RTT
- done by incrementing CongWin by 1 MSS for every
ACK received - Summary initial rate is slow but ramps up
exponentially fast
58Congestion avoidance state responses to loss
events
- Q If no loss, when should the exponential
increase switch to linear? - A When CongWin gets to current value of
threshold -
- Implementation
- For initial slow start, threshold is set to a
very large value (e.g., 65 Kbytes) - At loss event, threshold is set to 1/2 of CongWin
just before loss event
59Rationale for Renos Fast Recovery
- 3 dup ACKs indicates network capable of
delivering some segments - timeout occurring before 3 dup ACKs is more
alarming
- After 3 dup ACKs
- CongWin is cut in half
- window then grows linearly
- But after timeout event
- CongWin is set to 1 MSS instead
- window then grows exponentially to a threshold,
then grows linearly
60Summary TCP Congestion Control
- When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially. - When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows
linearly. - When a triple duplicate ACK occurs, Threshold set
to CongWin/2 and CongWin set to Threshold. - When timeout occurs, Threshold set to CongWin/2
and CongWin is set to 1 MSS.
61AIMD in steady state
additive increase increase CongWin by 1 MSS
every RTT in the absence of any loss event
probing
- multiplicative decrease cut CongWin in half
after loss event (3 dup acks)
Long-lived TCP connection
62Why is TCP fair?
R
equal window size
loss decrease window by factor of 2
congestion avoidance additive increase
Connection 2 window size
loss decrease window by factor of 2
congestion avoidance additive increase
Connection 1 window size
R
63TCP Fairness
- Fairness goal if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K (AIMD only provides
convergence to same window size, not necessarily
same throughput rate)
64Fairness (more)
- Fairness and UDP
- Multimedia apps often do not use TCP
- do not want rate throttled by congestion control
- Instead use UDP
- pump audio/video at constant rate, tolerate
packet loss - TCP-friendly congestion control for apps that
prefer UDP, e.g., Datagram Congestion Control
Protocol (DCCP)
65End of Lecture04