Title: Transport Layer
1Transport Layer
- Michalis Faloutsos
- Many slides from Kurose-Ross
2Transport Layer Functionality
- Hide network from application layer
- Transport layer resides at end points
- Sees the network as a black box
3Transport Layers of the Internet
- TCP reliable protocol
- Guarantees end-to-end delivery
- Self-controls rate congestion and flow control
- Connection oriented handshake, state
- Ordered delivery of packets to application
- UDP unreliable protocol
- Non-regulated sending rate
- Multiplexing-demultiplexing
4TCP overview
5TCP What and How For more RFCs 793, 1122,
1323, 2018, 2581
- point-to-point
- one sender, one receiver
- reliable, in-order byte steam
- no message boundaries
- pipelined
- TCP congestion and flow control set window size
- send receive buffers
- full duplex data
- bi-directional data flow in same connection
- MSS maximum segment size
- connection-oriented
- handshaking (exchange of control msgs) inits
sender, receiver state before data exchange - flow controlled
- sender will not overwhelm receiver
6TCP segment structure
URG urgent data (generally not used)
counting by bytes of data (not segments!)
ACK ACK valid
PSH push data now (generally not used)
bytes rcvr willing to accept
RST, SYN, FIN connection estab (setup,
teardown commands)
Internet checksum (as in UDP)
7TCP overview
- TCP is a sliding window protocol
- Sender can have (Window) bytes in flight
- Operates with cumulative ACKs
- It includes control for the sending rate
- Flow control receiver-set sending rate
- Congestion control network-aware sending rate
Congwin
8TCP seq. s and ACKs
- Seq. s
- byte stream number of first byte in segments
data - ACKs
- seq of next byte expected from other side
- cumulative ACK
- Q how receiver handles out-of-order segments
- A TCP spec doesnt say, - up to implementor
Host B
Host A
User types C
Seq42, ACK79, data C
host ACKs receipt of C, echoes back C
Seq79, ACK43, data C
host ACKs receipt of echoed C
Seq43, ACK80
simple telnet scenario
9TCP in a nutshell
- I. Slow start phase (actually this is fast
increase) - Start with a window of 1 (or 2)
- Successful ACK Increase window by one 1 max size
segment - Do this up to a threshold sshthresh
- II. Congestion control phase
- Increase window by 1 max size segment every RTT
- Drop window in half, if there is congestion
- Packet loss duplicate ACKs
- Time expiration
10TCP Congestion Control
- end-end control (no network assistance)
- transmission rate limited by congestion window
size, Congwin, over segments
Congwin
w segments, each with MSS bytes sent in one RTT
11TCP congestion control Intuition
- TCP is probing for usable bandwidth
- ideally transmit as fast as possible (Congwin as
large as possible) without loss - increase Congwin until loss (congestion)
- loss decrease Congwin, then begin probing
(increasing) again
12TCP congestion control
- TCP has two phases
- slow start
- start from small, increase quickly
- congestion avoidance
- Additive Increase Multiplicative Decrease
- important variables
- Congwin
- threshold defines threshold between two slow
start phase, congestion control phase
13TCP Slowstart
Host A
Host B
one segment
RTT
initialize Congwin 1 for (each segment ACKed)
Congwin until (loss event OR
CongWin gt threshold)
two segments
four segments
- exponential increase (per RTT) in window size
- loss event timeout (Tahoe TCP) and/or or three
duplicate ACKs (Reno TCP)
14Why Call it Slow Start ?
- The original version of TCP suggested that the
sender transmit as much as the Advertised Window
permitted. - Routers may not be able to cope with this burst
of transmissions. - Slow start is slower than the above version --
ensures that a transmission burst does not happen
at once.
15TCP Congestion Avoidance
Congestion avoidance
/ slowstart is over / / Congwin gt
threshold / Until (loss event) every w
segments ACKed Congwin threshold
Congwin/2 Congwin 1 perform slowstart
1
1 TCP Reno skips slowstart (fast recovery)
after three duplicate ACKs
16TCP Congestion Real Life is Hairy!
Congestion avoidance
- Remember bytes vs packets!
- CW MSS MSS/CW
- Thres Max( 2 MSS,
- InFlightData/2)
- MSS max segment size
- InFlighData un-ACK-ed data
/ slowstart is over / / Congwin gt
threshold / Until (loss event) every w
segments ACKed Congwin threshold
Congwin/2 Congwin 1 perform slowstart
1
- RFC 2581 TCP Congestion Control
17TCP Fairness and AIMD
- TCP congestion avoidance
- AIMD additive increase, multiplicative decrease
- increase window by 1 per RTT
- decrease window by factor of 2 on loss event
- Fairness goal if N TCP sessions share same
bottleneck link, each should get 1/N of link
capacity
TCP connection 1
bottleneck router capacity R
TCP connection 2
18Why is TCP fair?
- Two competing sessions
- Additive increase gives slope of 1, as throughout
increases - multiplicative decrease decreases throughput
proportionally
R
equal bandwidth share
loss decrease window by factor of 2
congestion avoidance additive increase
Connection 2 throughput
loss decrease window by factor of 2
congestion avoidance additive increase
Connection 1 throughput
R
19Macroscopic Description of Throughput
- Assume window toggling W/2 to W
- High rate W MSS / RTT
- Low rate W MSS / 2 RTT
- Rate increase is linearly between two extremes
- Average throughput
- 0.75 W MSS / RTT
20TCP reliable data transfer
event data received from application above
Simplified sender, assuming
create, send segment
- one way data transfer
- no flow, congestion control
wait for event
event timer timeout for segment with seq y
wait for event
retransmit segment
event ACK received, with ACK y
ACK processing
21TCP sender
00 sendbase initial_sequence number 01
nextseqnum initial_sequence number 02 03
loop (forever) 04 switch(event) 05
event data received from application above 06
create TCP segment with sequence
number nextseqnum 07 start timer for
segment nextseqnum 08 pass segment
to IP 09 nextseqnum nextseqnum
length(data) 10 event timer timeout for
segment with sequence number y 11
retransmit segment with sequence number y 12
compute new timeout interval for segment
y 13 restart timer for sequence
number y 14 event ACK received, with ACK
field value of y 15 if (y gt
sendbase) / cumulative ACK of all data up to y
/ 16 cancel all timers for
segments with sequence numbers lt y 17
sendbase y 18 19
else / a duplicate ACK for already
ACKed segment / 20 increment
number of duplicate ACKs received for y 21
if (number of duplicate ACKS received
for y 3) 22 / TCP
fast retransmit / 23 resend
segment with sequence number y 24
restart timer for segment y 25
26 / end of loop forever /
Simplified TCP sender
22TCP Receiver ACK generation RFC 1122, RFC 2581
TCP Receiver action delayed ACK. Wait up to
500ms for next segment. If no next segment, send
ACK immediately send single cumulative ACK
send duplicate ACK, indicating seq. of next
expected byte immediate ACK if segment
starts at lower end of gap
Event in-order segment arrival, no
gaps, everything else already ACKed in-order
segment arrival, no gaps, one delayed ACK
pending out-of-order segment arrival higher-than-
expect seq. gap detected arrival of segment
that partially or completely fills gap
23TCP retransmission scenarios
Host A
Host B
Seq92, 8 bytes data
Seq100, 20 bytes data
Seq92 timeout
ACK100
ACK120
Seq100 timeout
Seq92, 8 bytes data
ACK120
premature timeout, cumulative ACKs
24TCP Round Trip Time and Timeout
- Q how to estimate RTT?
- SampleRTT measured time from segment
transmission until ACK receipt - ignore retransmissions, cumulatively ACKed
segments - SampleRTT will vary, want estimated RTT
smoother - use several recent measurements, not just current
SampleRTT
- Q how to set TCP timeout value?
- longer than RTT
- note RTT will vary
- too short premature timeout
- unnecessary retransmissions
- too long slow reaction to segment loss
25TCP Round Trip Time and Timeout
EstimatedRTT (1-x)EstimatedRTT xSampleRTT
Exponential weighted moving average influence of
given sample decreases exponentially fast typical
value of x 0.1
- Setting the timeout
- EstimtedRTT plus safety margin
- large variation in EstimatedRTT -gt larger safety
margin
Timeout EstimatedRTT 4Deviation
Deviation (1-x)Deviation
xSampleRTT-EstimatedRTT
26A problem
- When there are retransmissions, it is unclear if
the ACK is for the original transmission or for a
retransmission. - How do we overcome this ?
27The Karn Patridge Algorithm
- Take SampleRTT measurements only for segments
that have been sent once ! - This eliminates the possibility that wrong RTT
estimates are factored into the estimation. - Another change -- Each time TCP retransmits, it
sets the next timeout to 2 X Last timeout --gt
This is called the Exponential Back-off
(primarily for avoiding congestion).
28Jacobson Karels Algorithm
- An issue with the Karn/Patridge scheme is that it
does not take into account the variation between
RTT samples. - New method proposed -- the Jacobson Karels
Algorithm. - Estimated RTT Estimated RTT d X Difference
- Difference Sample RTT - Estimated RTT
- Deviation Deviation d (Difference -
deviation) - Timeout m Estimated RTT f deviation.
- The values of m and f are computed based on
experience -- Typically m 1 and f 4.
29Silly Window Syndrome
- Suppose a MSS worth of data is collected and
advertised window is MSS/2. - What should the sender do ? -- transmit half full
segments or wait to send a full MSS when window
opens ? - Early implementations were aggressive -- transmit
MSS/2. - Aggressively doing this, would consistently
result in small segment sizes -- called the Silly
Window Syndrome.
30Issues ..
- We cannot eliminate the possibility of small
segments being sent. - However, we can introduce methods to coalesce
small chunks. - Delaying ACKs -- receiver does not send ACKs as
soon as it receives segments. - How long to delay ? Not very clear.
- Ultimate solution falls to the sender -- when
should I transmit ?
31Nagles Algorithm
- If sender waits too long --gt bad for interactive
connections. - If it does not wait long enough -- silly window
syndrome. - How do we solve this?
- Timer -- clock based
- If both available data and Window MSS, send
full segment. - Else, if there is unACKed data in flight, buffer
new data until ACK returns. - Else, send new data now.
- Note -- Socket interface allows some applications
to turn off Nagles algorithm by setting the
TCP-NODELAY option.
32TCP Connection Management
- Recall TCP sender, receiver establish
connection before exchanging data segments - initialize TCP variables
- seq. s
- buffers, flow control info (e.g. RcvWindow)
- client connection initiator
- Socket clientSocket new Socket("hostname","p
ort number") - server contacted by client
- Socket connectionSocket welcomeSocket.accept()
33TCP Set-up
- Three way handshake
- Step 1 client end system sends TCP SYN control
segment to server - specifies initial seq
- Step 2 server end system receives SYN, replies
with SYNACK control segment - ACKs received SYN
- allocates buffers
- specifies server-gt receiver initial seq.
- Step 3 Client replies with an ACK (using servers
seq number)
34TCP Connection Management (cont.)
- Closing a connection
- client closes socket clientSocket.close()
- Step 1 client end system sends TCP FIN control
segment to server - Step 2 server receives FIN, replies with ACK.
Closes connection, sends FIN. - Last ACK is never ACK-ed!!
35TCP Connection Management (cont.)
client
server
- Step 3 client receives FIN, replies with ACK.
- Enters timed wait - will respond with ACK to
received FINs - Step 4 server, receives ACK. Connection closed.
Sends FIN. - Last ACK is never ACK-ed
closing
FIN
ACK
closing
FIN
ACK
timed wait
closed
closed
36TCP Connection Management (cont)
TCP server lifecycle
TCP client lifecycle