Title: CSE 524: Lecture 12
1CSE 524 Lecture 12
2Transport Layer
- Last class
- CIDR exam question
- Specific transport layers
- UDP
- This class
- TCP
3TL TCP and Transport Layer Functions
- Demux to upper layer
- Quality of service
- Security
- Delivery semantics
- Flow control
- Congestion control
- Reliable data transfer
4TL TCP Overview RFCs 793, 1122, 1323, 2018,
2581
- point-to-point
- one sender, one receiver
- reliable, in-order byte steam
- no message boundaries
- pipelined
- TCP congestion and flow control set window size
- send receive buffers
- full duplex data
- bi-directional data flow in same connection
- MSS maximum segment size
- connection-oriented
- handshaking (exchange of control msgs) inits
sender, receiver state before data exchange - protocol implemented at ends (fate-sharing)
- flow and congestion controlled
- sender will not overwhelm receiver or network
5TL TCP header
URG urgent data (generally not used)
counting by bytes of data (not segments!)
ACK ACK valid
PSH push data now (generally not used)
bytes rcvr willing to accept
RST, SYN, FIN connection estab (setup,
teardown commands)
Internet checksum (as in UDP)
6TL TCP connections
- TCP sender, receiver establish connection
before exchanging data segments - initialize TCP variables
- Initial sequence s
- Buffers, flow control info (e.g. RcvWindow)
- Window scaling
- client connection initiator
- server contacted by client
- Java API
- Socket clientSocket new Socket("hostname","po
rt) Socket connectionSocket
welcomeSocket.accept()
7TL TCP connections
- Three way handshake
- Step 1 client end system sends TCP SYN control
segment to server - specifies initial seq
- should be random to prevent spoofing (
http//www.rfc-editor.org/rfc/rfc1948.txt ) - Step 2 server end system receives SYN, replies
with SYNACK control segment - ACKs received SYN
- allocates buffers
- specifies server-gt receiver initial seq.
- Step 3 client receives SYNACK control segment,
replies with ACK and potentially data - ACKs received SYNACK
- goes to established state
8TL TCP Connection Establishment
- A and B must agree on initial sequence number
selection - 3-way handshake
A
B
SYN Seq A
SYNACK-A Seq B
ACK-B
9TL TCP Sequence Number Selection
- Why not simply chose 0?
- Must avoid overlap with earlier incarnation
- Client machine seq 0, initiates connection to
server with seq 0. - Client sends one byte and machine crashes
- Client reboots and initiates connection again
- Server thinks new incarnation is the same as old
connection
10TL TCP Sequence Number Selection
- Why is selecting a random ISN Important?
- Suppose machine X selects ISN based on
predictable sequence - Fred has .rhosts to allow login to X from Y
- Evil Ed attacks
- Disables host Y denial of service attack
- Make a bunch of connections to host X
- Determine ISN pattern a guess next ISN
- Fake pkt1 ltsrc Ygtltdst Xgt, guessed ISN
- Fake pkt2 desired command
- Attack popularized by K. Mitnick
11TL TCP ISN selection and spoofing attacks
.rhosts Y
X
Ed
Y
12TL TCP connection setup
CLOSED
active OPEN
create TCB Snd SYN
passive OPEN
CLOSE
create TCB
delete TCB
CLOSE
LISTEN
delete TCB
APP SEND
rcv SYN
SYN SENT
SYN RCVD
snd SYN
snd SYN ACK
rcv SYN
snd ACK
Rcv SYN, ACK
rcv ACK of SYN
Snd ACK
CLOSE
ESTAB
Send FIN
13TL TCP connections
- Data transfer for established connections using
sequence numbers and sliding windows with
cumulative ACKs - Seq. s
- byte stream number of first byte in segments
data - ACKs
- seq of next byte expected from other side
- cumulative ACK
- duplicate acks sent when out-of-order packet
received - See web trace
- Java API
- connectionSocket.receive()
- clientSocket.send()
Host B
Host A
User types C
Seq42, ACK79, data C
host ACKs receipt of C, echoes back C
Seq79, ACK43, data C
host ACKs receipt of echoed C
Seq43, ACK80
simple telnet scenario
14TL TCP connections
- Closing a connection
- Client-initiated close (reverse process for
server-initiated close) - Java API clientSocket.close()
- Step 1 client end system sends TCP FIN control
segment to server - Step 2 server receives FIN, replies with ACK.
Closes connection, sends FIN.
15TL TCP connections
- Step 3 client receives FIN, replies with ACK.
- Enters timed wait - will respond with ACK to
received FINs - Step 4 server, receives ACK. Connection closed.
- Note with small modification, can handle
simultaneous FINs.
client
server
closing
FIN
ACK
closing
FIN
ACK
timed wait
closed
closed
16TL TCP Connection Tear-down
Sender
Receiver
FIN
FIN-ACK
Data write
Data ack
FIN
FIN-ACK
17TL TCP Connection Tear-down
CLOSE
ESTAB
send FIN
CLOSE
rcv FIN
send FIN
send ACK
CLOSE WAIT
FIN WAIT-1
rcv FIN
rcv ACK
CLOSE
snd ACK
snd FIN
rcv FINACK
FIN WAIT-2
CLOSING
LAST-ACK
snd ACK
rcv ACK of FIN
rcv ACK of FIN
TIME WAIT
CLOSED
rcv FIN
Timeout2msl
snd ACK
delete TCB
18TL Time Wait Issues
- Cannot close connection immediately after
receiving FIN - What if a new connection restarts and uses same
sequence number? - Web servers not clients close connection first
- Established ? Fin-Waits ? Time-Wait ? Closed
- Why would this be a problem?
- Time-Wait state lasts for 2 MSL
- MSL is should be 120 seconds (is often 60s)
- Servers often have order of magnitude more
connections in Time-Wait
19TL TCP connections
TCP server lifecycle
TCP client lifecycle
20TL TCP Demux to upper layer
gathering data from multiple app processes,
enveloping data with header (later used for
demultiplexing)
32 bits
source port
dest port
other header fields
- multiplexing/demultiplexing
- based on sender, receiver port numbers, IP
addresses - source, dest port s in each segment
- recall well-known port numbers for specific
applications - Servers wait on well known ports (/etc/services)
application data (message)
TCP/UDP segment format
21TL TCP Demux to upper layer
Web client host C
server B
host A
port use simple telnet app
Web server B
Web client host A
port use Web server
22TL TCP Flow control
- TCP is a sliding window protocol
- For window size n, can send up to n bytes without
receiving an acknowledgement - When the data is acknowledged then the window
slides forward - Each packet advertises a window size
- Indicates number of bytes the receiver has space
for - Original TCP always sent entire window
- Congestion control now limits this
23TL TCP Flow control
- receiver explicitly informs sender of
(dynamically changing) amount of free buffer
space - RcvWindow field in TCP segment
- sender keeps the amount of transmitted, unACKed
data less than most recently received RcvWindow
sender wont overrun receivers buffers
by transmitting too much, too fast
RcvBuffer size or TCP Receive Buffer RcvWindow
amount of spare room in Buffer
receiver buffering
24TL TCP Flow control
- What happens if window is 0?
- Receiver updates window when application reads
data - What if this update is lost?
- Deadlock
- TCP Persist timer
- Sender periodically sends window probe packets
- Receiver responds with ACK and up-to-date window
advertisement
25TL TCP flow control enhancements
- Problem (Clark, 1982)
- If receiver advertises small increases in the
receive window then the sender may waste time
sending lots of small packets - What happens if window is small?
- Small packet problem known as Silly window
syndrome - Receiver advertises one byte window
- Sender sends one byte packet (1 byte data, 40
byte header 4000 overhead)
26TL TCP flow control enhancements
- Solutions to silly window syndrome
- Clark (1982)
- receiver avoidance
- prevent receiver from advertising small windows
- increase advertised receiver window by min(MSS,
RecvBuffer/2) - Nagles algorithm (1984)
- sender avoidance
- prevent sender from unnecessarily sending small
packets - http//www.rfc-editor.org/rfc/rfc896.txt
- Inhibit the sending of new TCP segments when new
outgoing data arrives from the user if any
previously transmitted data on the connection
remains unacknowledged - Allow only one outstanding small (not full sized)
segment that has not yet been acknowledged - Works for idle connections (no deadlock)
- Works for telnet (send one-byte packets
immediately) - Works for bulk data transfer (delay sending)
27TL TCP reliable data transfer
- Segment integrity
- Acknowledgement generation
- Retransmission
28TL TCP RDT segment integrity
- Checksum included in header
- Is it sufficient to just checksum the packet
contents? - No, need to ensure correct source/destination
- Pseudoheader portion of IP hdr that are
critical - Checksum covers Pseudoheader, transport hdr, and
packet body - Layer violation, redundant with parts of IP
checksum
29TL TCP RDT acks and timeouts
- TCPs reliable data transfer approach
- Cumulative acknowledgements
- Receiver sends back the byte number it expects to
receive next - Out of order packets generate duplicate
acknowledgements - Receive 1, Ack 2
- Receive 4, Ack 2
- Receive 3, Ack 2
- Receive 2, Ack 5
- Retransmissions
- Sender sends segment and sets a timer
- Waits for an acknowledgement indicating segment
was received - Send 1
- Wait for Ack 2
- No Ack 2 and timer expires
- Send 1 again
30TL TCP RDT acks and timeouts
event data received from application above
simplified sender, assuming
- one way data transfer
- no flow, congestion control
create, send segment
wait for event
event timer timeout for segment with seq y
wait for event
retransmit segment
event ACK received, with ACK y
ACK processing
31TL TCP RDT acks and timeouts
00 sendbase initial_sequence number 01
nextseqnum initial_sequence number 02 03
loop (forever) 04 switch(event) 05
event data received from application above 06
create TCP segment with sequence
number nextseqnum 07 start timer for
segment nextseqnum 08 pass segment
to IP 09 nextseqnum nextseqnum
length(data) 10 event timer timeout for
segment with sequence number y 11
retransmit segment with sequence number y 12
compute new timeout interval for segment
y 13 restart timer for sequence
number y 14 event ACK received, with ACK
field value of y 15 if (y gt
sendbase) / cumulative ACK of all data up to y
/ 16 cancel all timers for
segments with sequence numbers lt y 17
sendbase y 18 19
else / a duplicate ACK for already
ACKed segment / 20 increment
number of duplicate ACKs received for y 21
if (number of duplicate ACKS received
for y 3) 22 / TCP
fast retransmit / 23 resend
segment with sequence number y 24
restart timer for segment y 25
26 / end of loop forever /
Simplified TCP sender
32TL TCP delayed acknowledgements
- Problem
- In request/response programs, you send separate
ACK and Data packets for each transaction - Delay ACK in order to send ACK back along with
data - Solution
- Dont ACK data immediately
- Wait 200ms (must be less than 500ms why?)
- Must ACK every other packet
- Must not delay duplicate ACKs
- Without delayed ACK 40 byte ack data packet
- With delayed ACK data packet includes ACK
- See web trace example
- Extensions for asymmetric links
- See later part of lecture
33TL TCP ACK generation RFC 1122, RFC 2581
TCP Receiver action delayed ACK. Wait up to
500ms for next segment. If no next segment, send
ACK immediately send single cumulative ACK
send duplicate ACK, indicating seq. of next
expected byte immediate ACK if segment
starts at lower end of gap
Event in-order segment arrival, no
gaps, everything else already ACKed in-order
segment arrival, no gaps, one delayed ACK
pending out-of-order segment arrival higher-than-
expect seq. gap detected arrival of segment
that partially or completely fills gap
34TL TCP retransmission
- Wait at least one RTT before retransmitting
packet - Importance of accurate RTT estimators
- Estimator too low ? unneeded retransmissions
- Estimator too high ? poor throughput, slow
reaction to segment loss - RTT estimator must adapt to change in RTT
- But not too fast, or too slow!
- Backing off the retransmission timeout
- Exponential backoff
- Double retransmission timer interval after every
loss until successful retransmission
35TL TCP retransmission scenarios
Host A
Host B
Seq92, 8 bytes data
Seq100, 20 bytes data
Seq92 timeout
ACK100
ACK120
Seq100 timeout
Seq92, 8 bytes data
ACK120
premature timeout, cumulative ACKs
36TL Initial Round-trip Estimator
- Round trip times exponentially averaged
- Recommended value for x 0.1-0.2
- 0.125 for most TCPs
- Influence of given sample decreases exponentially
fast - Retransmit timer set to b RTT, where b 2
- Every time timer expires, RTO exponentially
backed-off - Like Ethernet
- Not good at preventing spurious timeouts
EstimatedRTT (1-x)EstimatedRTT xSampleRTT
37TL Jacobsons Retransmission Timeout
- Key observation
- At high loads round trip variance is high
- Need larger safety margin with larger variations
in RTT - Solution
- Base RTO value on RTT and standard deviation
(RRTT)
38TL Jacobsons Retransmission Timeout
EstimatedRTT (1-x)EstimatedRTT xSampleRTT
- Setting the timeout
- EstimtedRTT plus safety margin
- large variation in EstimatedRTT -gt larger safety
margin
Timeout EstimatedRTT 4Deviation
Deviation (1-x)Deviation
xSampleRTT-EstimatedRTT
39TL Retransmission Ambiguity
A
B
Original transmission
X
RTO
Sample RTT
retransmission
ACK
40TL Karns algorithm
- Accounts for retransmission ambiguity
- If a segment has been retransmitted
- Dont count RTT sample on ACKs for this segment
- Keep backed off time-out for next packet
- Reuse RTT estimate only after one successful
transmission
41TL Timer Granularity
- Many TCP implementations set RTO in multiples of
200,500,1000ms - Why?
- Avoid spurious timeouts RTTs can vary quickly
due to cross traffic - Make timers interrupts efficient
42TL TCP Congestion Control
- Motivated by ARPANET congestion collapse
- Flow control, but no congestion control
- Sender sends as much as the receiver resources
will allow - Go-back-N on loss, burst out advertised window
- Congestion control
- Extending control to network resources
- Underlying design principle packet conservation
- At equilibrium, inject packet into network only
when one is removed - Basis for stability of physical systems (fluid
model) - Why was this not working before?
- No equilibrium
- Solved by self-clocking and congestion window
- Spurious retransmissions
- Solved by accurate RTO estimation (see earlier
discussion) - Resource limitations prevent equilibrium
- Solved by congestion window and congestion
avoidance algorithms
43TL TCP congestion control basics
- Keep a congestion window, cwnd
- Book calls this Congwin
- Denotes how much network is able to absorb
- Senders maximum window
- Min (receivers advertised window, cwnd)
- Senders actual window
- Max window - unacknowledged segments
44TL TCP Congestion Control
- end-end control (no network assistance)
- transmission rate limited by congestion window
size, cwnd over segments
cwnd
- w segments, each with MSS bytes sent in one RTT
45TL TCP congestion control
- two phases
- slow start
- congestion avoidance
- important variables
- cwnd
- ssthresh defines threshold between two slow
start phase, congestion control phase (Book calls
this threshold) - useful reference
- http//www.aciri.org/floyd/papers/sacks.ps.Z
- probing for usable bandwidth
- ideally transmit as fast as possible (cwnd as
large as possible) without loss - increase cwnd until loss (congestion)
- loss decrease cwnd, then begin probing
(increasing) again
46TL TCP slow start
- Start the self-clocking behavior of TCP
- Use acks to clock sending new data
- Do not send entire advertised window in one shot
Pr
Pb
Sender
Receiver
Ab
As
Ar
47TL TCP slow start
Host A
Host B
initialize cwnd 1 for (each segment ACKed)
cwnd until (loss event OR cwnd gt
ssthresh)
one segment
RTT
two segments
four segments
- exponential increase (per RTT) in window size
- Window actually increases to W in RTT log2(W)
- Can overshoot window and cause packet loss
48TL TCP slow start example
49TL TCP slow start sequence plot
. . .
Sequence No
Time