Title: TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581
1TCP Overview RFCs 793, 1122, 1323, 2018, 2581
- point-to-point
- one sender, one receiver
- reliable, in-order byte steam
- no message boundaries
- pipelined
- TCP congestion and flow control set window size
- send receive buffers
- full duplex data
- bi-directional data flow in same connection
- MSS maximum segment size
- connection-oriented
- handshaking (exchange of control msgs) inits
sender, receiver state before data exchange - flow controlled
- sender will not overwhelm receiver
2TCP segment structure
URG urgent data (generally not used)
counting by bytes of data (not segments!)
ACK ACK valid
PSH push data now
bytes rcvr willing to accept
RST, SYN, FIN connection estab (setup,
teardown commands)
Internet checksum (as in UDP)
3TCP Options
- MSS
- Timestamp
- Window scale
- NOP (no operation)
- Selective-ACK
4TCP Connection Management
- Three way handshake
- Step 1 client host sends TCP SYN segment to
server - specifies initial seq
- no data
- Step 2 server host receives SYN, replies with
SYNACK segment - server allocates buffers
- specifies server initial seq.
- Step 3 client receives SYNACK, replies with ACK
segment, which may contain data
- Recall TCP sender, receiver establish
connection before exchanging data segments - initialize TCP variables
- seq. s
- buffers, flow control info (e.g. RcvWindow)
- client connection initiator
- connect()
- server contacted by client
- accept()
5TCP connection
- SYN, FIN consume one sequence number
- ISN increase by one every 4ms.
- 9.5 hours reuse cycle.
- MSS maximum segment size
6TCP Connection Management (cont.)
- Closing a connection
- client closes socket close()
- Step 1 client end system sends TCP FIN control
segment to server - Step 2 server receives FIN, replies with ACK.
Closes connection, sends FIN.
7TCP Connection Management (cont.)
- Step 3 client receives FIN, replies with ACK.
- Enters timed wait - will respond with ACK to
received FINs - Step 4 server, receives ACK. Connection closed.
- Note with small modification, can handle
simultaneous FINs.
client
server
closing
FIN
FIN_WAIT_1
ACK
closing
FIN
FIN_WAIT_2
TIME_WAIT
ACK
timed wait
closed
closed
8TCP Connection Management (cont)
TCP server lifecycle
TCP client lifecycle
9TCP Connection Management
- Allow half-close, i.e., one end to terminate its
output, but still receiving data - Allow simultaneous open
- Allow simultaneous close
- Crashes?
10root_at_shannon liu tcpdump -S tcp port
22 tcpdump listening on eth0 230151.363983
shannon.cs.ucdavis.edu.60042 gt weasel.cs.ucdavis.e
du.ssh S 30367135983036713598(0) win 5840 ltmss
1460,sackOK,timestamp 13989220 0,nop,wscale 0gt
(DF) 230151.364829 weasel.cs.ucdavis.edu.ssh gt
shannon.cs.ucdavis.edu.60042 S
24622798152462279815(0) ack 3036713599 win 24616
ltnop,nop,timestamp 626257407 13989220,nop,wscale
0,nop,nop,sackOK,mss 1460gt (DF) 230151.364844
shannon.cs.ucdavis.edu.60042 gt weasel.cs.ucdavis.e
du.ssh . ack 2462279816 win 5840
ltnop,nop,timestamp 13989220 626257407gt
(DF) 230151.375451 weasel.cs.ucdavis.edu.ssh gt
shannon.cs.ucdavis.edu.60042 P
24622798162462279865(49) ack 3036713599 win
24616 ltnop,nop,timestamp 626257408 13989220gt
(DF) 230151.375478 shannon.cs.ucdavis.edu.60042
gt weasel.cs.ucdavis.edu.ssh . ack 2462279865
win 5840 ltnop,nop,timestamp 13989221 626257408gt
(DF) 230151.379319 shannon.cs.ucdavis.edu.60042
gt weasel.cs.ucdavis.edu.ssh P
30367135993036713621(22) ack 2462279865 win 5840
ltnop,nop,timestamp 13989221 626257408gt
(DF) 230151.379570 weasel.cs.ucdavis.edu.ssh gt
shannon.cs.ucdavis.edu.60042 . ack 3036713621
win 24616 ltnop,nop,timestamp 626257408
13989221gt (DF)
11230151.941616 shannon.cs.ucdavis.edu.60042 gt
weasel.cs.ucdavis.edu.ssh P 30367143733036714437
(64) ack 2462281065 win 7680 ltnop,nop,timestamp
13989277 626257462gt (DF) 230151.952442
weasel.cs.ucdavis.edu.ssh gt shannon.cs.ucdavis.edu
.60042 P 24622810652462282153(1088) ack
3036714437 win 24616 ltnop,nop,timestamp 626257465
13989277gt (DF) 230151.991682
shannon.cs.ucdavis.edu.60042 gt weasel.cs.ucdavis.e
du.ssh . ack 2462282153 win 9792
ltnop,nop,timestamp 13989283 626257465gt
(DF) 230154.699597 shannon.cs.ucdavis.edu.60042
gt weasel.cs.ucdavis.edu.ssh F
30367144373036714437(0) ack 2462282153 win 9792
ltnop,nop,timestamp 13989553 626257465gt
(DF) 230154.699880 weasel.cs.ucdavis.edu.ssh gt
shannon.cs.ucdavis.edu.60042 . ack 3036714438
win 24616 ltnop,nop,timestamp 626257740
13989553gt(DF) 230154.701129 weasel.cs.ucdavis.e
du.ssh gt shannon.cs.ucdavis.edu.60042 F
24622821532462282153(0) ack 3036714438 win 24616
ltnop,nop,timestamp 626257740 13989553gt
(DF) 230154.701143 shannon.cs.ucdavis.edu.60042
gt weasel.cs.ucdavis.edu.ssh . ack 2462282154
win 9792 ltnop,nop,timestamp 13989553 626257740gt
(DF) 26 packets received by filter 0 packets
dropped by kernel
12Outline
- Transport-layer services
- Multiplexing and demultiplexing
- Connectionless transport UDP
- Principles of reliable data transfer
- Connection-oriented transport TCP
- segment structure
- reliable data transfer
- flow control
- connection management
- Principles of congestion control
- TCP congestion control
13TCP seq. s and ACKs
- Seq. s
- byte stream number of first byte in segments
data - ACKs
- seq of next byte expected from other side
- cumulative ACK
- Q how receiver handles out-of-order segments
- A TCP spec doesnt say, - up to implementor
Host B
Host A
User types C
Seq42, ACK79, data C
host ACKs receipt of C, echoes back C
Seq79, ACK43, data C
host ACKs receipt of echoed C
Seq43, ACK80
simple telnet scenario
14TCP Round Trip Time and Timeout
- Q how to estimate RTT?
- SampleRTT measured time from segment
transmission until ACK receipt - ignore retransmissions
- SampleRTT will vary, want estimated RTT
smoother - average several recent measurements, not just
current SampleRTT
- Q how to set TCP timeout value?
- longer than RTT
- but RTT varies
- too short premature timeout
- unnecessary retransmissions
- too long slow reaction to segment loss
15TCP Round Trip Time and Timeout
EstimatedRTT (1- ?)EstimatedRTT ?SampleRTT
- Exponential weighted moving average
- influence of past sample decreases exponentially
fast - typical value ? 0.125
16Example RTT estimation
17TCP Round Trip Time and Timeout
- Setting the timeout
- EstimtedRTT plus safety margin
- large variation in EstimatedRTT -gt larger safety
margin - first estimate of how much SampleRTT deviates
from EstimatedRTT
DevRTT (1-?)DevRTT
?SampleRTT-EstimatedRTT (typically, ? 0.25)
Then set timeout interval
TimeoutInterval EstimatedRTT 4DevRTT
18RTT
- Timestamp can be used to measure RTT for each
segment - Better RTT estimate
- NO Syn required
19Outline
- Transport-layer services
- Multiplexing and demultiplexing
- Connectionless transport UDP
- Principles of reliable data transfer
- Connection-oriented transport TCP
- segment structure
- reliable data transfer
- flow control
- connection management
- Principles of congestion control
- TCP congestion control
20TCP reliable data transfer
- TCP creates rdt service on top of IPs unreliable
service - Pipelined segments
- Cumulative acks
- TCP uses single retransmission timer
- Retransmissions are triggered by
- timeout events
- duplicate acks
- Initially consider simplified TCP sender
- ignore duplicate acks
- ignore flow control, congestion control
21TCP sender events
- data rcvd from app
- Create segment with seq
- seq is byte-stream number of first data byte in
segment - start timer if not already running (think of
timer as for oldest unacked segment) - expiration interval TimeOutInterval
- timeout
- retransmit segment that caused timeout
- restart timer
- Ack rcvd
- If acknowledges previously unacked segments
- update what is known to be acked
- start timer if there are outstanding segments
22TCP sender(simplified)
NextSeqNum InitialSeqNum
SendBase InitialSeqNum loop (forever)
switch(event) event
data received from application above
create TCP segment with sequence number
NextSeqNum if (timer currently
not running) start timer
pass segment to IP
NextSeqNum NextSeqNum length(data)
event timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer event ACK
received, with ACK field value of y
if (y gt SendBase)
SendBase y if (there are
currently not-yet-acknowledged segments)
start timer
/ end of loop forever /
- Comment
- SendBase-1 last
- cumulatively acked byte
- Example
- SendBase-1 71y 73, so the rcvrwants 73
y gt SendBase, sothat new data is acked
23TCP retransmission scenarios
Host A
Host B
Seq92, 8 bytes data
Seq100, 20 bytes data
ACK100
ACK120
Seq92, 8 bytes data
Sendbase 100
SendBase 120
ACK120
Seq92 timeout
SendBase 100
SendBase 120
premature timeout
24TCP retransmission scenarios (more)
SendBase 120
25TCP ACK generation RFC 1122, RFC 2581
TCP Receiver action Delayed ACK. Wait up to
500ms for next segment. If no next segment, send
ACK Immediately send single cumulative ACK,
ACKing both in-order segments Immediately send
duplicate ACK, indicating seq. of next
expected byte Immediate send ACK, provided
that segment startsat lower end of gap
Event at Receiver Arrival of in-order segment
with expected seq . All data up to expected seq
already ACKed Arrival of in-order segment
with expected seq . One other segment has ACK
pending Arrival of out-of-order
segment higher-than-expect seq. . Gap
detected Arrival of segment that partially or
completely fills gap
26Nagel algorithm
- Small segments cannot be sent until the
outstanding data is acked if a TCP connection has
outstanding data that has not yet been acked. - Self-clocking the fast the ack comes back, the
faster the data is sent - May need to be disabled somtimes
27Fast Retransmit
- Time-out period often relatively long
- long delay before resending lost packet
- Detect lost segments via duplicate ACKs.
- Sender often sends many segments back-to-back
- If segment is lost, there will likely be many
duplicate ACKs.
- If sender receives 3 ACKs for the same data, it
supposes that segment after ACKed data was lost - fast retransmit resend segment before timer
expires
28Fast retransmit algorithm
event ACK received, with ACK field value of y
if (y gt SendBase)
SendBase y
if (there are currently not-yet-acknowledged
segments) start
timer
else increment count
of dup ACKs received for y
if (count of dup ACKs received for y 3)
resend segment with
sequence number y
a duplicate ACK for already ACKed segment
fast retransmit