TCP - PowerPoint PPT Presentation

About This Presentation
Title:

TCP

Description:

If TCP reports that the message has been delivered then we can rest ... Bennet and Partridge, Packets reordering is not pathological network behavior, 1999. ... – PowerPoint PPT presentation

Number of Views:343
Avg rating:3.0/5.0
Slides: 63
Provided by: skb
Category:
Tags: tcp | partridge

less

Transcript and Presenter's Notes

Title: TCP


1
TCP
10
2
TCP purpose
  • TCP provides reliable data transmission over an
    unreliable network.
  • TCP provides congestion control
  • TCP provides flow control
  • TCP passes messages
  • Inputs
  • Destination address
  • Destination port
  • Source port (socket)
  • Message
  • Outputs
  • Message
  • Error reporting
  • If TCP reports that the message has been
    delivered then we can rest assured that the
    receiving application has received the data. What
    the application does with it is another story.
  • At least 85 of all traffic uses TCP.but I heard
    the 50 of traffic in S. Korea uses UDP (gaming).
  • UDP
  • No flow control
  • No error reporting (little error reporting)

BGP
FTP
HTTP
SMTP
telnet
icmp
UDP
OSPF
TCP
IP
3
TCP header
  • IP header is 20 bytes (source IP, destination IP,
    protocol, TTL,)
  • TCP header 20 bytes

Source port
Destination port
Sequence
ACK
Header length 4 bits
Reserved 6
U R G
A C K
P S H
R S T
S Y N
F I N
REC WIN 16
CHECK SUM 16
Urgent ptr 16
Options and padding
4
  • Ports used so a single host can have many
    connections at the same time. When a packet
    arrives, it is distinguished by the source IP,
    source port, and destination port. More or less,
    the IPs and port define an application
  • Sequence number indicates the 1st byte of the
    data.
  • ACK is the next expected sequence number
  • Header length in 32 bit words. 4 bits means the
    max size is 60 bytes. 20 bytes are used by the
    header, so up to 40 bytes more could be in
    options.
  • flags
  • URG urgent ptr (urgent data and valid urgent
    ptr, eg., cntrl-c)
  • ACK ACK number is valid
  • PSH receiver (the receiver should pass this
    data to the application as soon as possible as
    oppose to what? This should be set when this
    packet will empty the outgoing buffer so the
    receiver should not wait for a full buffer before
    passing data to the app. Just send it now.)
  • RST reset connection (something went wrong,
    good for detecting attacks).
  • SYN synchronize sequence number
  • FIN sender is finished sending data

5
connection establishment
Node A initiates a connection with node B gt Node
A performs an active open, node B passive open
(listen)
dest
source
SYN1, seq2197 ACK0
Send SYN
Send SYN/ACK
SYN1, seq197 ACK1, ack2198
Send ACK (for syn)
ACK flag1 ack198 seq2198
Initial SYN depends on implementation
6
Connection establishment
  • If the first SYN is dropped, then it is resent 3
    seconds later. If this is dropped, it is resent 6
    seconds. And so on. The maximum waiting time is
    64 seconds. The maximum time can be as high as
    180 second. But this depends on the
    implementation.
  • If the listener doesnt get an ACK, it will
    retransmit in 3 second and back-off in the same
    way.
  • But if the listener gets a data packet, the ack
    will be set and this will end the connection
    establishment.
  • Often during connection establishment connection
    setup data is included in the options.
  • Eg., the segment size is included in the options.
  • More option discussed later

7
Connection termination
  • FIN flag implies no more data will be sent from
    that host.
  • A FIN from each side closes the connection.
  • A FIN from only one size puts the connection in
    the half close state.
  • Example
  • Node A sends first
  • A sends pkt with FIN1 and seqU (A enters
    FIN_WAIT)
  • B responds with ACK and ackU1 (B enters
    close_wait)
  • A receives ACK (A enters FIN_WAIT2)
  • Now b closes
  • B send pkt with FIN set and seqV (enters
    LAST_ACK)
  • A responds with ACK and ackV1 (enters
    TIME_WAIT and stays there for 120 seconds and
    then enters closed)
  • B receives ACK and enters closed.
  • Use netstat to determine the state of the TCP
    connections.

8
Sending data
  • Either side can send data. When sequence number
    indicates where the first byte is placed in the
    receiver buffer.
  • The receiver responds with an ACK, the ack
    indicates the next empty byte location in the
    buffer.

SYN had seq14
Seq20 Ack1001 Data Hi, size 2 (bytes)
Seq
15
16
17
18
19
20
21
22
e
H
i
S
t
e
v
buffer
Seq1001 Ack22 Data size 0
9
SYN had seq14
Seq20 Ack1001 Data Hi, size 2 (bytes)
Seq
15
16
17
18
19
20
21
22
e
S
t
e
v
buffer
Seq22 Ack1001 Data Bye, size 3 (bytes)
SYN had seq14
Seq
15
16
17
18
19
20
21
22
e
S
t
e
v
B
y
e
buffer
Seq1001 Ack20 Data size 0
Seq20 Ack1001 Data Hi, size 2 (bytes)
SYN had seq14
Seq
15
16
17
18
19
20
21
22
e
S
t
e
v
B
y
e
H
i
buffer
Seq1001 Ack25 Data size 0
Note here the receiver is not sending data, so
its seq num is never changing and the reply ack
is never changing. But the definitions of the ACK
and SYN remain valid. Note that SYN and FIN
packets are special cases. No data, but the ACKs
increment.
10
Retransmission time-out
  • How to decide when a packet should be
    retransmitted?
  • Two methods. Here we talk about the first, when
    the ACK has not been received in a long time, TCP
    assumes that the packet was dropped.
  • How long is a long time..? No good solution.

Van jackobsons algorithm
This does not work all that well. Really, it is
MinRTO that controls when time-outs occur. Van
Jackobsons algorithm does not work well. But
more analysis is required.
11
RTO analysis
Using the July 25, 2001 snapshot of round-trip
times from the NLANR data set. we computed
empirical probability of spurious timeouts. The
total data set consists of nearly 13000
connections between 122 sites and 17.5 million
round-trip time measurements. This data
consisted of time series of round-trip times for
each connection with each time series containing
1440 round-trip times (one sample per minute over
the entire day)
12
Detecting drops with triple Dup ACKs
Seq20 Ack1001 Data Hi, size 2 (bytes)
16
17
18
19
20
21
22
25
30
30
35
15
Seq
e
S
t
e
v
H
i
buffer
Seq1001 Ack22 Data size 0
Seq22 Ack1001 Data Bye, size 2 (bytes)
Seq25 Ack1001 Data Wazup, size 5 (bytes)
16
17
18
19
20
21
22
25
30
30
35
15
e
S
t
e
v
H
i
W
a
z
u
p
Seq1001 Ack22 Data size 0 Rwin2
Seq30 Ack1001 Data Give, size 4 (bytes)
25
30
30
35
15
16
17
18
19
20
21
22
e
H
i
W
a
z
u
p
S
t
e
v
G
i
v
e
Seq1001 Ack22 Data size 0 Rwin2
Seq34 Ack1001 Data Me, size 2 (bytes)
16
17
18
19
20
21
22
25
30
30
35
15
e
S
t
e
v
H
i
W
a
z
u
p
G
v
e
M
e
i
Seq1001 Ack22 Data size 0 Rwin2
25
30
30
35
16
17
18
19
20
21
22
Seq22 Ack1001 Data Bye, size 2 (bytes)
15
e
H
i
W
a
z
u
p
M
e
e
S
t
e
v
G
v
e
i
B
y
Seq1001 Ack36 Data size 0 Rwin2
13
Why triple dup ACK?
  • Why not one DUP ACK?
  • Bennet and Partridge, Packets reordering is not
    pathological network behavior, 1999. This paper
    showed that packet reordering can/does occur.
    Further research into this could be a project.
  • The reason for the packet reordering is that the
    routers have parallel paths through them. So,
    depending on the order of arrival and the packet
    sizes, the incoming order will be different from
    the outgoing order.
  • Supposedly this was only a problem with older
    model juniper routers. There are many of these
    routers out there. Cisco field day!
  • Reordering only happens when the packets arrive
    at nearly the same time. This might not happen
    that much in TCP (see ACK clocking later).
  • However, this is an active research area.
  • Load balancing can cause packets to take
    different paths. This can cause reordering. Load
    balancing is a good project topic.
  • Route flap can also cause reordering.
  • Why not a larger DUPThres (larger than 3)?
  • This casues other problems.
  • Limited transmit can help. See my papers on
    TCP-PR for details.
  • Using triple DUP ACKs instead of RTO is called
    fast retransmit because the drop is detected
    faster.

14
Flow control so the receive doesnt get
overwhelmed.
SYN had seq14
  • The number of unacknowledged packets must be lass
    than the receiver window.
  • As the receivers buffer fills, decreases the
    receiver window.

Seq20 Ack1001 Data Hi, size 2 (bytes)
Seq
15
16
17
18
19
20
21
22
Seq1001 Ack22 Data size 0 Rwin2
S
t
e
v
e
H
i
buffer
Seq22 Ack1001 Data By, size 2 (bytes)
16
17
18
19
20
21
22
15
e
S
t
e
v
H
i
B
y
Seq1001 Ack24 Data size 0 Rwin0
Application reads buffer
25
26
27
28
29
30
31
24
Seq1001 Ack24 Data size 0 Rwin9
Seq4 Ack1001 Data e, size 1 (bytes)
25
26
27
28
29
30
31
24
e
15
Flow control so the receive doesnt get
overwhelmed.
SYN had seq14
  • The number of unacknowledged packets must be lass
    than the receiver window.
  • As the receivers buffer fills, decreases the
    receiver window.

Seq20 Ack1001 Data Hi, size 2 (bytes)
Seq
15
16
17
18
19
20
21
22
Seq1001 Ack22 Data size 0 Rwin2
S
t
e
v
e
H
i
buffer
Seq22 Ack1001 Data By, size 2 (bytes)
16
17
18
19
20
21
22
15
e
S
t
e
v
H
i
B
y
Seq1001 Ack24 Data size 0 Rwin0
Application reads buffer
25
26
27
28
29
30
31
24
3 s
Seq1001 Ack24 Data size 0 Rwin9
window probe
Seq4 Ack1001 Data , size 0 (bytes)
Seq1001 Ack24 Data size 0 Rwin9
Seq4 Ack1001 Data e, size 1 (bytes)
25
26
27
28
29
30
31
24
e
16
Flow control so the receive doesnt get
overwhelmed.
SYN had seq14
  • The number of unacknowledged packets must be lass
    than the receiver window.
  • As the receivers buffer fills, decreases the
    receiver window.

Seq20 Ack1001 Data Hi, size 2 (bytes)
Seq
15
16
17
18
19
20
21
22
Seq1001 Ack22 Data size 0 Rwin2
S
t
e
v
e
H
i
buffer
Seq22 Ack1001 Data By, size 2 (bytes)
16
17
18
19
20
21
22
15
e
S
t
e
v
H
i
B
y
Seq1001 Ack24 Data size 0 Rwin0
3 s
Seq4 Ack1001 Data , size 0 (bytes)
Seq1001 Ack24 Data size 0 Rwin0
6 s
Max time between probes is 60 or 64 seconds
Seq4 Ack1001 Data , size 0 (bytes)
17
Receiver window
  • The receiver window field is 16 bits.
  • Default receiver window
  • By default, the receiver window is in units of
    bytes.
  • Hence 64KB is max receiver size for any (default)
    implementation.
  • Ethernet segments are 1500 bytes (TCP data
    1460).
  • So that would give 44 packets.
  • If the bit-rate was 10Mbps, what is the RTT so
    that this window size is equal to the bandwidth
    delay product.
  • Receiver window scale
  • During SYN, one option is Receiver window scale.
  • This option provides the amount to shift the
    Receiver window.
  • Eg. Is rec win scale 4 and rec win10, tehn
    real receiver window is 10ltlt4 160 bytes.

18
Congestion Control
  • Make sure not to overwhelm the network
  • How much data to put into the network?
  • The sender maintains a the congestion window
    (cwnd) that is the maximum number of
    unacknowledged packets.
  • InFlight is the number of unacked packets.
  • If InFlight lt cwnd, then a packet can be sent.
  • When an ACK arrives, InFlight decreases so
    another packet can be sent.

19
suppose that cwnd 4MSS
MSS is maximum segment size min of segment
sizes of sender and receiver. It is negotiated
during SYN.
suppose MSS1000
Seq20 Ack1001Data , size 1 MSS (bytes)
Inflight1MSS
Inflight2MSS
Seq1020 ck1001 Data , size 1 MSS (bytes)
Seq2020 Ack1001 Data , size 1 MSS (bytes)
Inflight3MSS
Seq3020 Ack1001 Data , size 1 MSS (bytes)
Inflight4MSS
Seq1001 Ack1020 Data size 0
Seq1001 Ack1020 Data size 0
Inflight3MSS
Seq4020 Ack1001 Data , size 1 MSS (bytes)
Inflight4MSS
Inflight3MSS
Seq4020 Ack1001 Data , size 1 MSS (bytes)
Inflight4MSS
20
suppose that cwnd 4MSS
MSS is maximum segment size min of segment
sizes of sender and receiver. It is negotiated
during SYN.
suppose MSS1000
Seq20 Ack1001Data , size 1 MSS (bytes)
Inflight1MSS
Seq1020 ck1001 Data , size 1 MSS (bytes)
Inflight2MSS
Seq2020 Ack1001 Data , size 1 MSS (bytes)
ACK clocking What is the maximum rate that ACKs
can arrive at the sender?
Seq3020 Ack1001 Data , size 1 MSS (bytes)
Inflight3MSS
Inflight4MSS
Seq1001 Ack1020 Data size 0
Seq1001 Ack1020 Data size 0
Inflight3MSS
Seq4020 Ack1001 Data , size 1 MSS (bytes)
Inflight4MSS
Inflight3MSS
Seq4020 Ack1001 Data , size 1 MSS (bytes)
Inflight4MSS
21
ACK clocking
10Mbps
100Mbps
100Mbps
Packets can leave here at 100Mbps
22
ACK clocking
10Mbps
100Mbps
100Mbps
Packets can leave here at 100Mbps
Packets leave here at a rate of 10Mbps
What rate do packets leave here?
23
ACK clocking
10Mbps
100Mbps
100Mbps
Packets can leave here at 100Mbps
Packets leave here at a rate of 10Mbps
What rate do packets leave here? Ans 10Mbps,
they arrive at 10Mbps
What about the ACKs?
10Mbps
100Mbps
100Mbps
What rate do ACKs leave here?
24
ACK clocking
10Mbps
100Mbps
100Mbps
Packets can leave here at 100Mbps
Packets leave here at a rate of 10Mbps
What rate do packets leave here? Ans 10Mbps,
they arrive at 10Mbps
What about the ACKs?
10Mbps
100Mbps
100Mbps
What rate do ACKs leave here? Ans 40/1040
10Mbps. Or at a rate so that if a oacket is send
for each ACK, then the rate that the packets are
sent is 10Mbps
What rate do ACKs leave here? Ans 40/1040
10Mbps. Or at a rate so that if a oacket is send
for each ACK, then the rate that the packets are
sent is 10Mbps
What about the packets?
25
ACK clocking
10Mbps
100Mbps
100Mbps
Packets can leave here at 100Mbps
Packets leave here at a rate of 10Mbps
What rate do packets leave here? Ans 10Mbps,
they arrive at 10Mbps
What about the ACKs?
10Mbps
100Mbps
100Mbps
What rate do ACKs leave here? Ans 40/1040
10Mbps. Or at a rate so that if a oacket is send
for each ACK, then the rate that the packets are
sent is 10Mbps
What rate do ACKs leave here? Ans 40/1040
10Mbps. Or at a rate so that if a oacket is send
for each ACK, then the rate that the packets are
sent is 10Mbps
What about the packets? 10Mbps. Perfect!!!
26
Congestion control
  • ACK clocking makes the sender not send any faster
    than the bottleneck link speed.
  • But how to fill the pipe?

We only send cwnd packets in a burst. How big
should cwnd be?
Sending at burst rate of 10Mbps
Not sending pckts. Wasted bandwidth
Sending at burst rate of 10Mbps
27
Congestion control
  • ACK clocking makes the sender not send any faster
    than the bottleneck link speed.
  • But how to fill the pipe?

We only send cwnd packets in a burst. How big
should cwnd be?
The number of pckts sent in one RTT is the
cwnd. In order to not waste bandwidth, how many
packets should be sent?
RTT
28
Congestion control
  • ACK clocking makes the sender not send any faster
    than the bottleneck link speed.
  • But how to fill the pipe?

We only send cwnd packets in a burst. How big
should cwnd be?
The number of pckts sent in one RTT is the
cwnd. In order to not waste bandwidth, how many
packets should be sent?
RTT
Cwnd (bytes) Link byte-rate (byte/s) RTT s
Bottleneck links speed
Bandwidth delay product Link byte-rate (byte/s)
RTT s
29
Congestion control
  • Ideally cwnd bandwidth delay product.
  • This ignores fairness. If there are N flows that
    are also use the same link. Then ideally cwnd
    bandwidth delay product/N.
  • But how to find this value???

30
TCP congestion control
  • Theme probe the system.
  • Slowly increase cwnd until there is a packet
    drop. That must imply that the cwnd size (or sum
    of windows sizes) is larger than the BWDP.
  • Once a packet is dropped, then decrease the cwnd.
    And then continue to slowly increase.
  • Two phases
  • slow start (to get to the ballpark of the correct
    cwnd)
  • Congestion avoidance, to oscillate around the
    correct cwnd size.

Cwndgtssthress Triple dup ack
Connection establishment
Slow-start
Congestion avoidance
timeout
Connection termination
31
Slow start
  • When the connect first start (and after a timeout
    for todays TCPs)
  • Cwnd starts at 1 or 2 MSS.
  • For each non-dup ACK received, the window size
    increase by one.
  • This increasing continues until the window
    reaches the value of SSThres.
  • The initial value of SSThres is often large
    (taken as infinite). So the Rwin limits the
    growth of the window.

32
Slow start
cwnd
SYN Seq20 AckX
SYN Seq1000 Ack21
SYN Seq21 Ack1001
1
Seq21 Ack1001 Data size 1000
Seq1001 Ack1021 size 0
2
Seq1021 Ack1001 Data size 1000
Seq2021 Ack1001 Data size 1000
Seq1001 Ack1021 size 0
3
Seq1021 Ack1001 Data size 1000
Seq1001 Ack1021 size 0
Seq2021 Ack1001 Data size 1000
4
Seq1021 Ack1001 Data size 1000
Seq2021 Ack1001 Data size 1000
5
6
7
The pipe is full!
8
33
Slow start
cwnd
SYN Seq1000 Ack21
1
RTT
2
Seq1001 Ack1021 size 0
Cwnd doubles every RTT!!
3
RTT
Seq1001 Ack1021 size 0
4
Seq1001 Ack1021 size 0
5
RTT
6
7
RTT
The pipe is full!
8
What is happening here?
RTT??
34
Slow start
cwnd
SYN Seq1000 Ack21
1
RTT
2
Seq1001 Ack1021 size 0
Cwnd doubles every RTT!!
3
RTT
Seq1001 Ack1021 size 0
4
Seq1001 Ack1021 size 0
5
RTT
6
7
RTT
What is happening here? Now the queue is
filling. Either it will fill and drop a packet or
the recWin will stop cwnd from increasing
8
RTT??
35
  • If RecWin!inf and RecWinltbandwidth delay product
    queue size, and there are no other packets,
    then there will never be a drop. Lots of
    conditions, but a large number of flows do not
    experience drops.
  • If RecWin/ssthressinf and the outgoing link of
    the sender is not the bottleneck, then eventually
    there will be a drop. If the drop is detected
    with triple dupack, then cwnd cwnd/2 and
    congestion avoidance is entered.
  • If the drop(s) is(are) detected with timeout,
    then ssthresscwnd/2, cwnd1 and slowstart is
    continued.
  • If ssthresslt bandwidth delay product queue size
    and RecWingtssthress, the congestion avoidance is
    entered.

36
Congestion Avoidance
Basics additive increase multiplicative decrease
(AIMD)!! Rough view For every cwnds worth of
packets, cwnd is incremented by one. When there
is a drop, cwndcwnd/2.
Seq (MSS)
cwnd
cwnd
11
4
6
12
1
13
2
3
14
16
4
15
17
2
18
3
19
4
20
5
15
15
21
5
15
6
15
7
15
5
6
8
5
9
6
15
7
3
8
22
9
10
23
10
22
11
23
12
13
24
11
6
14
12
4
13
15
24
14
15
37
Rough view of TCP congestion control
drops
drop
Congestion avoidance
Slow start
Slow start
38
TCP - more detailed view
  • Delayed ACKs
  • The worry was that the network was going to be
    all jammed up with ACKs.
  • So instead of sending an ACK for every pck, delay
    the ack and maybe ack two packets
  • Generate an ACK for at least every other packet.
  • Dont delay an ACK by more than 500ms. (exact
    number depends on implementation.)
  • If packets are out of order, generate an ACK for
    every packet.
  • Also, immediately send an ACK when a gap in the
    buffer is filled.
  • Delayed ACKs can greatly slow down a connection.
  • Eg., the first packet is delayed by 500ms
  • Depending on the implementation, cwnd will grow
    more slowly.

39
Details - Fast recovery
  • cwnd after a drop
  • Recall, TCP only sends packets when InFlight lt
    Cwnd.
  • InFlight only decreases when a new ACK is
    received, I.e., a DUP ACK does not cause InFlight
    to change.
  • If a DUP ACK arrives, then it means that a packet
    arrived at the receiver and an ACK was sent. So
    the number of packet in the network has
    decreased. So InFlight should decrease.
  • But maybe the network has duplicated the ACK. To
    be conservative, leave InFlight as is (I guess).

40
Fast recovery
  • Upon the two DUP ACK arrival, do nothing. Dont
    send any packets (InFlight is the same).
  • Upon the third Dup ACK,
  • set SSThrescwnd/2.
  • Cwndcwnd/23
  • Retransmit the requested packet.
  • Upon every other DUP ACK, cwndcwnd1.
  • If InFlightltcwnd, send a packet and increment
    InFlight.
  • When a new ACK arrives, set cwndssthres (RENO).
  • When an ACK arrives that ACKs all packets that
    were outstanding when the first drop was
    detected, cwndssthres (NEWRENO)

41
Fast recovery
Seq (MSS)
cwnd
Inflight
cwnd
11
4
6
12
1
6
13
2
3
14
16
4
15
17
2
18
3
19
4
20
5
15
15
21
5
15
6
15
7
15
5
8
5
9
6
6
66/23
15
7
7
8
7
22
9
8
8
10
23
10
22
11
23
12
13
24
11
6
14
12
3
3
13
15
24
14
15
42
Fast recovery multiple drops - RENO
Seq (MSS)
cwnd
4
1
2
3
cwnd
Inflight
4
2
11
6
3
12
6
4
5
12
16
5
12
17
6
18
7
19
5
8
5
20
12
9
6
12
21
7
12
8
12
9
10
12
10
11
12
6
66/23
12
13
11
7
6
7
22
14
12
8
8
15
23
15
12
15
12
15
24
3
3
Why is this bad? The first drop told us that we
were sending to fast. The second drop tells us
the same thing (already). So why react to the
same news twice.NewReno
15
15
523
5
16
15
2
2
43
Fast Recovery multiple drops - NewReno
  • The problem was that one of the packets that was
    outstanding when the drop was detected was also
    dropped.
  • Solution (NewReno)
  • When a drop is detected,
  • Ssthrescwnd/2
  • Cwndcwnd/23
  • Recover seq of largest byte sent.
  • Retransmit the dropped packet
  • Upon a DUP ACK, increment cwnd and sent if
    Inflightltcwnd
  • If ACK is larger than pervious ACK, but smaller
    than recover (partial ack)
  • Suppose that pervious ackX and now
    ackYltrecover
  • Retransmit drop packet
  • Cwnd cwnd (Y-X)1
  • Of course, Inflight Inflight-(Y-X)
  • So transmit another packet (that makes two
    transmissions)
  • If ACKgtrecover,
  • Cwndssthres
  • Exit fast recovery

44
Fast Recovery single drops - NewReno
cwnd
Inflight
14
14
16
17
18
19
20
21
17
17
17
17
Recover29
14
17
10
11
12
13
14
15
15
16
31
Note how the actual number outstanding is always
7
7
45
Fast Recovery multiple drops - NewReno
cwnd
Inflight
14
14
16
17
18
19
20
21
17
17
NewReno sends two packets for every ACK
indicating a multiple drop.
17
17
29
Recover29
14
17
10
11
12
13
14
15
15
16
21
2 drops takes 2 RTT to recover. N drops takes N
RTT to recover. If NRTTgtRTO, then slow-steady
gt no TO impatient gt TO
19
21
1619-(21-17)1
1519-4
35
7
Exit fast recovery
46
Other things
  • Idle restart
  • If no packet has been sent in RTO seconds
  • SSThressCwnd
  • Cwnd1
  • Slow-start
  • Avoids big bursts after idle times
  • E.g., get data form disk
  • http 1.1
  • Timeout exponential back off
  • If no ACK arrives before RTO timer expires, then
    time-out
  • Ssthresscwnd/2 Cwnd2 slow-start
  • RTOmin(2RTO,64s)
  • If next packet is dropped, then the wait is
    longer
  • Gives up after 9-12 tries. But implementation
    dependent (ns never stops)
  • If a retransmitted is dropped, the TCP times out.

47
Dup ACKs after timeout
cwnd
Inflight
20
14
14
21
16
22
17
23
18
19
20
21
17
24
17
17
17
29
24
30
Recover29
26
28
14
10
30
11
12
13
42
14
15
15
17
42
16
42
31
42
42
42
42
19
42
1619-(21-17)1
1519-4
eventually timeout
DUP ACKS
17
18
Set send_high to maximum seq sent. If DUP ACKs
are received for segments less than send_high,
assume it does not indicate a drop. In case there
was a drop, then there will be a time out.
18
19
48
Selective Acknowledgment SACKThe latest
widespread congestion control
  • Problem when a multiple packets are dropped, the
    cumulative ACK does not give information as to
    which packets were dropped. As a result, fast
    recovery is not so fast it takes one RTT per
    lost packet.
  • Solution embed into the ACK some information
    about which packets have successfully arrived.
  • TCP-SACK allows ACKs to contain information about
    received packets.
  • If the packets are received in order, then the
    ACK looks the same as TCP-RENO or TCP-NEWRENO.
    But if a packet the packets arrive out of order,
    then the ACK contains SACK blocks.
  • A SACK block indicates a sequence of segments
    that have been received.

seq num
15
20
25
30
35
A
A
A
S
S
S
S
S
S
S
N
N
N
ACKed
SACKed
SACKed
Not Sent
49
TCP-SACK
SACK blocks are 8 bytes long (4 bytes for each
edge) The SACK option includes 1 byte to specify
that it is a SCK block and one byte for the
number of SACK blocks. 1 SACK block 10 bytes
2 bytes padding -gt 52 bytes header 2 SACK blocks
18 bytes 2 bytes padding -gt 60 bytes header 3
SACK blocks 26 bytes 2 bytes padding -gt 68
bytes header 4 SACK blocks 34 bytes 2 bytes
padding -gt 76 bytes header Max ACK is 80 bytes If
time stamp option is used, then the max number of
SACK blocks is 3.
kind5
length2
SACK option
left edge of 2st block 26
right edge of 2st block 30
left edge of 1st block 20
right edge of 1st block 23
50
Generation of SACKs
  1. No SACK blocks if no out of order packets
  2. No delayed ACK if out of order packets (send an
    ACK for every received packet.
  3. When an out of order packet arrives, the first
    SACK block contains contain the segment that just
    arrived.
  4. The ACK should contain as many SACK blocks as fit
    and are required (no skimping to save bit-rate).
  5. The SACK blocks included should be those that
    have most recently been reported (see 3). So if
    there are at most 3 SACK blocks, then each
    continuous block of segments will be reported at
    least 3 times.
  6. If the packet that arrived has just been received
    (a duplicate reception), then the first SACK
    block should identify this packet. (This is the
    DSACK extension to SACK). In this case, the next
    SACK block should indicate the continuous
    sequence of segments that contain the segments
    received in duplicate.

seq num
15
20
25
30
35
A
A
A
S
S
S
S
S
S
S
N
N
N
ACKed
SACKed
SACKed
Not Sent
left edge of 2nd block
right edge of 2nd block
right edge of 2nd block
left edge of 2nd block
Now suppose that segment 21 arrives for a second
time.
kind5
length2
SACK option
left edge of DUP packet 21
right edge of DUP packet 22
left edge of 1st block 20
right edge of 1st block 23
left edge of 2st block 26
right edge of 2st block 30
51
DSACK
  • DSACK is to identify packets that have been
    needlessly retransmitted.
  • The primary source of such retransmissions is
    packet reordering.
  • If such a retransmission occurs, it likely means
    that cwnd was divided by 2 needlessly.
  • DSACK helps identify these needless divides by
    two.
  • It is not clear what can be done once they are
    identified.
  • Many ideas have been suggested, but it remains to
    be scene if they actually improve things
  • Ethan Blanton, Mark Allman, On Making TCP More
    Robust to Packet Reordering (2002) show that
    some improvement is possible
  • Bohacek et al shows that if there is persistent
    reordering, more drastic measures are required.
  • Neither paper includes analysis of the current
    situation in the Internet.
  • The current situation is not completely known.
  • The homework provides backbone traces with
    rampant reordering.
  • In my opinion (on 2/20/04) some sort of
    timer-based approach is necessary. The DUPACK
    threshold approach is not appropriate because a
    burst of packets (as can be seen in the homework)
    can be very reordered. But reordering by more
    than a few milliseconds is very rare.
  • A project could examine this.

52
Eifel Detection
  • DSACK is only useful after the arrival of the
    second copy of the packet.
  • Eifel uses time-stamps to inform the sender that
    a packet that was thought to have been lost has
    actually arrived.

53
TCP-SACK (Sender side)
  • Slow start and the linear increase part of SACK
    is the same as TCP-RENO/NEWRENO. The fast
    recovery part is different.
  • SACK provides more information about which
    packets have been lost. The sender can use this
    to determine
  • which packets to send
  • when to send packets
  • When to assume that a packet is lost
  • If DupThresh continuous SACK blocks have been
    SACKed that have larger sequence number. The idea
    is that DupThresh packets have been SACKed with
    larger sequence number, but continuous SACK
    blocks are used instead.
  • If DupThreshMSS bytes have been SACKed that have
    larger sequence number.

MSS5 bytes DupThresh3
little packets
8
13
18
23
19
Packet num
3
14
15
16
17
6569
7882
8387
4044
seq num
7071
7273
7475
7677
1519
S
S
S
A
A
A
S
S
S
N
N
N
ACKed
SACKed
SACKed
Not Sent
  • Assumed dropped because of reason 1 and 2
  • Number of continuous sack blocks with higher seq
    num 4?DupThresh
  • Number SACKed bytes with large seq num 25 ?
    MSSDupThresh
  • Assumed dropped because of reason 1 only
  • Number of continuous sack blocks with higher seq
    num 3 ?DupThresh
  • Number SACKed bytes with large seq num
    9ltMSSDupThresh

Not assumed dropped.
54
Number in pipe or InFlight
  • If a packet has been sent, not lost, and not
    SACKed, then this packet is assumed to be in the
    pipe.
  • Any packet that has been retransmitted and not
    SACKed.
  • Retransmission happen in order (smallest seq num
    first, why?)
  • Let HighRX denote the highest segment that has
    been Retransmitted.
  • Any packet that has been not been SACKed and has
    seq num less been retransmitted, so it is in the
    pipe.

55
Which packet to send next? (during fast recovery)
  • The next to transmit is the segment with the
    smallest seq num that satisfies
  • If the segment is less than HighRX
  • If the segment has seq num less than the largest
    segment in a SACK block
  • If the segment is assumed to be lost.

seq num
15
20
25
30
35
A
A
S
S
S
S
S
S
S
N
N
N
A
ACKed
SACKed
SACKed
Not Sent
HighRX
already retransmitted
next to be sent
  • If the above is an empty set, then the next to be
    sent is smallest segment that has not yet been
    sent.
  • If the above is also empty (because there are no
    more packets to be sent),

seq num
15
20
25
30
35
A
A
S
S
S
S
S
N
N
N
A
SACKed
ACKed
SACKed
Not Sent
HighRX
next to be sent
already retransmitted
end of file
seq num
15
20
25
A
A
S
S
S
S
S
A
SACKed
SACKed
ACKed
HighRX
already retransmitted
next to be sent
56
TCP-SACK congestion control
  • When a loss is detected
  • set RecoveryPointSeq num of highest segment
    sent. Fast recovery ends when this seq num is
    ACKed (SACKed is not good enough).
  • ssthresh cwndInflight
  • Retransmit lost packet with smallest seq num.
  • Set HighRX equal to the retransmitted packet
  • During recovery (until RecoveryPoint is ACKed)
  • If pipeltcwnd, then send next to be sent.

57
TCP-SACK notes
  • After RTO, the TCP-SACK sender starts fresh and
    erases SAKC info from prior to the RTO (some of
    it might be regained in retransmissions of SACK
    blocks).
  • Like NEWRENO, the highest seq sent before an RTO
    is recorded and a dupack from a packet qith seq
    num less than this highest seq does not cause
    fast recovery/retransmit.
  • Like NEWRENO, the retransmit timer can be reset
    during recovery (slow and steady) or not
    (impatient).

58
TCP-SACK timeout
cwnd newReno
Inflight
pkt sent
14
14
16
  • SACK, NewReno, etc. will time-out if a
    retransmission is lost.
  • If SACK uses the same technique to increase cwnd
    as NewReno (I.e., cwndinflight/23). and if
    there are more than cwnd/2 packets are lost, SACK
    will time-out.
  • The ns implementation has this problem.

17
18
19
20
21
17
29
17
17
17
14
14
17
10
11
12
13
14
14
no more packet sent time-out
59
TCP-SACK burst
cwnd SACK
pkt sent
  • SACK, NewReno, etc. will time-out if a
    retransmission is lost.
  • Multiple drops lead to a burst of packets being
    sent.

pipe
16
14
17
18
19
20
21
17
29
17
17
17
4,5,6,7
17,18,19,20
7
lost ACK clocking and sent a burst
21
7
22
24
31
37
38
recovery ends
60
Limited Transmit
  • When a packet is dropped and the window size is
    less than 4, TCP will always timeout (not enough
    ACKs arrive to get triple DUP).
  • It, upon receiving a DUP ACK, a packet is
    transmitted, then there might be enough DUPACKs
    to cause fast retransmitted and avoid time-out.
  • Limited transmit allow for a packet to be send
    when the second Dup Ack is received. (In general,
    for every other dup ack).
  • Even if a packet is lost, sending a packet for
    every other ACK is sending at half the bit-rate.
  • While this helps TCP avoid time-outs, it also
    makes this version of TCP far more aggressive for
    loss probability greater than about 1 (where
    time-outs become quite prevalent for non-limited
    transmit TCP)

Seq (MSS)
Seq (MSS)
cwnd
cwnd
3
3
1
1
2
2
3
3
2
2
2
4
4
2
2
5
5
2
Time out
Triple dup ack! No time out
61
Limited Transmit
Seq (MSS)
cwnd
Seq (MSS)
5
1
cwnd
cwnd
2
4
3
1
1
4
2
2
2
5
3
3
4
4
2
2
2
6
2
5
5
2
6
2
7
2
2
Triple dup ack!
Triple dup ack!
62
ECN
  • Sometimes the router will have a large enough
    queue to accept the packet, but the queue
    occupancy is beyond a threshold, so in order to
    try to get the TCP flows to send at a slower
    rate, the router would drop packets (even though
    there is room in the queue).
  • Its funny to drop packets when there is room in
    the queue, so another option is to mark the
    packets. The receiver should include in the ACK
    that packet that is being ACKed has been marked
    and the sender should react to this marking as it
    would to a drop, except that there is no reason
    to retransmit the marked packet.
  • This approach has little impact in general,
    except, like limited transmit, when the loss
    probability if very high, it can reduce timeouts.
Write a Comment
User Comments (0)
About PowerShow.com