Title: TCP: Reliable Transport Service
1TCP Reliable Transport Service
2Introduction
- What was the most impressive aspect of the ENIAC
computer? - It was reliable in spite of the fact that its
components (vacuum tubes) were unreliable - Internet pioneers faced a similar challenge
- To provide a reliable transport service over
unreliable (conceptually) technology
3Introduction (continued)
- IP datagram service is unreliable in that
datagrams may be - Lost (usually dropped due to congestion)
- Duplicated
- Delivered out of order
- TCP provides reliable service nevertheless
4The Service TCP Provides to Applications
- TCP is connection-oriented
- Each TCP connection has exactly two endpoints
- TCP is completely reliable
- TCP allows full duplex communication
- TCP provides a stream interface (unstructured)
- TCP uses a reliable connection startup
- And a graceful shutdown
5TCP Connections
- Virtual
- An ordered pair of endpoints
- An endpoint is an ordered pair containing
- An IP address
- A TCP port number
- This information is kept by the OS
- netstat
6Achieving Reliability
- A major challenge
- One computer reboots during a file transfer
- It is rebooted and a second connection is
established - Messages from the two sessions get mixed up
- Packets from the first session must be rejected
7Positive Acknowledgement with Retransmission
- Retransmission is one of many techniques
- Send packet start timer
- Two possibilities
- Packet arrives
- Acknowledgement is sent back
- Stop timer and proceed
- Packet fails to arrive
- Timer goes off
- Resend packet and set another timer
-
8Positive Acknowledgement
- http//www.calvin.edu/lave/figure-12.1.pdf
- Suppose an acknowledgement is lost.
- A duplicate packet will be sent.
- Sequence numbers are used to allow the
destination software to reassemble a packet.
9The Idea Behind Sliding Windows
- Stop-and-wait wastes network bandwidth
- Sliding window protocols allow the sender to
transmit multiple packets before waiting for an
acknowledgement. - http//www.calvin.edu/lave/figure-12.3.pdf
- http//www.calvin.edu/lave/figure-12.4.pdf
10The Transmission Control Protocol
- Very complex
- Specifies format of data and acks
- Specifies how TCP software distinguishes among
multiple destinations on a host - How hosts recover from lost or duplicated packets
- How hosts initiate and terminate a transfer
11The Transmission Control Protocol
- Does not include
- The details of interaction with an application
- Eg, sockets
- Can be used with a variety of packet delivery
systems
12Ports, Connections, and Endpoints
- Like UDP, TCP resides in the transport layer
- Like UDP, TCP uses protocol port numbers to
identify the ultimate destination within a host - TCP ports are much more complex than UDP ports
13Ports, Connections, and Endpoints (continued)
- TCP port numbers do not correspond to a single
object. TCP is built on the connection
abstraction, in which objects are identified by
connections. - A connection is a pair of endpoints.
- An endpoint is a pair, (host, port).
- Two connections might share one endpoint.
14Passive and Active Opens
- An application program on one end performs a
passive open by indicating to the OS that it will
accept an incoming connection. The OS provides a
TCP port number for its end of the connection. - An application program on the other end uses an
active open request a connection.
15Passive and Active Opens (continued)
- The two TCP software modules communicate to
establish and verify a connection. - Once the connection is established, the
application programs begin to pass data. - TCP modules at each end guarantee reliable
delivery.
16Segments, Streams, and Sequence Numbers
- TCP views the data as a sequence of octets that
it divides into segments. Usually, each segment
travels in a single IP datagram. - TCP uses a specialized sliding window mechanism
to solve two problems efficient transmission and
flow control.
17Segments, Streams, and Sequence Numbers
(continued)
- TCP allows the receiver to restrict transmission
until it has sufficient buffer space to
accommodate more data. - The sliding window mechanism operates the octet
level. A sender keeps three pointers associated
with each connection. - http//www.calvin.edu/lave/figure-12.6.pdf
18Sliding Window Protocol
19Segments, Streams, and Sequence Numbers
(continued)
- TCP software at each end maintains two windows
per connection for a total of four. - One slides as data is sent, the other as data is
received.
20Variable Window Size and Flow Control
- TCP allows the window size to vary over time.
Each ack contains a window advertisement that
specifies how many octets of data the receiver is
prepared to accept. - This provides flow control. The window size may
even be zero.
21Variable Window Size and Flow Control (continued)
- Solves two independent problems.
- A hand-held PDA communicating with a fast PC may
be overrun. - A mechanism is needed to allow intermediate
systems (routers) to control a source that sends
more traffic than the system can tolerate.
22Variable Window Size and Flow Control (continued)
- When intermediate machines become overloaded, the
condition is called congestion, and mechanisms to
solve the problem are called congestion control
mechanisms. - A poorly-designed protocol will make congestion
worse.
23TCP Segment Format
- The unit of transfer is called a segment.
- http//www.calvin.edu/lave/figure-12.7.pdf
- HLEN is in multiples of 32-bits. Length varies
due to the presence of options. - http//www.calvin.edu/lave/figure-12.8.pdf
24Out of Band Data
- Interrupting a file transfer.
- Data may be identified as urgent.
- Implementation depends on the software.
- The URG bit will be set, and the URGENT POINTER
refers to the end of the urgent data. - TCP delivers such data to the server immediately,
out of band. Eg, telnet.
25TCP Options
- Each option begins with a 1-octet field
specifying the type with a 1-octet length field. - Padding is used to guarantee a multiple of 32
bits in the header.
26Maximum Segment Size Option
- Allows a receiver to specify a maximum segment
size. This allows heterogeneous systems to
communicate. - Endpoints may establish the minimum MTU along the
path between them. - Small segments imply overhead. Large segments may
have to be fragmented.
27Maximum Segment Size Option (continued)
- It is hard to achieve the optimum segment size.
- Many implementations of TCP do not include a
mechanism for doing so. - Routes may change dynamically.
- The optimum size depends on lower-level protocol
headers (IP options).
28Window Scaling Options
- The WINDOW field in the TCP header is 16 bits
long, the maximum size window is 64 Kbytes. A
larger size is needed to obtain high throughput
on a network such as a satellite channel, a long
fat pipe. - A type, a length, and a shift value, S.
29Window Scaling Options (continued)
- A receiver shifts the WINDOW field S bits.
- The option can be negotiated when the connection
is initially established in which case all
successive advertisements are assumed to use the
negotiated scale or the scaling factor can vary
with each segment.
30Timestamp Option
- Added to help TCP compute the delay on the
underlying network and to handle the case where
sequence numbers exceed 2 to the 32nd power.
Protect Against Wrapped Sequence numbers, PAWS. - Sender includes a timestamp, receiver returns it
allowing the sender to compute the elapsed time.
31TCP Checksum Option
- Like UDP, and for the same reason, TCP uses a
pseudo-header to compute the checksum - http//www.calvin.edu/lave/figure-12.9.pdf
32Acknowledgements, Retransmission, and Timeouts
- Because TCP sends data in variable length
segments, and because retransmitted segments can
include more data than the original, acks cannot
easily refer to segments. They refer to a
position in the stream using the sequence
numbers. - A TCP ack specifies the sequence number of the
next octet that the receiver expects to receive.
33Acknowledgements, etc (continued)
- This acknowledgement scheme is called cumulative
because it reports how much of the stream has
accumulated. - Advantages
- acks are easy to generate and unambiguous
- lost acks do not necessarily force
retransmissions.
34Acknowledgements, etc (continued)
- Disadvantages
- The sender does not receive information about all
successful transmissions, but only about a single
position in the stream. - Example
- Sender sends 5 segments. The last 4 make it.
- Receiver keeps saying it wants the first one.
- Sender has no way of knowing that 80 arrived
35Acknowledgements, etc (continued)
- When a timeout occurs, the sender must choose
between two potentially inefficient schemes. - Resend all 5 segments.
- Resend only the first and wait for the ack before
deciding what to do next.
36Timeout and Retransmission
- If the timer expires, the segment is resent.
- TCP is intended for use in an internet
environment. It is impossible to know in advance
how quickly acks will return. - Transmission time varies dynamically.
37Adaptive Retransmission
- TCP monitors the performance of each connection
and deduces reasonable values for timeouts. - TCP records the time at which each segment is
sent and the time an ack arrives for the data in
that segment. - TCP adjusts based on an estimated round trip
time, RTT.
38RTT
- RTT (a Old_RTT) ((1 a)
New_Round_Trip_Sample) - Choosing a close to 1 makes the weighted average
immune to changes that last a short time. - Choosing a close to 0 makes the average respond
quickly to changes in delay.
39RTT (continued)
- http//www.calvin.edu/lave/figure-12.10.pdf
- TCP computes the timeout value as b RTT
- Originally b was set to 2. Better techniques have
been developed recently.
40Accurate Measurement of Round Trip Samples
- Sounds easy, but .
- Acknowledgement ambiguity. If a segment must be
resent, and then an ack arrives, is it for the
original segment or the retransmitted segment? - No way to tell so what should we assume?
41Accurate Measurement of Round Trip Samples
(continued)
- If you measure from the original segment, RTT
will grow without bound in cases where the
internet loses datagrams. - If you measure from the most recent
retransmission can also fail. Suppose the
end-to-end delay suddenly increases. The original
ack may be associated with the retransmission
lowering the next timeout. RTT can end up being
half what it should be so that each segment is
sent twice.
42Karns Algorithm
- Ignore round trip estimates for retransmitted
segments. - A simplistic implementation can lead to failure.
Consider a segment sent after a sharp increase in
delay. The timeout is too low, but the next batch
of round trip estimates will be ignored.
43Karns Algorithm (continued)
- Use a timer backoff strategy. If the timer
expires, increase the timeout (multiply by a
constant, say 2). - Works well even in networks with high packet loss.
44Responding to High Variance in Delay
- Previously mentioned computations do not adapt
well to a wide range in variation in delay. - Queueing theory suggests that the round trip time
increases proportional to 1/(1-L), where L is the
current fractional load, and the variation, s,
is inversely proportional to (1-L) squared.
45Responding to High Variance in Delay (continued)
- If an internet is running at 50 capacity, we
expect the round trip delay to vary by a factor
of 4 from the mean round trip time. At 80 the
factor is 25. Using the original TCP technique
for estimating round trip time with b 2, means
that the round trip estimation can adapt to loads
of at most 30.
46Responding to High Variance in Delay (continued)
- The 1989 specification for TCP requires
estimating the average round trip time and the
variance and to use the estimated variance as the
constant, b.
47Responding to High Variance in Delay (continued)
- DIFF SAMPLE Old_RTT
- Smoothed_RTT Old_RTT d DIFF
- DEV Old_DEV r(DIFF - Old_DEV)
- Timeout Smothed_RTT e DEV
- Where DEV is the estimated mean deviation, d and
r are fractions, and e is a factor
48Responding to High Variance in Delay (continued)
- d controls how quickly the new sample affects the
weighted average - r controls how quickly the new sample affects the
mean deviation - e controls how much the deviation affects the
round trip timeout
49Responding to High Variance in Delay (continued)
- TCP chooses d and r to be inverses of powers of 2
- Research suggests d 1/8, r ¼, and e 3 will
work well
50Responding to High Variance in Delay (continued)
- http//www.calvin.edu/lave/figure-12.11.pdf
- http//www.calvin.edu/lave/figure-12.12.pdf
- Comments on page 210
51Response to Congestion
- Severe delay caused by an overload of datagrams
at one or more switching points, eg routers - The router begins to enqueue datagrams until it
can forward them. When it reaches capacity, it
starts to drop datagrams. - Endpoints do not know where or why congestion has
occurred.
52Response to Congestion (continued)
- Most transmission protocols use timeout and
retransmission so they respond to increased delay
by retransmitting datagrams, increasing the
congestion. - This leads to congestion collapse.
- Therefore, TCP must reduce transmission rates
when congestion occurs.
53Response to Congestion (continued)
- Routers watch queue lengths and use techniques
such as ICMP source quench to inform hosts that
congestion has occurred, but transport protocols
can help avoid congestion by reducing
transmission rates automatically. - Two techniques are used slow-start and
multiplicative decrease.
54Multiplicative Decrease
- In addition to the limit imposed by the sliding
window protocol, TCP uses a second limit, the
congestion window limit or congestion window. - Allowed_window is the minimum of the advertised
window and the congestion window.
55Multiplicative Decrease (continued)
- TCP assumes that most datagram loss comes from
congestion (wireless!) and uses the following
strategy - Upon loss of a segment reduce the congestion
window by half (min is 1). - For remaining segments backoff the transmission
timer exponentially.
56Slow-start
- When starting traffic on a new connection or
increasing traffic after congestion, increase the
congestion window by one segment each time an ack
arrives. - It takes only log base 2 of N round trips to get
a congestion window of N so slow-start is
something of a misnomer. - These techniques have improved the performance of
TCP by factors of 2 to 10.
57Slow-start (continued)
- One additional restriction.
- Once the congestion window reaches one half of
its size before congestion, TCP enters a
congestion avoidance phase, and slows down the
rate of increase. It increases the congestion by
1 only if all segments in the window have been
acknowledged. - Additive Increase Multiple Decrease
- AIMD
58Fast Recovery and Other Techniques
- The early version of TCP, sometimes referred to
as Tahoe used the scheme described above. - In 1990 the Reno version of TCP appeared
introducing several changes. - Fast recovery or fast retransmit has higher
throughput where only occasional loss occurs.
59Fast Recovery
- The loss of a single segment
- Receiver sends an ACK for the point in the stream
where the missing segment begins. - From the senders point of view, a series of ACKs
arrive with the same sequence number. - Fast retransmit uses a series of three duplicate
ACKs to trigger a retransmit before the timer
goes off.
60Fast Recovery (continued)
- To maintain higher throughput, fast retransmit
continues to send data from the window while
waiting for an ACK for the retransmitted segment. - The congestion window is artificially inflated
(increased by 1 for each duplicate ACK). TCP
keeps many segments in flight.
61Fast Recovery (continued)
- The NewReno version handles the case where two
segments are lost in a single window. - An ACK arrives
- Sequence number at end of window
- Only missing segment
- In the middle
- Retransmit that second segment immediately
62Fast Recovery (continued)
- TCP reduces transmission when congestion occurs.
UDP does not. - TFRC (TCP Friendly Rate Control) has been
proposed. - Emulates TCP for UDP slowing the rate at which
UDP is sent.
63Explicit Feedback Mechanisms
- TCP infers congestion. TCP has implicit
information. - Selective Acknowledgement (SACK)
- Receiver specifies which data has been received
and which is still missing. The sender then knows
exactly which segment to resend. - Proposed. More work to do. Page 215.
- First sequence number in a block and the one
immediately beyond the block.
64Explicit Congestion Notification
- ECN. Routers notify TCP when congestion occurs.
- Routers use a pair of bits in the IP header to
record congestion. - Receiver uses a pair of bits in the TCP header to
inform the sender.
65Congestion, Tail Drop, and TCP
- The impact of the lack of communication between
layers. - A choice of policy or implementation at one layer
can have a dramatic effect on the performance of
higher layers. (middle/upper management) - If a router delays some datagrams more than
others, TCP will back off.
66Congestion, Tail Drop, and TCP (continued)
- A router becomes overrun and drops datagrams.
Early routers used a tail-drop policy to manage
queue overflow. - If the input queue is filled when a datagram
arrives, discard the datagram. - A router will discard one segment from N
connections rather N from one connection. - Global synchronization. All connections back off.
67Random Early Detecion
- RED. Avoids tail-drop whenever possible.
- Random early drop (discard).
- Let N be the number of datagrams in the queue,
and let a datagram arrive. - N lt min. Add to queue.
- N gt max. Discard.
- Min lt N lt max. Randomly discard.
68RED (continued)
- The choice of min, max, and p is key.
- Min must be large enough to achieve high
utilization. Max must be more than min by the
increase in queue size during one TCP round trip. - P is computed based on the current queue size and
the thresholds. (Linear)
69RED (continued)
- Network traffic is bursty so RED cannot use a
linear scheme. - Later datagrams would be dropped with higher
probability with a negative impact on TCP
throughput. - Borrow a page from TCPs book use a weighted
average queue size using the old average and the
current queue size.
70RED (continued)
- Avg (1 gamma) Old_avg gamma
Current_queue_size - If gamma is small, the average will track long
term trends, but remain immune to short bursts.
(gamma .02) - RED computations use powers of 2 and integer
arithmetic. - Queue is measured in octets.
71RED (continued)
- Small datagrams (remote login traffic or requests
to servers or ACKs) have lower probability of
being dropped. - The IETF now recommends that routers implement
RED. - Handles congestion, avoids sychronization, and
allows short bursts well.
72Establishing a TCP Handshake
- To establish a connection, TCP uses a three-way
handshake. - http//www.calvin.edu/lave/figure-12.13.pdf
- SYN, SYN-ACK, ACK
- Avoids getting connections mixed up
73Initial Sequence Numbers
- A initiates a connection sends its initial
sequence number, x, in the sequence field - B records the sequence number and replies by
sending its sequence number as well as an ACK
that it expects x1. - Use the number of the next octet expected.
- Sequence numbers are supposed to be random!
74Closing a TCP Connection
- Connections are closed in one direction. Only
when both connections have been closed does the
TCP software delete its record of the connection. - http//www.calvin.edu/lave/figure-12.14.pdf
75Closing a TCP Connection (continued)
- Subtle. After receiving a FIN, TCP sends an ACK
and informs the application. This may take time,
even human interaction. Finally, the second FIN
is sent, and the original site replies with the
third message, an ACK.
76TCP Connection Reset
- Sometimes abnormal conditions arise that force an
application to break a connection. - TCP provides a reset facility.
- One side sends a segment with the RST bit in the
CODE field set. - The other side aborts the connection and informs
its application program. Transfer ceases
immediately in both directions.
77TCP State Machine
- http//www.calvin.edu/lave/figure-12.15.pdf
78Forcing Data Delivery
- Example, a remote session, say ssh.
- TCP provides a push mechanism that forces
delivery without waiting for the buffer to fill.
The PSH bit in the CODE field is set so the other
end will pass on the data immediately.
79Reserved TCP Port Numbers
- http//www.calvin.edu/lave/figure-12.16.pdf
- Although TCP and UDP are independent, they use
the same numbers where appropriate, eg 53. - Ports 0-1023 are reserved (well known ports).
80TCP Performance
- TCP tackles a complex task, yet the code is
neither cumbersome nor inefficient. - At Cray Research, Inc. researchers have
demonstrated TCP throughput approaching a gigabit
per second.
81Silly Window Syndrome and Small Packets
- When the receiving application reads one octet of
data, that much space is available in its buffer.
It might send an advertisement showing a window
of one octet. The sender then sends another
octet. - The silly window syndrome (SWS) plagued early TCP
implementations.
82Avoiding Silly Window Syndrome
- TCP specifications now include heuristics to
prevent SWS. - Sending machine avoids transmitting a small
amount of data in each segment. - Receiver avoids sending small increments in
window advertisements.
83Receive-Side Silly Window Avoidance
- Suppose a receiving application extracts data
slowly. - Wait until the available space reaches one half
of the total buffer size or a maximum sized
segment.
84Delayed Acknowledgements
- Second approach to SWS
- Delay sending an ACK when the window is not
sufficiently large to advertise. - Increase throughput.
- Include data with ACKs (piggyback)
- Problem? Dont delay too long or the data will be
resent. 500 millisecond limit. ACK every other
segment.
85Send-Side Silly Window Avoidance
- Surprising and elegant.
- Allow the sending application to make multiple
calls to write, and collect the data before
sending. This is called clumping. - Algorithm is adaptive, something like slow start.
It is called self-clocking because it does not
compute delays.
86Send-Side SWA (continued)
- When an application generates additional data to
be sent over a connection for which previous data
has been sent but not acknowledged, place the
data in a buffer but do not send until you have a
maximum sized segment. If still waiting to send
when an ACK arrives, send all the data. Apply the
rule even when the user requests a push operation.
87Send-Side SWA (continued)
- If the application is reasonably fast compared to
the network, successive segments will contain
many octets. If the application is slow, small
segments will be sent without long delay. - This is called the Nagel algorithm and requires
little computational overhead.