Title: The Rapid Fire Survey of IP UDP TCP
1The Rapid Fire Survey ofIP / UDP / TCP
- Dirk GrunwaldAssoc. ProfessorDept. of Computer
ScienceUniversity of Colorado, Boulder
2Review
- IP (Internet protocol) is designed to connect
networks that are - Possibly managed by multiple organizations /
people - May have different physical connections
- May be connected via a sequence of arbitrary
intermediaries - A layered approach is used simplify application
protocol design
3Protocol Layering
FTP
FTP
TCP
TCP
IP
IP
Ethernet
Token Ring
IP
4Review
- The link layer deals with the actual transport of
bits across a physical medium. - The network layer abstracts the characteristics
of the different link layers to a common layer
(e.g. IP) and provides management functions at
that layer. - The transport layer adds various features
- Reliable communication (tcp)
- Arbitrary message sizes (udp)
- The application layer is the API provided to the
programmer. Protocols are defined above that.
5Problems to identify solve
- Addressing
- How do we name applications?
- How do we name connections?
- How do we name computers?
- For humans
- Across networks
- Within a physical network
- How do we deal with a decentralized organization?
- Who arbitrates decisions?
- Who defines standards?
- How do we deal with a plurality of physical
networks?
6Naming Addresses
- Addresses are defined across three layers
- Physical / link level
- Medium Access Control (MAC)
- Network/IP level
- IP address
- Transport/application level
- Ports
7Media Access and Control
- Media can be arbitrated or be susceptible to
collision - Arbitrated Token Ring or 802.11 in PCF mode
- Collision Ethernet, 802.11 in adhoc mode
- A Collision domain includes all the nodes that
may be affected by a collision
8Hubs Switches
- A hub is a single collision domain, although it
has a physical hub and spoke topology - A switch is a set of distinct collision
domains.Frames destined for another collision
domain are switched from one domain to another
9Addressing at the physical layer
- Ethernet (or 802.3) networks specify a 48-bit
physical MAC address - 00-00-f8-75-5b-a6 -- Unique identifier for the
network interface card (NIC) - Address ranges are assigned to specific
vendorsE.g., 00-00 is Digital Equipment corp. - Certain MAC addresses mean broadcast
10Addressing at the physical/link layer
- Frames are delivered to NICs with that
specific MAC address (or all w/broadcast) - A hub presents each frame to all NICs
- A switch moves frames from one collision domain
to another based on the MAC address - A table is maintained that specifies which MAC
addresses are on which collision domain. - Frames destined for an unknown MAC address are
broadcast to all collision domains
11The reality of the world today
- A 10-BaseT ethernet NIC runs 9 for a cheapo
PCI/ISA10-BaseT via USB is 40. 100BaseT via
PCI is 30.Gigabit NIC is 350. - A 4-port hub costs 40. Switches are gt70.
Gigabit is much more (gt2000).
12More Realities
- Single nodes on switches allow you to use duplex
communication - Send receive concurrently
- You need to use high-qualitycabling (Cat5)
for100 Mb/s networks - Gigabit networks currently require fiber, but
cable standard now available. - Modest network bandwidth contention is a
problem you throw money at, not brains.
NIC
13TokenRing / FDDI
- A token circulates amoung all computers.You
can only transmit if you have the token. - Variations More than one tokenbased on
lengthor e.g. WDM or FDM.
14More Addressing
- So, at the physical layer, Ethernet/802.3 uses a
MAC address - Can locate computers within a single physical
network - You want to limit network size - broadcast
packets still affect full network. - How do you address at the network and transport
level?
15IP Addressing
- Each host in the internet has a unique 32-bit
address - Im lying
- There are three address types
- Unicast communication -- destined for a single
host - Broadcast communication -- destined for all hosts
on a network - Multicast communication -- destined for a set of
hosts that belong to a multicast group. - Note the use of network and host
- Network IDs are assigned by the InterNIC
16IP Addressing
0
netid/7
hostid/24
Class A
1
netid/14
hostid/16
0
Class B
1
0
1
netid/21
hostid/8
Class C
1
1
1
multicast group/28
0
Class E
1
1
1
1
multicast group/28
Class F
Class Range (as dotted quad)A 0.0.0.0 to 127.25
5.255.255B 128.0.0.0 to 191.255.255.255C 192.0.0
.0 to 223.255.255.255D 224.0.0.0 to 239.255.255.2
55E 240.0.0.0 to 255.255.255.255
17Problems Subnets
- A few companies got class A networks(e.g.,
Digital, Xerox) - Many educational institutions got class B
networksE.g., my primary computer is
128.138.241.78 - Most people get class C networks. E.g., my cable
modem in Palo Alto was 208.166.41.96 - Allegedly, broadcasts would go to an entire
network - Obviously impractical for a Class A
network.Thats 16,777,216 hosts - Well discuss subnetting and routing later
18Mapping names to numbers
- Obviously, its hard to remember that
128.138.241.78 is my computer - But, numbers are more useful when actually
switching messages - The Domain Naming System maps names to IP
addresses - A tree-structured distributed database and naming
scheme - Each separately administered subtree is a zone
- Network Solutions handles registration of each
top level domain (e.g., colorado.edu). - Sub-domains are then administered by individual
groups - cs.colorado.edu
- Well discuss how names are resolved later
19Transport Level Naming
- Each NIC receives messages for a number of
applications - How do we differentiate the data intended for
different apps? - Each IP connection has an associated 16-bit port
number. - Port numbers are contained in each TCP UDP
packet - Some port numbers are well known services
- E.g., telnet is always port number 23
- Port numbers from 0..1023 are for well known
services.Those port numbers are assigned by the
Internet Assigned Numbers Authority (IANA)
20Transport Naming in Unix
- Unix uses reserved ports for security
- Only the superuser can create ports in the range
of 0..1023. - This is used for simplistic authentication
- On most unix systems, /etc/services lists the
reserved ports
systat 11/tcp users daytime 13/tcp daytime 13/
udp netstat 15/tcp qotd 17/tcp
quote text chargen 19/tcp ttytst
source chargen 19/udp ttytst source ftp-data 20/
tcp ftp 21/tcp ssh 22/tcp SSH Remote Login
Server ssh 22/udp SSH Remote Login Server
21Representing TCP UP
- UDP is a datagram or message oriented
protocol - Maps well to Ethernet, etc
- TCP is a stream oriented
- Appears to be an infinite stream of bytes
- This maps to frames by packetization
IP Packet
IP Packet
IP Packet
IP Packet
IP Packet
22Encapsulation
- Application level communication typically has
three levels of addressing - Application information (e.g., HTML headers)
- Transport information (port)
- Network information (IP address)
- Link information (MAC address)
- Each layer is encapsulated in the preceding
layer. - We mux or encapsulate the message when its
sent - We demultiplex the message when it arrives
- Leads to layered software design
23Encapsulation as it goes down the protocol stack
User Data
App.
User Data
App Hdr
User Data
App Hdr
TCP Hdr
TCP
User Data
App Hdr
TCP Hdr
IP Hdr
IP
User Data
App Hdr
TCP Hdr
IP Hdr
EthernetHeader
Ethernettrailer
Ethernet
14
20
20
4
46-1500 bytes
24Demultiplexing
Ethernetdriver
EthernetFrame
ARP
IP
RARP
Other
IPHeader
TCP
UDP
ICMP
IGMP
TCP/UDPHeader
App
App
App
App
25Standards Bodies
- Lots of arbitrary constants here!
- Naming, IP assignment, protocol header formats,
etc - Largely volunteer organization
- Internet Society -- "We are the most public
secret cabal in the history of the world." - Jon
Postel - Internet Architecture Board (IAB) - technical
oversight coordination body - Internet Engineering Task Force (IETF) -
near-term, standards-oriented. Develops
specifications that become internet standards - Internet Research Task Force (IRTF) - RD arm
26Standards are embodied by RFCs
- Request for Comment (RFC)
- Unique monotoniclly assigned numbers. RFCs can
not be revised, only re-issued. - All RFCs are available on-line
- www.faqs.org has nice searchable index
- www.ietf.org has information on drafts and
working groups
27Standards
- Ethernet defined by Digital, Xerox and Intel
- Later, the IEEE published a different set of
standards - http//grouper.ieee.org/groups/802/
- 802 defines a logical link control common to
all 802 nets - 802.3 covers many CSMA/CD networks
- 802.4 covers token bus networks
- 802.5 covers token ring networks
- 802.11 covers wireless ethernet
28Standards
- In the IP world,
- RFC 894 defines IP-in-ethernet
- RFC 1042 defines IP-in-802
- The host requirements RFC says that all hosts
connected to 10-Mbit Ethernet cable should - Be able to send/receive using RFC 894
- Be able to send/receive a mix of RFC 1042 and 894
packets - May be able to send packets using RFC 1042. If
either can be sent, you must default to 894
packets
29Ethernet 802.3 Encapsulation
- Destination MAC or hardware address
- Each NIC has a unique hardware address
- Source MAC or hardware address
- Protocol type to allow sharing the same physical
media with several different protocols - Type fields are defined by RFC 1700, which makes
RFC 1340 obsolete - Some data
- A checksum
30Ethernet Encapsulation (RFC 894)
6
6
2
46-1500 bytes
4
...
DestAddr.
SrcAddr.
Type
CRC
Payload
0800
IP Datagram
46-1500 bytes
0806
ARP request/reply
PAD
28 bytes
18 bytes
8035
RARP request/reply
PAD
31Variations
- Observation
- Ethernet MAC information is fixed and can be
pre-computed - Data is typically fixed size
- Other fields (IP and TCP headers) can vary in
size and also have CRC fields for end-to-end IP
checksums - RFC 893 describes trailer encapsulation
- The IP and TCP headers move to the end of the
frame - Helps in computing IP checksum
- Allows more efficient use of scatter/gather DMA
hardware
32802.3 Encapsulation
- Explicit length - number of bytes up to but not
including the CRC - 802.2 LLC - link layer control common to all 802
networks and needed for e.g. wireless
communication - DSAP - desination service access point (0xaa)
- SSAP - source service access point (0xaa)
- Control field is set to 3
- 802.2 SNAP - sub-network access protocol
- Fixed origin code (0)
- Type field, as in the Ethernet type field
33802.3 Encapsulation
802.2LLC
802.2SNAP
802.3 MAC
DSAP AA
Control
SSAP AA
Payload has same format as Ethernet encapsulation
34SLIP - Serial Line IP
- Specified in RFC 1055
- IP datagram is terminated by the special END
(0xc0) character. Most implementations transmit
END at the start as well. - If a byte in the IP datagram contains END, the 2
byte sequence 0xdb, 0xdc is transmitted (byte
stuffing). - 0xdb is the SLIP escape (ESC) character.
- If a byte in the IP datagram equals the SLIP ESC,
the 2 byte sequence 0xdb, 0xdd is transmitted
35SLIP Encapsulation w/Byte Stuffing
IP Datagram
C0
DB
DB
DC
DB
DD
C0
36Problems with SLIP
- Each endpoint must know the IP address of the
other endpoint. - Theres no TYPE field -- thus, SLIP only supports
a single protocol - Theres no checksum - thus, all retransmissions
are initiated by end-to-end re-transmissions
37PPP - Point-to-Point Protocol
- Encapsulate IP datagrams on a serial link
- A Link Control Protocol (LCP) to establish,
configure and test the data-link connection. - This allows connection feature negotiation
- A family of Network Control Protocols specific to
different network layer protocols - IP
- OSI networks (X.25)
- DECnet
- AppleTalk
38PPP Protocol
FLAG
Addr
Cntl
Proto
Payload
CRC
FLAG
- Byte stuffing as in SLIP/CSLIP protocol
- Bytes with values less than 0x20 are also escaped
to avoid problems with flow-control - Most implementations can negotiate to eliminate
ADDR and CNTL fields, reducing overhead to 1 byte.
39MTU
- Most link layers have a limit to the size of an
IP datagram, or Message Transmission Unit (MTU) - If an IP datagram gt MTU, then it is fragmented
(Chap 11.5)
Network MTU (bytes)Hyperchannel 6553516Mb token
ring 179144MB token ring 4464FDDI 4352Ethernet
1500IEEE 802.3 1492X.25 576PPP 296
40Path MTU
- Messages traverse a route or path through a
network. - The smallest MTU along that path is called the
Path MTU. - Not always constant, since the route between two
nodes in the network can vary - Also, routing isnt a bijective relationship, and
thus the A-gtB MTU may differ from the B-gtA MTU - RFC 1191 defines path MTU discovery, which is
the process of automatically discovering the
smallest MTU along a path. - Everyone does this
41The IPv4 Protocol
- IP is a best effort connectionless protocol
- Its a datagram/packet oriented protocol
- You can get an IP packet from anyone without any
setup or connection establishment - Packets are normally routed using destination
routingYou specify where packet is to go, not
how it gets there - You can optionally specify source routingYou
specify route for packet as part of the packet - Each packet is routed independently
- Can be delivered out of order
- Might not be delivered at all
42Conventions in IPv4 - Network Byte Order
- IP data is layed out in Big Endian
OrderByte transmission order 0, 1, 2, 3 - Representing a 16-bit integer in memory
- Big endian 0,1 - (SPARC, M68k)
- Little endian 1,0 - (x86, Alpha)
- Network byte order is defined to be big endian
0
15
16
31
0
1
2
3
43Conventions in IPv4
- When we need to set fields in an IP header, we
will need to use translation functions to be
portable. - Actually, you need this for all binary fields
- htons - host to network short (16-bit)
- port_number htons (port_number)
- ntohs - network to host short
- htonl - host to network long (32-bit)
- htonl (interface_addr.get_ip_address ())
- ntohl - network to host long
44Whats Stored in an IPv4 Packet?
- Version - 4 bit field specifying the IP version.
Currently 4 - Header length - specified in 32 bit words. Range
is 5..15 words, or 20..60 bytes - Type of Service (8 bits)
- 3 bit precedence field (ignored today), one must
be zero field - 4-bit field specifying desired service qualities.
- Minimize Delay
- Maximum Throughput
- Maximize Reliability
- Minimize Monetary Cost
- Only one bit can be set. None set is normal
service - Largely ignored by routers IP implementations
45Whats Stored in an IPv4 Packet
- Message length, in bytes
- Datagram identification field that must be unique
- Used with flags fragment offset if a message
must be fragmented - Time to live field - upper limit on the number of
hops a message can go before being dropped - Protocol - identifies TCP, UDP, ICMP, etc
- Header checksum - checksum of just the TCP/IP
header - Source address
- Destination address
- Options
46IPv4 Protocol Layout
Version
Hdr Lth
Type of Svc
Total length (in bytes)
16-bit Packet Identification
Flags
Fragment Offset
Time To Live
Protocol
Header Checksum
Source IP Address
Destination IP Address
... (options, if any)...
Data
47Parsing the IPv4 Packet
- Data starts at Total Length - Header Length
- Maximum IP data gram is 65535 bytes
- Hosts are not required to receive packets gt576
bytes - Ethernet MTU is 1540 bytes
- Most implementations allow for 8192 byte IP
datagrams (because of Network File System)
48IPv4 Options
- Security handling restrictions
- Have each router record its IP address
- Have each router record its IP address and
timestamp - Loose source routing - specify a list of IP
addresses that must be traversed by the packet - Strict source routing - enforce that list
- 60-byte limit on IP headers limits utility of
these options - We need to worry about source routing when we
talk about AdHoc routing
49Hop-by-Hop IP Routing
- Datagram arrives
- For this host? gt Deliver to TCP, UDP, etc
- Else gt Lookup next hop in routing table If
theres an entry, forward the message Else gt
discard the datagram
50Routing Tables
- Routing table Contents
- Destination IP address (either HOST or NET
address) - IP address of the next hop router
- Flags (HOST/NET, Router/Direct)
- Network interface to use
- Routing lookup
- Search for an entry that matches the destination
IP address - Handles directly connected or point-to-point
links - Search for an entry that matches the destination
network - If found, send to the directly-connected router
or interface - Search for a default route
51Problems in Routing
- Remember, IP routing is decentralized.
- How are routing tables established?
- You can specify static routes, I.e., hard-coded
information about your local network - A default route is usually specified via a static
route, but its not sufficient - Routers share their local information using
dynamic routing protocols that propogate local
information across a large network
R1
R3
DST
SRC
R2
52Example Host Routing
default
Enet, 140.252.13
default
53Example Host Routing
140.252.13.33
Enet, 140.252.13
140.252.13.33
Enet
54Example Host Routing
140.252.13.33
Enet, 140.252.13
140.252.13.33
Enet
55Example with Router
gateway
.1.4
Enet, 140.252.1
.1.183
netb
.1.29
bsdi
sun
.13.35
.13.33
Enet, 140.252.13
56Example with Router
gateway
.1.4
Enet, 140.252.1
.1.183
netb
Next 140.252.13.33(default)
.1.29
bsdi
sun
192.252.1.183
.13.35
.13.33
Enet, 140.252.13
57Example with Router
gateway
.1.4
Enet, 140.252.1
.1.183
netb
Next 140.252.13.33(default)
.1.29
bsdi
sun
192.252.1.183
.13.35
.13.33
Enet, 140.252.13
192.252.1.183
Enet
58Example with Router
gateway
.1.4
Enet, 140.252.1
.1.183
netb
192.252.1.183
Next 140.252.1.183(default)
.1.29
bsdi
sun
192.252.1.183
.13.35
.13.33
Enet, 140.252.13
192.252.1.183
Enet
59Example with Router
gateway
192.252.1.183
Enet
.1.4
Enet, 140.252.1
.1.183
192.252.1.183
netb
Next 140.252.1.4(default)
192.252.1.183
.1.29
bsdi
sun
.13.35
.13.33
Enet, 140.252.13
60Example with Router
gateway
192.252.1.183
Enet
.1.4
Enet, 140.252.1
.1.183
netb
Next 140.252.104.2(default)
.1.29
bsdi
sun
.13.35
.13.33
Enet, 140.252.13
61Key notes
- All the hosts and routers used a default route
- The destination IP address never changed
- All routing decisions were made based on that
routing address - Different link-layer encapsulation schemes were
used as the message went from Ethernet to CSLIP
to Ethernet
62Subnet Addressing
- Routing is based on networks
- Routing for all nodes in network is handled by a
single router - Class B - single address routes traffic for 65536
addresses - Class C - single address routes traffic for 256
addresses - Original network field unworkable for network
like things since class A B had too many bits
devoted to network field - Hence, subnets - specified by RFC 950
- Imposes logical ordering, allowing many networks
of fewer machines - Hierarchical - still a single advertised router
for a Class B network
63Common Subnetting Sizes
16 bits
8 bits
8 bits
Subnetid
Hostid
Network ID 128.138
16 bits
10 bits
6 bits
Subnetid (241)
Hostid (78)
Network ID 128.138
64Why different sizes?
- Its possible to have networks span multiple
physical media - VPN software
- Its possible to have multiple networks on a
single physical media - The ideal goal is to have a single network
(subnet) per physical media - All broadcast traffic is routed to that physical
media, so many networks on the same media causes
more traffic - More networks allows better clustering of
network traffic
65Subnet Masks - how are subnets specified?
- Subnet mask has 1s on left, zeros on right
- Specifies which bits are the host id in an IP
address - Stevens Corrections
- arbitrary bitmask not allowed
- Subnet zero can be used
16 bits
8 bits
8 bits
Subnetid
Hostid
Network ID 128.138
255.255.255.0
16 bits
10 bits
6 bits
Subnetid (241)
Hostid (78)
Network ID 128.138
255.255.255.192
11111111 11111111
11111111 11
000000
66Given an IP address...
- Select router based on the IP address
- I.e., for Class B, use the upper 16-bits as a
network specification - For class C, use the upper 24-bits as a network
specification - Route to that network (using routing tables..)
- Then,
- That router uses the pre-specified subnet mask to
select a subnet - A subnet routing table is consulted and traffic
is directed to that subnet - More hierarchical structure
67Special IP Addresses
- Special source addresses as part of an
initialization procedure (e.g. bootp) - This host on this network NET 0, HostID 0
- Specified host on this networkNET 0, HostID
this host - Loopback addresses
- Loopback address - allows applications on same
host to communicate using TCP/IP - NetID 127, HostID anything
68Special IP Output Addresses
- Limited Broadcast - Typically used for
initialization - Only appears on local cable/collision domain
- NETID -1, HostID -1
- Net-directed Broadcast (to netid)
- Forwarded via router
- NETID netid, HostID -1
- Subnet-directed Broadcast (to netid, subnetid)
- NETID netid, SubnetID subnetid, HostID -1
- All subnets-directed broadcast for netid
- Most routers dont support this - use Multicast
instead to do the same thing - NETID netid, SubnetID -1, HostID -1
69Loopback Devices
- Loopback devices allow applications on the same
host to talk to each other directly - No packet directed to the loopback device can
appear on any physical network - Typical implementation results in
- Loopback typically implemented as another network
layer - Everything sent to loopback (127.0.0.1) appears
as IP input - Datagrams sent to broadcast or multicast
addresses are copied to the loopback interface
and also sent on Ethernet - Anything send to one of the hosts own IP
addresses is sent to the loopback device
70Loopback Devices
IP outputFunction
IP inputFunction
Destination IP addressequals broadcast
addressor multicast address?
Place on IPinput queue
Place on IPinput queue
Yes
LoopbackDriver
Yes
Destination IP addressequals interface address?
No, use ARP toget dest. Ethernetaddress
IP
EthernetDriver
Demultiplex based onEthernet frame type
ARP
ARP
71IFCONFIG - determining interface configurations
on Unix systems
- foobar-39 ifconfig -a
- tu0 flagsc22ltBROADCAST,NOTRAILERS,MULTICAST,SIMP
LEXgt - tu1 flagsc63ltUP,BROADCAST,NOTRAILERS,RUNNING,MUL
TICAST,SIMPLEXgt - inet 128.138.241.78 netmask ffffffc0
broadcast 128.138.241.127 ipmtu 1500 - sl0 flags10ltPOINTOPOINTgt
- lo0 flags100c89ltUP,LOOPBACK,NOARP,MULTICAST,SIMP
LEX,NOCHECKSUMgt - inet 127.0.0.1 netmask ff000000 ipmtu 4096
72Netstat - statistics more
- foobar-40 netstat -in
- Name Mtu Network Address
Ipkts Ierrs Opkts Oerrs Coll - tu0 1500 ltLinkgt 08002be4c18c
0 0 0 0 0 - tu1 1500 ltLinkgt 0000f800a3f2
165378215 0 155667063 37 2792801 - tu1 1500 128.138.241.64/26 128.138.241.78
165378215 0 155667063 37 2792801 - sl0 296 ltLinkgt
0 0 0 0 0 - lo0 4096 ltLinkgt
88293425 0 88293510 0 0 - lo0 4096 127/8 127.0.0.1
88293425 0 88293510 0 0 - foobar-41
73ARP / RARP / ICMP
- ARP is a protocol for mapping and IP address to a
MAC address - RARP is a protocol for managing a machine --
telling a machine what its IP address should be,
based on the MAC address - ICMP is the internet control message protocol and
is used to manage ( measure) many aspects of IP
74ARP - The Problem
- Once a packet has been routed to a specific
network, we need to deliver it to the appropriate
host - The host Ethernet only listens to an ethernet MAC
address - We only have an IP address
- Thus, we need to know how to map the IP address
to a MAC address
75ARP - Example
- FTP uses gethostbyname to determine the IP
address of an FTP server - FTP asks TCP to establish a connection
- TCP send a connection request to that IP address,
which is on the local network - The O/S uses ARP to determine the Ethernet MAC
address - The destination O/S replies the reply is
received - The IP layer can now send the packet
76The sequence
77Format of an ARP request
HardwareSize
ProtocolType
SizeType
HardwareType
EthenetDest. Address
EthernetSrc Address
SenderEnet Addr
SenderIP Addr
TargetEnet Addr
TargetIP Addr
Notice this!Used by Proxy ARP
78Notes
- ARP uses a physical (Ethernet) broadcast to the
network - A unicast response is used to inform the sender
of the appropriate MAC address - ARP responses are cached by the kernel
- Everyone listens to the Sender message and caches
that response
79You can use arp to see the ARP table
- dirk-linux-23 arp
- Address HWtype HWaddress
- foobar.cs.colorado.edu ether 0000F800A3F2
- equium.cs.colorado.edu ether 00A0C94922F4
- itsydev.cs.colorado.edu ether 00A0CC50BD00
- cs-gw3-esl.cs.colorado. ether 00E0F7940580
- dirk-vmware.cs.colorado ether 00A0CC50C48A
80Example ARP exchange
Destination MAC
Sender MAC
- 110754.537688 00f80a3f2 ffffffffffff
arp 42 arp who-has ragtop.cs.colorado.edu tell
foobar.cs.colorado.edu - 110754.538665 00f8755b8c 00f80a3f2
arp 60 arp reply ragtop.cs.colorado.edu is-at
00f8755b8c
81Proxy ARP
- ARP Packets reply to the Sender Hardware
Address and cache the Target Hardware Address - This can be different than the Ethernet Source
Address of the reply! - Thus, host A can reply for host B, and all IP
packets destined for B will be sent to A - Host A can then insure they get to host B
82Using Proxy ARP to Bridge Networks
gateway
.1.4
Enet, 140.252.1
.1.183
netb
.1.29
bsdi
sun
192.252.1.183
.13.35
.13.33
Enet, 140.252.13
83Using ARP to spot configuration problems
- At boot-up, many systems issue an ARP request for
their own IP address. - If anyone responds, something is mis-configured
- You can also use gratuitous ARP for rapid
fail-over. - Everyone (usually) snoops the sending hardware
address - Server A B have same internal IP address, but A
is dormant - Server A listens for a death song from Server B
- Server A immediately send an ARP request
- Everyone now thinks that A is the specified IP
address
84RARP
- RARP is a reverse ARP request
- A host knows its MAC address, but not the
specified IP address - Broadcasts an RARP who-is request
- An RARP server looks up the MAC address in a
table (/etc/ethers) and replies with the IP
address - DHCP protocol provides same functionality, better
management
85ICMP - Internet Control Message Protocol
- Communicates error and exceptional conditions
- Some ICMP messages cause errors to be returned to
the use process
IPHeader
(contents depend on type code)
8-bit type
Checksum of ICMP
8-bitcode
86ICMP Types
87Error Reporting
- ICMP never returns errors (e.g. destination
unreachable) for - ICMP error messages
- A datagram destined for an IP broadcast address
- A datagram sent as a link-layer broadcast
- A fragment other than the first
- A datagram whose source address does not define a
single host (zero, loopback, broadcast or
multicast) - Avoids broadcast storm
- Implies that protocols must be able to deal with
dropped ICMP packets
88ICMP Destination Unreachable Codes
89UDP Protocol
IP Datagram
UDP Datagram
IPHeader
UDPHeader
UDPData
20 bytes
8 bytes
90UDP Header
16-bit Destination Port
16-bit Source Port
16-bit UDP Checksum (opt)
16-bit UDP Length
Data (if any)
91UDP Header
IPHeader
UDPHeader
UDPData
92UDP Checksum
IPHeader
UDPHeader
UDPData
IPPesudo-Header
93IP Pseudo-Header
32-bit Source IP address
32-bit Destination IP address
MBZ
Protocol
16-bit UDP Length
16-bit Destination Port
16-bit Source Port
16-bit UDP Checksum (opt)
16-bit UDP Length
Data (if any)
Possible odd byte
PAD
94UDP Checksum
- Checksum calculated like IP checksum, but use
pseudo-IP header to insure packet arrived at
proper host - If transmitted checksum field is zero, it means
sender didnt compute the checksum. - If the computed checksum would be zero, its
represented as 65535 - No packets with checksum errors are not reported
95IP Fragmentation
- When a router transits a packet that is too large
for the MTU of the outgoing link, the packet is
fragmented - Fragmented packets are not reassembled until they
reach their final destination - Fragments may also be fragmented
- Fragments are identified using the datagram
sequence - Typically, if any fragment is lost, a router will
discard all fragments. Routers usually only
discover fragment loss if they drop the fragment
themselves. - The endpoint assumes fragments are lost after
30-60 seconds
96Packets vs. Datagrams
- An IP datagram is the unit of end-to-end
transmission at the IP layer (before
fragmentation after reassembly) - A packet is the unit of data passed between the
IP layer and the link layer. - A packet can be a complete IP datagram or a
fragment
97IP Fragmentation
Payload
IP Header
Payload
IP Header
Payload
IP Header
More Fragementsis Set
Payload
IP Header
More Fragementsis NOT Set
98IP Fragmentation - Identifying Information
Payload
IP Header
Payload
IP Header
99IP Fragmentation
Payload
IP Header
Payload
IP Header
100IP Fragmentation Of Non-Final Fragments
Payload
IP Header
Payload
IP Header
Payload
IP Header
More Fragementsis Set
IP Header
Payload
Payload
IP Header
101IP Fragmentation Of Final Fragment
Payload
IP Header
Payload
IP Header
Payload
IP Header
Payload
IP Header
More Fragementsis Set
IP Header
More Fragementsis NOT Set
IP Header
102Dont Fragment
- Hosts must be able to receive packets of 576
bytes, which means a 512-byte datagram wont be
fragmented - One of the IPv4 header flags specifies that this
packet should not be fragmented
16-bit Packet Identification
Fragment Offset
Reserved
Dont Fragment
MoreFragments
103ICMP Unreachable Error
- Attempting to fragment a fragment with dont
fragment set generates an ICMP error packet - ICMP type destination unreachable (type 3)
- code fragmentation required but dont fragment
set (code 4)
Type (3)
Code (4)
Checksum
MTU of next network hop
MBZ
IP Header (including options)and first 8 bytes
of original IP datagram data
104MTU Discovery UsingDont Fragment Packets
105ICMP Source Quench
- If a router / host discards datagrams due to
buffer overflows, it may send a ICMP source
quench message - I tried for 15 minutes to generate this on a slow
host was unable to do so - More likely to occur when e.g., routing to a
dialup, but even that failed. - Can be used by a protocol to slow down
transmission rate (e.g., TCP)
106UDP Pragmatics (review from code)
- UDP port and TCP ports are separate name spaces
- UDP port 80 doesnt mean the same thing as TCP
port 80 - UDP ports are unique to a specific interface
- port 80 on loopback is not the same as port 80 on
eth0 - Most POSIX/UNIX systems let you specify
wildcards - IPADDR_ANY is a special address (0.0.0.0) that is
a wild card interface address
107Using netstat to see ports
current-45 netstat -n -a Active Internet
connections (including servers) Proto Recv-Q
Send-Q Local Address Foreign Address
State tcp 0 0 128.138.202.9222
128.138.241.121813 ESTABLISHED tcp
0 0 0.0.0.06000 0.0.0.0
LISTEN tcp 0 0 0.0.0.022
0.0.0.0 LISTEN tcp
0 0 0.0.0.01024 0.0.0.0
LISTEN tcp 0 0 0.0.0.0758
0.0.0.0 LISTEN tcp
0 0 0.0.0.025 0.0.0.0
LISTEN tcp 0 0 0.0.0.0113
0.0.0.0 LISTEN tcp
0 0 0.0.0.079 0.0.0.0
LISTEN tcp 0 0 0.0.0.0512
0.0.0.0 LISTEN tcp
0 0 0.0.0.0513 0.0.0.0
LISTEN tcp 0 0 0.0.0.0514
0.0.0.0 LISTEN tcp
0 0 0.0.0.023 0.0.0.0
LISTEN tcp 0 0 0.0.0.021
0.0.0.0 LISTEN tcp
0 0 0.0.0.037 0.0.0.0
LISTEN tcp 0 0 0.0.0.013
0.0.0.0 LISTEN tcp
0 0 0.0.0.0111 0.0.0.0
LISTEN udp 0 0 0.0.0.08000
0.0.0.0 udp 0 0
0.0.0.0768 0.0.0.0 udp 0
0 0.0.0.0770 0.0.0.0 udp
0 0 0.0.0.0177 0.0.0.0
108Using netstat to see interfaces
current-45 netstat -n -a . udp 0
0 128.138.202.928000 0.0.0.0 udp 0
0 127.0.0.18000 0.0.0.0 udp
0 0 0.0.0.0769 0.0.0.0 udp
0 0 0.0.0.0768 0.0.0.0
109System Calls Used
- Socket
- Create an endpoint on the local system
- Bind
- Specify the local interface and port for the
endpoint - Connection
- Specify the remote interface and port for the
endpoint - setsockopt / getsockopt
- Modify various default properties
110Bound Connected Sockets
- Until bind is called, a socket is not bound
- Cant receive messages (havent specified port)
- When you send using an unbound socket, its
bound to an ephemeral port - Until connect is called, a socket is not
connected - Sending messages on an unconnected socket
requires that you specify the destination address
each time. - If you do call connect, you can only receive
messages on the connected socket from that the
specified remote endpoint
111POSIX socket interface
- send
- Send a message on a connected socket
- sendto
- Send a datagram to a specified IP address. The
socket can be unconnected. - recv
- Receive a datagram from a bound socket
- recvfrom
- Receive a datagram and record the source IP
address - recvmsg
- Essentially like recvfrom, but arguments packed
in a struct
112One Last POSIX call - select
- Select lets you wait on multiple file descriptors
to become available, or for a timeout to occur - include ltsys/time.hgt
- int select(
- int nfds,
- fd_set readfds,
- fd_set writefds,
- fd_set exceptfds,
- struct timeval timeout)
113Common UDP Server Pattern
socket
recvfrom
setsockopt
sendto
bind
114Common UDP Client Pattern
socket
sendto
setsockopt
recvfrom
115Using Broadcast
- UDP broadcast involves sending to explicit
broadcast addresses - Most POSIX implementations require you explicitly
enable broadcastret setsockopt(sockfd,
SOL_SOCKET, SO_BROADCAST, on, sizeof(on)) - Only applicable to UDP!
116Broadcast Addresses
- Limited Net Broadcast - 255.255.255.255
- Never forwarded by a router!
- Net-directed Broadcast - e.g., 128.138.255.255
- A router must forward a net-directed broadcast,
but must have an option to disable this. - Subnet-directed Broadcast - e.g., 128.138.202.255
117Multicast
- Class D addresses are multicast addresses
- 224.0.0.0 through 239.255.255.255
- A specific multicast address defines a network
group - Two special network groups
- 224.0.0.xxx is never routed
- 224.0.0.1 - all hosts group
- 224.0.0.2 - all routers group
118Well-defined multicast groups
- ntp.mcat.net is 224.0.1.1
- Network time protocol
119Joining a Multicast Group prior to receive
- //
- // Set socket option to to joint mcast
- //
-
- struct ip_mreq mreq
- memcpy(mreq.imr_multiaddr,
- from_addr.sin_addr.s_addr,
- sizeof(struct in_addr))
- mreq.imr_interface.s_addr
htonl(INADDR_ANY) - ret setsockopt(sockfd, IPPROTO_IP,
IP_ADD_MEMBERSHIP, - mreq, sizeof(mreq))
- check_and_exit(ret, "setsockopt")
-
Desired multicast group
120Using TTL to define multicast scope
- TTL field is used to limit propogation of
multicast packets - In IPv4
- 0 - node local - doesnt leave machine
- 1 - link local - doesnt get routed
- lt32 - site local - .But whats a site?
- lt255 - global - The world
121Setting a TTL scope
- //
- //
- // Set socket option to to joint mcast
- //
-
- u_char ttl 16
- ret setsockopt(sockfd, IPPROTO_IP,
IP_MULTICAST_TTL, - ttl, sizeof(ttl))
- check_and_exit(ret, "setsockopt")
-
Desired TTL field
122Administrative Scoping
- 239.xxx.yyy.zzz is the administratively scope
multicast IP space - Addresses assigned locally to an organization,
but not unique across organizations - Border routers must not forward
- link-local -- 224.0.0.0 to 224.0.0.255
- site-local -- 239.255.0.0 to 239.255.255.255
- organization-local - 239.192.0.0 to
239.195.255.255 - global -- 224.0.1.0 to 238.255.255.255
123Converting Multicast to Ethernet
- Multicast addresses are targeted to a number of
clients - How does the ethernet card know which messages to
receive? - Could simple broadcast all packets
- Takes the same amount of network bandwidth as
selective multicast, but.. - Disturbs all machines
- Can use ARP to advertise single MAC as resolving
multiple IP addresses - ..But multiple machines want to receive
124Mapping a Multicast Address toEthernet Address
- Ethernet cards can usually receive on multiple
MAC addresses - Multicast router enters a virtual MAC address,
clients receive on that virtual MAC
125Digression - Virtual IP Addresses
- alias alias_address/bitmask
- Establishes an additional network address for
this interface. - Eample ifconfig eth0 alias 128.138.241.79/26
- The following aliaslist command adds network
addresses 40 through 50, inclusive, to
subnets 18.240.32, 18.240.64, and 18.240.96 - ifconfig aliaslist 18.240.32,64,96.40-50
- Doesnt require multiple MAC addresses, but often
implemented using them.
126IGMP - Internet Group Management Protocol
- Part of IP layer
- Lets hosts routers know who belongs to what
groups
IP Header(20 bytes)
IGMPMessage (8 bytes)
127IGMP Message Format
- Type
- 1 is a query sent by multicast router
- 2 is a reponse sent by a host
- Group address is a class D IP address
- On query, its zero
- On response, its the group address being reported
128IGMP Host Reports
- Host sends a report when it joins a group
- Doesnt report when it leaves the group, but
doesnt respond to next query
IGMP report, TTL 1IGMP group addr group
addressdest IP addr group addresssrc IP addr
hosts IP address
Router
Host
129IGMP Router Query
- Router sends query at regular intervals to see if
anyone still belongs to any groups. Queries sent
out each interface. - Host responds by sending one responsefor each
group to which it belongs
IGMP query, TTL 1IGMP group addr 0.0.0.0dest
IP addr 224.0.0.1src IP addr routers IP
address
Router
Host
130Sample Query on Windows Bootup
- 055232.517937 arp who-has 192.168.1.6 tell
192.168.1.6 - 055232.518010 linux gt 192.168.1.6 icmp echo
request - 055233.378928 192.168.1.6 gt ALL-ROUTERS.MCAST.NE
T icmp router solicitation - 055237.511385 arp who-has 192.168.1.6 tell
linux - 055237.511664 arp reply 192.168.1.6 is-at
0a0cc3b954b - 055238.453193 192.168.1.6 gt ALL-ROUTERS.MCAST.NE
T icmp router solicitation - 055243.477432 192.168.1.6 gt ALL-ROUTERS.MCAST.NE
T icmp router solicitation
131Multicast Routes
132Sender Sends DatagramWith Specified TTL
Pruned because no one is listening
133Receiver Starts Joins group
134Routers Form Destination Tree
135Non-participants prune themselves
136TCP Protocol
137TCP Protocol
IP Datagram
TCP Segment
IPHeader
TCPHeader
TCPData
20 bytes
20 bytes
Variable size
138The TCP Protocol
16-bit Destination Port
16-bit Source Port
32-bit sequence number
32-bit acknowledgment number
16-bit window size
flags
reserved
Header lth
16-bit TCP Checksum
16-bit Urgent Pointer
Options (if any)
Data (if any)
139The TCP Flags
URG
ACK
PS H
RST
S YN
FIN
- URG - urgent pointer is valid (Stevens, 20.8)
- ACK - the acknowledgment number is valid
- PSH - The reciever should pass this data to the
application as soon as possible (push) - RST - reset the connection (Stevens, 18.7)
- SYN - synchronize the sequence numbers to
initiate a connection - FIN - sender is finished sending data
140TCP Header
- The combination of an IP address and a port
number is called a socket in RFC 793. - A socket pair specifies a TCP connection
- The sequence number is used to number the
starting byte of each segment. The byte sequences
wrap around after 32 bits of bytes have been
sent. - When the SYN flag is set, the sequence number
contains the initial sequence number (ISN). - The acknowledgment number contains the next
sequence number the sender of the ACK expects to
receive
141Acknowledgments
- TCP uses a sliding window protocol without
selective or negative acknowledgments. - Selective acknowledgments would let the protocol
say its missing a range of bytes. TCP can only
say that it has received up to byte N. - The protocol has no way to specify a negative
acknowledgment. It can only say what has been
received
0
Next ACK
142Other Header Fields
- TCPs flow control is limited by a window size,
which represents space allocated by the O/S for
the connection. The sender should not transmit
more data than the window size can hold. - When a connection is started, the acknowledgment
field specifies the window size. Can be lt65535
bytes, but this value can be scaled to allow
larger sizes (Stevens 24.4) - The urgent pointer is a positive offset that
must be added to the sequence number of the
segment to yield the sequence number of the last
byte of urgent data. This is used to send
emergency data to the other end (20.8)
143Connection ProtocolThree-way handshake
- The client sends a SYN segment specifying the
port number of the server and the clients ISN - The server responds with a SYN and ISN. Server
ACKs the client SYN using client ISN1. - A SYN consumes one sequence number
- The client must ACK the SYN from the server using
the server ISN1 - The side sending the first SYN is said to perform
an active open. The other side performs a passive
open.
144Connection Timeline
145Termination
- Four segments to terminate a half-close.
- Receipt of a FIN only means no more data will
flow in that particular direction. The other
direction may still be active. - The FIN sender performs an active close, the
other performs a passive close. - When the server receives a FIN, it sends an ACK
of the received sequence number plus one (segment
5) - O/S delivers end of file to application
- Server then closes its connection, causing a FIN
to the client which the client ACKs using the
sequence number 1
146Normal Termination
147TCP State Transition Diagram
148TCP State Transition Diagram
149TCP State Transition Diagram
150Normal TCP/IP Connection Termination
151You can use netstat tosee connection state
- Proto Recv-Q Send-Q Local Address
Foreign Address State - tcp 1 0 dialup-85-157.Col32779
a216-200-14-151.depwww CLOSE_WAIT - tcp 1 0 dialup-85-157.Col32780
a216-200-14-151.depwww CLOSE_WAIT - tcp 0 136 linuxtelnet
grok1050 ESTABLISHED - tcp 0 0 dialup-85-157.Colo1023
foobar.cs.Colorado.ssh ESTABLISHED - udp 0 0 localhost32856
localhostdomain ESTABLISHED
152The 2-MSL Wait State
- Every implementation chooses a value for the
maximum segment lifetime -- the maximum time any
segment can existing in the network before being
discarded - Specified in RFC 793 as 2 minutes, but common
values are 1 or 2 minutes 30 seconds - When TCP performs an ACTIVE CLOSE and sends the
final ACK, that connection must stay in the
TIME_WAIT state for 2MSL. - This lets TCP re-send final ACK if its lost ---
if all connection information was gone, it
couldnt retransmit
153The 2-MSL Wait State
- While a connection is in 2MSL, the socket part
can not be re-used - Most BSD systems also insist that the port cant
be re-used while that (local) port number is in a
2MSL state - Use SO_REUSEADDR to over-ride that
- Normally, the CLIENT does the active close and
the CLIENT enters the TIME_WAIT state - 2MSL wait not typically an issue since the CLIENT
usually picks an ephemeral port, so no one cares
if it cant be reused
154TIME-WAIT for servers
- The TIME_WAIT / 2MSL state causes problems for
servers - An active close on a well-known port means the
server cant be restarted for 2-4 minutes,
depending on MSL. - Allegedly true even if SO_REUSEADDR is specified
- But, BSD systems allow a new connection to be
established if the ISN is larger than that of the
final sequence number of the previous connection
155TCP Options
- End-of-option list
- No-op (used to align options on 32-bit
boundaries) - MSS
- Window scale factor
- Timestamp
- Timestamp value echo reply
156Server Design
- Restricting local IP addresses
- Same rules as UDP
- Restricting foreign IP addresses
- Most APIs dont support a connect on the
server to allow it to fully specify the remote
end-point. - Server specifies incoming connection request
queue - Backlog of active connections
157Overview of Mobile IP
158IPv6 Design Goals
- IPv4 was very successful, but the limited
addresses pose problems - Experience had shown that aspects of IPv4 were
problematic option headers, fragments - Simplifications for IPv6
- Move to 128-bite addresses
- Assign a fixed format to all headers
- Remove the header checksum
- Use extension headers rather than options
- Remove the hop-by-hop segmentation procedure
159IPv4 Header
Version
Hdr Lth
Type of Svc
Total length (in bytes)
16-bit Packet Identification
Flags
Fragment Offset
Time To Live
Protocol
Header Checksum
Source IP Address
Destination IP Address
... (options, if any)...
Data
160IPv6 Header
Version
Flow Label
Class
Payload Length
Next Header
Hop Limit
161IPv6 Header
- Version -- 6
- Class -- used for to assign service class for
real time networking - Flow -- used to identify packets that are in a
flow, or which should the same routing behavior
at intermediate points (not a virtual circuit
identifier or specifier!) - Payload Length -- Only include payload (not 20
byte header) 16 bit, Packets lt 64K - Next Header -- the type of the next header (e.g,
TCP, UDP or one of the extension headers) - Hop limit -- TTL renamed for honesty
162(non) Coexistence
- The original intent was to have IPv4 and IPv6
deployed concurrently over the same network
fabric - That idea has been pitched.
- IPv6 has been assigned an Ethernet Content Type
of 0x86DD vs. the 0x8000 for IPv4 - The 6BONE provides a virtual IPv6 network using
IPv4 encapsulation akin to MBONE.
163Fragments
- Lesson Unit of transmission should be unit of
control - No fragments created enroute in IPv6
- If message gt MTU, you get ICMP message and should
use PMTU - However, there is a way to fragment a datagram,
but its done in an end-to-end fashion.
164From Options To Extension Headers
IPv6 HeaderNext Header TCP
TCP Header Payload
IPv6 HeaderNext Header Routing
Routing HeaderNext Header TCP
TCP Header Payload
165Extension Headers
- Goal Intermediate routers dont need to look at
the headers. Unless we tell them to. - Extension Headers Protocols (e.g. TCP) share
the same 256-entry name space, so limited number
of extensions - Current IPv6 Extension Headers
- Routing Header
- Fragment Header
- Destination Options Header
- Hop-by-Hop Options Header
- Authentication Header
- Encrypted Security payload
166Routing Extension Header
Next Header
Hdr Ext Len
Routing Type0
Segments Left
Reserved
...
167Routing Extension Header
- Plays same role as source routing header
- Basic ideaWhen a datagram reaches a
destination, the destination checks for a routing
header. If there is at least one segment left,
that address is copied from the routing header
and the packet is forwarded to that
address.Otherwise, the routing header is
removed and the next routing header is processed. - You can have multiple routing headers if the
8-bit header length causes a problem. - You can specify other source routing modes using
type
168Fragment Header
Next Header
Reserved
Fragment Offset (13 bits)
M
RES
Identification
- Each fragment routed independently
- identification identifies the original packet
that was fragmented - The offset is the offset within the fragment
- The M field is a more fragments bit and is
set to 1 for all but last fragment
169