Title: CS 498 Lecture 14 The Internet Protocol V4
1CS 498 Lecture 14 The Internet Protocol V4
- Jennifer Hou
- Department of Computer Science
- University of Illinois at Urbana-Champaign
- ReadingChapter 14, The Linux Networking
Architecture Design and Implementation of
Network Protocols in the Linux Kernel
2First Possible Path of an IP Packet
- Packets arrive on an interface and are stored in
the input queue of the respective CPU. - Once the layer-3 protocol in the LLC has been
determined (e.g., ETH_PROTO_IP), the packet is
passed to the ip_rcv() function.
3Recall the Packet Path in LLC
4Internet Protocol Implementation in Linux
Higher Layers
ip_input.c
ip_output.c
ip_queue_xmit
ip_local_deliver
MULTICAST
IP_LOCAL_OUTPUT
. . .
ip_mr_input
IP_LOCAL_INPUT
ip_queue_xmit2
ip_forward.c
IP_FORWARD
ip_local_deliver
ip_forward_finish
ip_forward
ip_output
ip_fragment
ip_rcv_finish
ip_finish_output
ROUTING
ForwardingInformation Base
IP_POST_ROUTING
IP_PRE_ROUTING
ip_route_input
ip_rcv
ip_finish_output2
ARP
ARP
neigh_resolve_output
dev.c
dev.c
dev_queue_xmit
net_rx_action
52nd/3rd Possible Path of an IP Packet
- TCP/UDP packets are packed into an IP packet and
passed down to IP via ip_queue_xmit(). - The IP layer generates IP packets itself, e.g.,
multicast packets or fragmentaion of a large
packet, or ICMP/IGMP packets.
6Path of Incoming IP Packets
7ip_rcv(skb,dev,pkt_type)
- Packets that are not addressed to the host
(packets received in the promiscuous mode) are
rejected. - A sanity check is performed
- Does the packet have at least the size of an IP
header? - Is this IP Version 4?
- Is the checksum correct?
- Does the packet have a wrong length?
- If the actual packet size lt skb?len, then invoke
skb_trim(skb,iph?total_len) - The netfilter hook NF_IP_PRE_ROUTING is invoked
8Packet Filtering Architecture in Linux
Device driver(input)
Device driver (Output)
CRC check Consistency check
NF_IP_PRE_ROUTING
NF_IP_POST_ROUTING
Forwarded packets
Routing
NF_IP_FORWARD (iptables FORWARD)
Routing
NF_IP_LOCAL_IN (iptables INPUT)
NF_IP_LOCAL_OUT (iptables OUTPUT)
Higher layers Local processes
Incoming packets
Outgoing packets
9ip_rcv_finish(skb)
- ip_route_input() is invoked to determine the
route of a packet. - skb?dst is set to an entry in the routing cache
which stores both the destination IP and the
pointer to an entry in the hard header cache
(cache for the layer 2 frame packet header) - If the IP packet header includes options, an
ip_option structure is created. - skb?dst?input() points to the function that
should be used to handle the packet (delivered
locally or forwarded further) - ip_local_deliver(), ip_forward(), ip_mr_input()
10IP Forwarding
- To activate IP packet forwarding, do
- echo 1 gt /proc/sys/net/ipv4/ip_forward
11ip_forward(skb)
- Step 1 Packet not marked with pkt_type
PACKET_HOST are deleted. - Step 2 If TTL 1, then the packet is deleted,
and an ICMP packet with ICMP_TIME_EXCEEDED set is
returned. - Step 3 skb_cow(skb,headroom) is used to check
whether there is still sufficient space for the
MAC header in the output device. If not,
skb_realloc_headroom() creates sufficient space.
12Recall pkt_type in the sk_buff Structure
- pkt_type specifies the type of a packet
- PACKET_HOST a packet sent to the local host
- PACKET_BROADCAST a broadcast packet
- PACKET_MULTICAST a multicast packet
- PACKET_OTHERHOSTa packet not destined for the
local host, but received in the promiscuous mode. - PACKET_OTGOING a packet leaving the host
- PACKET_LOOKBACK a packet sent by the local host
to itself.
13Recall the skb structure
sk_buff
sk_buff_head
next
sk_buff
prev
list
stamp
net_device
dev
h
nh
mac
Packet data area
dst
len
...
IP-Header
head
UDP-Header
data
UDP-Data
tail
end
datarefp 1
14Recall How skb is Managed
- skb_cow(skb,headroom) checks whether the passed
socket buffer still has at least headroom bytes
free in the front packet data space. - skb_realloc_headroom(skb,newheadroom) creates a
new socket buffer with a headroom of size
newheadroom.
15ip_forward(skb)
- Step 4 The TTL field of the IP packet is
decremented by 1. - Step 5 If the packet length (including the MAC
header) is too large (skb?len gt mtu) and no
fragmentation is allowed (Dont fragment bit is
set in the IP header), the packet is discarded
and the ICMP message with ICMP_FRAG_NEEDED is
sent back. - Step 6 The netfilter hook NF_IP_FORWARDING is
invoked
16ip_forward_finish(skb)
- If the IP options exist, they are processed in
ip_forward_options(). - ip_send() is invoked to check if the pcket has to
be fragmented. - Either ip_finish_output() or ip_fragment() is
invoked.
17ip_finish_output(skb)
- The skb?dev is pointed to the output network
device dev. - The layer-2 packet type is set to ETH_P_IP.
- The netfilter hook NF_IP_POST_ROUTING is invoked.
18ip_finish_output2(skb)
- If skb?dst already includes a pointer to the
layer 2 header cache (dst?hh), then the layer-2
header is copied directly into the packet data
space of the skb. - Otherwise, the neigh_resolved_output() function
(that implements the ARP) is invoked. - dev_queue_xmit() is invoked to pass the packet
down to the device.
19ip_local_deliver(skb)
- The only task of ip_lcal_deliver(skb) is to
re-assemble fragmented packets by invoking
ip_defrag(). - The netfilter hook NF_IP_LOCAL_IN is invoked.
20ip_local_deliver_finish(skb)
- The protocol ID of the IP header is used to
calcualte the hash value in the ipprot hash
table. - If the corresponding transport protocol can be
found, then the handler is invoked. - tcp_v4_rcv() TCP
- udp_rcv() UDP
- icmp_rcv() IMCP
- igmp_rcv() IGMP
- If no protocol is found, the packet is passed to
a RAW socket (if one exists) or dropped with an
ICMP Destination Unreachable message returned.
21Hash Table ipprot
inet_protocol
udp_rcv()
0
handler
inet_protosMAX_INET_PROTOS
udp_err()
err_handler
next
inet_protocol
protocol IPPROTO_UDP
copy
data
name "UDP"
inet_protocol
igmp_rcv()
1
handler
Null
err_handler
next
protocolIPPROTO_IGMP
copy
data
name "IGMP"
MAX_INET_PROTOS
inet_protocol
22Path of Outgoing Packets
23ip_queue_xmit(skb)
- skb?sk?dst is checked to see if it contains a
pointer to an entry in the routng cache. - All the packets of a socket are routed through
the same path, so storing a pointer to an routing
entry in sk?dst saves expensive routing table
lookup. - If not route is present (e.g., the first packet
of a socket), then ip_route_output() is invoked
to determine a route. - The fields of the IP packet are filled (version,
header length, TOS, fragement offset, TTL,
addresses and protocol). - If IP options exist, ip_options_build() is
invoked. - NF_IP_LOCAL_OUTPUT is invoked.
24ip_queue_xmit2(dev)
- Checks how much headroom is available in he
socket buffer. - The packet is checked for fragmentation and the
checksum is computed (ip_send_check(iph)) - sk?dst?output() causes the ip_output() function
to be invoked. ip_output() invokes the netfilter
NF_IP_POST_ROUTING.
25IP Fragmentation
26IP Fragmentation
- If the packet size is larger than the MTU of the
transmission medium, then the packet has to be
split into smaller packets.
27ip_fragment(skb,output)
- The maximum packet size is computed.
- IP fragments are created in a while loop until
the datagram has been divided into smaller
packets. - For each new IP fragment
- alloc_skb() is used to create a new socket.
- The IP packet header is copied to the fragment,
with the MF bit and the offset field properly
set. - The corresponding payload is copied to the
fragment as well. - If the IP options exist, ip_options_fragment() is
invoked. - ip_send_check() is invoked to compute the IP
checksum. - The original packet is released with kfree_skb().
28Recall the IP Header Looks Like
IP-packet format
0
3
7
15
31
Version
IHL
Codepoint
Total length
Fragment-ID
D F
M F
Fragment-Offset
Time to Live
Protokoll
Checksum
Source address
Destination address
Options and payload
29Reassembling Packets
- Recall that ip_local_deliver() passes all the
fragmented IP packets to ip_defrag(). - The fragments are stored in the fragment cache,
until either all the fragments of a datagram have
arrived, or the maximum wait time (ipfrag_time,
30 seconds) has expired.
30Fragment Cache
ipq
ipq
0
ipq_hashIPQ_HASHSZ
next
next
saddr
...
daddr
fragments
The hash table value is Calculated based on
saddr, daddr, id, and protocol.
id
...
protocol
pprev
last_in
...
fragments
len
sk_buff
sk_buff
meat
A flag that specifies whether all fragments Have
arrived
lock
refcnt
timer
pprev
iif
The length of the original packet
ipq
1
sk_buff
sk_buff
bytes already in the cache
. . .
ipq
ipq
IPQ_HASHZ
sk_buff
sk_buff
sk_buff
31APIs Used for Reassembling Fragments
- ipq_unlink(qp) removes the ipq entry from the
fragment cache - ipq_frag_destroy(qp) releases an ipq fragment
list. First, frag_kfree_skb() releases all the
socket buffers of fragments, and then
frag_free_queue() releases the ipq structure. - ip_expire() is the handling routine for the
timer. - When the timer expires, if all the fragments of
the datagram have not arrived, the entry in the
fragment cache is deleted, and an ICMP message of
the type ICMP_TIME_EXCEEDED is sent back.
32APIs Used for Reassembling Fragments
- ip_frag_create(hash,iph) creates a new entry in
the fragment cache and uses the IP packet header,
iph, of the fragment that just arrived to
initialize the entry. - ip_find(iph) searches the fragment cache for the
ipq entry for an IP datagram with the iph packet
header. - The hash value is calculated from the
sender/destination address, protocol, and
fragment ID. - If no matching entry is found then a new ipq
entry is created in the fragment cache
(ip_frag_create()).
33APIs Used for Reassembling Fragments
- ip_frag_queue(qp,skb) inserts a new fragment,
skb, into the fragment list of a datagram
(represented by the ipq structure pointed to by
qp). - ip_frag_reasm() reassembles all the fragments of
a datagram when qp?len qp?meat.