Title: Utilizing NICs enhancements
1Utilizing NICs enhancements
- A look at how driver software needs to change
when using newer features of our hardware
2theory versus practice
- The engineering designs one encounters in
computer hardware components can be observed to
undergo an evolution during successive
iterations, from a scheme that embodies
simplicity, purity, and symmetry at the outset,
based upon what designers think will be the
devices likely uses, to a conglomeration of
disparate add-ons as actual practices dictate
accommodations
3backward compatibility
- An historically important consideration in the
marketing of computer hardware has been the need
to maintain past functions in a transparent
manner i.e., no change is needed to run older
software on newer equipment, while offering
enhancements as options that can be selectively
enabled
4Example Intels x86
- The current generation of Intel CPUs will still
execute all of the software written for PCs a
quarter-century ago based on a small set of
16-bit registers, a restricted set of
instructions, and a one-megabyte memory-space
but is able, as an option, to use more and larger
registers (64-bits), richer instruction-sets, and
more memory
5Gigabit NICs
- Intels network controller designs exhibit this
same kind of evolution over time - The Legacy descriptor-formats are just one
example of keeping prior-generation
functionality its simple, its pure (i.e.,
not tied to any specific network-protocols, but
emphasizing mechanism, not policy) - But now alternatives exist -- as options!
6Legacy RX-Descriptors
The device-driver initializes this base-address
field with the physical address of a
packet-buffer and network hardware does not
ever modify it
Base-address (64-bits)
status
Packet- length
Packet- checksum
VLAN tag
errors
The network controller later will
write-back values into all these fields
when it has finished transferring a
received packets data into that packet-buffer
7RxDesc Status-field
7 6 5 4
3 2 1 0
PIF
IPCS
TCPCS
VP
IXSM
EOP
DD
UDPCS
DD Descriptor Done (1yes, 0no) shows if nic
is finished with descriptor EOP End Of
Packet (1yes, 0no) shows if this packet is
logically last IXSM Ignore Checksum
Indications (1yes, 0no) VP VLAN Packet match
(1yes, 0no) USPCS UDP Checksum calculated in
packet (1yes, 0no) TCPCS TCP Checksum
calculated in packet (1yes, 0no) IPCS IPv4
Checksum calculated on packet (1yes, 0no)
PIF Passed In-exact Filter (1yes, 0no) shows
if software must check
8RxDesc Errors-field
7 6 5 4
3 2 1 0
RXE
IPE
TCPE
reserved (0)
SEQ
SE
CE
reserved (0)
CE CRC Error or Alignment Error (check
statistics registers to differentiate)
TCPE TCP/UDP Checksum Error IPE IPv4
Checksum Error These bits are relevant
only while NIC is operating in SerDes mode SE
Symbol Error SEQ Sequence Error RXE
Rx Data Error
9Extended RX-Descriptors
CPU writes this, NIC reads it
NIC writes this, CPU reads it
Base-address (64-bits)
MRQ (multiple receive queues)
Packet- checksum
IP identification
reserved (0)
Extended status
Packet- length
VLAN tag
Extended errors
The device-driver initializes the
base-address field with the physical address
of a packet-buffer, and it initializes the
reserved field with a zero-value the
network hardware will later modify both fields
The network controller will
write-back the values for these fields
when it has transferred a received packets
data into the packet-buffer
10An alternative option
CPU writes this, NIC reads it
NIC writes this, CPU reads it
Base-address (64-bits)
MRQ (multiple receive queues)
RSS Hash (Receive Side Scaling)
reserved (0)
Extended status
Packet- length
VLAN tag
Extended errors
Receive Side Scaling refers to an optional
capability in the network controller to assist
with routing of network packets to various
CPUs within a modern multiprocessor system (See
Section 3.2.13 in Intels Software Developers
Manual)
11Extended Rx-Status (20-bits)
19 18 17 16 15 14 13 12
11 10 9 8 7 6 5 4
3 2 1 0
0
0
0
0
A C K
0
0
0
0
U D P V
I P I V
0
P I F
I P C S
T C P C S
U D P C S
V P
I X S M
E O P
D D
These extra status-bits provide
additional hardware support to driver
software for processing ethernet packets
that conform to standard TCP/IP network
protocols (with possibilities for future
expansion)
These eight bits have the same meanings as in
a Legacy Rx-Status byte
DD Descriptor Done EOP End Of Packet
IXSM Ignore Checksum Indications VP VLAN
Packet match USPCS UDP Checksum calculated
TCPCS TCP Checksum calculated IPCS IPv4
Checksum calculated PIF Passed In-exact
Filter
ACK TCP ACK-Packet identification UDPV Valid
UDP checksum IPIV Valid IP Identification
12Extended Rx-Errors (12 bits)
11 10 9 8 7
6 5 4 3 2
1 0
RXE
IPE
TCPE
0
0
SEQ
SE
CE
0
0
0
0
These eight bits have the same meanings, and
the occupy the same arrangement, as in
the Legacy Rx-Errors byte
13Main device-driver changes
- If we want to utilize the NICs Extended
Receive Descriptor format, we will need several
significant changes in our driver source-code and
data-types - Our modules initialization of base_address
fields - Our new need for programming register RFCTL
- Our typedef for the RX_DESCRIPTOR structs
- Our get_info_rx() function for /proc/nicrx
display - Our interrupt-handlers treatment of rxring
entries
14Use of C language union
- Each Receive-Descriptor now has a dual
identity, as far as the NIC is concerned - one layout during its fetch from memory
- another layout during write-back to memory
- The C language provides a special type
construction for accommodating this kind of
programming situation, its known as a union
and it requires a special syntax
15Bitfields in C
- Some of the fields in the Extended RX
Descriptor do not align with the CPUs natural
8-bit,16-bit and 32-bit data-sizes - The C language provides bitfields for a
situation like this (not yet standardized)
Extended errors
Extended status
12-bits
20-bits
16Syntax for Rx-Descriptors
typedef struct unsigned long
long base_address unsigned long
long reserved RX_DESC_FETCH typedef
struct unsigned int mrq unsigned
short ip_identification unsigned
short packet_chksum unsigned
int desc_status20 unsigned
int desc_errors12 unsigned
short packet_length unsigned
short vlan_tag RX_DESC_STORE typedef
union RX_DESC_FETCH rxf RX_DESC_STORE rxs
RX_DESCRIPTOR
17RFCTL (0x5008)
The Receive Filter Control register
31
16
reserved (0)
15 14 13 12 11 10 9
8 7 6 5 4 3
2 1 0
E X T E N
IP FRSP _DIS
ACKD _DIS
ACK DIS
IPv6 XSUM _DIS
IPv6 _DIS
NFS_VER
NFSR _DIS
NFSW _DIS
iSCSI_DWC
iSCSI _DIS
EXTEN (bit 15) Extended Status Enable (1yes,
0no) This enables the NIC to write-back the
Extended Status
18Modifying my_read()
- To implement use of Extended Receive
Descriptors in our most recent character-mode
device-driver (i.e., zerocopy.c), we need some
changes in the read() method - Most obvious example a packet-buffers memory
address can no longer be gotten from an
Rx-Descriptors base_address (which now gets
overwritten by the NIC)
19For our pseudo-files sake
- Also our drivers read() function shouldnt
prepare a current rx-descriptor for reuse, as it
did in earlier drivers, since that would destroy
all of the useful information which the NIC has
just written into that descriptor - Instead, the preparation of a descriptor for
reuse in a future packet-receive operation should
be deferred, at least temporarily
20OK, but then when?
- We can reassign the duty to refresh some
Rx-Descriptors for reuse to our drivers
Interrupt Service Routine specifically, at the
point in time when an RXDMT0 event is signaled
(Rx-Descriptor Min-Threshold) - It might be best to create a bottom half to
take care of those re-initializations, but we
havent yet done that in our new prototype
21Handling RXDMT0 interrupts
irqreturn_t my_isr( int irq, void dev_id )
int intr_cause ioread32( io E1000_ICR
) if ( intr_cause (1ltlt4) ) // Rx-Descriptors
Low unsigned int rx_buf virt_to_phys(
rxring ) 16 N_RX_DESC unsigned int rxtail
ioread32( io E1000_RDT ), i, ba //
prepare the next eight Rx-Descriptors for reuse
by the NIC for (i 0 i lt 8 i) ba
rx_buf rxtail RX_BUFSIZ rxring rxtail
.base_address ba rxring rxtail .reserved
0LL rxtail (1 rxtail)
N_RX_DESC // now give the NIC ownership
of these reinitialized descriptors iowrite32(
rxtail, io E1000_RDT )
22extended.c
- Heres our revision of zerocopy.c, aimed at
showing how we can incorporate use of the NICs
Extended Receive Descriptors - It appears to function exactly as before, until a
user attempts to view the drivers
Receive-Descriptor queue - cat /proc/nicrx
- Then we are shown descriptors having two distinct
formats (i.e., FETCH and STORE)
23Demo bitfield.c
- Because the manner in which bitfields are
handled in the C language varies with the
particular C-compiler being used, we have created
a short demo-program that shows us how our GNU
C-compiler gcc handles the layout of bitfields
within a C data-item
typedef struct unsigned int desc_status20 /
/ bits 0..19 unsigned int desc_errors12 //
bits 20..31 RXD_ELT