Title: What
1Whats needed to transmit?
- A look at the minimum steps required for
programming our 82573L nic to send packets
2Typical NIC hardware
main memory
packet
nic
TX FIFO
transceiver
buffer
LAN cable
B U S
RX FIFO
CPU
3Quotation
Many companies do an excellent job of providing
information to help customers use their
products... but in the end there's no substitute
for real-life experiments putting together the
hardware, writing the program code, and watching
what happens when the code executes. Then when
the result isn't as expected -- as it often isn't
-- it means trying something else or searching
the documentation for clues.-- Jan Axelson,
author, Lakeview Research (1998)
4Thanks, Intel!?
- Intel Corporation has kindly posted details
online for programming its family of gigabit
Ethernet controllers includes our 82573L
5Our nictx.c module
- Weve created an LKM which has minimal
functionality enough to be sure we know how to
transmit a raw Ethernet packet but we do this
in a forward-looking way so that our source-code
can later be turned into a Linux character-mode
device-driver (once weve also seen how to write
code which allows our nic to receive packets)
6Access to PRO1000 registers
- Device registers are hardware mapped to a range
of addresses in physical memory - We obtain the location (and the length) of this
memory-range from a BAR register in the nic
devices PCI Configuration Space - Then we request the Linux kernel to setup an I/O
remapping of this memory-range to virtual
addresses within kernel-space
7Tx-Desc Ring-Buffer
TDBA base-address
0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x
80
TDH (head)
TDLEN (in bytes)
TDT (tail)
owned by hardware (nic)
owned by software (cpu)
Circular buffer (128-bytes minimum)
8How transmit works
Buffer0
List of Buffer-Descriptors
descriptor0
descriptor1
Buffer1
descriptor2
descriptor3
0
0
0
Buffer2
0
We setup each data-packets that we want to be
transmitted in a Buffer area in ram We also
create a list of buffer-descriptors and inform
the NIC of its location and size Then, when
ready, we tell the NIC to Go! (i.e., start
transmitting), but let us know when these
transmissions are Done
Buffer3
Random Access Memory
9Allocating kernel-memory
- Our 82573L device-driver will need to use a
segment of contiguous physical memory which is
cache-aligned and non-pageable - Such a memory-block can be allocated by using the
kernels kzalloc() function (and it can later
be deallocated using kfree()) - You should use the GFP_KERNEL flag (and we also
used the GFP_DMA flag)
10NIC registers (for transmit)
enum E1000_CTRL 0x0000, // Device
Control E1000_STATUS 0x0008, // Device
Status E1000_TCTL 0x0400, // Transmit
Control E1000_TDBAL 0x3800, // Tx-Descriptor
Base-Address Low E1000_TDBAH 0x3804, //
Tx-Descriptor Base-Address High E1000_TDLEN
0x3808, // Tx-Descriptor queue
Length E1000_TDH 0x3810, // Tx-Descriptor
Head E1000_TDT 0x3818, // Tx-Descriptor
Tail E1000_TXDCTL 0x3828, // Tx-Descriptor
Control E1000_RA 0x5400, // Receive-address
Array
11Device Control (0x0000)
31 30 29 28 27 26
25 24 23 22 21
20 19 18 17 16
PHY RST
VME
R 0
TFCE
RFCE
RST
R 0
R 0
R 0
R 0
R 0
ADV D3 WUC
R 0
D/UD status
R 0
R 0
15 14 13 12 11
10 9 8 7 6 5
4 3 2 1 0
R 0
R 0
R 0
FRC DPLX
FRC SPD
R 0
SPEED
R 0
S L U
R 0
R 0
R 1
0 0
F D
GIO M D
R 0
FD Full-Duplex SPEED (0010Mbps, 01100Mbps,
101000Mbps, 11reserved) GIOMD GIO Master
Disable ADVD3WUP Advertise Cold Wake Up
Capability SLU Set Link Up D/UD Dock/Undock
status RFCE Rx Flow-Control Enable FRCSPD
Force Speed RST Device Reset TFCE Tx
Flow-Control Enable FRCDPLX Force Duplex PHYRST
Phy Reset VME VLAN Mode Enable
82573L
12Device Status (0x0008)
31 30 29 28 27 26
25 24 23 22 21
20 19 18 17 16
?
0
0
0
0
0
0
0
0
0
0
0
GIO Master EN
0
0
0
some undocumented functionality?
15 14 13 12 11
10 9 8 7 6 5
4 3 2 1 0
0
0
0
0
0
PHY RA
ASDV
I L O S
S L U
0
TX OFF
0 0
F D
Function ID
L U
SPEED
FD Full-Duplex LU Link Up TXOFF
Transmission Paused SPEED (0010Mbps,01100Mbps,
101000Mbps, 11reserved) ASDV Auto-negotiation
Speed Detection Value PHYRA PHY Reset Asserted
82573L
13Transmit Control (0x0400)
31 30 29 28 27 26
25 24 23 22 21
20 19 18 17 16
R 0
R 0
R 0
MULR
TXCSCMT
UNO RTX
RTLC
R 0
SW XOFF
COLD (upper 6-bits) (COLLISION DISTANCE)
15 14 13 12 11
10 9 8 7 6 5
4 3 2 1 0
COLD (lower 4-bits) (COLLISION DISTANCE)
0
ASDV
I L O S
S L U
TBI mode
P S P
0 0
R 0
R 0
E N
SPEED
CT (COLLISION THRESHOLD)
EN Transmit Enable SWXOFF Software XOFF
Transmission PSP Pad Short Packets RLTC
Retransmit on Late Collision CT Collision
Threshold (0xF) UNORTX Underrun No
Re-Transmit COLD Collision Distance
(0x3F) TXCSCMT TxDescriptor Minimum
Threshold MULR Multiple Request Support
82573L
14Tx-Descriptor Control (0x3828)
31 30 29 28 27 26
25 24 23 22 21
20 19 18 17 16
0
0
0
0
0
0
0
G R A N
0
0
WTHRESH (Writeback Threshold)
15 14 13 12 11
10 9 8 7 6 5
4 3 2 1 0
0
0
0
FRC DPLX
FRC SPD
0
HTHRESH (Host Threshold)
I L O S
0 0
A S D E
0
L R S T
0 0
0
0
PTHRESH (Prefetch Threshold)
0
0
This register controls the fetching and write
back of transmit descriptors. The three
threshhold values are used to determine when
descriptors are read from, and written to, host
memory. Their values can be in units of cache
lines or of descriptors (each descriptor is 16
bytes), based on the value of the GRAN bit
(0cache lines, 1descriptors). When GRAN 1,
all descriptors are written back (even if not
requested). --Intel manual
Recommended for 82573 0x01010000 (GRAN1,
WTHRESH1)
15An observation
- We notice that the 82573L device retains the
values in many of its internal registers - This fact reduces the programming steps that will
be required to operate our nic on the anchor
cluster machines, since Intels own Linux device
driver (e1000e.ko) has already initialized many
nic registers - But we MAY need to bring eth1 down!
16Using /sbin/ifconfig
- You can use the /sbin/ifconfig command to find
out whether the eth1 interface has been brought
down - /sbin/ifconfig eth1
- If it is still operating, you can turn it off
with the (privileged) command - sudo /sbin/ifconfig eth1 down
17Programming steps
- Detect the presence of the 82573L network
controller (VENDOR_ID, DEVICE_ID) - Obtain the physical address-range where the nics
device-registers are mapped - Ask the kernel to map this address range into the
kernels virtual address-space - Copy the network controllers MAC-address into a
6-byte array for future access - Allocate a block of kernel memory large enough
for our descriptors and buffers - Insure that the network controllers Bus Master
capability has been enabled - Select our desired configuration-options for the
DEVICE CONTROL register - Perform a nic reset operation (by toggling bit
26), then delay until reset completes - Select our desired configuration-options for the
TRANSMIT CONTROL register - Initialize our array of Transmit Descriptors with
the physical addresses of buffers - Initialize the Transmit Engines registers (for
Tx-Descriptor Queue and Control) - Setup the buffer-contents for an Ethernet packet
we want to be transmitted - Enable the Transmit Engine
- Give ownership of a Tx-Descriptor to the
network controller - Install our /proc/nictx pseudo-file (for
user-diagnostic purposes)
18Legacy Tx-Descriptor Layout
31
0
0x0 0x4 0x8 0xC
Buffer-Address low (bits 31..0)
Buffer-Address high (bits 63..32)
CMD
Packet Length (in bytes)
CSO
special
status
CSS
reserved 0
Buffer-Address the packet-buffers 64-bit
address in physical memory Packet-Length
number of bytes in the data-packet to be
transmitted CMD Command-field CSO/CSS
Checksum Offset/Start (in bytes) STA
Status-field
19Suggested C syntax
- typedef struct
- unsigned long long base_address
- unsigned short packet_length
- unsigned char cksum_offset
- unsigned char desc_command
- unsigned char desc_status
- unsigned char cksum_origin
- unsigned short special_info
- TX_DESCRIPTOR
20TxDesc Command-field
7 6 5 4
3 2 1 0
IDE
VLE
DEXT
reserved 0
RS
IC
IFCS
EOP
EOP End Of Packet (1yes, 0no) IFCS Insert
Frame CheckSum (1yes, 0no) provided EOP is
set IC Insert CheckSum (1yes, 0no) as
indicated by CSO/CSS fields RS Report Status
(1yes, 0no) DEXT Descriptor Extension
(1yes, 0no) use 0 for Legacy-Mode VLE
VLAN-Packet Enable (1yes, 0no) provided EOP
is set IDE Interrupt-Delay Enable (1yes,
0no)
21TxDesc Status field
3 2 1 0
reserved 0
LC
EC
DD
DD Descriptor Done this bit is written
back after the NIC processes the descriptor
provided the descriptors RS-bit was set (i.e.,
Report Status) EC Excess Collisions
indicates that the packet has experienced more
than the maximum number of excessive
collisions (as defined by the TCTL.CT
field) and therefore was not transmitted.
(This bit is meaningful only in HALF-DUPLEX
mode.) LC Late Collision indicates that
Late Collision has occurred while operating in
HALF-DUPLEX mode. Note that the collision
window size is dependent on the SPEED
64-bytes for 10/100-MBps, or 512-bytes for
1000-Mbps.
22Bit-mask definitions
- enum
- DD (1ltlt0), // Descriptor Done
- EC (1ltlt1), // Excess Collisions
- LC (1ltlt2), // Late Collision
- EOP (1ltlt0), // End Of Packet
- IFCS (1ltlt1), // Insert Frame CheckSum
- IC (1ltlt2), // Insert CheckSum as per
CSO/CSS - RS (1ltlt3), // Report Status
- DEXT (1ltlt5), // Descriptor Extension
- VLE (1ltlt6), // VLAN packet
- IDE (1ltlt7) // Interrupt-Delay Enable
-
23Ethernet packet layout
- Total size normally can vary from 64 bytes up to
1536 bytes (unless jumbo packets and/or
undersized packets are enabled) - The NIC expects a 14-byte packet header and it
appends a 4-byte CRC check-sum
0 6
12 14
the packets data payload goes here (usually
varies from 56 to 1500 bytes)
destination MAC address (6-bytes)
source MAC address (6-bytes)
Type/length (2-bytes)
Cyclic Redundancy Checksum (4-bytes)
24In-class exercises
- Modify the code in our nictx.c module so that
it will transmit more than just one raw packet
when you install it into the kernel - Can you also modify the module_exit() function
so that it will transmit a packet before it
disables the Transmit Engine?