Title: Block Design Review: Substrate Decap and IPv4 Parse
1Block Design ReviewSubstrate Decap and IPv4
Parse
Brandon Heller bdh4_at_cec.wustl.edu http//www.arl.w
ustl.edu/projects/techX
2Revision History
- 9/26/06 (BDH)
- Released
- 9/28/06 (BDH)
- SD now at 5Gbps
3Contents
- For SD and Parse
- overview
- block diagram
- memory usage
- code locations
- test procedures
- Performance analysis
- Unexpected interactions
- Future work
- slide taken from PlanetLab_Design.ppt
4Substrate Decap
5Substrate Decap
- Main functions
- validate consume Ethernet header
- look up code_option and slice_data_ptr based on
VLAN tag - validate consume substrate UDP/IP headers
- pass relevant fields to IPv4 parse
- Single code path
- NN communication
- Uses 8 threads
- Name change from Demux
- slide taken from PlanetLab_Design.ppt
6IPv4 MR Functional Blocks
DstAddr (6B)
Ethernet Header
SrcAddr (6B)
Type802.1Q (2B)
VLAN (2B)
TypeIP (2B)
Ver/HLen/Tos/Len (4B)
ID/Flags/FragOff (4B)
TTL (1B)
Protocol UDP (1B)
Hdr Cksum (2B)
Dst Addr (4B)
IP Header
Src Addr (4B)
Slice Data Ptr (32b)
IP Options (0-40B)
Src Port (2B)
UDP Header
Dst Port (2B)
UDP length (2B)
UDP checksum (2B)
UDP Payload (MN Packet)
PAD (nB)
Ethernet Trailer
CRC (4B)
- slide taken from PlanetLab_Design.ppt
7Ethernet Validation
- No alignment necessary
- Counters kept in non-VLAN-specific region
- Tests for
- invalid Ethernet packet length
- non-VLAN tag protocol ID
- non-locally-addressed packet
- unrecognized VLAN
8VLAN Table
- code_option 0 implies invalid slice
- on switch for a slice in the data plane
- SD data is currently only counters
- 64B slice data
- SRAM space for all 4096 VLANs
9Substrate UDP/IP Validation
- Header checks per RFC1812
- IP ver other than 4
- invalid header length
- length too small
- IP len doesn't match Enet-deduced IP len
- UDP len doesn't match IP-deduced UDP len
- NOTE need to check Ethernet length, to ensure
that padded 64B packets are using the correct
length
10SD Block Diagram
substrate_decap()
mem access
dl_source()
Signal next ctx
Read Eth/IP Hdrs
DRAM 5 8B reads
NN Dequeue
Validate Ethernet
init signal
Wait for prev ctx
Read VLAN table
SRAM 2 4B reads
Validate IP
Read UDP hdr
DRAM 2 8B reads
Signal next ctx
Validate UDP
NN Enqueue
Prepare ring data
Wait for prev ctx
dl_sink()
- add one 4B SRAM increment per counter (none
currently for common case)
11File locations (in /IPv4_MR/)
- Code
- src/substrate_decap/PL/substrate_decap.c,h
- src/dispatch_loop/PL/substrate_decap_dl.c,h
- src/dispatch_loop/PL/dl_source.c,h
- dl_source() and dl_sink() functions
- adds ordered thread synchronization if the
following defined - DL_ORDERED
- FIRST_ORDERED_ME
- LAST_ORDERED_ME
- src/IXP2XXX_book/Chapter09/ordered_signal.c,h
- functions for ordered thread synchronization
- src/dispatch_loop/PL/nn_rings.c,h
- functions for enqueuing and dequeuing NN ring
data - Data formats
- src/PL/ipv4_common.h
- IP and UDP structure definitions
- src/PL/substrate_common.h
- Ethernet VLAN structure definitions
- src/dispatch_loop/PL/ring_formats.h
12Required Includes
- Files
- IXA_SDK_4.0\microengineC\src\intrinsic.c
- IXA_SDK_4.0\microengineC\src\rtl.c
- Directories
- IXA_SDK_4.0\src\library\microblocks_library\microc
\ - IXA_SDK_4.0\MicroengineC\include\..\..\..\..\
- IXA_SDK_4.0\src\library\dataplane_library\microc\
- These are required to gain access to the buffer
libraries and intrinsic functions!
13SD Initialization
- All memory locations defined in dl_system.h,
incl - locations for MAC address
- IPV4_SD_MAC_ADDR_HI32
- IPV4_SD_MAC_ADDR_LO16
- non-VLAN-specific counters
- IPV4_SD_COUNTERS_BASE
- IPV4_SD_COUNTERS_SIZE
- VLAN table
- IPV4_SD_VLAN_CODE_OPT_TABLE_x (BASE, SIZE,
ENTRY_SIZE) - VLAN-specific memory
- SLICE_DATA_TABLE_x (BASE, SIZE, ENTRY_SIZE,
ENTRY_TOTAL) - IPV4_SD_SLICE_DATA_ENTRY_OFFSET
- At least one slice must be initialized to send
packets - Call init_slice() from system_init.ind
- Currently 0xaaa initialized by default
- All counters zeroed
- SD caches MAC address in registers
- Thread 0 waits for signal from rx
14Substrate Decap Validation
- All validation tests done with 1 thread and
substrate_decap_tests.tcs - Ethernet validation/counter tests
- invalid Ethernet packet length
- non-VLAN tag protocol ID
- non-locally-addressed packet
- unrecognized VLAN
- UDP/IP validation/counter tests
- IP ver other than 4
- invalid header length
- length too small
- IP len doesn't match Enet-deduced IP len
- UDP len doesn't match IP-deduced UDP len
- Watched counters for proper number of increments
- Fully valid packet vlan_ip_udp_ip_udp/tcp
(speed_test_all_valid.tcs) - Verified all fields of output ring data were as
expected - Single-thread plus 8-thread
- Hardware testing
15SD Other
- Bugs
- substrate IP proto not checked, should correspond
to UDP - Untested
- buffer drops
- Data Structures
- substrate_decap_vlan_table_entry_t
- substrate_decap_stats_t
- substrate_decap_vlan_stats_t
- vlan_ip_header
- ipv4_header_struct
- vlan_header_struct
- udp_header
- Performance
- coming later
16IPv4 Parse
17IPv4 Parse
- Main functions
- Read/align IP header
- Validate and consume IP header (per RFC1812
5.2.2) - Update IP header
- Dec TTL
- Recalc IP checksum
- Write updated checksum to DRAM
- Read/align L4 (UDP/TCP/other) header
- Mark exceptions for Header Format
- Extract fields for Lookup
- slide taken from PlanetLab_Design.ppt
18IPv4 MR Functional Blocks
Buf Handle(32b)
MN Frm Offset (16b)
MN Frm Length(16b)
Rx UDP DPort (16b)
Slice ID (VLAN) (16b)
Rx IP SAddr (32b)
Reserved (12b)
Rx UDP SPort (16b)
Code (4b)
Slice Data Ptr (32b)
Slice Data Ptr (32b)
Reserved (28b)
Code (4b)
- IPv4 Exception Bits
- Bit 0 TTL 0 or 1
- Bit 1 Options
19IPv4 Internal Header Formats
Zeros (4b)
Type (6b)
Len (6b)
Rx UDP DPort (2B)
Tx UDP DPort (2B)
Tx UDP SPort (2B)
Type Dependent Data (8B)
Tx IP DAddr (4B)
- 4 bits at start discriminate between IPv4 and
internal headers - for more details see planetlab_IPv4_MR_parse_hdr_f
ormat.ppt in bdh4\techx\IPv4_MR_shared
20Parse Validation
- IPv4_parse_tests.tcs
- Invalid internal header
- invalid len for internal header type
- internal header type unknown
- Invalid IPv4 (RFC 1812 checks)
- IP ver other than 4
- invalid header length
- length too small
- SD IP len doesn't match packet IP len
- invalid header checksum
- IPv4 Exceptions
- options flag set in packet
- TTL equals zero
- TTL equals one
- IPv4_parse_valid.tcs
- Fully valid, no-exceptions packets
- from GPE, classify
- from GPE, non-classify
- ingress, TCP
21Parse Block Diagram
ipv4_parse()
mem access
dl_source()
Read Int Hdr
DRAM 2 8B reads
Signal next ctx
Handle Internal
(DRAM 2 8B reads)
NN Dequeue
init signal
Read IP
DRAM 4 8B reads
Wait for prev ctx
Checksum
Validate IP
Signal next ctx
Read L4
DRAM 4 8B reads
NN Enqueue
Handle L4
Wait for prev ctx
Prepare ring data
dl_sink()
- add one 4B SRAM increment per counter (none
currently for common case)
22File locations (in /IPv4_MR/)
- Code
- src/ipv4/PL/ipv4_parsec,h
- src/dispatch_loop/PL/parse_dl.c,h
- src/parse/PL/parse.c,h
- src/dispatch_loop/PL/dl_source.c,h
- dl_source() and dl_sink() functions
- adds ordered thread synchronization if the
following defined - DL_ORDERED
- FIRST_ORDERED_ME
- LAST_ORDERED_ME
- src/IXP2XXX_book/Chapter09/ordered_signal.c,h
- functions for ordered thread synchronization
- src/dispatch_loop/PL/nn_rings.c,h
- functions for enqueuing and dequeuing NN ring
data - Data formats
- src/PL/ipv4_common.h
- IP and UDP structure definitions
- src/dispatch_loop/PL/ring_formats.h
- ring data struct defs
23Parse Initialization
- All memory locations defined in dl_system.h,
incl - VLAN-specific memory
- SLICE_DATA_TABLE_x (BASE, SIZE, ENTRY_SIZE,
ENTRY_TOTAL) - IPV4_PARSE_SLICE_DATA_ENTRY_OFFSET
- At least one slice must be initialized to send
packets - Call init_slice() from system_init.ind
- Currently 0xaaa initialized by default
- All counters zeroed
24Other
- Bugs
- none?
- Untested
- buffer drops
- Unimplemented
- checksum for IP options not handled yet
- Data Structures
- parse_vlan_stats_t
- ipv4_header_struct
- udp_header_struct
- tcp_header_struct
- Performance
- coming next
25Performance
26Packet Sizes
27Cycle Budget (min eth packets)
- To hit 5Gb rate
- 76B per min IPv4 packet (64 min Eth 12B IFS)
- 1.4Ghz clock rate
- 5 Gb/sec 1B/8b packet/76B 8.22 Mp/sec
- 1.4Gcycle/sec 1 sec/ 8.22 Mp 170.3 cycles
per packet - compute budget 170 cycles
- latency budget (threads170)
- 4 threads 680 cycles
- 8 threads 1360 cycles
28Cycle Budget (IPv4 MN packets)
- To hit 5Gb rate
- 90B per min IPv4 packet (78 min IPv4MN 12B IFS)
- 1.4Ghz clock rate
- 5 Gb/sec 1B/8b packet/90B 6.94 Mp/sec
- 1.4Gcycle/sec 1 sec/ 6.94 Mp 201.7 cycles
per packet - compute budget 201 cycles
- latency budget (threads201)
- 4 threads 804 cycles
- 8 threads 1608 cycles
29Performance Anomalies
- Spot the issue!
- these issues have since been fixed!
Substrate Decap
more DRAM contention
unhidden DRAM latency
30Substrate Decap Performance
- Optimized common case (ingress, no options)
- Combined initial header checks
- No options assumed ? single DRAM read
- 153 cycles typical
- 650 cycles latency
- 337 control store instructions
- Expected performance
- (201/153)5Gb 6.5Gb expected performance
- Simulated performance (as of 9/26/2006)
- gt5 Gb, but something else slows down 6Gb input
31SD Optimizations
- possible optimizations
- caching VLAN-to-CodeOption table in Local Memory
- optimize nn_dequeue_incr() via assembly coding
- move VLAN counter computation off fast path?
- use transfer regs directly
- saves 9 cycles
- remove volatile statements
32Parse Performance
- single-threaded
- 380 cycles for computation
- 1708 cycles latency
- 556 control store insts
- Expected performance
- (201/380)5Gb lt3Gb expected performance
- Going to optimize a bit before add all 8 threads
33Parse Optimizations
- possible optimizations
- incremental IPv4 checksum update per RFC1624
- checksum computation in assembler
- optimized 5LW alignment for IP read
- combined initial error-check to optimize common
case - reduces branch delays
- slows down exception path
34Implementation Status
- Parse needs
- error testing
- IP options with checksum
- multithreading
- drop tests
35Image Slide Template
36Text Slide Template
37Extra Slides
38Parse Memory Usage
- Memory reads/writes
- 2 8B DRAM reads unaligned internal header
- 2 8B DRAM reads unaligned internal header
FwdKey - 4 8B DRAM reads unaligned IPv4 header
- 0,6 DRAM reads unaligned IPv4 header options
- 4 8B DRAM reads unaligned L4 header
- 1 SRAM increment per counter
- 1 DRAM write updated TTL and checksum
39Ethernet Validation
- First, read packet from memory, guaranteed
aligned - Not specific to any VLAN - in separate mem area
- For efficiency, can keep counters in LM and
update to RAM when a signal is triggered - typedef struct _substrate_decap_stats_t
-
- unsigned int rx // received
- unsigned int pass // passed to next stage
- unsigned int dropLen // invalid Ethernet
packet length - unsigned int dropTPID // non-VLAN tag
protocol ID - unsigned int dropDst // non-locally-addresse
d packet - unsigned int dropVLAN // unrecognized VLAN
-
- substrate_decap_stats_t
40UDP/IP Validation
- typedef struct _substrate_decap_slice_stats_t
-
- unsigned int dropIPVer // IP ver other than
4 - unsigned int dropHdrLen // invalid header
length - unsigned int dropLenSmall // length too
small - unsigned int dropLenMismatch // IP len
doesn't match Enet IP len - unsigned int dropUDPLen // UDP len
doesn't match IP UDP len - unsigned int pass // passed to
next stage -
- substrate_decap_slice_stats_t
41RFC 1812 5.2.2 IP Header Validation
- The packet length reported by the Link Layer must
be large enough to hold the minimum length legal
IP datagram (20 bytes) - (2) The IP checksum must be correct.
- (3) The IP version number must be 4. If the
version number is not 4 then the packet may be
another version of IP, such as IPng or ST-II. - 4) The IP header length field must be large
enough to hold the minimum length legal IP
datagram (20 bytes 5 words). - (5) The IP total length field must be large
enough to hold the IP datagram header, whose
length is specified in the IP header length
field. - from http//www.faqs.org/rfcs/rfc1812.html