Title: SPP V2 Router Design
1SPP V2 RouterDesign
John DeHart and Mike Wilson
2Revision History
- 3 June 2008
- Initial release, presentation
- 25 June 2008
- Updates on feedback from presentation
3SPP Versions
- SPP Version 0
- What we used for SIGCOMM Paper
- SPP Version 1
- Bare minimum we would need to release something
to PlanetLab Users - SPP Version 2
- What we would REALLY like to release to PlanetLab
users.
4Objectives for SPP-NPE version 2
- Deal with constraints imposed by switch
- can send to only one NPU can receive from only
one NPU - split processing across NPUs
- parsing, lookup on one queueing on other
- Provide more resources for slice-specific
processing - Decouple QM schedulers from links
- collection of largely independent schedulers
- may use several to send to the same link
- e.g. separate rate classes (1-10M, 10-100M,
100-100M) - optionally adjust scheduler rates dynamically
- Provide support for multicast
- requires addition of next-hop IP address after
queueing - Enable single slice to operate at 10 Gb/s
- Support slow code options
- Use separate rate classes to limit rate to slow
code options - LCI QMs for Parse, NPUB QMs for HdrFmt
5SPP Version 2 System Architecture
Default Data Path
GPE Blade
GPE Blade
LC Ingress
Decap Parse Lookup AddShim
NPUA
1 10Gb/s OR 10 1Gb/s
SPI Switch
SPI Switch
Switch Blade
RTM
FIC
FIC
Copy QM HdrFormat
NPUB
LC Egress
NPE 7010 Blade
LC 7010 Blade
6SPP Version 2 System Architecture
Fast-Path Data
GPE Blade
GPE Blade
LC Ingress
Decap Parse Lookup AddShim
NPUA
1 10Gb/s OR 10 1Gb/s
SPI Switch
SPI Switch
Switch Blade
RTM
FIC
FIC
Copy QM HdrFormat
NPUB
LC Egress
NPE 7010 Blade
LC 7010 Blade
7SPP Version 2 System Architecture
Exception Data PathLocal Delivery
GPE Blade
GPE Blade
LC Ingress
Decap Parse Lookup AddShim
NPUA
1 10Gb/s OR 10 1Gb/s
SPI Switch
SPI Switch
Switch Blade
RTM
FIC
FIC
Copy QM HdrFormat
NPUB
LC Egress
NPE 7010 Blade
LC 7010 Blade
8 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM/3
SRAM/0
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
9 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
10PlanetLab NPE Input Frame from LC
- Ethernet Header
- DstAddr MAC address of NPE
- SrcAddr MAC address of LC
- VLAN One VLAN per MR (MR Slice)
- Only use lower 12 bits of Vlan Tag
- IP Header
- Dst Addr IP address of this node
- How many IP Addresses can a NODE have?
- Src Addr IP address of previous hop
- Protocol UDP
- UDP Header
- Dst Port Identifies input tunnel
- Src Port with IP Src Addr identifies sending
entity
DstAddr (6B)
Ethernet Header
SrcAddr (6B)
Type802.1Q (2B)
VLAN (2B)
TypeIP (2B)
Ver/HLen/Tos/Len (4B)
ID/Flags/FragOff (4B)
TTL (1B)
Protocol UDP (1B)
Hdr Cksum (2B)
Src Addr (4B)
IP Header
Dst Addr (4B)
IP Options (0-40B)
Src Port (2B)
UDP Header
Dst Port (2B)
UDP length (2B)
UDP checksum (2B)
UDP Payload (MN Packet)
PAD (nB)
Ethernet Trailer
CRC (4B)
Indicates 8-Byte Boundaries Assuming no IP Options
11 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
Port (4b)
Reserved (12b)
Eth. Frame Len (16b)
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
12RxA
13 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
MN Frm Offset (16b)
MN Frm Length(16b)
Port (4b)
Reserved (12b)
Eth. Frame Len (16b)
SRAM
Rx UDP DPort (16b)
Slice ID (VLAN) (16b)
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
Rx IP SAddr (32b)
Reserved (12b)
Rx UDP SPort (16b)
Code (4b)
Slice Data Ptr (32b)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
14Decap
- Inputs
- Packet from RxA
- Outputs
- Meta-frame (handle, offset and length)
- Slice ID (VLAN tag)
- or is this lower 12b of VLAN tag and lower 4b of
RX DA in? - Metainterface (Rx Saddr, Rx Sport, Rx Dport)
- Code Option (4b, only 16 available)
- Slice data pointer
- Initialization
- VLAN table
- Functionality
- Read VLAN tag from DRAM, determine correct code
option. - Validate packet. Drop invalid, unmatched
packets. - IP Options for NPE dropped in LC, should never
arrive here! - Enqueue valid packets to SRAM ring.
- Update stats
- Status
- Change dl_sink from NN to SRAM.
15VLAN table
VLAN code_opt slice_data_ptr slice_data_size
0 0 0 0
1 0 0 0
0x0aa 1
0x7ff 0 0 0
SD data
P data
- code_option 0 implies invalid slice
- on switch for a slice in the data plane
- SD data is currently only counters
- 64B slice data
- Only use lower 12b of VLAN tag (4096 VLANs)
- Only changes from V1
- No longer need all data on NPUA, drop HF data,
per-slice buffer limits
16 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
MN Frm Length (16b)
MN Frm Offset (16b)
MN Frm Offset (16b)
MN Frm Length(16b)
SRAM
Lookup Key143-112 Type(1b)/Slice ID(15b)/Rx
UDP DPort (16b)
Rx UDP DPort (16b)
Slice ID (VLAN) (16b)
Lookup Key111-80 DA (32b)
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
Rx IP SAddr (32b)
Lookup Key 79-48 SA (32b)
Reserved (12b)
Rx UDP SPort (16b)
Code (4b)
Lookup Key 47-16 Ports (32b)
Exception Bits (12b)
Code (4b)
Lookup Key Proto/TCP_Flags 15- 0 (16b)
Slice Data Ptr (32b)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
17Parse
- Inputs
- Meta-frame (handle, offset and length)
- Slice ID (VLAN tag)
- Tunnel ID (Rx Saddr, Rx Sport, Rx Dport)
- Code Option (4b, only 16 available)
- Slice data pointer
- Outputs
- Meta-frame (handle, offset and length)
- Lookup key (Includes slice ID, Rx UDP dport)
- Change to include lower 4b of RX DA in shave
VLAN bits for the SliceID. - Code Option (4b, only 16 available)
- Exception bits (MN-specific)
- Initialization
- Slice Data
- Functionality
- Slice-specific processing
- Parse meta-frame.
- Extract lookup key.
- Raise any relevant exceptions.
18 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
MN Frm Length (16b)
MN Frm Offset (16b)
SRAM
Slice ID (VLAN) (16b)
Lookup Key143-112 Type(1b)/Slice ID(15b)/Rx
UDP DPort (16b)
Exception Bits (12b)
Code (4b)
Result Index (32b)
Lookup Key111-80 DA (32b)
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
Lookup Key 79-48 SA (32b)
Lookup Key 47-16 Ports (32b)
Lookup Key Proto/TCP_Flags 15- 0 (16b)
Exception Bits (12b)
Code (4b)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
19LookupA
- Inputs
- Meta-frame (handle, offset and length)
- Lookup key (Includes slice ID, Rx UDP dport)
- Slice ID (VLAN tag)
- Code Option (4b, only 16 available)
- Outputs
- Meta-frame (handle, offset and length)
- Lookup Result (Index into SRAM table on NPUB)
- 32b is overkill some of these bits are reserved.
- Slice ID (VLAN tag)
- Code Option (4b, only 16 available)
- Exception bits (from Parse)
- Stats Index (from TCAM)
- Initialization
- Filters set in TCAM by control
- Functionality
- Look up key in TCAM
- On miss, drop the packet
- Status
20 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
Slice ID (VLAN) (16b)
Exception Bits (12b)
Code (4b)
Result Index (32b)
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
21AddShim
- Inputs
- Meta-frame (handle, offset and length)
- Lookup Result (Index into SRAM table on NPUB)
- Slice ID (VLAN tag)
- Code Option (4b, only 16 available)
- Exception bits (from Parse)
- Stats Index (from TCAM)
- Outputs
- Shim Packet (buffer handle)
- Buffer descriptor contains updated offset and
length, if needed - Initialization
- None.
- Functionality
- Prepend shim header to preserve packet
annotations across NPUs - Overwrite the existing ethernet header (Up to
18B) with - Slice ID (16b)
- Code Option (4b)
- Exception Bits (12b)
- MN Frame Offset (16b)
22 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
23TxA
- Sends shim packet to NPUB.
- Unmodified 10 Gbps Tx 2ME.
24SPP Version2 NPUA to NPUB Frame
- SHIM (16B)
- Slice ID (16b)
- Code Option (4b)
- Exception Bits (12b)
- Result Index (32b)
- Stats Index (16b)
- Offset of MN Packet (16b)
- Memory Alignment Padding (4B)
- IP Header, UDP Header may be overwritten by
- opaque slice data, written in Parse
SHIM (16B)
TypeIP (2B)
Ver/HLen/Tos/Len (4B)
ID/Flags/FragOff (4B)
TTL (1B)
Protocol UDP (1B)
Hdr Cksum (2B)
Src Addr (4B)
IP Header
Dst Addr (4B)
IP Options (0-40B)
Src Port (2B)
UDP Header
Dst Port (2B)
UDP length (2B)
UDP checksum (2B)
UDP Payload (MN Packet)
PAD (nB)
Ethernet Trailer
CRC (4B)
Indicates 8-Byte Boundaries Assuming no IP Options
25 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
26RxB
27 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
Reserved (12b)
StatsA (1 ME)
SRAM
TCAM
Frame Length (16b)
Stats Index (16b)
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
28LookupB/Copy
- Inputs
- Shim packet (buffer handle, frame length)
- Outputs
- Packet (buffer handle, frame length)
- QueueID (QM, Scheduler, Queue ID)
- Stats Index
- Initialization
- ResultTable
- Functionality (Overview)
- Copy shim header into buffer descriptor
- Look up routing information from result index
- If multicast, make the copies
- Enqueue to correct QM (from ResultTable)
29LookupB/Copy Code Sketch
- if not currently processing mcast packet
- read packet from SRAM ring
- extract shim
- load ResultTable value
- fill buffer descriptor
- if unicast
- if per-slice packet limit permits
- update per-slice packet count
- write to SRAM ring for correct QM. (By qmschedID
in result table value). - else drop buffer
- else
- start mcast processing
- if per-slice packet limit permits
- update per-slice packet count
- fetch first header buffer descriptor
- if payload length ? 0
- write ref count into payload descriptor
- else drop payload buffer
- else
30ResultTable Unicast
- Data needed to enqueue, rewrite packet
- QID
- QMID, SchedID, QID (20b) (Lookup Result)
- Src MI
- IP Saddr (32b) (Per SchedID Table)
- UDP Sport (16b) (Lookup Result)
- Tunnel Next Hop
- IP DAddr (32b) (Lookup Result)
- IP DPort (16b)(Lookup Result)
- Chassis Addressing
- Ethernet Dst MAC (48b) (Per SchedID Table)
- Slice Specific Lookup Result Data (?) (Lookup
Result) - Ethernet Src MAC
- Should be constant across all pkts.
Results Entry
QID (20b)
IP DAddr (32b)
UDP DPort (16b)
UDP SPort (16b)
HFIndex (16b)
31ResultTable Multicast
- Fanout gives the number of copies (0..15)
- Data needed per copy on NPUB
- QID
- QMID, SchedID, QID (20b) (Lookup Result)
- Src MI
- IP Saddr (32b) (Per SchedID Table)
- UDP Sport (16b) (Lookup Result)
- Tunnel Next Hop
- IP DAddr (32b) (Lookup Result)
- IP DPort (16b)(Lookup Result)
- Chassis Addressing
- Ethernet Dst MAC (48b) (Per SchedID Table)
- Slice Specific Lookup Result Data (?) (Lookup
Result) - Ethernet Src MAC
- Should be constant across all pkts.
- Support Multicast but optimize for Unicast
Results Entry
Fanout (4b)
QID (20b)
IP DAddr (32b)
16
UDP DPort (16b)
UDP SPort (16b)
HFIndex (16b)
32 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
33QM
- No change from V1
- Incorporates recent change to limit queues by
pkts - Some changes in how control allocates bandwidth
- Need to ensure that slow HdrFmt blocks cant tie
up the system - Currently looking at worst-case engineering
- (everyone runs at slowest block speed)
34 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
35HdrFmt / SubEncap
- Inputs
- Buffer Handle
- Remaining inputs from Buffer Descriptor
- Multicast or Unicast (from buffer_next)
- Frame length, offset
- HFIndex (index into HFTable, a slice-specific
table) - QMSchedID (for per-sched lookup in ResultTable)
- Outputs
- Packet (buffer handle)
- Buffer descriptor contains updated offset and
length - Initialization
- HFTable, containing slice-specific data. For
IPv4, this contains next-hop information (for
both multicast and unicast traffic). - Functionality
- Substrate level
- read buffer descriptor and pass frame offset,
length, HFIndex, mcast/ucast to slice-specific
HdrFmt - Slice level arbitrary processing.
- For IPv4, this writes the next-hop information.
- Returns new offset, length of frame.
- Substrate level
36 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
37Scr2NN/FreelistMgr
- Inputs
- Buffer Handle (possibly chained)
- Outputs
- Buffer Handle (possibly chained)
- Initialization
- None
- Functionality
- Combines Freelist Manager with Scr2NN glue
- FM Read from scratch ring. Free buffers,
correctly handling chained buffers and reference
counts. - Scr2NN Read from Scratch, write to NN.
- Status
- Both blocks exist, but combining them is not
straight-forward. - Open question how should we prioritize among
these tasks? The author should ensure that no
deadlock is possible. (TxB writes to FM if FM
ring is full, TxB stalls. If Scr2NN is writing
to TxB, it stalls. Gridlock.)
38 NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
39TxB
- Must support chained buffers
- Multicast uses header buffers and payload buffers
- Headers are slice-specific we cant rely on
known, static lengths as we did in ONL. - Sends header from one buffer, payload from
chained buffer. - Can TX do this? Comments in the code seem to
imply that chained (non-SOP) buffers must start
at offset 0. Our payloads usually wont. - According to DZar, this will probably take some
TX modification, but theres no reason why it
wont work. Might have a performance penalty, of
course.
40SPP V2 SideB SRAM Buffer Descriptor
Buffer_Next (32b)
LW0
Buffer_Size (16b)
LW1
Reserved (4b)
Free_list 0000 (4b)
Ref_Cnt (8b)
Packet_Size (16b)
LW2
Slice ID(xsid)(12b)
Stats Index (16b)
Reserved (4b)
LW3
MR Exception Bits (16b)
HFIndex (16b)
LW4
ResultIndex (32b)
LW5
MR Bits (optional) (32b)
LW6
Packet_Next (32b)
LW7
- HFIndex is an index into the HFTable. For IPv4,
this provides Next Hop information. - ResultIndex is used to get tunnel header info
from the ResultTable
41Design Questions
- Small hole for abuse in HdrFmt
- QM rate limits on payload length
- HdrFmt (after QM) can vastly increase packet
length - Should the LookupB table give the padding size
for each entry? Enforced in SubEncap? - ANSWER No, we will resort to our control of
HdrFmt to force it to behave. (We write all of
the code options right now.) - What are the best places to update stats on NPUB?
- ANSWER Post-Q only
- Is there any remaining reason that NPUB would
need the source tunnel information? - ANSWER No. If a code option needs it, put it
into opaque slice data. - Still working out remaining data areas.
42Extra Slides
- The rest of the slides are old or for extra
information
43Questions/Issues
- 4/28/08
- How many code options?
- Limit of 16?
- To handle slow Code Options
- LCI Queues would control traffic to Fast/Slow
Parse Code - Classes of code options defined by how long their
Parse code takes. - Scheduler assigned to a class of code option.
- NPE Queues would control traffic to Fast/Slow HF
Code - LCE Queues control the output rate to Interfaces.
- Multicast Problems
- Impact of multicast traffic overloading
Lookup/Copy and becoming a bottleneck. - Rx on SideB, can it use SRAM output ring?
- All our other 10G Rxs have NN output ring.
- Option for HF to send out additional pkts?
- How to pass MR and substrate hdrs to TxB?
- Through Ring or through Hdr Buffer associated
with Hdr Buffer descriptor. - If the latter then what are the constraints in Tx
for buffer chaining?
44Meeting Notes
- 1/15/08
- QM Add Pkt count to Queue Params, change limit
from QLen to PktCount - Add Per Slice Pkt limit to NPUA and NPUB
- Limit Fanout to 16
- MCast Control will allocate all 16 entries for a
multicast result entry, result entry will be
typed as multicast or unicast and will not
transition from one to the other. - What happens to pkts in queues when there is a
route change that sends that flows pkts to a
different interface and queue? Pkt ordering
problems?
45 NPE Version 2 Block Diagram
slice, resultIndx, etc, passed in shim
Lookup produces resultIndx, statsIndx
SRAM
SRAM
NPUA
GPE
Decap, Parse, LookupA, AddShim (8 MEs)
RxA (2 ME)
TxA (2 ME)
Stats (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
Stats (1 ME)
flow control?
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt (4 MEs)
for unicast, resultIndx replaced by QiD allowing
output side to skip lookup
SRAM
NPUB
SRAM
Lookup on ltslice, resultIndxgtyields fanout,
list of QiDscopy to queues, adding
copy(slice, resultIndx remain in packet
buffer)
use slice to select slice to format packet use
resultIndx to get next-hop
46Questions/Issues
- Where are exit and entry points for packets sent
to and from the GPE for exception processing? - Parse (NPUA) and LookupA (NPUA) are where most
exceptions are generated - IP Options
- No Route
- Etc.
- HdrFormat (NPUB) is where we do ethernet header
processing - What needs to be in the SHIM going from NPUA to
NPUB? - ResultIndex (32b)
- Exception Bits (12b)
- StatsIndex (16b)
- Slice (12b)
- ???
- Will we support multi-copy in a way similar to
the ONL Router? - How big can the fanout be?
- How many QIDs need to be stored with the LookupB
Result? - Is there some encoding for the QIDs that can take
into account support for multicast and the copy?
For example - Multicast QID(20b)
- Multicast (1b) 1
- Copy (4b)
47 NPE Version 2 Block Diagram
SRAM
SRAM
NPUA
GPE
Decap, Parse, LookupA, AddShim (8 MEs)
RxA (2 ME)
TxA (2 ME)
Stats (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
- NPUA
- RxASame as Version 0
- TxA New 10Gb/s
- Decap Same as Version 0
- Parse Same as Version 0
- New code options?
- LookupA Results will be different from Version 0
- AddSim New
Stats (1 ME)
flow control?
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt (4 MEs)
SRAM
NPUB
SRAM
48 NPE Version 2 Block Diagram
- NPUB
- RxBSame as Version 0
- TxB New 10Gb/s
- with L2 Header coming in on input ring?
- LookupB New
- Copy New, may be able to use some code from ONL
Copy - QM New, decoupled from Links
- HF New, may use some code from Version 0
SRAM
SRAM
NPUA
GPE
Decap, Parse, LookupA, AddShim (8 MEs)
RxA (2 ME)
TxA (2 ME)
Stats (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
Stats (1 ME)
flow control?
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt (4 MEs)
SRAM
NPUB
SRAM
49 NPE Version 2 Block Diagram
NPUA
Sram2NN (1 ME)
SRAM
SRAM
GPE
Decap, Parse, LookupA, AddShim (8 MEs)
RxA (2 ME)
TxA (2 ME)
StatsA (1 ME)
FreeList MgrA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
flow control?
FreeList MgrB (1 ME)
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt (4 MEs)
NPUB has 17 MEs currently speced
SRAM
Scr2NN (1 ME)
SRAM
NPUB
50SPP V2 MR Specific Code
- Where does the MR Specific Code reside in V2
- Parse
- HdrFormat
- What about LookupA and LookupB?
- Lookup is a service provided to the MRs by the
Substrate. - No MR specific code needed in LookupA or LookupB
- What about SideA AddShim?
- The Exception bits that go in the shim are MR
Specific but they should be passed to AddShim and
it will write them into the Shim. - No MR Specific code needed in AddShim.
- What about SideB Copy?
- Is there anything MR specific about setting up
multiple copies of a packet? - There shouldnt be. We will have the Copy block
allocate a new hdr buffer descriptor and link it
to the existing data buffer descriptor and take
care of reference counts. - The actual building of the new header(s) for the
copies will be left to HF. - No MR Specific code needed in Copy.
51SPP V2 Hdr Format
- Lots of changes for HF
- Move behind QM
- More general
- Support multiple source IP Addresses
- General support for Tunnels
- Eventually different kinds of tunnels (UDP/IP,
GRE, )? - Support for Multicast
- Dealing with header buffer descriptors
- Reading Fanout table
- Substrate portion of HF will need to do Decap
type table lookup - Slice ID ? (Code Option, Slice Memory Pointer,
Slice Memory Size) - HF gets a buffer descriptor from the QM
- The Substrate portion of HF must determine
- Code Option (8b)
- Slice ID (12b)
- Location of Next Hop information (20b - 32b)
- LD vs. FWD?
- Stats Index (16b)
- Should HF do this of QM?
52SPP V2 Result
- We need to be much more general in our support
for Tunnels, Interfaces, MetaInterfaces, and Next
Hops. - SideB Result
- Interface
- IP SAddr (32b)
- Eth MAC DAddr (48b) (LC, GPE1, GPE2, , GPEn)
- SchedulerId (8b) which QM should handle pkt
- TxMI
- IP Sport (16b)
- TxNextHop
- IP DAddr (32b)
- IP DPort (16b)
53Data Areas
- Where are the tables and what data is transmitted
from SideA to SideB? - SideA Tables
- Shim between SideA and SideB
- SideB Tables
54Pkt Processing Data and Tables
- SideA
- MR/Slice Table
- Generated by Control
- Used by
- Substrate Decap to retrieve a MR/Slices
parameters - Indexed by SliceId VLAN
- Contains
- Code option
- Slice Memory ptr
- Slice Memory size
- ???
- TCAM
- Generated by Control
- Used by
- LookupA
- Contains
- Key
- Result
55Data Areas
- Shim between SideA and SideB
- Written to DRAM Buffer to be sent from SideA to
SideB - Contains
- resultIndex (32b)
- Generated by Control
- Result of TCAM lookup on SideA
- Translates into an SRAM Address on SideB
- exceptionBits (12b)
- Generated by SideA Parse/Lookup
- Used by
- SideB HF
- statsIndex (16b)
- Generated by Control
- Result of TCAM lookup on SideA
- Used by
- SideA Lookup/AddShim to increment counters
- SideB Lookup/Copy to increment PreQ Cntrs (or
perhaps SideA is the PreQ cntrs) - SideB HF or QM to increment PostQ Cntrs
- sliceId (12b)
56Data Areas
- SideB
- Data Buffer Descriptor
- Hdr Buffer Descriptor
- Used for multi-copy packets
- SPP V2 may require Tx to handle multi-buffer
packets. - It is unclear if we can cleanly do that same
thing that we do with ONL where HF passes the
Ethernet header to Tx. - We may also need to have support for MR specific
per copy data - Results Table
- Generated by Control
- Used by
- LookupB/Copy
- HF
- Should HF get its per copy info from here as
well. - Contains
- Fanout (if fanout is gt 1 we can overload some of
the following fields with a pointer into a Fanout
table) - QID
- InterfaceId
- TxMI Id
- Probably doesnt help to make it an index into a
table for UDP Tunnels since UDP Port is 16 bits
57Data Areas (continued)
- SideB (continued)
- Fanout Table
- Generated by Control
- Used by
- LookupB/Copy
- HF
- Contains
- QIDFanout
- InterfaceId
- TxMI Id
- Tx Next Hop IDFanout
- Implementation Choices
- One contiguous block of memory
- Fixed size or variable sized
- Chained with one set of values per entry
- Chained with N (N4?) sets of values per entry