SPP V2 Router Design - PowerPoint PPT Presentation

About This Presentation
Title:

SPP V2 Router Design

Description:

Bare minimum we would need to release something to PlanetLab Users. SPP Version 2: ... Change to include lower 4b of RX DA in; shave VLAN bits for the SliceID. ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 58
Provided by: kareny
Category:
Tags: spp | design | router | shave

less

Transcript and Presenter's Notes

Title: SPP V2 Router Design


1
SPP V2 RouterDesign
John DeHart and Mike Wilson
2
Revision History
  • 3 June 2008
  • Initial release, presentation
  • 25 June 2008
  • Updates on feedback from presentation

3
SPP Versions
  • SPP Version 0
  • What we used for SIGCOMM Paper
  • SPP Version 1
  • Bare minimum we would need to release something
    to PlanetLab Users
  • SPP Version 2
  • What we would REALLY like to release to PlanetLab
    users.

4
Objectives for SPP-NPE version 2
  • Deal with constraints imposed by switch
  • can send to only one NPU can receive from only
    one NPU
  • split processing across NPUs
  • parsing, lookup on one queueing on other
  • Provide more resources for slice-specific
    processing
  • Decouple QM schedulers from links
  • collection of largely independent schedulers
  • may use several to send to the same link
  • e.g. separate rate classes (1-10M, 10-100M,
    100-100M)
  • optionally adjust scheduler rates dynamically
  • Provide support for multicast
  • requires addition of next-hop IP address after
    queueing
  • Enable single slice to operate at 10 Gb/s
  • Support slow code options
  • Use separate rate classes to limit rate to slow
    code options
  • LCI QMs for Parse, NPUB QMs for HdrFmt

5
SPP Version 2 System Architecture
Default Data Path
GPE Blade
GPE Blade
LC Ingress
Decap Parse Lookup AddShim
NPUA
1 10Gb/s OR 10 1Gb/s
SPI Switch
SPI Switch
Switch Blade
RTM
FIC
FIC
Copy QM HdrFormat
NPUB
LC Egress
NPE 7010 Blade
LC 7010 Blade
6
SPP Version 2 System Architecture
Fast-Path Data
GPE Blade
GPE Blade
LC Ingress
Decap Parse Lookup AddShim
NPUA
1 10Gb/s OR 10 1Gb/s
SPI Switch
SPI Switch
Switch Blade
RTM
FIC
FIC
Copy QM HdrFormat
NPUB
LC Egress
NPE 7010 Blade
LC 7010 Blade
7
SPP Version 2 System Architecture
Exception Data PathLocal Delivery
GPE Blade
GPE Blade
LC Ingress
Decap Parse Lookup AddShim
NPUA
1 10Gb/s OR 10 1Gb/s
SPI Switch
SPI Switch
Switch Blade
RTM
FIC
FIC
Copy QM HdrFormat
NPUB
LC Egress
NPE 7010 Blade
LC 7010 Blade
8
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM/3
SRAM/0
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
9
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
10
PlanetLab NPE Input Frame from LC
  • Ethernet Header
  • DstAddr MAC address of NPE
  • SrcAddr MAC address of LC
  • VLAN One VLAN per MR (MR Slice)
  • Only use lower 12 bits of Vlan Tag
  • IP Header
  • Dst Addr IP address of this node
  • How many IP Addresses can a NODE have?
  • Src Addr IP address of previous hop
  • Protocol UDP
  • UDP Header
  • Dst Port Identifies input tunnel
  • Src Port with IP Src Addr identifies sending
    entity

DstAddr (6B)
Ethernet Header
SrcAddr (6B)
Type802.1Q (2B)
VLAN (2B)
TypeIP (2B)
Ver/HLen/Tos/Len (4B)
ID/Flags/FragOff (4B)
TTL (1B)
Protocol UDP (1B)
Hdr Cksum (2B)
Src Addr (4B)
IP Header
Dst Addr (4B)
IP Options (0-40B)
Src Port (2B)
UDP Header
Dst Port (2B)
UDP length (2B)
UDP checksum (2B)
UDP Payload (MN Packet)
PAD (nB)
Ethernet Trailer
CRC (4B)
Indicates 8-Byte Boundaries Assuming no IP Options
11
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
Port (4b)
Reserved (12b)
Eth. Frame Len (16b)
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
12
RxA
  • No change from V1

13
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
MN Frm Offset (16b)
MN Frm Length(16b)
Port (4b)
Reserved (12b)
Eth. Frame Len (16b)
SRAM
Rx UDP DPort (16b)
Slice ID (VLAN) (16b)
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
Rx IP SAddr (32b)
Reserved (12b)
Rx UDP SPort (16b)
Code (4b)
Slice Data Ptr (32b)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
14
Decap
  • Inputs
  • Packet from RxA
  • Outputs
  • Meta-frame (handle, offset and length)
  • Slice ID (VLAN tag)
  • or is this lower 12b of VLAN tag and lower 4b of
    RX DA in?
  • Metainterface (Rx Saddr, Rx Sport, Rx Dport)
  • Code Option (4b, only 16 available)
  • Slice data pointer
  • Initialization
  • VLAN table
  • Functionality
  • Read VLAN tag from DRAM, determine correct code
    option.
  • Validate packet. Drop invalid, unmatched
    packets.
  • IP Options for NPE dropped in LC, should never
    arrive here!
  • Enqueue valid packets to SRAM ring.
  • Update stats
  • Status
  • Change dl_sink from NN to SRAM.

15
VLAN table
VLAN code_opt slice_data_ptr slice_data_size
0 0 0 0
1 0 0 0

0x0aa 1

0x7ff 0 0 0

SD data
P data
  • code_option 0 implies invalid slice
  • on switch for a slice in the data plane
  • SD data is currently only counters
  • 64B slice data
  • Only use lower 12b of VLAN tag (4096 VLANs)
  • Only changes from V1
  • No longer need all data on NPUA, drop HF data,
    per-slice buffer limits

16
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
MN Frm Length (16b)
MN Frm Offset (16b)
MN Frm Offset (16b)
MN Frm Length(16b)
SRAM
Lookup Key143-112 Type(1b)/Slice ID(15b)/Rx
UDP DPort (16b)
Rx UDP DPort (16b)
Slice ID (VLAN) (16b)
Lookup Key111-80 DA (32b)
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
Rx IP SAddr (32b)
Lookup Key 79-48 SA (32b)
Reserved (12b)
Rx UDP SPort (16b)
Code (4b)
Lookup Key 47-16 Ports (32b)
Exception Bits (12b)
Code (4b)
Lookup Key Proto/TCP_Flags 15- 0 (16b)
Slice Data Ptr (32b)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
17
Parse
  • Inputs
  • Meta-frame (handle, offset and length)
  • Slice ID (VLAN tag)
  • Tunnel ID (Rx Saddr, Rx Sport, Rx Dport)
  • Code Option (4b, only 16 available)
  • Slice data pointer
  • Outputs
  • Meta-frame (handle, offset and length)
  • Lookup key (Includes slice ID, Rx UDP dport)
  • Change to include lower 4b of RX DA in shave
    VLAN bits for the SliceID.
  • Code Option (4b, only 16 available)
  • Exception bits (MN-specific)
  • Initialization
  • Slice Data
  • Functionality
  • Slice-specific processing
  • Parse meta-frame.
  • Extract lookup key.
  • Raise any relevant exceptions.

18
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
MN Frm Length (16b)
MN Frm Offset (16b)
SRAM
Slice ID (VLAN) (16b)
Lookup Key143-112 Type(1b)/Slice ID(15b)/Rx
UDP DPort (16b)
Exception Bits (12b)
Code (4b)
Result Index (32b)
Lookup Key111-80 DA (32b)
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
Lookup Key 79-48 SA (32b)
Lookup Key 47-16 Ports (32b)
Lookup Key Proto/TCP_Flags 15- 0 (16b)
Exception Bits (12b)
Code (4b)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
19
LookupA
  • Inputs
  • Meta-frame (handle, offset and length)
  • Lookup key (Includes slice ID, Rx UDP dport)
  • Slice ID (VLAN tag)
  • Code Option (4b, only 16 available)
  • Outputs
  • Meta-frame (handle, offset and length)
  • Lookup Result (Index into SRAM table on NPUB)
  • 32b is overkill some of these bits are reserved.
  • Slice ID (VLAN tag)
  • Code Option (4b, only 16 available)
  • Exception bits (from Parse)
  • Stats Index (from TCAM)
  • Initialization
  • Filters set in TCAM by control
  • Functionality
  • Look up key in TCAM
  • On miss, drop the packet
  • Status

20
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
Slice ID (VLAN) (16b)
Exception Bits (12b)
Code (4b)
Result Index (32b)
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
21
AddShim
  • Inputs
  • Meta-frame (handle, offset and length)
  • Lookup Result (Index into SRAM table on NPUB)
  • Slice ID (VLAN tag)
  • Code Option (4b, only 16 available)
  • Exception bits (from Parse)
  • Stats Index (from TCAM)
  • Outputs
  • Shim Packet (buffer handle)
  • Buffer descriptor contains updated offset and
    length, if needed
  • Initialization
  • None.
  • Functionality
  • Prepend shim header to preserve packet
    annotations across NPUs
  • Overwrite the existing ethernet header (Up to
    18B) with
  • Slice ID (16b)
  • Code Option (4b)
  • Exception Bits (12b)
  • MN Frame Offset (16b)

22
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
23
TxA
  • Sends shim packet to NPUB.
  • Unmodified 10 Gbps Tx 2ME.

24
SPP Version2 NPUA to NPUB Frame
  • SHIM (16B)
  • Slice ID (16b)
  • Code Option (4b)
  • Exception Bits (12b)
  • Result Index (32b)
  • Stats Index (16b)
  • Offset of MN Packet (16b)
  • Memory Alignment Padding (4B)
  • IP Header, UDP Header may be overwritten by
  • opaque slice data, written in Parse

SHIM (16B)
TypeIP (2B)
Ver/HLen/Tos/Len (4B)
ID/Flags/FragOff (4B)
TTL (1B)
Protocol UDP (1B)
Hdr Cksum (2B)
Src Addr (4B)
IP Header
Dst Addr (4B)
IP Options (0-40B)
Src Port (2B)
UDP Header
Dst Port (2B)
UDP length (2B)
UDP checksum (2B)
UDP Payload (MN Packet)
PAD (nB)
Ethernet Trailer
CRC (4B)
Indicates 8-Byte Boundaries Assuming no IP Options
25
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
26
RxB
  • No change from V1

27
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
Reserved (12b)
StatsA (1 ME)
SRAM
TCAM
Frame Length (16b)
Stats Index (16b)
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
28
LookupB/Copy
  • Inputs
  • Shim packet (buffer handle, frame length)
  • Outputs
  • Packet (buffer handle, frame length)
  • QueueID (QM, Scheduler, Queue ID)
  • Stats Index
  • Initialization
  • ResultTable
  • Functionality (Overview)
  • Copy shim header into buffer descriptor
  • Look up routing information from result index
  • If multicast, make the copies
  • Enqueue to correct QM (from ResultTable)

29
LookupB/Copy Code Sketch
  • if not currently processing mcast packet
  • read packet from SRAM ring
  • extract shim
  • load ResultTable value
  • fill buffer descriptor
  • if unicast
  • if per-slice packet limit permits
  • update per-slice packet count
  • write to SRAM ring for correct QM. (By qmschedID
    in result table value).
  • else drop buffer
  • else
  • start mcast processing
  • if per-slice packet limit permits
  • update per-slice packet count
  • fetch first header buffer descriptor
  • if payload length ? 0
  • write ref count into payload descriptor
  • else drop payload buffer
  • else

30
ResultTable Unicast
  • Data needed to enqueue, rewrite packet
  • QID
  • QMID, SchedID, QID (20b) (Lookup Result)
  • Src MI
  • IP Saddr (32b) (Per SchedID Table)
  • UDP Sport (16b) (Lookup Result)
  • Tunnel Next Hop
  • IP DAddr (32b) (Lookup Result)
  • IP DPort (16b)(Lookup Result)
  • Chassis Addressing
  • Ethernet Dst MAC (48b) (Per SchedID Table)
  • Slice Specific Lookup Result Data (?) (Lookup
    Result)
  • Ethernet Src MAC
  • Should be constant across all pkts.

Results Entry
QID (20b)
IP DAddr (32b)
UDP DPort (16b)
UDP SPort (16b)
HFIndex (16b)
31
ResultTable Multicast
  • Fanout gives the number of copies (0..15)
  • Data needed per copy on NPUB
  • QID
  • QMID, SchedID, QID (20b) (Lookup Result)
  • Src MI
  • IP Saddr (32b) (Per SchedID Table)
  • UDP Sport (16b) (Lookup Result)
  • Tunnel Next Hop
  • IP DAddr (32b) (Lookup Result)
  • IP DPort (16b)(Lookup Result)
  • Chassis Addressing
  • Ethernet Dst MAC (48b) (Per SchedID Table)
  • Slice Specific Lookup Result Data (?) (Lookup
    Result)
  • Ethernet Src MAC
  • Should be constant across all pkts.
  • Support Multicast but optimize for Unicast

Results Entry
Fanout (4b)
QID (20b)
IP DAddr (32b)
16
UDP DPort (16b)
UDP SPort (16b)
HFIndex (16b)
32
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
33
QM
  • No change from V1
  • Incorporates recent change to limit queues by
    pkts
  • Some changes in how control allocates bandwidth
  • Need to ensure that slow HdrFmt blocks cant tie
    up the system
  • Currently looking at worst-case engineering
  • (everyone runs at slowest block speed)

34
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
35
HdrFmt / SubEncap
  • Inputs
  • Buffer Handle
  • Remaining inputs from Buffer Descriptor
  • Multicast or Unicast (from buffer_next)
  • Frame length, offset
  • HFIndex (index into HFTable, a slice-specific
    table)
  • QMSchedID (for per-sched lookup in ResultTable)
  • Outputs
  • Packet (buffer handle)
  • Buffer descriptor contains updated offset and
    length
  • Initialization
  • HFTable, containing slice-specific data. For
    IPv4, this contains next-hop information (for
    both multicast and unicast traffic).
  • Functionality
  • Substrate level
  • read buffer descriptor and pass frame offset,
    length, HFIndex, mcast/ucast to slice-specific
    HdrFmt
  • Slice level arbitrary processing.
  • For IPv4, this writes the next-hop information.
  • Returns new offset, length of frame.
  • Substrate level

36
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
37
Scr2NN/FreelistMgr
  • Inputs
  • Buffer Handle (possibly chained)
  • Outputs
  • Buffer Handle (possibly chained)
  • Initialization
  • None
  • Functionality
  • Combines Freelist Manager with Scr2NN glue
  • FM Read from scratch ring. Free buffers,
    correctly handling chained buffers and reference
    counts.
  • Scr2NN Read from Scratch, write to NN.
  • Status
  • Both blocks exist, but combining them is not
    straight-forward.
  • Open question how should we prioritize among
    these tasks? The author should ensure that no
    deadlock is possible. (TxB writes to FM if FM
    ring is full, TxB stalls. If Scr2NN is writing
    to TxB, it stalls. Gridlock.)

38
NPE Version 2 Block Diagram
NPUA
SRAM
SRAM
GPE
RxA (2 ME)
AddShim (1 ME)
Decap(1 ME)
Parse (8 ME)
LookupA (1 ME)
TxA (2 ME)
StatsA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt/ SubEncap (4 MEs)
SRAM
Scr2NN/Freelist (1 ME)
SRAM
NPUB
39
TxB
  • Must support chained buffers
  • Multicast uses header buffers and payload buffers
  • Headers are slice-specific we cant rely on
    known, static lengths as we did in ONL.
  • Sends header from one buffer, payload from
    chained buffer.
  • Can TX do this? Comments in the code seem to
    imply that chained (non-SOP) buffers must start
    at offset 0. Our payloads usually wont.
  • According to DZar, this will probably take some
    TX modification, but theres no reason why it
    wont work. Might have a performance penalty, of
    course.

40
SPP V2 SideB SRAM Buffer Descriptor
Buffer_Next (32b)
LW0
Buffer_Size (16b)
LW1
Reserved (4b)
Free_list 0000 (4b)
Ref_Cnt (8b)
Packet_Size (16b)
LW2
Slice ID(xsid)(12b)
Stats Index (16b)
Reserved (4b)
LW3
MR Exception Bits (16b)
HFIndex (16b)
LW4
ResultIndex (32b)
LW5
MR Bits (optional) (32b)
LW6
Packet_Next (32b)
LW7
  • HFIndex is an index into the HFTable. For IPv4,
    this provides Next Hop information.
  • ResultIndex is used to get tunnel header info
    from the ResultTable

41
Design Questions
  • Small hole for abuse in HdrFmt
  • QM rate limits on payload length
  • HdrFmt (after QM) can vastly increase packet
    length
  • Should the LookupB table give the padding size
    for each entry? Enforced in SubEncap?
  • ANSWER No, we will resort to our control of
    HdrFmt to force it to behave. (We write all of
    the code options right now.)
  • What are the best places to update stats on NPUB?
  • ANSWER Post-Q only
  • Is there any remaining reason that NPUB would
    need the source tunnel information?
  • ANSWER No. If a code option needs it, put it
    into opaque slice data.
  • Still working out remaining data areas.

42
Extra Slides
  • The rest of the slides are old or for extra
    information

43
Questions/Issues
  • 4/28/08
  • How many code options?
  • Limit of 16?
  • To handle slow Code Options
  • LCI Queues would control traffic to Fast/Slow
    Parse Code
  • Classes of code options defined by how long their
    Parse code takes.
  • Scheduler assigned to a class of code option.
  • NPE Queues would control traffic to Fast/Slow HF
    Code
  • LCE Queues control the output rate to Interfaces.
  • Multicast Problems
  • Impact of multicast traffic overloading
    Lookup/Copy and becoming a bottleneck.
  • Rx on SideB, can it use SRAM output ring?
  • All our other 10G Rxs have NN output ring.
  • Option for HF to send out additional pkts?
  • How to pass MR and substrate hdrs to TxB?
  • Through Ring or through Hdr Buffer associated
    with Hdr Buffer descriptor.
  • If the latter then what are the constraints in Tx
    for buffer chaining?

44
Meeting Notes
  • 1/15/08
  • QM Add Pkt count to Queue Params, change limit
    from QLen to PktCount
  • Add Per Slice Pkt limit to NPUA and NPUB
  • Limit Fanout to 16
  • MCast Control will allocate all 16 entries for a
    multicast result entry, result entry will be
    typed as multicast or unicast and will not
    transition from one to the other.
  • What happens to pkts in queues when there is a
    route change that sends that flows pkts to a
    different interface and queue? Pkt ordering
    problems?

45
NPE Version 2 Block Diagram
slice, resultIndx, etc, passed in shim
Lookup produces resultIndx, statsIndx
SRAM
SRAM
NPUA
GPE
Decap, Parse, LookupA, AddShim (8 MEs)
RxA (2 ME)
TxA (2 ME)
Stats (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
Stats (1 ME)
flow control?
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt (4 MEs)
for unicast, resultIndx replaced by QiD allowing
output side to skip lookup
SRAM
NPUB
SRAM
Lookup on ltslice, resultIndxgtyields fanout,
list of QiDscopy to queues, adding
copy(slice, resultIndx remain in packet
buffer)
use slice to select slice to format packet use
resultIndx to get next-hop
46
Questions/Issues
  • Where are exit and entry points for packets sent
    to and from the GPE for exception processing?
  • Parse (NPUA) and LookupA (NPUA) are where most
    exceptions are generated
  • IP Options
  • No Route
  • Etc.
  • HdrFormat (NPUB) is where we do ethernet header
    processing
  • What needs to be in the SHIM going from NPUA to
    NPUB?
  • ResultIndex (32b)
  • Exception Bits (12b)
  • StatsIndex (16b)
  • Slice (12b)
  • ???
  • Will we support multi-copy in a way similar to
    the ONL Router?
  • How big can the fanout be?
  • How many QIDs need to be stored with the LookupB
    Result?
  • Is there some encoding for the QIDs that can take
    into account support for multicast and the copy?
    For example
  • Multicast QID(20b)
  • Multicast (1b) 1
  • Copy (4b)

47
NPE Version 2 Block Diagram
SRAM
SRAM
NPUA
GPE
Decap, Parse, LookupA, AddShim (8 MEs)
RxA (2 ME)
TxA (2 ME)
Stats (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
  • NPUA
  • RxASame as Version 0
  • TxA New 10Gb/s
  • Decap Same as Version 0
  • Parse Same as Version 0
  • New code options?
  • LookupA Results will be different from Version 0
  • AddSim New

Stats (1 ME)
flow control?
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt (4 MEs)
SRAM
NPUB
SRAM
48
NPE Version 2 Block Diagram
  • NPUB
  • RxBSame as Version 0
  • TxB New 10Gb/s
  • with L2 Header coming in on input ring?
  • LookupB New
  • Copy New, may be able to use some code from ONL
    Copy
  • QM New, decoupled from Links
  • HF New, may use some code from Version 0

SRAM
SRAM
NPUA
GPE
Decap, Parse, LookupA, AddShim (8 MEs)
RxA (2 ME)
TxA (2 ME)
Stats (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
Stats (1 ME)
flow control?
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt (4 MEs)
SRAM
NPUB
SRAM
49
NPE Version 2 Block Diagram
NPUA
Sram2NN (1 ME)
SRAM
SRAM
GPE
Decap, Parse, LookupA, AddShim (8 MEs)
RxA (2 ME)
TxA (2 ME)
StatsA (1 ME)
FreeList MgrA (1 ME)
SRAM
TCAM
SPI Switch
SPI Switch
Switch Blade
flow control?
FreeList MgrB (1 ME)
StatsB (1 ME)
SRAM
SRAM
QueueManager (4 MEs)
RxB (2 ME)
TxB (2 ME)
LookupBCopy (2 ME)
HdrFmt (4 MEs)
NPUB has 17 MEs currently speced
SRAM
Scr2NN (1 ME)
SRAM
NPUB
50
SPP V2 MR Specific Code
  • Where does the MR Specific Code reside in V2
  • Parse
  • HdrFormat
  • What about LookupA and LookupB?
  • Lookup is a service provided to the MRs by the
    Substrate.
  • No MR specific code needed in LookupA or LookupB
  • What about SideA AddShim?
  • The Exception bits that go in the shim are MR
    Specific but they should be passed to AddShim and
    it will write them into the Shim.
  • No MR Specific code needed in AddShim.
  • What about SideB Copy?
  • Is there anything MR specific about setting up
    multiple copies of a packet?
  • There shouldnt be. We will have the Copy block
    allocate a new hdr buffer descriptor and link it
    to the existing data buffer descriptor and take
    care of reference counts.
  • The actual building of the new header(s) for the
    copies will be left to HF.
  • No MR Specific code needed in Copy.

51
SPP V2 Hdr Format
  • Lots of changes for HF
  • Move behind QM
  • More general
  • Support multiple source IP Addresses
  • General support for Tunnels
  • Eventually different kinds of tunnels (UDP/IP,
    GRE, )?
  • Support for Multicast
  • Dealing with header buffer descriptors
  • Reading Fanout table
  • Substrate portion of HF will need to do Decap
    type table lookup
  • Slice ID ? (Code Option, Slice Memory Pointer,
    Slice Memory Size)
  • HF gets a buffer descriptor from the QM
  • The Substrate portion of HF must determine
  • Code Option (8b)
  • Slice ID (12b)
  • Location of Next Hop information (20b - 32b)
  • LD vs. FWD?
  • Stats Index (16b)
  • Should HF do this of QM?

52
SPP V2 Result
  • We need to be much more general in our support
    for Tunnels, Interfaces, MetaInterfaces, and Next
    Hops.
  • SideB Result
  • Interface
  • IP SAddr (32b)
  • Eth MAC DAddr (48b) (LC, GPE1, GPE2, , GPEn)
  • SchedulerId (8b) which QM should handle pkt
  • TxMI
  • IP Sport (16b)
  • TxNextHop
  • IP DAddr (32b)
  • IP DPort (16b)

53
Data Areas
  • Where are the tables and what data is transmitted
    from SideA to SideB?
  • SideA Tables
  • Shim between SideA and SideB
  • SideB Tables

54
Pkt Processing Data and Tables
  • SideA
  • MR/Slice Table
  • Generated by Control
  • Used by
  • Substrate Decap to retrieve a MR/Slices
    parameters
  • Indexed by SliceId VLAN
  • Contains
  • Code option
  • Slice Memory ptr
  • Slice Memory size
  • ???
  • TCAM
  • Generated by Control
  • Used by
  • LookupA
  • Contains
  • Key
  • Result

55
Data Areas
  • Shim between SideA and SideB
  • Written to DRAM Buffer to be sent from SideA to
    SideB
  • Contains
  • resultIndex (32b)
  • Generated by Control
  • Result of TCAM lookup on SideA
  • Translates into an SRAM Address on SideB
  • exceptionBits (12b)
  • Generated by SideA Parse/Lookup
  • Used by
  • SideB HF
  • statsIndex (16b)
  • Generated by Control
  • Result of TCAM lookup on SideA
  • Used by
  • SideA Lookup/AddShim to increment counters
  • SideB Lookup/Copy to increment PreQ Cntrs (or
    perhaps SideA is the PreQ cntrs)
  • SideB HF or QM to increment PostQ Cntrs
  • sliceId (12b)

56
Data Areas
  • SideB
  • Data Buffer Descriptor
  • Hdr Buffer Descriptor
  • Used for multi-copy packets
  • SPP V2 may require Tx to handle multi-buffer
    packets.
  • It is unclear if we can cleanly do that same
    thing that we do with ONL where HF passes the
    Ethernet header to Tx.
  • We may also need to have support for MR specific
    per copy data
  • Results Table
  • Generated by Control
  • Used by
  • LookupB/Copy
  • HF
  • Should HF get its per copy info from here as
    well.
  • Contains
  • Fanout (if fanout is gt 1 we can overload some of
    the following fields with a pointer into a Fanout
    table)
  • QID
  • InterfaceId
  • TxMI Id
  • Probably doesnt help to make it an index into a
    table for UDP Tunnels since UDP Port is 16 bits

57
Data Areas (continued)
  • SideB (continued)
  • Fanout Table
  • Generated by Control
  • Used by
  • LookupB/Copy
  • HF
  • Contains
  • QIDFanout
  • InterfaceId
  • TxMI Id
  • Tx Next Hop IDFanout
  • Implementation Choices
  • One contiguous block of memory
  • Fixed size or variable sized
  • Chained with one set of values per entry
  • Chained with N (N4?) sets of values per entry
Write a Comment
User Comments (0)
About PowerShow.com