CSE 58x: Networking Practicum - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

CSE 58x: Networking Practicum

Description:

Custom network-specific instruction set programmed at assembler level ... Transmit and Receive FIFOs to external line cards. 32 m-engine opcodes. ALU instructions ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 37

Provided by: thef

Category:

more less

Transcript and Presenter's Notes

Title: CSE 58x: Networking Practicum

1
CSE 58x Networking Practicum

Instructor Wu-chang Feng
TA Francis Chang

2
About the course

Prerequisite CSE 524 or the equivalent
Implementation-focused course
Intel's IXA network processor platform
Contents
Brief lecture material on network processors and
the IXP
5 weeks of designed laboratories
3 weeks of final projects

3
Modern router architectures

Split into a fast path and a slow path
Control plane
High-complexity functions
Route table management
Network control and configuration
Exception handling
Data plane
Low complexity functions
Fast-path forwarding

4
Router functions

RFC 1812 plus...
Error detection and correction
Traffic measurement and policing
Frame and protocol demultiplexing
Address lookup and packet forwarding
Segmentation, fragmentation, reassembly
Packet classification
Traffic shaping
Timing and scheduling
Queuing
Security

5
Design choices for network products

General purpose processors
Embedded RISC processors
Network processors
Field-programmable gate arrays (FPGAs)
Application-specific integrated circuits (ASICs)

6
General purpose processors (GPP)

Programmable
Mature development environment
Typically used to implement control plane
Too slow to run data plane effectively
Sequential execution
CPU/Network 50x increase over last decade
Memory latencies 2x decrease over last decade
Gigabit ethernet 333 nanosecond per packet
budget
Cache miss 150-200 nanoseconds

7
Embedded RISC processors (ERP)

Same as GPP, but
Slower
Cheaper
Smaller (require less board space)
Designed specifically for network applications
Typically used for control plane functions

8
Application-specific integrated circuits (ASIC)

Custom hardware
Long time to market
Expensive
Difficult to develop and simulate
Not programmable
Not reusable
But, the fastest of the bunch
Suitable for data plane

9
Field Programmable Gate Arrays (FPGA)

Flexible re-programmable hardware
Less dense and slower than ASICs
Cheaper than ASICs
Good for providing fast custom functionality
Suitable for data plane

10
Network processors

The speed of ASICs/FPGAs
The programmability and cost of GPPs/ERPs
Flexible
Re-usable components
Lower cost
Suitable for data plane

11
Network processors

Common features
Small, fast, on-chip instruction stores (no
caching)
Custom network-specific instruction set
programmed at assembler level
What instructions are needed for NPs? Open
question.
Minimality, Generality
Multiple processing elements
Multiple thread contexts per element
Multiple memory interfaces to mask latency
Fast on-chip memory (headers) and slow off-chip
memory (payloads)
No OS, hardware-based scheduling and thread
switching

12
Why network processors?

The propaganda
Take the current vertical network device market
Commoditize horizontal slices of it
PC market
Initially, an IBM custom vertical
Now, a commodity market with Intel providing the
chip-set
Network device market
Draw your own conclusions

13
Network processing approaches
ASIC
FPGA
Network processor
Speed
GPP
Embedded RISC Processor
Programming/Development Ease
14
Network processor architectures

Packet path
Store and forward
Packet payload completely stored in and forwarded
from off-chip memory
Allows for large packet buffers
Re-ordering problems with multiple processing
elements
Intel IXP, Motorola C5
Cut-through
Packet held in an on-chip FIFO and forwarded
through directly
Small packet buffers
Built-in packet ordering
AMCC

15
Network processor architectures

Processing architecture
Parallel
Each element independently performs entire
processing function
Packet re-ordering problems
Larger instruction store needed per element
Pipelined
Each element performs one part of larger
processing function
Communicates result to next processing element in
pipeline
Smaller code space
Packet ordering retained
Deterministic behavior (no memory thrashing)
Hybrid

16
Network processor architectures

Processing hierarchy
ASICs
Embedded RISC processors
Specialized co-processors
See figure 13.7 in book

17
Network processor architectures

Memory hierarchy
Small on-chip memory
Control/Instruction store
Registers
Cache
RAM
Large off-chip memory
Cache
Static RAM
Dynamic RAM

18
Network processor architectures

Internal interconnect
Bus
Cross-bar
FIFO
Transfer registers

19
Network processor architectures

Concurrency
Hardware support for multiple thread contexts
Operating system support for multiple thread
contexts
Pre-emptiveness
Migration support

20
Increasing network processor performance

Processing hierarchy
Increase clock speed
Increase elements
Memory hierarchy
Increase size
Decrease latency
Pipelining
Add hierachies
Add memory bandwidth (parallel stores)
Add functional memory (CAMs)

21
Focus of this class...

Network processors
Intel IXA

22
IXP 1200 features

One embedded RISC processor (StrongARM)
Runs control plane (Linux)
6 programmable packet processors (m-engines)
Runs data plane (m-engine assembler or m-engine
C)
Central hash unit
Multiple, bus interconnects
IXBus (4.4Gbps) to overcome PCI's 2.2Gbps limit
Small on-board memory
Serial interface for control
External interfaces for memory

23
(No Transcript)
24
IXP12xx m-engine
25
IXP2xxx m-engine
26
m-engine functions

Packet ingress from physical layer interface
Checksum verification
Header processing and classification
Packet buffering in memory
Table lookup and forwarding
Header modification
Checksum computation
Packet egress to physical layer interface

27
m-engine characteristics

Programmable microcontroller
Custom RISC instruction set
Private 2048 instruction store per m-engine
(loaded by StrongARM)
5-stage execution pipeline
Hardware support for 4 threads and context
switching
Each m-engine has 4 hardware contexts (mask
memory latency)

28
m-engine characteristics

128 general purpose registers
Can be partitioned or shared
Absolute or context-relative
128 transfer registers
Staging registers for memory transfers
4 blocks of 32 registers
SDRAM or SRAM
Read or Write
Local Control and Status Registers (CSRs)
USTORE instructions, CTX, etc. (p. 315)

29
m-engine characteristics

FBI unit
Scratchpad memory
Hash unit
FBI CSRs
IXBus control
IXBus FIFOs
Transmit and Receive FIFOs to external line cards

30
32 m-engine opcodes

ALU instructions
ALU, ALU_SHF, DBL_SHIFT
Branch/Jump instructions
BR, BR0, BR!0, BR_BSET, BRBYTE, BRCTX,
BR_INP_STATE, BR_!SIGNAL, JUMP, RTN, etc.
Reference instructions
CSR, FAST_WR, LOCAL_CSR_RD, R_FIFO_RD, PCI_DMA,
SCRATCH, SDRAM, SRAM, T_FIFO_WR, etc.
Local register instructions
FIND_BST, IMMED, LD_FIELD, LOAD_ADDR,
LOAD_BSET_RESULT1, etc.

31
32 m-engine functions

Miscellaneous
CTX_ARB
NOP
HASH1_48, HASH1_64, etc.

32
8
9
8
8
9
7. m-engine or StrongARM processing 8. Packet
header read from SDRAM or RFIFO into m-engine
and classified (via SRAM tables) 9. Packet
headers modified 10. mpackets sent to
interface 11. Poll for space on MAC Update
transmit-ready if room for mpacket 12. mpackets
transferred to MAC
1. Packet received on physical interface (MAC) 2.
Ready-bus sequencer polls MAC for mpacket
Updates receive-ready upon a full mpacket 3.
m-engine polls for receive-ready 4. m-engine
instructs FBI to move mpacket from MAC to
RFIFO 5. m-engine moves mpacket directly from
RFIFO to SDRAM 6. Repeat 1-5 until full packet
received
33
Programming the IXP

Focus of this course on steps 7, 8, and 9
2 programming frameworks
Command-line, IXA Active Computing Engine (ACE)
framework
Graphical microengine C development environment

34
Programming the IXP

Command-line, IXA Active Computing Engine (ACE)
framework
Re-usable function blocks chained together to
build an application (Chapters 22-24)
New functions implemented as new blocks in chain
Core ACEs (StrongARM)
Written in C
Microblock ACEs (microengines)
Written in assembler

35
(No Transcript)
36
Programming the IXP