Title: CSE 58x: Networking Practicum
1CSE 58x Networking Practicum
- Instructor Wu-chang Feng
- TA Francis Chang
2About the course
- Prerequisite CSE 524 or the equivalent
- Implementation-focused course
- Intel's IXA network processor platform
- Contents
- Brief lecture material on network processors and
the IXP - 5 weeks of designed laboratories
- 3 weeks of final projects
3Modern router architectures
- Split into a fast path and a slow path
- Control plane
- High-complexity functions
- Route table management
- Network control and configuration
- Exception handling
- Data plane
- Low complexity functions
- Fast-path forwarding
4Router functions
- RFC 1812 plus...
- Error detection and correction
- Traffic measurement and policing
- Frame and protocol demultiplexing
- Address lookup and packet forwarding
- Segmentation, fragmentation, reassembly
- Packet classification
- Traffic shaping
- Timing and scheduling
- Queuing
- Security
5Design choices for network products
- General purpose processors
- Embedded RISC processors
- Network processors
- Field-programmable gate arrays (FPGAs)
- Application-specific integrated circuits (ASICs)
6General purpose processors (GPP)
- Programmable
- Mature development environment
- Typically used to implement control plane
- Too slow to run data plane effectively
- Sequential execution
- CPU/Network 50x increase over last decade
- Memory latencies 2x decrease over last decade
- Gigabit ethernet 333 nanosecond per packet
budget - Cache miss 150-200 nanoseconds
7Embedded RISC processors (ERP)
- Same as GPP, but
- Slower
- Cheaper
- Smaller (require less board space)
- Designed specifically for network applications
- Typically used for control plane functions
8Application-specific integrated circuits (ASIC)
- Custom hardware
- Long time to market
- Expensive
- Difficult to develop and simulate
- Not programmable
- Not reusable
- But, the fastest of the bunch
- Suitable for data plane
9Field Programmable Gate Arrays (FPGA)
- Flexible re-programmable hardware
- Less dense and slower than ASICs
- Cheaper than ASICs
- Good for providing fast custom functionality
- Suitable for data plane
10Network processors
- The speed of ASICs/FPGAs
- The programmability and cost of GPPs/ERPs
- Flexible
- Re-usable components
- Lower cost
- Suitable for data plane
11Network processors
- Common features
- Small, fast, on-chip instruction stores (no
caching) - Custom network-specific instruction set
programmed at assembler level - What instructions are needed for NPs? Open
question. - Minimality, Generality
- Multiple processing elements
- Multiple thread contexts per element
- Multiple memory interfaces to mask latency
- Fast on-chip memory (headers) and slow off-chip
memory (payloads) - No OS, hardware-based scheduling and thread
switching
12Why network processors?
- The propaganda
- Take the current vertical network device market
- Commoditize horizontal slices of it
- PC market
- Initially, an IBM custom vertical
- Now, a commodity market with Intel providing the
chip-set - Network device market
- Draw your own conclusions
13Network processing approaches
ASIC
FPGA
Network processor
Speed
GPP
Embedded RISC Processor
Programming/Development Ease
14Network processor architectures
- Packet path
- Store and forward
- Packet payload completely stored in and forwarded
from off-chip memory - Allows for large packet buffers
- Re-ordering problems with multiple processing
elements - Intel IXP, Motorola C5
- Cut-through
- Packet held in an on-chip FIFO and forwarded
through directly - Small packet buffers
- Built-in packet ordering
- AMCC
15Network processor architectures
- Processing architecture
- Parallel
- Each element independently performs entire
processing function - Packet re-ordering problems
- Larger instruction store needed per element
- Pipelined
- Each element performs one part of larger
processing function - Communicates result to next processing element in
pipeline - Smaller code space
- Packet ordering retained
- Deterministic behavior (no memory thrashing)
- Hybrid
16Network processor architectures
- Processing hierarchy
- ASICs
- Embedded RISC processors
- Specialized co-processors
- See figure 13.7 in book
17Network processor architectures
- Memory hierarchy
- Small on-chip memory
- Control/Instruction store
- Registers
- Cache
- RAM
- Large off-chip memory
- Cache
- Static RAM
- Dynamic RAM
18Network processor architectures
- Internal interconnect
- Bus
- Cross-bar
- FIFO
- Transfer registers
19Network processor architectures
- Concurrency
- Hardware support for multiple thread contexts
- Operating system support for multiple thread
contexts - Pre-emptiveness
- Migration support
20Increasing network processor performance
- Processing hierarchy
- Increase clock speed
- Increase elements
- Memory hierarchy
- Increase size
- Decrease latency
- Pipelining
- Add hierachies
- Add memory bandwidth (parallel stores)
- Add functional memory (CAMs)
21Focus of this class...
- Network processors
- Intel IXA
22IXP 1200 features
- One embedded RISC processor (StrongARM)
- Runs control plane (Linux)
- 6 programmable packet processors (m-engines)
- Runs data plane (m-engine assembler or m-engine
C) - Central hash unit
- Multiple, bus interconnects
- IXBus (4.4Gbps) to overcome PCI's 2.2Gbps limit
- Small on-board memory
- Serial interface for control
- External interfaces for memory
23(No Transcript)
24IXP12xx m-engine
25IXP2xxx m-engine
26m-engine functions
- Packet ingress from physical layer interface
- Checksum verification
- Header processing and classification
- Packet buffering in memory
- Table lookup and forwarding
- Header modification
- Checksum computation
- Packet egress to physical layer interface
27m-engine characteristics
- Programmable microcontroller
- Custom RISC instruction set
- Private 2048 instruction store per m-engine
(loaded by StrongARM) - 5-stage execution pipeline
- Hardware support for 4 threads and context
switching - Each m-engine has 4 hardware contexts (mask
memory latency)
28m-engine characteristics
- 128 general purpose registers
- Can be partitioned or shared
- Absolute or context-relative
- 128 transfer registers
- Staging registers for memory transfers
- 4 blocks of 32 registers
- SDRAM or SRAM
- Read or Write
- Local Control and Status Registers (CSRs)
- USTORE instructions, CTX, etc. (p. 315)
29m-engine characteristics
- FBI unit
- Scratchpad memory
- Hash unit
- FBI CSRs
- IXBus control
- IXBus FIFOs
- Transmit and Receive FIFOs to external line cards
3032 m-engine opcodes
- ALU instructions
- ALU, ALU_SHF, DBL_SHIFT
- Branch/Jump instructions
- BR, BR0, BR!0, BR_BSET, BRBYTE, BRCTX,
BR_INP_STATE, BR_!SIGNAL, JUMP, RTN, etc. - Reference instructions
- CSR, FAST_WR, LOCAL_CSR_RD, R_FIFO_RD, PCI_DMA,
SCRATCH, SDRAM, SRAM, T_FIFO_WR, etc. - Local register instructions
- FIND_BST, IMMED, LD_FIELD, LOAD_ADDR,
LOAD_BSET_RESULT1, etc.
3132 m-engine functions
- Miscellaneous
- CTX_ARB
- NOP
- HASH1_48, HASH1_64, etc.
328
9
8
8
9
7. m-engine or StrongARM processing 8. Packet
header read from SDRAM or RFIFO into m-engine
and classified (via SRAM tables) 9. Packet
headers modified 10. mpackets sent to
interface 11. Poll for space on MAC Update
transmit-ready if room for mpacket 12. mpackets
transferred to MAC
1. Packet received on physical interface (MAC) 2.
Ready-bus sequencer polls MAC for mpacket
Updates receive-ready upon a full mpacket 3.
m-engine polls for receive-ready 4. m-engine
instructs FBI to move mpacket from MAC to
RFIFO 5. m-engine moves mpacket directly from
RFIFO to SDRAM 6. Repeat 1-5 until full packet
received
33Programming the IXP
- Focus of this course on steps 7, 8, and 9
- 2 programming frameworks
- Command-line, IXA Active Computing Engine (ACE)
framework - Graphical microengine C development environment
34Programming the IXP
- Command-line, IXA Active Computing Engine (ACE)
framework - Re-usable function blocks chained together to
build an application (Chapters 22-24) - New functions implemented as new blocks in chain
- Core ACEs (StrongARM)
- Written in C
- Microblock ACEs (microengines)
- Written in assembler
35(No Transcript)
36Programming the IXP
- Graphical microengine C development environment
- Monolithic microengine C code (can not be used on
IXP1200 hardware) - Demos forthcoming