Title: Scalable IP Lookup for Programmable Routers
1Scalable IP Lookup forProgrammable Routers
- David E. Taylor, Jonathan S. Turner,
- John W. Lockwood, Todd S. Sproull
- ARL, Washington University in Saint Louis
- http//www.arl.wustl.edu
- David B. Parlour
- Xilinx, Inc.
- http//www.xilinx.com
- IEEE Infocom 2002, New York
2Motivation Focus
- Scalability strike a favorable balance between
lookup performance, resource utilization, and
update performance for high lookup rates and
large databases - Amenable to implementation in a programmable
router - Maximize packet processing resources ? minimize
resource utilization by baseline functionality - Proof of concept using open-platform research
systems - Algorithm and architecture efficiently scale to
support multi-gigabit links - Memory usage for route table ( 10 bytes per
entry) - Hardware resource usage for search engine ( 1
of FPGA CLBs per 500 Mb/s) - Developed supporting control software with web
interface
3Route Lookup Example
Query packet arriving on port 3 destined for
128.252.153.194 Result anything arriving on port
3 going to 128.252. transmit on port 5
4Route Lookup Challenges
- Classless Inter-Domain Routing (CIDR) allows
route table entries to be variable-length
prefixes - Requires a Longest Prefix Match (LPM) search over
the table to find the most specific route - Backbone route tables are extremely large
- Currently 70k to 110k entries with an approximate
doubling every two years - Optical link rates place high throughput
constraints on route lookup engines - 2.5 Gb/s to 40 Gb/s ?5.9M pkt/s to 94.3M pkt/s
- Must support frequent updates
- Periodic distribution of routing information
5Eatherton Dittias Tree Bitmap
6Eatherton Dittias Tree Bitmap
Create multi-bit decision trie using k-bit strides
Simultaneously compare k address bits per node
Reduces number of memory accesses at the cost of
memory space
7Eatherton Dittias Tree Bitmap
Compress multi-bit nodes using bitmaps
Extending Paths Bitmap set of exit points from
multi-bit node Internal Prefix Bitmap set of
stored prefixes in multi-bit node
8Eatherton Dittias Tree Bitmap
- Minimize pointer storage
- Store all children of a node contiguously with
apointer to first child - Store next-hop information for internal
prefixescontiguously and store pointer to first
item in list
Use strides of IP address to select bit in
bitmap count of 1s to the left use as an
index from pointer
9Eatherton Dittias Tree Bitmap
IP Address 1000 0000 1111 1100 1010 0000
10Scalable IP Lookup Design
- Fast IP Lookup (FIPL) Engine
- Performs a longest prefix match (LPM) lookup on
the Tree Bitmap - Designed with periodic memory access pattern to
facilitate parallel operation - FIPL Engine Controller
- Instantiate required number of parallel lookup
engines to support link rate - Interleaves memory accesses of parallel
- FIPL Wrapper
- Buffers packets andmodifies Layer 2headers
based onlookup results - Control Processor
- Handles data structureupdates via
arbitratedSRAM interface
11FIPL Engine Design
- Tree Bitmap stored in SRAM operating at 100MHz
- Regular memory access period 8 clock cycles
- Interleave parallel engine memory accesses using
3-bit cycle counter - Exhaust memory bandwidth with 8 FIPL engines
- Employs multicycle logic paths for area
efficiency - Relative to the Xilinx Virtex 1000-E FPGA, each
FIPL Engine utilizes less than 1 of the device
resources
12Performance Analysis
Gate-level simulation of FPGA running at 100MHz
Used sample database from Mae-West of 16,564
routes Tree Bitmap required 118.8 bits per
entry (w/ 36-bit next hop info)
13Update Performance
Injected a continuous cycle of route add, modify,
delete at various rates
14Performance on Research Platform
- Based on results, targeted 4 engine configuration
to the WUGS/FPX research platform to support 2
Gb/s links - Sustained 1.988 Gb/s throughput on min. length
packets 4.7 M packets/sec - Limited by 2 Gb/s switch interface of FPX
(32-bit at 62.5 MHz) - 12 performance degradation at 200k updates/s
- Utilizes only 8 of available logic resources and
12.5 of on-chip memory resources - 4 FIPL Engines and FIPL Engine Controller
utilizes 6 of logic resources - FIPL Wrapper utilizes 2 of logic resources and
12.5 of on-chip memory resources
15Modular Control Software
- FIPL Memory Manager
- Manages Tree Bitmap data structure
- Accepts route add/modify/delete ? generates
memory read/write commands - NCHARGE
- Provides reliable connectivity between multiple
software processes and reconfigurable hardware
modules - Sproull, et. al., Control and Configuration
Software for a Reconfigurable Networking Hardware
Platform, FCCM'02. - Remote User Interface
- Download FPGA circuit, program FPX, configure
switch, and submit route updates remotely via web
page - More info at http//www.arl.wustl.edu/projects/fpx
/
16Current Work Multi-Service Router
17Towards Better FIPL Performance
- Several options for architecture optimizations to
achieve 200MHz clock - Utilize on-chip BlockRAMs for implementation of
table-based CountOnes - Focus on reduction of off-chip memory accesses
- Root node extension and caching
- Asymmetrical node extension
- Stride lengths of 12/8/4/4/4
- Empty path pruning (form of path compression)
- Other algorithmic optimizations
- More data structure compression
- Investigate intelligent node caching techniques
18Conclusions Lessons
- Design, simulation, implementation of a
Longest-Prefix Match (LPM) search engine - Achieved a favorable balance between lookup
performance, memory efficiency, and update
performance - Support 500 Mb/s per 1 of FPGA
- Utilize 10 bytes per route entry
- Support 100k updates per second on-the-fly
- Scalable design provides for ease of use in
various research systems (IP routers,
Programmable MSRs, etc.) - Great insight gained by carrying algorithmic work
through to high-performance implementation - High-performance FPGA design is hard
- Opinion CAD tools have not arrived yet
19Thank you for listening.
Questions?
20Towards Better Performance
- Several options for design optimizations to
achieve 200MHz clock - Reduce off-chip memory accesses via root node
extension and caching - Brute node extension causes bitmap functions to
grow exponentially - Represent root node as on-chip array indexed by
first i-bits of destination address - Each array entry storesNext hop for LPM
ini-bit path, pointer to anextending sub-tree - Maintain 4-bit stridelength for off-chipnodes
21Motivation Advanced Network Services
22Supporting Advanced Network Services
- Pressing need for high-performance programmable
routers - Flexible for rapid deployment of new services
- No need to modify end-systems
- Must scale with reasonable per-port costs
- Thousands of ports each supporting optical links
- Thousands of flows per port
- Must be computationally robust
- Support next-generation services without
modification to infrastructure - Requires additional per-port processing resources
- Minimize resources required for baseline
functionality