Scalable IP Lookup for Programmable Routers - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Scalable IP Lookup for Programmable Routers

Description:

ARL, Washington University in Saint Louis. http://www.arl.wustl.edu. David B. Parlour ... Amenable to implementation in a programmable router ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 23

Provided by: davide56

Category:

more less

Transcript and Presenter's Notes

Title: Scalable IP Lookup for Programmable Routers

1
Scalable IP Lookup forProgrammable Routers

David E. Taylor, Jonathan S. Turner,
John W. Lockwood, Todd S. Sproull
ARL, Washington University in Saint Louis
http//www.arl.wustl.edu
David B. Parlour
Xilinx, Inc.
http//www.xilinx.com
IEEE Infocom 2002, New York

2
Motivation Focus

Scalability strike a favorable balance between
lookup performance, resource utilization, and
update performance for high lookup rates and
large databases
Amenable to implementation in a programmable
router
Maximize packet processing resources ? minimize
resource utilization by baseline functionality
Proof of concept using open-platform research
systems
Algorithm and architecture efficiently scale to
support multi-gigabit links
Memory usage for route table ( 10 bytes per
entry)
Hardware resource usage for search engine ( 1
of FPGA CLBs per 500 Mb/s)
Developed supporting control software with web
interface

3
Route Lookup Example
Query packet arriving on port 3 destined for
128.252.153.194 Result anything arriving on port
3 going to 128.252. transmit on port 5
4
Route Lookup Challenges

Classless Inter-Domain Routing (CIDR) allows
route table entries to be variable-length
prefixes
Requires a Longest Prefix Match (LPM) search over
the table to find the most specific route
Backbone route tables are extremely large
Currently 70k to 110k entries with an approximate
doubling every two years
Optical link rates place high throughput
constraints on route lookup engines
2.5 Gb/s to 40 Gb/s ?5.9M pkt/s to 94.3M pkt/s
Must support frequent updates
Periodic distribution of routing information

5
Eatherton Dittias Tree Bitmap
6
Eatherton Dittias Tree Bitmap
Create multi-bit decision trie using k-bit strides
Simultaneously compare k address bits per node
Reduces number of memory accesses at the cost of
memory space
7
Eatherton Dittias Tree Bitmap
Compress multi-bit nodes using bitmaps
Extending Paths Bitmap set of exit points from
multi-bit node Internal Prefix Bitmap set of
stored prefixes in multi-bit node
8
Eatherton Dittias Tree Bitmap

Minimize pointer storage
Store all children of a node contiguously with
apointer to first child
Store next-hop information for internal
prefixescontiguously and store pointer to first
item in list

Use strides of IP address to select bit in
bitmap count of 1s to the left use as an
index from pointer
9
Eatherton Dittias Tree Bitmap
IP Address 1000 0000 1111 1100 1010 0000
10
Scalable IP Lookup Design

Fast IP Lookup (FIPL) Engine
Performs a longest prefix match (LPM) lookup on
the Tree Bitmap
Designed with periodic memory access pattern to
facilitate parallel operation
FIPL Engine Controller
Instantiate required number of parallel lookup
engines to support link rate
Interleaves memory accesses of parallel
FIPL Wrapper
Buffers packets andmodifies Layer 2headers
based onlookup results
Control Processor
Handles data structureupdates via
arbitratedSRAM interface

11
FIPL Engine Design

Tree Bitmap stored in SRAM operating at 100MHz
Regular memory access period 8 clock cycles
Interleave parallel engine memory accesses using
3-bit cycle counter
Exhaust memory bandwidth with 8 FIPL engines
Employs multicycle logic paths for area
efficiency
Relative to the Xilinx Virtex 1000-E FPGA, each
FIPL Engine utilizes less than 1 of the device
resources

12
Performance Analysis
Gate-level simulation of FPGA running at 100MHz
Used sample database from Mae-West of 16,564
routes Tree Bitmap required 118.8 bits per
entry (w/ 36-bit next hop info)
13
Update Performance
Injected a continuous cycle of route add, modify,
delete at various rates
14
Performance on Research Platform

Based on results, targeted 4 engine configuration
to the WUGS/FPX research platform to support 2
Gb/s links
Sustained 1.988 Gb/s throughput on min. length
packets 4.7 M packets/sec
Limited by 2 Gb/s switch interface of FPX
(32-bit at 62.5 MHz)
12 performance degradation at 200k updates/s
Utilizes only 8 of available logic resources and
12.5 of on-chip memory resources
4 FIPL Engines and FIPL Engine Controller
utilizes 6 of logic resources
FIPL Wrapper utilizes 2 of logic resources and
12.5 of on-chip memory resources

15
Modular Control Software

FIPL Memory Manager
Manages Tree Bitmap data structure
Accepts route add/modify/delete ? generates
memory read/write commands
NCHARGE
Provides reliable connectivity between multiple
software processes and reconfigurable hardware
modules
Sproull, et. al., Control and Configuration
Software for a Reconfigurable Networking Hardware
Platform, FCCM'02.
Remote User Interface
Download FPGA circuit, program FPX, configure
switch, and submit route updates remotely via web
page
More info at http//www.arl.wustl.edu/projects/fpx
/

16
Current Work Multi-Service Router
17
Towards Better FIPL Performance

Several options for architecture optimizations to
achieve 200MHz clock
Utilize on-chip BlockRAMs for implementation of
table-based CountOnes
Focus on reduction of off-chip memory accesses
Root node extension and caching
Asymmetrical node extension
Stride lengths of 12/8/4/4/4
Empty path pruning (form of path compression)
Other algorithmic optimizations
More data structure compression
Investigate intelligent node caching techniques

18
Conclusions Lessons

Design, simulation, implementation of a
Longest-Prefix Match (LPM) search engine
Achieved a favorable balance between lookup
performance, memory efficiency, and update
performance
Support 500 Mb/s per 1 of FPGA
Utilize 10 bytes per route entry
Support 100k updates per second on-the-fly
Scalable design provides for ease of use in
various research systems (IP routers,
Programmable MSRs, etc.)
Great insight gained by carrying algorithmic work
through to high-performance implementation
High-performance FPGA design is hard
Opinion CAD tools have not arrived yet

19
Thank you for listening.
Questions?
20
Towards Better Performance

Several options for design optimizations to
achieve 200MHz clock
Reduce off-chip memory accesses via root node
extension and caching
Brute node extension causes bitmap functions to
grow exponentially
Represent root node as on-chip array indexed by
first i-bits of destination address
Each array entry storesNext hop for LPM
ini-bit path, pointer to anextending sub-tree
Maintain 4-bit stridelength for off-chipnodes

21
Motivation Advanced Network Services
22
Supporting Advanced Network Services