Title: Fast Incremental Updates for Pipelined Forwarding Engines
1Fast Incremental Updates for Pipelined Forwarding
Engines
- Author Anindya Basu, Girija Narlikar
- Publisher Transaction on networking 05
- Reporter Yen Cheng Liu
- Date 11/30
2Outline
- Introduction
- Background
- Solving pipeline architecture problem
- Route update characteristics
- Memory optimization
- Reduce bubbles
3Introduction
- The paper focus on ASIC-based packet forwarding
engine that utilize pipelining - Main issues of update
- Memory allocated must be balanced across stages
- Memory locations that are modified must be
limited in number and balanced across stages
4Introduction
- Main contribution of the paper
- Present an algorithm to build a trie which has
balanced stage memory allocation - Develop multiple optimization which aimed at
reducing number of modification in each stage due
to route update - Software-based scheme to process update( similar
to shadow trie ) - Flexible
- Cost effective
5Background
6Pipelined Lookups Using Tries
- Each trie level is stored in a different pipeline
stage - Using leaf pushing trie
- The longest matching prefix is always in the leaf
of the traversed path - Using write bubble to update
- Each bubble consists of a sequence of( stage,
location, value) triples, 1 triple for 1 stage - Minimizing the number of write bubble can reduce
the disruption to the lookup process
7SOLVING the PIPELINED ARCHITECTURE PROBLEM
- Forwarding engine model
- A trie component that constructs and updates the
routing trie - Packing component that packs writes from a batch
of consecutive route updates into write bubbles
that are sent down the pipeline - pipeline component that actually simulates the
traversal of these write bubbles through a
multi-stage pipeline.
8Forwarding engine model
9SOLVING the PIPELINED ARCHITECTURE PROBLEM
- Assumptions
- The initial trie construction takes as input a
snapshot of the entire table - Bubbles are processed by the pipeline in the same
order as they are generated by the packing
component - Only tries with fixed strides are considered
- Focus on leaf-pushed tries
- Writes to different pipeline stages can be
combined into a single write bubble - The packing component is permitted to pack
pipeline writes from multiple route updates into
a single write bubble - focus on IPv4 lookups
- The next hop information is stored in a separate
Next Hop table that is distinct from the
pipelined trie.
10Routing table Observation
- Because 24 bit prefix dominate the routing table
nowadays - Most routing update effect 24 bit prefix
- The number of short prefixes is very low.
However, the modifications in each update is
large( first level often has stride of 12-16 bits
) - The address blocks allocated to an ISP customer
are sub-blocks of the address block allocated to
the ISP - Prefixes corresponding to the customers of a
given ISP are typically neighboring 24-bit
prefixes - A link failure (recovery) in an ISP network
disconnects ( - reconnects) some or all of its customer networks
(represented by neighboring prefixes in the
routing trie). - Large proportion of routes that are withdrawn get
added back a few minutes later
11Memory optimization
- Designing non-pipelined tries
- Use controlled prefix expansion to construct
memory-efficient tries for the set of prefixes in
a routing table( using DP ) - controlled prefix expansion
- Node(i) number of nodes at level I
- If we terminate at bit position i, next level is
at bit position j, j gt I - gt node( i 1 ) lt 2( j i )
- T j, r gt memory requirement for j 1 bits,
r level
12Designing non-pipelined tries
- Here, we choose to terminate the (r-1)th level,
at position m to minimize the total memory
13Implications for memory usage and update
performance
- CPE doesnt attempt to equally distribute the
memory across stages
14A New Algorithm for Pipelined Architectures
- The new algorithm, MinMax, is based on CPE
- Constraints
- Each level in the fixed-stride must fit in a
single pipeline stage - The maximum memory allocated to a stage (over all
stages) is minimized. - The total memory used is minimized subject to the
first two constraints
15A New Algorithm for Pipelined Architectures
- The 1th and 3rd constraints are satisfied by
following equations
16A New Algorithm for Pipelined Architectures
- Memory allocated to the rth level in the
multi-bit trie - Maximum memory allocated to any trie level
- Find p minimum value of above function
17A New Algorithm for Pipelined Architectures
- Appling to constraints
- Main goal reduce max memory across stages
- A memory-efficient trie typically has smaller
strides and hence less replication of routes in
the trie
18A New Algorithm for Pipelined Architectures
- Worst Case Memory Bound
- The max memory per stage in k level is
19Performance
20Reducing write bubble
- Four optimization methods to achieve the goal
- Separating out updates to short routes
- Node pull-ups
- Eliminating excess writes
- Caching deleted subtrees
21Separating out update to short routes
- Separating out updates to short routes
- Ex the addition of 7-bit route can cause up to
211 writes( stride of 16 )
22Node pull-up
23Node pull-up
- State trie
- The pullup information( in the form of a changed
stride length) is stored in the node where the
pullup has occurred. - Software-based state trie can store the
information
24Eliminating excess writes
- Neighboring routes are often added in the same
timestamp - Add
25Eliminating excess writes
26Caching deleted subtrees
27Caching deleted subtrees
- When a route withdrawal causes a sub-tree to be
deleted, the trie component caches the sub-tree
in software and remembers the location of the
cached trie in the pipeline memory - Therefore, the only information that must be
stored with the cached subtree is the prefix that
was pushed down, and the last route in the
subtree that was withdrawn
28Caching deleted subtree
- Memory requirements
- Limited caching size memory
- FIFO is applied
29Reducing write bubble
- Benefits of applying each optimization
individually, and together with the other
optimizations
30Reducing write bubble
- Three schemes are compared here
-
31Reducing write bubble
- The experiments shows that 4-6 stages is better
when all optimization applied
32Prefix Table Dynamics
- Performing a large number of incremental updates
may cause the trie to gradually become
unbalanced. - MinMax may need to re-applied
-