Title: 19 June 20201
1Packet Classification for Core Routers Is there
an alternative to CAMs?
- Paper by
- Florin Baboescu, Sumeet Singh, George Varghese
- Presentation by
- Edward W. Spitznagel
2Outline
- Introduction
- Packet Classification Problem
- Extended Grid-of-Tries (EGT)
- Grid-of-Tries
- Extending Grid-of-Tries into EGT
- Path Compression
- Results
- Summary
3Packet Classification Problem
- Suppose you are a firewall, or QoS router, or
network monitor ... - You are given a list of rules (filters) to
determine how to process incoming packets, based
on the packet header fields - Goal when a packet arrives, find the least-cost
rule that matches the packets header fields
4Packet Classification Problem
- Example packet arrives with header (0101, 0010,
3, 5, UDP) - classification result filter c
- filter b also matches, but, c has lower cost
- Easy when we have only a few rules very hard
with 100,000 rules and packets arriving at 40
Gb/s
5Packet Classification - Metrics
- Metrics for evaluating classification algorithms
- Time complexity of classifying a packet
- often expressed as the number of memory accesses
required - Storage requirements of data structures
- Number of fields that can be handled
6Packet Classification in Core Routers
- Many core routers have fairly large (e.g. 2000
rule) databases - Expected to grow in fact, may be limited by
current technology - Classification in core routers must be done
quickly - Emerging core routers operate at 40Gb/s. With
40-byte packets, that means one packet every 8
nsec - Thus the general belief that brute-force hardware
(TCAMs) will be necessary to support packet
classification in core routers
7Packet Classification - TCAM disadvantages
- Ternary CAMs (TCAM) have disadvantages
- Density Scaling 10-12 transistors per bit of
TCAM (vs. 4-6 transistors per bit of SRAM) - Power Scaling due to performing all comparisons
in parallel. - Time Scaling 5-10 nsec for a TCAM operation
- Extra Chips requires TCAM chip(s) and bridge
ASIC - Rule Multiplication for ranges arbitrary ranges
are represented by sets of prefixes very
inefficient. - Thus, we consider an algorithmic solution...
8Packet Classification trends
- Packet classification in 2D several good methods
- Grid of Tries, Area-based QuadTrees, FIS-trees,
Tuple-space search, range trees and fractional
cascading - Classification in k dimensions, where kgt2, is
hard - O(logK-1 N) time and linear space, or O(log N)
time and O(NK) space, for N filters in K
dimensions - Modern algorithms use heuristics to exploit the
structure and properties that real-world filter
databases tend to have. - Example RFC and HiCuts algorithms
9Extended Grid of Tries (EGT)
- Observation Core router tables studied have a
low maximum filter depth in the 2D space defined
by ltSource IP Address, Destination IP Addressgt - in this case, low means20 or less
- i.e. no point in this 2D plotof filters is
covered by morethan 20 filters
10Extended Grid of Tries (EGT)
- The Basic Idea
- Use an existing 2D scheme to classify with
respect to Source IP and Dest. IP - Then, do linear search over asmall list of
possible matches(at most 20, but
typicallyaround 5) - EGT use Grid-of-Triesas the 2D scheme
11Grid of Tries - Intuition
- Imagine a search trie containing Dest. Address
prefixes - Now add a Source Address trie under each Dest.
prefix - Filters are stored in these tries, perhaps
multiple times
12Grid of Tries - Intuition
- Reduce storage by storing each filter only once
- But we now need to backtrack to ancestors source
tries during a search...
13Grid of Tries
- Use switch pointers to improve search efficiency
- allows us to jump to the next source trie among
ancestors, instead of backtracking
14Extended Grid of Tries
- EGT uses jump pointers instead of switch pointers
- EGT requires the 2D search to return all filters
matching in those dimensions - Thus, some of the nodes skipped by a switch
pointer cannot be skipped in an EGT search - So, search complexity is a bit higher than in
ordinary Grid-of-Tries - worst case search takes W(H1)W (H2)W time,
where Wtime to find best prefix in a single
trie, and Hmax trie height (H32 for IPv4) - but, the authors expect typically it takes LW
with L being a small value (reflecting the low
maximum prefix containment seen in most filter
databases)
15EGT with Path Compression (EGT-PC)
- EGT-PC adds Path Compression whereby single
branching paths are removed - Improves search time and storage requirements,
particularly for small filter sets
16EGT-PC Results
- Storage requirements impressively low (almost as
low as TCAM!) - since we store each filter only once
- Storage, in terms of number of 32-bit words
- Classification time is good, but not as
impressive - also a result of storing each filter once we
therefore may need to traverse multiple Source
tries - Memory accesses, in terms of 32-bit word accesses
17EGT-PC Results
- Memory usage by component
- Storage for list is proportionalto number of
filters - Storage for trie is roughlyproportional to
number of filters - Path compression reduces storage by a factor of
3, roughly
18EGT-PC Results with larger databases
- Larger databases are generated using smaller ones
as a core - randomly generated prefixes for Source Address
and Destination Address, using the prefix length
distributions from the original databases - Other fields are randomly derived from the
distributions in the original databases - Memory Accesses still not bad, even for large
databases - Storage Requirements still appear to be linear
19EGT-PC Remarks
- May only work well with core routers
- Lookups
- faster than HiCuts not as fast or as
deterministic as RFC. - can easily be characterized by maximum 2D filter
depth - Storage requirements quite good
- using Grid-of-Tries for the 2D scheme is a wise
choice (storage efficiency) - Very nice to have results comparing several
different algorithms (unlike nearly all previous
papers) - It is possible to apply the basic EGT idea, but
with a different 2D scheme - Tuple Space, FIS-trees, RFC in 2D, and perhaps
Area-based QuadTrees - The trick is that the 2D scheme must be modified
to return all filters matching those 2 dimensions
(rather than just the least-cost filter matching
those 2 dimensions)
20Comparison of different algorithms
Best
Worst
Lookup Speed
Linear Search
EGT
HiCuts-1
TCAM
EGT-PC
RFC
HiCuts-4
Best
Worst
Storage Requirements
RFC
Linear Search
EGT
HiCuts-1
HiCuts-4
EGT-PC
TCAM
21Summary
- Packet Classification Given packet P and list of
filters F, find least cost filter in F that
matches P - Important metrics Lookup time, data structure
size - Extended Grid of Tries
- Core routers have a low maximum filter depth in
the 2D space defined by ltSrc. Addr, Dest. Addrgt - Thus, we can perform a 2D search via Grid of
Tries, and then - and we can add path compression to the trie
- Lookup time is fairly good storage requirements
are very good.
22Thanks -- Questions?
?
23Backup slides to follow...
24Geometric Representation
- Filters with K fields can be represented
geometrically in K dimensions - Example
b
c
c
c
c
a
25Ternary CAMs
- Most popular practical approach to
high-performance packet classification - Hardware compares query word (packet header) to
all stored words (filters) in parallel - each bit of a stored word can be 0, 1, or X
(dont care) - Very fast, but not without drawbacks
- High power consumption limits scalability
- inefficient representation of ranges
26Ternary CAM - Example
(Now perform priority resolution...)
27Range Matching in TCAMs
- Convert ranges intosets of prefixes
- 1-4 becomes 001, 01, and 100
- 3-5 becomes 011 and 10
F
28Range Matching in TCAMs
b
c
a
e
f
d
- With two 16-bit range fields,a single rule could
require upto 900 TCAM entries! - Typical case entire filter setexpands by a
factor of 2 to 6