Title: Towards a Packet Classification Benchmark
1Towards a Packet Classification Benchmark
- ARL Current Research Talk
- 20 October 2003
2Packet Classification Example
Query Packet from 12.34.244.1 going to
168.92.44.32 using TCP from port 1200 to port
1450 Result Decrypt all packets using
AES Transmit packet on port 3
Query Packet from 12.34.244.1 going to
168.92.44.32 using TCP from port 1200 to port
1450 Result Encrypt packet using AES Send copy
of header to usage accounting with userID
110 Transmit packet on port 5
3Formal Problem Statement
- Given a packet P containing fields Pj and a
collection of filters F with each filter Fi
containing fields Fij, select the highest
priority exclusive filter and k highest priority
non-exclusive filters where for each filter - For all j Fij matches Pj
- Performance tradeoffs commonlycharacterized by
point locationproblem in computational geometry - For n regions defined in j dimensions,
- for j gt 3, a point may be located in
- multi-dimensional space in O(log n)
- time with O(nj) space or O(logj-1n)
- time with O(n) space
Example n 13, j 2
Packet header maps to point in 2-D space
Destination Address
Source Address
4Motivation for a Benchmark
- No benchmark currently exists in industry or
research community - Performance of two most effective packet
classification solutions depends on the
composition of filters in the filter set - TCAM capacity depends on port range
specifications - Range conversion to prefixes may cause a single
filter to occupy 2(w-1)k TCAM slots (900 slots
in the worst case for TCP UDP
source/destination ports) - w number of bits required to represent a point
in the range - k number of fields specified by ranges
- Observed expansion factors range from 40 to 520
- Fastest algorithms leverage heuristics and
optimize average performance - Cutting algorithms (E-TCAMs, Hi-Cuts, Hyper-Cuts)
- Tuple-Space algorithms
- Plethora of new packet classification products
- Network processors, packet processors, traffic
managers, TCAMs - Intel, IBM, Silicon Access, Mosaid, IDT
(Solidium), SiberCore, Cypress, etc.
5Motivation for a Benchmark (2)
- Security and confidentiality concerns limit
access to real databases for study and
performance evaluation - Well-connected researchers have gained access but
are unable to share - Lack of large real databases due to limited
deployment of high-performance packet
classification solutions - Performance evaluations with realdatabases
limited by size and structure of samples - Goal develop a benchmark capable of capturing
relevant characteristics of real databases
while providing structured mechanisms for
augmenting database composition and analyzing
performance effects - Should have value for three distinct communities
researchers, product vendors, product consumers
6Related Work
- IETF Benchmarking Working Group (BMWG) developed
benchmark methodologies for Forwarding
Information Base (FIB) routers and firewalls - FIB focuses on performance evaluation of routers
at transmission interfaces - Firewall methodology is a high-level testing
methodology with no detailed recommendations of
filter composition - Network Processing Forum has a benchmarking
initiative - Produced IP lookup and switch fabric benchmarks
- Thus far, only IBM and Intel have published
results for IP lookup - No details or announcements re packet
classification - Performance evaluation by researchers
- Most randomly select prefixes from forwarding
tables and use existing protocol, port range
combinations - Baboescu Varghese added refinements for
controlling the number of zero-length prefixes
and prefix nesting
7Related Work (2)
- Woo Infocom 2000 provided strong motivation for
a benchmark - Provided a high-level overview of filter
composition for various environments - ISP Peering Router, ISP Core Router, Enterprise
Edge Router, etc. - Generated large synthetic databases but provided
few details regarding database construction - No mechanisms for varying filter composition
8Understanding Filter Composition
- Most complex packet filters typically appear in
firewall and edge router filter sets - Heterogeneous applications network address
translation (NAT), virtual private networks
(VPNs), and resource reservation - Firewall filters are created manually by a system
admin using standard tools such as Cisco Firewall
MC - Model of filter construction specify
communicating subnets, specify application (or
set of applications) - TCP and UDP identify applications via 16-bit port
numbers - Provide services to unknown clients via contact
ports in the range of well-known (or system)
ports assigned by IANA - Since 1993, the system port range is 01023
- Established sessions typically use a unique port
in the ephemeral port range 102465535 - IANA manages a list of user registered ports in
the range 102449151 - Limited number of protocols in use, dominated by
TCP and UDP
9Analyzing Database Structure
- Engaged in an iterative process of analyses in
order to identify useful metrics - Accurately capture database structure
- Goal identify methods and metrics useful for
constructing synthetic databases - Defined new metrics
- Joint address prefix length distributions
- Scope metric used to assess the specificity of
filters on a logarithmic scale - Skew metric used to assess the number of subnets
covered by a given filter set - Quantifies branching in the binary tree
representation of address prefixes
10Scope Definition
- From a geometric perspective, a filter defines a
region in 5-d space - Volume of the region is the product of the 1-d
lengths specified by the filter fields - e.g. Number of addresses covered by source
address prefix - Points in 5-d space correspond to packet headers
- Filter properties are commonly defined as a tuple
specification, or a vector with fields - t0, source address prefix length, 032
- t1, destination address prefix length, 032
- t2, source port range width, 0216
- t2, destination port range width, 0216
- t4, protocol specification, Boolean specified,
not specified
11Scope Distributions
- Scope distribution characterizes the specificity
of filters in the database - Exact match filters have scope 0
- Default filters have scope 104
- Notable spikes near low end of distribution
- Wide variance
12 Joint Prefix Length Distributions
- Observe large spikes in joint distribution along
the edges - Unlike forwarding tables /0 and /32 prefixes are
common in prefix length pairs - Strong motivation for capturing joint
distribution - Observe a correlation with port range
specifications (not shown)
13 Joint Prefix Length Distributions (2)
- For synthetic database generation, we want to
- Select a prefix length pair based on total prefix
length - Total length specified by diagonals in joint
distribution - Allow distribution to be modified
- Represent joint distribution by a collection of
1d distributions - Build a total length distribution 064
- bin sum of prefix lengths
- For each non-empty bin in total length
distribution, build a source length distribution
for the prefix pairs in the bin - (destination address prefix length) (total
length) (source address prefix length) - Allows for high-level input parameter for address
scope adjustment
14Skew Definition
- Want a high-level characterization of address
space coverage by filters, (also want to
anonymize IP addresses) - Complete, statistical model is infeasible
- Imagine a binary tree with a branching
probability for each node - Employ a suitable approximation to capture
important characteristics such as prefix
containment - Build two binary trees from the source and
destination address prefixes in the filters - At each node, define the weight of the left child
and right child as the number of filters
specifying a prefix reached by taking the left
child and right child, respectively - Let heavy maxweight of left child, weight of
right child - Let light minweight of left child, weight of
right child
15Skew Distributions
- For each level in the tree compute the average
skew for the nodes at that level - Low skew ? evenly weighted children, doubling
of address space coverage - High skew ? asymmetrically weighted children,
containment of address space coverage - Skew 1 means a node has a single path
16Designing a Flexible Benchmark
- Provide mechanism for defining database structure
- Structure could be based on analysis of seed
databases - Construct a set of benchmark database structures
to use a departure point for performance
evaluation - Provide high-level controls for augmenting
database structure - Observe effects on search and capacity
performance - Scale the database while preventing redundant
filters - Adjust the specificity or scope of filters
- Introduce entropy into the database
- A structured mechanism for straying from database
structure - Difficult to provide meaningful adjustments for
application specifications (protocol, port ranges)
17Benchmark Architecture
18Parameter Files
- Defines the general database via requisite
statistics - May be extracted from seed databases using an
analysis tool - Goal compile a set of benchmark parameter files
that characterize various packet classification
application environments (as proposed by Woo) - Protocol and port pair class distribution
- Distribution of protocol specifications
- For each protocol, specify a port pair class
distribution for filters specifying the given
protocol - Port pair class defines the structure of port
range pairs - 25 port pair classes ?all possible permutations
of five port classes - WC 065535, WR1 01023, WR2
102365535, AR, EM - Port range distributions
- Arbitrary range and exact port distributions
- Limited set of arbitrary ranges observed in real
databases
19Parameter Files (2)
- Joint prefix length distributions for each port
pair class - 25 distributions, each containing a total length
distribution and the associated source address
prefix length distributions - Preserves correlation between port pair class and
prefix length pairs in directional filters - Address skew distributions for source and
destination addresses - Source/destination prefix correlation
distribution - Specifies the distance between communicating
subnets specified by filter - Probability that the address prefixes of a filter
continue to be identical at a given prefix length - Consider a filter with address prefix length pair
(16,25) - Consider walking the source and destination
address prefix trees in parallel - Assume that the prefixes are identical for the
first 8 bits - The correlation probability at level 9
specifies the probability that the next bit in
the prefixes will be the same - Once prefixes diverge or prefix length is
reached, the distribution is irrelevant
20Synthetic Database Generator
- Reads in parameter file
- Trivial option to generate a completely random
filter database - Takes three high-level input parameters
- size target size for synthetic database
- Resulting size may be less than target
- Tool generates filters using statistical model
then post-processes database to remove redundant
filters - Favorable for assessing scalability of parameter
files - Smoothing (r) number of bits by which synthetic
filters may stray from points in prefix length
pair distribution - Structured entropy mechanism for introducing
new prefix length pairs - Models aggregation and/or increased flow
segregation - Scope (s) bias to more or less specific filters
- Adjusts the shape of the address length
distributions without adding or removing bins
21Understanding Scaling Effects
- Readily scale a seed database by 30x to 40x
- Larger seed databases provide for larger
synthetic databases - rules6 (1500 filters) is approximately 6x larger
than rules1 and rules5 - As the limitof the seed parameter file is
reached ? shift in average filter scope to more
specific filters
22Smoothing Adjustment
- Smoothing (r) number of bits by which synthetic
filters may stray from points in prefix length
pair distribution - Apply a symmetric binomial spreading to each
spike in the joint prefix length distribution - For each joint distribution in parameter file
- Apply binomial spreading to each spike in total
length distribution - For each source prefix length distribution
- Apply binomial spreading to each spike in source
length distribution - Tricky details like adjusting the width of the
source spreading as you move away from the
original spike - Truncate and normalize distribution to allow for
spreading of spikes at the edges - Let k 2r
23Smoothing Example Single Spike
- All prefixes lengths are 16 bits
- Database target size 64,000 filters
- No scope adjustment, s 0
- Generate databases for various values of
smoothing adjustment, r
(a.) r 0
(b.) r 0, top-view
24Single Spike with r 8
- r 8 ? maximum Manhattan distance from
original spike - Observe symmetric binomial distribution across
total prefix length (diagonal) and source prefix
length
(a.) r 8
(b.) r 8, top-view
25Single Spike with r 32
- r 32 ? maximum Manhattan distance from
original spike - Observe symmetric binomial distribution across
total prefix length (diagonal) and source prefix
length
(a.) r 32
(b.) r 32, top-view
26Smoothing with Seed Parameter File
- r 16
- Appears to be the sensible limit to smoothing for
real databases - Spreading is cumulative, adjacent spikes may
spread into each other creating new dominant
spikes
27Understanding Smoothing Effects
- High sensitivity for small values of smoothing
adjustment, r - Believe that this is due to dominance of spikes
at the more specific edges of the joint
distributions in seed databases - Truncation causes a slight drift to a larger
average scope
28Smoothing Contrived Distributions
- Constructed two contrived distributions to verify
hypothesis - Spikes all joint distributions have two points
(0,0) and (32,32) - Uniform uniform total length distribution
- Observed identical drift for spikes distribution
and no drift for uniform distribution
29Scope Adjustment
- Scope (s) bias to more or less specific
filters, -11 - Adjusts the shape of the address length
distributions without adding or removing bins - s gt 0 decrease scope, increase specificity
(prefix length) - s lt 0 increase scope, decrease specificity
(prefix length) - Utilize a bias function on the random number used
to select from the cumulative distributions - Bias function computes area under line whose
slope is defined by s - Prevents laborious recomputation of each prefix
length distribution
s 1
s -1
s 0
1
1
1
0.5
0.75
0.25
1
0
0.5
1
0
0.5
1
0
0.5
S -1
S 0
S 1
30Scope Example Uniform Distribution
- Uniform distribution, r 0, s 1
- Weight is pushed to more specific address prefixes
31Scope Contrived Distributions
- Maximum bias of 12-bits longer or shorter in
total prefix length - Provides for an 4096x increase or decrease in the
average coverage of the filters in the database - As expected, negligible difference in two
distributions - No change in bins, only a shift in weight
32Scope Real Distributions
- Observed maximum bias of 6-bits longer or
shorter in total prefix length - Provides for an 64x increase or decrease in the
average coverage of the filters in the database - Sensitivity is dependent upon parameter file
33Synthetic Database Generation Summary
- Solid foundation for a packet classification
benchmark - May be beneficial to have a high-level skew
adjustment or skew compensation coupled with
scaling - Allow more branching for larger databases
- Need more sample databases from other application
environments in order to compile benchmark suite
of parameter files - Alternately, formulate parameter files manually
from more detailed extensions of Woos
descriptions
34Trace Generator
- Problem given a filter database, construct an
input trace of packet headers that query the
database at all interesting points and an
associated output trace of best-matching (or
all-matching) filters for each packet header - We can define interesting in various ways
- A point in each 5-d polyhedron formed by the
intersections of the 5-d rectangles specified by
the filters in the database (optimal solution) - Appears to be an O((nlog n)5) problem using
fancy data-structures - Optimizations may exist and amortized performance
may be better - A random selection of points (least favorable
solution) - A pseudo-random selection of points (most
feasible solution?) - For each filter, chose a few random points
covered by the filter - Might be able to develop some heuristics to
choose points that are and are not likely to be
overlapped by other filters - Post-process the input trace in order to generate
the output trace - Could feedback results of post-process in order
to choose points for filters not appearing in the
output trace
35The next step
- Finalize trace generator design, implement, and
analyze (if necessary) - Run several packet classification algorithms
through the benchmark - Use results to refine tools and develop
benchmarking methodology that extracts salient
features - Investigate ways to generate broad interest in
the benchmark - Publication
- Web-based scripts
- Pitch to the IETF
- Comments, critiques, suggestions, questions?