Towards a Packet Classification Benchmark - PowerPoint PPT Presentation

About This Presentation
Title:

Towards a Packet Classification Benchmark

Description:

Applied Research Laboratory. David E. Taylor. Towards a Packet Classification Benchmark ... Applied Research Laboratory. David E. Taylor. Motivation for a ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 36
Provided by: davide56
Category:

less

Transcript and Presenter's Notes

Title: Towards a Packet Classification Benchmark


1
Towards a Packet Classification Benchmark
  • ARL Current Research Talk
  • 20 October 2003

2
Packet Classification Example
Query Packet from 12.34.244.1 going to
168.92.44.32 using TCP from port 1200 to port
1450 Result Decrypt all packets using
AES Transmit packet on port 3
Query Packet from 12.34.244.1 going to
168.92.44.32 using TCP from port 1200 to port
1450 Result Encrypt packet using AES Send copy
of header to usage accounting with userID
110 Transmit packet on port 5
3
Formal Problem Statement
  • Given a packet P containing fields Pj and a
    collection of filters F with each filter Fi
    containing fields Fij, select the highest
    priority exclusive filter and k highest priority
    non-exclusive filters where for each filter
  • For all j Fij matches Pj
  • Performance tradeoffs commonlycharacterized by
    point locationproblem in computational geometry
  • For n regions defined in j dimensions,
  • for j gt 3, a point may be located in
  • multi-dimensional space in O(log n)
  • time with O(nj) space or O(logj-1n)
  • time with O(n) space

Example n 13, j 2
Packet header maps to point in 2-D space
Destination Address
Source Address
4
Motivation for a Benchmark
  • No benchmark currently exists in industry or
    research community
  • Performance of two most effective packet
    classification solutions depends on the
    composition of filters in the filter set
  • TCAM capacity depends on port range
    specifications
  • Range conversion to prefixes may cause a single
    filter to occupy 2(w-1)k TCAM slots (900 slots
    in the worst case for TCP UDP
    source/destination ports)
  • w number of bits required to represent a point
    in the range
  • k number of fields specified by ranges
  • Observed expansion factors range from 40 to 520
  • Fastest algorithms leverage heuristics and
    optimize average performance
  • Cutting algorithms (E-TCAMs, Hi-Cuts, Hyper-Cuts)
  • Tuple-Space algorithms
  • Plethora of new packet classification products
  • Network processors, packet processors, traffic
    managers, TCAMs
  • Intel, IBM, Silicon Access, Mosaid, IDT
    (Solidium), SiberCore, Cypress, etc.

5
Motivation for a Benchmark (2)
  • Security and confidentiality concerns limit
    access to real databases for study and
    performance evaluation
  • Well-connected researchers have gained access but
    are unable to share
  • Lack of large real databases due to limited
    deployment of high-performance packet
    classification solutions
  • Performance evaluations with realdatabases
    limited by size and structure of samples
  • Goal develop a benchmark capable of capturing
    relevant characteristics of real databases
    while providing structured mechanisms for
    augmenting database composition and analyzing
    performance effects
  • Should have value for three distinct communities
    researchers, product vendors, product consumers

6
Related Work
  • IETF Benchmarking Working Group (BMWG) developed
    benchmark methodologies for Forwarding
    Information Base (FIB) routers and firewalls
  • FIB focuses on performance evaluation of routers
    at transmission interfaces
  • Firewall methodology is a high-level testing
    methodology with no detailed recommendations of
    filter composition
  • Network Processing Forum has a benchmarking
    initiative
  • Produced IP lookup and switch fabric benchmarks
  • Thus far, only IBM and Intel have published
    results for IP lookup
  • No details or announcements re packet
    classification
  • Performance evaluation by researchers
  • Most randomly select prefixes from forwarding
    tables and use existing protocol, port range
    combinations
  • Baboescu Varghese added refinements for
    controlling the number of zero-length prefixes
    and prefix nesting

7
Related Work (2)
  • Woo Infocom 2000 provided strong motivation for
    a benchmark
  • Provided a high-level overview of filter
    composition for various environments
  • ISP Peering Router, ISP Core Router, Enterprise
    Edge Router, etc.
  • Generated large synthetic databases but provided
    few details regarding database construction
  • No mechanisms for varying filter composition

8
Understanding Filter Composition
  • Most complex packet filters typically appear in
    firewall and edge router filter sets
  • Heterogeneous applications network address
    translation (NAT), virtual private networks
    (VPNs), and resource reservation
  • Firewall filters are created manually by a system
    admin using standard tools such as Cisco Firewall
    MC
  • Model of filter construction specify
    communicating subnets, specify application (or
    set of applications)
  • TCP and UDP identify applications via 16-bit port
    numbers
  • Provide services to unknown clients via contact
    ports in the range of well-known (or system)
    ports assigned by IANA
  • Since 1993, the system port range is 01023
  • Established sessions typically use a unique port
    in the ephemeral port range 102465535
  • IANA manages a list of user registered ports in
    the range 102449151
  • Limited number of protocols in use, dominated by
    TCP and UDP

9
Analyzing Database Structure
  • Engaged in an iterative process of analyses in
    order to identify useful metrics
  • Accurately capture database structure
  • Goal identify methods and metrics useful for
    constructing synthetic databases
  • Defined new metrics
  • Joint address prefix length distributions
  • Scope metric used to assess the specificity of
    filters on a logarithmic scale
  • Skew metric used to assess the number of subnets
    covered by a given filter set
  • Quantifies branching in the binary tree
    representation of address prefixes

10
Scope Definition
  • From a geometric perspective, a filter defines a
    region in 5-d space
  • Volume of the region is the product of the 1-d
    lengths specified by the filter fields
  • e.g. Number of addresses covered by source
    address prefix
  • Points in 5-d space correspond to packet headers
  • Filter properties are commonly defined as a tuple
    specification, or a vector with fields
  • t0, source address prefix length, 032
  • t1, destination address prefix length, 032
  • t2, source port range width, 0216
  • t2, destination port range width, 0216
  • t4, protocol specification, Boolean specified,
    not specified

11
Scope Distributions
  • Scope distribution characterizes the specificity
    of filters in the database
  • Exact match filters have scope 0
  • Default filters have scope 104
  • Notable spikes near low end of distribution
  • Wide variance

12
Joint Prefix Length Distributions
  • Observe large spikes in joint distribution along
    the edges
  • Unlike forwarding tables /0 and /32 prefixes are
    common in prefix length pairs
  • Strong motivation for capturing joint
    distribution
  • Observe a correlation with port range
    specifications (not shown)

13
Joint Prefix Length Distributions (2)
  • For synthetic database generation, we want to
  • Select a prefix length pair based on total prefix
    length
  • Total length specified by diagonals in joint
    distribution
  • Allow distribution to be modified
  • Represent joint distribution by a collection of
    1d distributions
  • Build a total length distribution 064
  • bin sum of prefix lengths
  • For each non-empty bin in total length
    distribution, build a source length distribution
    for the prefix pairs in the bin
  • (destination address prefix length) (total
    length) (source address prefix length)
  • Allows for high-level input parameter for address
    scope adjustment

14
Skew Definition
  • Want a high-level characterization of address
    space coverage by filters, (also want to
    anonymize IP addresses)
  • Complete, statistical model is infeasible
  • Imagine a binary tree with a branching
    probability for each node
  • Employ a suitable approximation to capture
    important characteristics such as prefix
    containment
  • Build two binary trees from the source and
    destination address prefixes in the filters
  • At each node, define the weight of the left child
    and right child as the number of filters
    specifying a prefix reached by taking the left
    child and right child, respectively
  • Let heavy maxweight of left child, weight of
    right child
  • Let light minweight of left child, weight of
    right child

15
Skew Distributions
  • For each level in the tree compute the average
    skew for the nodes at that level
  • Low skew ? evenly weighted children, doubling
    of address space coverage
  • High skew ? asymmetrically weighted children,
    containment of address space coverage
  • Skew 1 means a node has a single path

16
Designing a Flexible Benchmark
  • Provide mechanism for defining database structure
  • Structure could be based on analysis of seed
    databases
  • Construct a set of benchmark database structures
    to use a departure point for performance
    evaluation
  • Provide high-level controls for augmenting
    database structure
  • Observe effects on search and capacity
    performance
  • Scale the database while preventing redundant
    filters
  • Adjust the specificity or scope of filters
  • Introduce entropy into the database
  • A structured mechanism for straying from database
    structure
  • Difficult to provide meaningful adjustments for
    application specifications (protocol, port ranges)

17
Benchmark Architecture
18
Parameter Files
  • Defines the general database via requisite
    statistics
  • May be extracted from seed databases using an
    analysis tool
  • Goal compile a set of benchmark parameter files
    that characterize various packet classification
    application environments (as proposed by Woo)
  • Protocol and port pair class distribution
  • Distribution of protocol specifications
  • For each protocol, specify a port pair class
    distribution for filters specifying the given
    protocol
  • Port pair class defines the structure of port
    range pairs
  • 25 port pair classes ?all possible permutations
    of five port classes
  • WC 065535, WR1 01023, WR2
    102365535, AR, EM
  • Port range distributions
  • Arbitrary range and exact port distributions
  • Limited set of arbitrary ranges observed in real
    databases

19
Parameter Files (2)
  • Joint prefix length distributions for each port
    pair class
  • 25 distributions, each containing a total length
    distribution and the associated source address
    prefix length distributions
  • Preserves correlation between port pair class and
    prefix length pairs in directional filters
  • Address skew distributions for source and
    destination addresses
  • Source/destination prefix correlation
    distribution
  • Specifies the distance between communicating
    subnets specified by filter
  • Probability that the address prefixes of a filter
    continue to be identical at a given prefix length
  • Consider a filter with address prefix length pair
    (16,25)
  • Consider walking the source and destination
    address prefix trees in parallel
  • Assume that the prefixes are identical for the
    first 8 bits
  • The correlation probability at level 9
    specifies the probability that the next bit in
    the prefixes will be the same
  • Once prefixes diverge or prefix length is
    reached, the distribution is irrelevant

20
Synthetic Database Generator
  • Reads in parameter file
  • Trivial option to generate a completely random
    filter database
  • Takes three high-level input parameters
  • size target size for synthetic database
  • Resulting size may be less than target
  • Tool generates filters using statistical model
    then post-processes database to remove redundant
    filters
  • Favorable for assessing scalability of parameter
    files
  • Smoothing (r) number of bits by which synthetic
    filters may stray from points in prefix length
    pair distribution
  • Structured entropy mechanism for introducing
    new prefix length pairs
  • Models aggregation and/or increased flow
    segregation
  • Scope (s) bias to more or less specific filters
  • Adjusts the shape of the address length
    distributions without adding or removing bins

21
Understanding Scaling Effects
  • Readily scale a seed database by 30x to 40x
  • Larger seed databases provide for larger
    synthetic databases
  • rules6 (1500 filters) is approximately 6x larger
    than rules1 and rules5
  • As the limitof the seed parameter file is
    reached ? shift in average filter scope to more
    specific filters

22
Smoothing Adjustment
  • Smoothing (r) number of bits by which synthetic
    filters may stray from points in prefix length
    pair distribution
  • Apply a symmetric binomial spreading to each
    spike in the joint prefix length distribution
  • For each joint distribution in parameter file
  • Apply binomial spreading to each spike in total
    length distribution
  • For each source prefix length distribution
  • Apply binomial spreading to each spike in source
    length distribution
  • Tricky details like adjusting the width of the
    source spreading as you move away from the
    original spike
  • Truncate and normalize distribution to allow for
    spreading of spikes at the edges
  • Let k 2r

23
Smoothing Example Single Spike
  • All prefixes lengths are 16 bits
  • Database target size 64,000 filters
  • No scope adjustment, s 0
  • Generate databases for various values of
    smoothing adjustment, r

(a.) r 0
(b.) r 0, top-view
24
Single Spike with r 8
  • r 8 ? maximum Manhattan distance from
    original spike
  • Observe symmetric binomial distribution across
    total prefix length (diagonal) and source prefix
    length

(a.) r 8
(b.) r 8, top-view
25
Single Spike with r 32
  • r 32 ? maximum Manhattan distance from
    original spike
  • Observe symmetric binomial distribution across
    total prefix length (diagonal) and source prefix
    length

(a.) r 32
(b.) r 32, top-view
26
Smoothing with Seed Parameter File
  • r 16
  • Appears to be the sensible limit to smoothing for
    real databases
  • Spreading is cumulative, adjacent spikes may
    spread into each other creating new dominant
    spikes

27
Understanding Smoothing Effects
  • High sensitivity for small values of smoothing
    adjustment, r
  • Believe that this is due to dominance of spikes
    at the more specific edges of the joint
    distributions in seed databases
  • Truncation causes a slight drift to a larger
    average scope

28
Smoothing Contrived Distributions
  • Constructed two contrived distributions to verify
    hypothesis
  • Spikes all joint distributions have two points
    (0,0) and (32,32)
  • Uniform uniform total length distribution
  • Observed identical drift for spikes distribution
    and no drift for uniform distribution

29
Scope Adjustment
  • Scope (s) bias to more or less specific
    filters, -11
  • Adjusts the shape of the address length
    distributions without adding or removing bins
  • s gt 0 decrease scope, increase specificity
    (prefix length)
  • s lt 0 increase scope, decrease specificity
    (prefix length)
  • Utilize a bias function on the random number used
    to select from the cumulative distributions
  • Bias function computes area under line whose
    slope is defined by s
  • Prevents laborious recomputation of each prefix
    length distribution

s 1
s -1
s 0
1
1
1
0.5
0.75
0.25
1
0
0.5
1
0
0.5
1
0
0.5
S -1
S 0
S 1
30
Scope Example Uniform Distribution
  • Uniform distribution, r 0, s 1
  • Weight is pushed to more specific address prefixes

31
Scope Contrived Distributions
  • Maximum bias of 12-bits longer or shorter in
    total prefix length
  • Provides for an 4096x increase or decrease in the
    average coverage of the filters in the database
  • As expected, negligible difference in two
    distributions
  • No change in bins, only a shift in weight

32
Scope Real Distributions
  • Observed maximum bias of 6-bits longer or
    shorter in total prefix length
  • Provides for an 64x increase or decrease in the
    average coverage of the filters in the database
  • Sensitivity is dependent upon parameter file

33
Synthetic Database Generation Summary
  • Solid foundation for a packet classification
    benchmark
  • May be beneficial to have a high-level skew
    adjustment or skew compensation coupled with
    scaling
  • Allow more branching for larger databases
  • Need more sample databases from other application
    environments in order to compile benchmark suite
    of parameter files
  • Alternately, formulate parameter files manually
    from more detailed extensions of Woos
    descriptions

34
Trace Generator
  • Problem given a filter database, construct an
    input trace of packet headers that query the
    database at all interesting points and an
    associated output trace of best-matching (or
    all-matching) filters for each packet header
  • We can define interesting in various ways
  • A point in each 5-d polyhedron formed by the
    intersections of the 5-d rectangles specified by
    the filters in the database (optimal solution)
  • Appears to be an O((nlog n)5) problem using
    fancy data-structures
  • Optimizations may exist and amortized performance
    may be better
  • A random selection of points (least favorable
    solution)
  • A pseudo-random selection of points (most
    feasible solution?)
  • For each filter, chose a few random points
    covered by the filter
  • Might be able to develop some heuristics to
    choose points that are and are not likely to be
    overlapped by other filters
  • Post-process the input trace in order to generate
    the output trace
  • Could feedback results of post-process in order
    to choose points for filters not appearing in the
    output trace

35
The next step
  • Finalize trace generator design, implement, and
    analyze (if necessary)
  • Run several packet classification algorithms
    through the benchmark
  • Use results to refine tools and develop
    benchmarking methodology that extracts salient
    features
  • Investigate ways to generate broad interest in
    the benchmark
  • Publication
  • Web-based scripts
  • Pitch to the IETF
  • Comments, critiques, suggestions, questions?
Write a Comment
User Comments (0)
About PowerShow.com