Title: Packet Classification
1Packet Classification 3
- Ozgur Ozturk
- CSE 581 Internet Technology
- Winter 2002
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
2Introduction
- Importance
- Identify the context of packets ?
- Apply necessary actions
- Differentiated services
- Memory and Time Efficiency
- Must handle Ks of rules
- Must be at wire-speed (No queuing)
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
3Packet Classification 3Paper List
- T. Lakshman, D. Stiliadis, "High-Speed
Policy-based Packet Forwarding Using Efficient
Multi-dimensional Range Matching
Bit-Parallelism - http//www.bell-labs.com/user/stiliadi/filter/pape
r.html - F. Baboescu, G. Varghese, "Scalable Packet
Classification ABV Agregated Bit Vector - M. Buddhikot, S. Suri, M. Waldvogel, "Space
Decomposition Techniques for Fast Layer-4
Switching Space Decomposition - V. Srinivasan, G. Varghese, S. Suri, M.
Waldvogel, "Fast and Scalable Layer Four
Switching Paper4
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
4Bit-Parallelism Paper-Intro.
- Presents packet classification schemes
- traffic-independent and worst-case performance
metric - a few K rules, at rates of M packets per second
using range matches on more than 4 packet header
fields
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
5Bit-Parallelism PaperRequirement for Real-Time
Operation
- Traditional router architectures
- flow-cache architectures to classify packets
- identified flows are expected to arrive in near
future - Current backbone routers
- active flows extremely high
- OC-3 links, 256K flows
- Cashes implemented as hash tables
- scales well to that size
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
6Bit-Parallelism PaperRequirement for Real-Time
Operation 2 - Hash-Table Prob.s
- Good hash function is non-trivial
- 100 to 200 bits of header to be randomly
distributed to no more than 20 to 24 bits of hash
index - header value distribution is unknown
- Performance of cache-based schemes is heavily
traffic dependent - Malicious Users
- limitations of hashing algo. cashing techniques
- Packet queuing delays acceptable after
classification
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
7Bit-Parallelism Paper Packet Classification
Constraints
- Scale to large routers with Gigabit links.
- Process at wire-speed
- 75 of packets lt typical TCP packet size (552
bytes) - Nearly half are 40 to 44 bytes (TCP Ack)
- Rules on several fields, specifying ranges, exact
matches and prefixes - Two prefix fields in some cases
- Allow arbitrary priorities for policies to allow
distinction for multiple matches - Optimize for lookups, sacrifice update
performance - lookup rate/update rate ?107.
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
8Bit-Parallelism Paper Packet Classification
Constraints-2
- Memory access time dominant factor in worst-case
lookup execution time - Amenable to hardware implementation
- Time vs. Space
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
9Bit-Parallelism Paper General Packet
Classification
- Decomposable search to perform multi-dimensional
search for packet filtering - k-dimensional query ? a set of 1-dimensional
queries on 1-dimensional intervals - Exploit parallelism where possible
- Seek poly-logarithmic solution
- Packet header fields ? k-dimensions
- Filters ? overlapping regions in the
k-dimensional space
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
10Bit-Parallelism Paper Efficiency of Proposed
Algorithms
- 1st Algorithm
- Memory kn2O(n) bits per dimension
- Time ?log(2n)?1
- Memory access ?n/w?
- 2nd Algorithm
- Memory reduce to O(n log n) bits
- Time increase constant
- Can be optimized for time and memory budget
- Exploit on-chip memory in traffic-independent
manner, to speed up worst case.
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
11Notation
- Rule rm in k dimentions
- rm (e1,m, e2,m,. ek,m)
- e range
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
12Bit-Parallelism Paper Algorithm demo on
2-D/Preprocessing 1
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
13Bit-Parallelism Paper Algorithm demo on
2-D/Preprocessing 2
Max 2n1 intervals for n rules
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
14Bit-Parallelism Paper Algorithm demo on
2-D/Preprocessing 3
Sets of rules formed corresponding to each region
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
15Bit-Parallelism Paper Algorithm demo on
2-D/Online 1
- P1 (x,y) to be classified
- find intervals x and y belongs to
- binary search ? ?log(2n1)?1 comparisons/dimensio
n - Create Intersection of all sets
- conjunction of corresponding bit vectors
- Highest Priority entry in the resultant bit
vector
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
16Bit-Parallelism Paper Algorithm demo on
2-D/Online 2
- Max Set Cardinality O(n)
- Intersection step examines all rules at least
ones ? Time complexity O(n) - With bit-level parallelism
- The bitmaps representing sets stored in a
(2n1)n array Bji,1..n (Ri,j set stored for
each dimension) - ?kn/w? memory accesses
- Different processing elements for each dimension
in hardware implementation - Prototype
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
17Different processing elements for each dimension
in hardware implementation Prototype
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
18Bit-Parallelism Paper- Algorithm 2 Packet Class.
based on Inc. Reads
- Algorithm utilizes incremental reads to reduce
required memory - Allows time-space optimization and increases
localization for off-chip SDRAM and wide on-chip
memory implementations - Consider a specific dimension j
- Assume maximum 2n1 non-overlapping intervals
- Corresponding to intervals in an n-bit bitmap
with the positions of the 1s indicating the
filter rules that overlap this interval - Adjacent intervals corresponding bitmaps differ
in only one bit - A single bitmap and 2n pointers of size log n to
the differing bits can be used to reconstruct any
bitmap
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
19Bit-Parallelism Paper- Algorithm 2 Packet Class.
based on Inc. Reads 2
- Reduces space requirement to O(n log n) from
O(n2) - Further Generalize
- (2n1)/l bitmaps instead of 1
- ?(2n1)/2l? pointers needed
- Choose l by need
- 2n1 ? memory reduce to O(n log n)
- Memory access increase ?n/w???2n log n /w?
- Trade off decision according to on-chip/off-chip
memory ratio.
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
20Bit-Parallelism Paper- Algorithm 2 Special Case
2-D Classification
- Necessary for best-effort traffic aggregation in
Internet backbone - Determine next hop and resource allocations based
on destination and source addresses only - Longest prefix match lookups
- Restrict source prefix ranges to powers of 2 in
order to reduce space - space requirement O(n) with trie implementation
- Virtual intervals
- Map intervals of prefix lengths to both
dimensions, sorted by length - Virtual Intervals allow worst-case lookup time
of O(lslog n) where ls is the number of possible
prefix lengths - Multicast group identification requires only two
additional memory accesses
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
21Bit-Parallelism Paper- Algorithm 2 Conclusions
- Packet classification, or filtering, is a useful
primitive in connectionless networks to provide
differentiated service and policy-based routing - More recently, security and active processing
- Two multi-dimensional range matching algorithms
allow millions of packets per second to be
processed on a set of thousands of filter rules - Robust and predictable worst-case performance
- Efficient 2-D algorithm for backbone routers with
hundreds of thousands of routing entries - Algorithms demonstrate that there may be no need
to restrict filtering to edge routers
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
22Paper4 Layer Four Switching
- Traditional router performs looking-up based on
destination address - Layer four switching provides increased
flexibility it gives a router the capability to
distinguish and deal with traffics differently - Block traffic from dangerous site
- Provide QoS service for certain traffics
- Give preferential treatment to certain traffic
(say, database flow). - Difficulties need layer four header information,
which may not always available - any modification of layer four header may cause
problems - Do not how to get header info when encrypted
- Some variants of L4S
- Firewall
- Reservation protocols such as RSVP
- Routing based on traffic type, say web traffic
23Paper4The Best Matching Filter Problem
- A packet P has k distinct header fields for
lookup H1, , Hk - The filter database of a Layer 4 Router consists
of a finite set of filters F1, F2, , FN, each
filter Fi has an associated directive acti - Match each field of P matches the corresponding
field of F - Cost used to determine an unambiguous match (say
order of filters) - An address range can always be transferred into a
sequence of prefixes so we can use prefix match
A filter database
Dest
Src
DP
SP
SP
M M M M T1 Net
S T0 Net
25 53 53 23 123
123
UDP UDP TCP-ACK
A packet example
(M, S, UDP, 53, 125)
24Paper4Set Pruning Trees (1)
- Build a trie on the destination prefixes in the
database - Each valid prefix in the destination trie points
to a trie containing some source prefixes. - A single filter may be fit into multiple
destination prefixes, thus has multiple source
trie copies. - Memory space O(N2)
- Time complexity O(N)
25Set Pruning Trees (2)
0
1
Dest-Trie
0
0
Src-Trie
0
1
0
1
0
0
1
F3
F3
F4
0
0
1
1
0
0
1
1
0
F6
E.g. Looking for (001, 001)
0
F1
F1
F7
F2
F5
F7
F2
F7
F7
26Avoid the Memory Blowup (1)
- Avoid the copying by having each destination
prefix D point to a source trie that stores the
filters whose destination field is exactly D - When searching, may need go back to the
destination trie for multiple times - Time complexity O(W2)
- Space complexity O(NW)
27Avoid the Memory Blowup (2)
0
1
Dest-Trie
0
0
1
0
1
0
1
E.g. Looking for (001, 001)
F3
F4
1
0
1
F6
0
Src-Trie
F1
F5
F2
F7
Memory requirementO(NW) Lookup Worst Case O(W2)
28Improving Search Time Basic Grid-of-Tries (1)
- Basic idea
- Use pre-computation and switch pointers (in the
lower lever tries) to speed up search in a later
source trie base on the search in an earlier
source trie. (Remember the previous searching
result) - Role of switch pointer
- Allow us to increase the length of the matching
source prefix, without having to restart at the
root of the next ancestor source trie. - Stored Filter node (D,S) stores the least cost
filter whose dest field is a prefix of D and src
field is a prefix of S - Time complexity 2W
- Space complexity O(NW)
29Improving Search Time Basic Grid-of-Tries (2)
0
1
Dest-Trie
0
0
0
1
0
1
0
1
0
E.g. Looking for (001, 001)
x
F3
F4
0
0
1
0
1
F6
0
Src-Trie
y
F1
F5
F2
F7
30Further Improvement Extension
- Use some faster scheme for destination address
matching - Time complexity O(W) ? O(log W)
- Use multi-bit tries for source address matching
- Time complexity O(W) ? O(W/k)
- Extend Grid-of-tries to handle protocol and port
fields - 3 GOT copies for TCP, UDP and OTHER respectively,
- 4 hash tables for 4 port combinations
- both unspecified, destination only, source only,
both specified
31Cross-Producting (1)
- How-to
- Slice filter database into column, the i-th
column storing all distinct prefixes in field i. - Make a cross-product table of all k columns
- Pre-compute the least cost filter that matches
each cross-product entry - When packet comes in, do best prefix matching for
each field respectively - With matching results, find out the corresponding
entry in the cross-product table - Discussion
- Very fast (for matching)
- Problem memory explosion Nk
- Solution On Demand Cross-Producting
32Cross-Producting (2)
Dest
Src
DP
SP
SP
Dest Prefix
Src Prefix
DestPort Prefix
SrcPort Prefix
Flags Prefixes
M M M M T1 Net
S T0 Net
25 53 53 23 123
123
UDP UDP TCP-ACK
123 Default
M T1 Net Default
S T0 Net Default
25 53 23 123 Default
UDP TCP-ACK Default
Num
CrossProduct
Matching Filter
1 2 3 4 5 6 479 480
F1 F1 F1 F1 F1 F1 F8 F8
M, S, 25, 123, UDP M, S, 25, 123, TCP-ACK M, S,
25, 123, default M, S, 25, default, UDP M, S, 25,
default, TCP-ACK M, S, 25, default, default
default,default,default,default,TCP-ACK default,
default,default,default,default
E.g. Looking for (M,S,UDP,25,57)
33Conclusions
- GOT solution scalable (linear) storage fast
lookups for D-S filters. - More general filters ? high lookup cost
- Cross-Producting solution, higher variance, but
faster on average (for lookup) because of cashing
need. - Hybrid scheme combines flexibility with
efficiency.
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
34ABV "Scalable Packet Classification F.
Baboescu, G. Varghese,
- GOAL
- Packet classification
- scalable (in rules, upto 100,000)
- wire speed
- Past Work
- Linear time search
- Linear amount of TCAMS
- Lucent scheme
- worst case doesn't scale
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
35SOLUTION
- Aggregated Bit Vector
- improvement on Lucent bit vector
- rule aggregation
- rule rearrangement
- Rule Aggregation
- bit vectors are sparse
- i.e., few rules match
- Some compression scheme
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
36SOLUTION continued
- Rule Rearrangement
- overlap is rare
- place rules w/ common values together
- sort out rule ordering later
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
37Comparing ABV w/ BV of Lucent
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
38Results
- At least an order magnitude faster than BV
- Scales well for memory access
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
39Paper 3Space Decomposition Techniques for
Fast Layer-4 Switching" M. Buddhikot, S. Suri,
M. Waldvogel
- new scheme, based on space decomposition, whose
search time is comparable to the best existing
schemes, but which also offers fast worst-case
filter update time. - three key ideas
- innovative data-structure based on quadtrees for
a hierarchical representation of the recursively
decomposed search space - fractional cascading and precomputation to
improve packet classification time - prefix partitioning to improve update time
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02
40Space Decomposition Evaluation
- Depending on the actual requirements of the
system this algorithm is deployed in, a single
parameter ? can be used to tradeoff search time
for update time. - Amenable to fast software and hardware
implementation. - For N two-dimensional filters specified using
prefixes of up to W bits in length, Area-based
Quadtrees (AQT) data structure requires O(N)
space, O(?W) search time, and O(?(N)1/?) - Both the average and worst-case search times and
memory consumption are comparable or better than
other schemes known in the literature.
Packet Classification 3 CSE 581 Internet
Technology (Winter 2002) Ozgur Ozturk 02/11/02