Title: Packet Level Algorithms
1Packet Level Algorithms
2Goals of the Talk
- Consider algorithms/data structures for
measurement/monitoring schemes at the router
level. - Focus on packets, flows.
- Emphasis on my recent work, future plans.
- Applied theory.
- Less on experiments, more on design/analysis of
data structures for applications. - Hash-based schemes
- Bloom filters and variants.
3Vision
- Three-pronged research data.
- Low Efficient hardware implementations of
relevant algorithms and data structures. - Medium New, improved data structures and
algorithms for old and new applications. - High Distributed infrastructure supporting
monitoring and measurement schemes.
4Background / Building Blocks
- Multiple-choice hashing
- Bloom filters
5Multiple Choices d-left Hashing
- Split hash table into d equal subtables.
- To insert, choose a bucket uniformly for each
subtable. - Place item in a cell in the least loaded bucket,
breaking ties to the left.
6Properties of d-left Hashing
- Analyzable using both combinatorial methods and
differential equations. - Maximum load very small O(log log n).
- Differential equations give very, very accurate
performance estimates. - Maximum load is extremely close to average load
for small values of d.
7Example of d-left hashing
- Consider 3-left performance.
Average load 6.4
Average load 4
Load 0 1.7e-08
Load 1 5.6e-07
Load 2 1.2e-05
Load 3 2.1e-04
Load 4 3.5e-03
Load 5 5.6e-02
Load 6 4.8e-01
Load 7 4.5e-01
Load 8 6.2e-03
Load 9 4.8e-15
Load 0 2.3e-05
Load 1 6.0e-04
Load 2 1.1e-02
Load 3 1.5e-01
Load 4 6.6e-01
Load 5 1.8e-01
Load 6 2.3e-05
Load 7 5.6e-31
8Example of d-left hashing
- Consider 4-left performance with average load of
6, using differential equations. -
Alternating insertions/deletions Steady state
Insertions only
Load gt 1 1.0000
Load gt 2 1.0000
Load gt 3 1.0000
Load gt 4 0.9999
Load gt 5 0.9971
Load gt 6 0.8747
Load gt 7 0.1283
Load gt 8 1.273e-10
Load gt 9 2.460e-138
Load gt 1 1.0000
Load gt 2 0.9999
Load gt 3 0.9990
Load gt 4 0.9920
Load gt 5 0.9505
Load gt 6 0.7669
Load gt 7 0.2894
Load gt 8 0.0023
Load gt 9 1.681e-27
9Review Bloom Filters
- Given a set S x1,x2,x3,xn on a universe U,
want to answer queries of the form - Bloom filter provides an answer in
- Constant time (time to hash).
- Small amount of space.
- But with some probability of being wrong.
- Alternative to hashing with interesting tradeoffs.
10Bloom Filters
Start with an m bit array, filled with 0s.
Hash each item xj in S k times. If Hi(xj) a,
set Ba 1.
To check if y is in S, check B at Hi(y). All k
values must be 1.
Possible to have a false positive all k values
are 1, but y is not in S.
n items m cn bits
k hash functions
11False Positive Probability
- Pr(specific bit of filter is 0) is
- If r is fraction of 0 bits in the filter then
false positive probability is - Approximations valid as r is concentrated around
Er. - Martingale argument suffices.
- Find optimal at k (ln 2)m/n by calculus.
- So optimal fpp is about (0.6185)m/n
n items m cn bits
k hash functions
12Example
m/n 8
Opt k 8 ln 2 5.45...
n items m cn bits
k hash functions
13Handling Deletions
- Bloom filters can handle insertions, but not
deletions. - If deleting xi means resetting 1s to 0s, then
deleting xi will delete xj.
xi xj
B
0
1
0
0
1
0
1
0
0
1
1
1
0
1
1
0
14Counting Bloom Filters
Start with an m bit array, filled with 0s.
Hash each item xj in S k times. If Hi(xj) a,
add 1 to Ba.
To delete xj decrement the corresponding counters.
Can obtain a corresponding Bloom filter by
reducing to 0/1.
15Counting Bloom Filters Overflow
- Must choose counters large enough to avoid
overflow. - Poisson approximation suggests 4 bits/counter.
- Average load using k (ln 2)m/n counters is ln
2. - Probability a counter has load at least 16
- Failsafes possible.
- We assume 4 bits/counter for comparisons.
16Bloomier Filters
- Instead of set membership, keep an r-bit function
value for each set element. - Correct value should be given for each set
element. - Non-set elements should return NULL with high
probability. - Mutable version function values can change.
- But underlying set can not.
- First suggested in paper by Chazelle, Kilian,
Rubenfeld, Tal.
17From Low to High
- Low
- Hash Tables for Hardware
- New Bloom Filter/Counting Bloom Filter
Constructions (Hardware Friendly) - Medium
- Approximate Concurrent State Machines
- Distance-Sensitive Bloom Filters
- High
- A Distributed Hashing Infrastructure
18Low Level Better Hash Tables for Hardware
- Joint work with Adam Kirsch.
- Simple Summaries for Hashing with Choices.
- The Power of One Move Hashing Schemes for
Hardware.
19Perfect Hashing Approach
Element 1
Element 2
Element 3
Element 4
Element 5
Fingerprint(4)
Fingerprint(5)
Fingerprint(2)
Fingerprint(1)
Fingerprint(3)
20Near-Perfect Hash Functions
- Perfect hash functions are challenging.
- Require all the data up front no insertions or
deletions. - Hard to find efficiently in hardware.
- In BM96, we note that d-left hashing can give
near-perfect hash functions. - Useful even with insertions, deletions.
- Some loss in space efficiency.
21Near-Perfect Hash Functions via d-left Hashing
- Maximum load equals 1
- Requires significant space to avoid all
collisions, or some small fraction of spillovers. - Maximum load greater than 1
- Multiple buckets must be checked, and multiple
cells in a bucket must be checked. - Not perfect in space usage.
- In practice, 75 space usage is very easy.
- In theory, can do even better.
22Hash Table Design Example
- Desired goals
- At most 1 item per bucket.
- Minimize space.
- And minimize number of hash functions.
- Small amount of spillover possible.
- We model as a constant fraction, e.g. 0.2.
- Can be placed in a content-addressable memory
(CAM) if small enough.
23Basic d-left Scheme
- For hash table holding up to n elements, with max
load 1 per bucket, use 4 choices and 2n cells. - Spillover of approximately 0.002n elements into
CAM.
24Improvements from Skew
- For hash table holding up to n elements, with max
load 1 per bucket, use 4 choices and 1.8n cells. - Subtable sizes 0.79n, 0.51n, 0.32n, 0.18n.
- Spillover still approximately 0.002n elements
into CAM. - Subtable sizes optimized using differential
equations, black-box optimization.
25Summaries to Avoid Lookups
- In hardware, d choices of location can be done by
parallelization. - Look at d memory banks in parallel.
- But theres still a cost pin count.
- Can we keep track of which hash function to use
for each item, using a small summary? - Yes use a Bloom-filter like structure to track.
- Skew impacts summary performance more skew
better. - Uses small amount of on-chip memory.
- Avoids multiple look-ups.
- Special case of a Bloomier filter.
26Hash Tables with Moves
- Cuckoo Hashing (Pagh, Rodler)
- Hashed items need not stay in their initial
place. - With multiple choices, can move item to another
choice, without affecting lookups. - As long as hash values can be recomputed.
- When inserting, if all spots are filled, new item
kicks out an old item, which looks for another
spot, and might kick out another item, and so on.
27Benefits and Problems of Moves
- Benefit much better space utilization.
- Multiple choices, multiple items per bucket, can
achieve 90 with no spillover. - Drawback complexity.
- Moves required can grow like log n.
- Constant on average.
- Bounded maximum time per operation important in
many settings. - Moves expensive.
- Table usually in slow memory.
28Question Power of One Move
- How much leverage do we get by just allowing one
move? - One move likely to be possible in practice.
- Simple for hardware.
- Analysis possible via differential equations.
- Cuckoo hard to analyze.
- Downside some spillover into CAM.
29Comparison, Insertions Only
- 4 schemes
- No moves
- Conservative Place item if possible. If not,
try to move earliest item that has not already
replaced another item to make room. Otherwise
spill over. - Second chance Read all possible locations, and
for each location with an item, check it it can
be placed in the next subtable. Place new item
as early as possible, moving up to 1 item left 1
level. - Second chance, with 2 per bucket.
- Target of 0.2 spillover.
- Balanced (all subtables the same) and skewed
compared. - All done by differential equation analysis (and
simulations match).
30Results of Moves Insertions Only
Space overhead, balanced Space overhead, skewed Fraction moved, skewed
No moves 2.00 1.79 0
Conservative 1.46 1.39 1.6
Standard 1.41 1.29 12.0
Standard, 2 1.14 1.06 14.9
31Conclusions, Moves
- Even one move saves significant space.
- More aggressive schemes, considering all possible
single moves, save even more. (Harder to
analyze, more hardware resources.) - Importance of allowing small amounts of spillover
in practical settings.
32From Low to High
- Low
- Hash Tables for Hardware
- New Bloom Filter/Counting Bloom Filter
Constructions (Hardware Friendly) - Medium
- Approximate Concurrent State Machines
- Distance-Sensitive Bloom Filters
- High
- A Distributed Hashing Infrastructure
33Low- Medium New Bloom Filters / Counting
Bloom Filters
- Joint work with Flavio Bonomi, Rina Panigrahy,
Sushil Singh, George Varghese.
34A New Approach to Bloom Filters
- Folklore Bloom filter construction.
- Recall Given a set S x1,x2,x3,xn on a
universe U, want to answer membership queries. - Method Find an n-cell perfect hash function for
S. - Maps set of n elements to n cells in a 1-1
manner. - Then keep bit fingerprint of
item in each cell. Lookups have false positive lt
e. - Advantage each bit/item reduces false positives
by a factor of 1/2, vs ln 2 for a standard Bloom
filter. - Negatives
- Perfect hash functions non-trivial to find.
- Cannot handle on-line insertions.
35Near-Perfect Hash Functions
- In BM96, we note that d-left hashing can give
near-perfect hash functions. - Useful even with deletions.
- Main differences
- Multiple buckets must be checked, and multiple
cells in a bucket must be checked. - Not perfect in space usage.
- In practice, 75 space usage is very easy.
- In theory, can do even better.
36First Design Just d-left Hashing
- For a Bloom filter with n elements, use a 3-left
hash table with average load 4, 60 bits per
bucket divided into 6 fixed-size fingerprints of
10 bits. - Overflow rare, can be ignored.
- False positive rate of
- Vs. 0.000744 for a standard Bloom filter.
- Problem Too much empty, wasted space.
- Other parametrizations similarly impractical.
- Need to avoid wasting space.
37Just Hashing Picture
Empty
Empty
Bucket
0000111111
1010101000
0001110101
1011011100
38Key Dynamic Bit Reassignment
- Use 64-bit buckets 4 bit counter, 60 bits
divided equally among actual fingerprints. - Fingerprint size depends on bucket load.
- False positive rate of 0.0008937
- Vs. 0.0004587 for a standard Bloom filter.
- DBR Within a factor of 2.
- And would be better for larger buckets.
- But 64 bits is a nice bucket size for hardware.
- Can we remove the cost of the counter?
39DBR Picture
000110110101 111010100001 101010101000 10101011010
1 010101101011
Bucket
Count 4
40Semi-Sorting
- Fingerprints in bucket can be in any order.
- Semi-sorting keep sorted by first bit.
- Use counter to track fingerprints and
fingerprints starting with 0. - First bit can then be erased, implicitly given by
counter info. - Can extend to first two bits (or more) but added
complexity.
41DBR Semi-sorting Picture
000110110101 111010100001 101010101000 10101011010
1 010101101011
Bucket
Count 4,2
42DBR Semi-Sorting Results
- Using 64-bit buckets, 4 bit counter.
- Semi-sorting on loads 4 and 5.
- Counter only handles up to load 6.
- False positive rate of 0.0004477
- Vs. 0.0004587 for a standard Bloom filter.
- This is the tradeoff point.
- Using 128-bit buckets, 8 bit counter, 3-left hash
table with average load 6.4. - Semi-sorting all loads fpr of 0.00004529
- 2 bit semi-sorting for loads 6/7 fpr of
0.00002425 - Vs. 0.00006713 for a standard Bloom filter.
43Additional Issues
- Futher possible improvements
- Group buckets to form super-buckets that share
bits. - Conjecture Most further improvements are not
worth it in terms of implementation cost. - Moving items for better balance?
- Underloaded case.
- New structure maintains good performance.
44Improvements to Counting Bloom Filter
- Similar ideas can be used to develop an improved
Counting Bloom Filter structure. - Same idea use fingerprints and a d-left hash
table. - Counting Bloom Filters waste lots of space.
- Lots of bits to record counts of 0.
- Our structure beats standard CBFs easily, by
factors of 2 or more in space. - Even without dynamic bit reassignment.
45Deletion Problem
- Suppose x and y have the same fingerprint z.
x
x
Insert x
x
x
y
y
y
Insert y
y
z
Delete x?
z
z
46Deletion Problem
- When you delete, if you see the same fingerprint
at two of the location choices, you dont know
which is the right one. - Take both out false negatives.
- Take neither out false positives/eventual
overflow.
47Handling the Deletion Problem
- Want to make sure the fingerprint for an element
cannot appear in two locations. - Solution make sure it cant happen.
- Trick uses (pseudo)random permtuations instead
of hashing.
48Two Stages
- Suppose we have d subtables, each with 2b
buckets, and want f bit fingerprints. - Stage 1 Hash element x into bf bits using a
strong hash function H(x). - Stage 2 Apply d permutations taking
0 2bf-1 0 2bf-1 - Bucket Bi and fingerprint Fi for ith subtable
given by ith permtuation. - Also, Bi and Fi completely determine H(x).
49Handling the Deletion Problem
- Lemma if x and y yield the same fingerprint in
the same bucket, then H(x) H(y). - Proof because of permutation setup, fingerprint
and bucket determine H(x). - Each cell has a small counter.
- In case two elements have same hash, H(x) H(y).
- Note they would match for all buckets/fingerprints
. - 2 bit counters generally suffice.
- Deletion problem avoided.
- Cant have two fingerprints for x in the table at
the same time handled by the counter.
50A Problem for Analysis
- Permutations implies no longer pure d-left
hashing. - Dependence.
- Analysis no longer applies.
- Some justification
- Balanced Allocation on Graphs (SODA 2006,
Kenthapadi and Panigrahy.) - Differential equations.
- Justified experimentally.
51Other Practical Issues
- Simple, linear permtuations
- High order bits for bucket, low order for
fingerprint. - Not analyzed, works fine in practice.
- Invertible permutations allow moving elements if
hash table overflows. - Move element from overflow bucket to another
choice. - Powerful paradigm
- Cuckoo hashing and related schemes.
- But more expensive in implemenation terms.
52Space Comparison Theory
- Standard counting Bloom filter uses
c counters/element 4c bits/element. - The d-left CBF using r bit remainders, 4 hash
functions, 8 cells/bucket uses 4(r2)/3
bits/element. - Space equalized when c (r2)/3.
- Can change parameters to get other tradeoffs.
53Space Comparison Practice
- Everything behaves essentially according to
expectations. - Not surprising everything is a balls-and-bins
process. - Using 4-left hashing
- Save over a factor of 2 in space with 1 false
postive rate. - Save over a factor of 2.5 in space with 0.1
false positive rate.
54From Low to High
- Low
- Hash Tables for Hardware
- New Bloom Filter/Counting Bloom Filter
Constructions (Hardware Friendly) - Medium
- Approximate Concurrent State Machines
- Distance-Sensitive Bloom Filters
- High
- A Distributed Hashing Infrastructure
55Approximate Concurrent State Machines
- Joint work with Flavio Bonomi, Rina Panigrahy,
Sushil Singh, George Varghese. - Extending the Bloomier filter idea to handle
dynamic sets, dynamic function values, in
practical setting.
56Approximate ConcurrentState Machines
- Model for ACSMs
- We have underlying state machine, states 1X.
- Lots of concurrent flows.
- Want to track state per flow.
- Dynamic Need to insert new flows and delete
terminating flows. - Can allow some errors.
- Space, hardware-level simplicity are key.
57Motivation Router State Problem
- Suppose each flow has a state to be tracked.
Applications - Intrusion detection
- Quality of service
- Distinguishing P2P traffic
- Video congestion control
- Potentially, lots of others!
- Want to track state for each flow.
- But compactly routers have small space.
- Flow IDs can be 100 bits. Cant keep a big
lookup table for hundreds of thousands or
millions of flows!
58Problems to Be Dealt With
- Keeping state values with small space, small
probability of errors. - Handling deletions.
- Graceful reaction to adversarial/erroneous
behavior. - Invalid transitions.
- Non-terminating flows.
- Could fill structure if not eventually removed.
- Useful to consider data structures in
well-behaved systems and ill-behaved systems.
59ACSM Basics
- Operations
- Insert new flow, state
- Modify flow state
- Delete a flow
- Lookup flow state
- Errors
- False positive return state for non-extant flow
- False negative no state for an extant flow
- False return return wrong state for an extant
flow - Dont know return dont know
- Dont know may be better than other types of
errors for many applications, e.g., slow path vs.
fast path.
60ACSM via Counting Bloom Filters
- Dynamically track a set of current
(FlowID,FlowState) pairs using a CBF. - Consider first when system is well-behaved.
- Insertion easy.
- Lookups, deletions, modifications are easy when
current state is given. - If not, have to search over all possible states.
Slow, and can lead to dont knows for lookups,
other errors for deletions.
61Direct Bloom Filter (DBF) Example
0
0
1
0
2
3
0
0
2
1
0
1
1
2
0
0
0
0
0
0
1
3
0
0
3
1
1
1
1
2
0
0
62Timing-Based Deletion
- Motivation Try to turn non-terminating flow
problem into an advantage. - Add a 1-bit flag to each cell, and a timer.
- If a cell is not touched in a phase, 0 it out.
- Non-terminating flows eventually zeroed.
- Counters can be smaller or non-existent since
deletions occur via timing. - Timing-based deletion required for all of our
schemes.
63Timer Example
1
0
0
0
1
0
1
0
Timer bits
3
0
0
2
1
0
1
1
RESET
0
0
0
0
0
0
0
0
3
0
0
0
1
0
1
0
64Stateful Bloom Filters
- Each flow hashed to k cells, like a Bloom filter.
- Each cell stores a state.
- If two flows collide at a cell, cell takes on
dont know value. - On lookup, as long as one cell has a state value,
and there are not contradicting state values,
return state. - Deletions handled by timing mechanism (or
counters in well-behaved systems). - Similar in spirit to KM, Bloom filter summaries
for multiple choice hash tables.
65Stateful Bloom Filter (SBF) Example
1
4
3
4
3
3
0
0
2
1
0
1
4
?
0
2
1
4
5
4
5
3
0
0
2
1
0
1
4
?
0
2
66What We Need A New Design
- These Bloom filter generalizations were not doing
the job. - Poor performance experimentally.
- Maybe we need a new design for Bloom filters!
- In real life, things went the other way we
designed a new ACSM structure, and found that it
led to the new Bloom filter/counting Bloom filter
designs.
67Fingerprint Compressed Filter
- Each flow hashed to d choices in the table,
placed at the least loaded. - Fingerprint and state stored.
- Deletions handled by timing mechanism or
explicitly. - False positives/negatives can still occur
(especially in ill-behaved systems). - Lots of parameters number of hash functions,
cells per bucket, fingerprint size, etc. - Useful for flexible design.
68Fingerprint Compressed Filter (FCF) Example
69Experiment Summary
- FCF-based ACSM is the clear winner.
- Better performance than less space for the others
in test situations. - ACSM performance seems reasonable
- Sub 1 error rates with reasonable size.
70Distance-Sensitive Bloom Filters
- Instead of answering questions of the form
- we would like to answer questions of the form
- That is, is the query close to some element of
the set, under some metric and some notion of
close. - Applications
- DNA matching
- Virus/worm matching
- Databases
71Distance-Sensitive Bloom Filters
- Goal something in same spirit as Bloom filters.
- Dont exhaustively check set.
- Initial results for Hamming distance show it is
possible. KM - Closely related to locality-sensitive hashing.
- Not currently practical.
- New ideas?
72From Low to High
- Low
- Hash Tables for Hardware
- New Bloom Filter/Counting Bloom Filter
Constructions (Hardware Friendly) - Medium
- Approximate Concurrent State Machines
- Distance-Sensitive Bloom Filters
- High
- A Distributed Hashing Infrastructure
73A Distributed Router Infrastructure
- Recently funded FIND proposal.
- Looking for ideas/collaborators.
74The High-Level Pitch
- Lots of hash-based schemes being designed for
approximate measurement/monitoring tasks. - But not built into the system to begin with.
- Want a flexible router architecture that allows
- New methods to be easily added.
- Distributed cooperation using such schemes.
75What We Need
On-Chip Memory
Off-Chip Memory
CAM(s)
Memory
Hashing Computation Unit
Programming Language
Unit for Other Computation
Computation
Control System
Communication Architecture
Communication Control
76Lots of Design Questions
- How much space for various memory levels? How
can we dynamically divide memory among multiple
competing applications? - What hash functions should be included? How open
should system be to new hash functions? - What programming functionality should be
included? What programming language to use? - What communication is necessary to achieve
distributed monitoring tasks given the
architecture? - Should security be a consideration? What
security approaches are possible? - And so on
77Related Theory Work
- What hash functions should be included?
- Joint work with Salil Vadhan.
- Using theory of randomness extraction, we show
that for d-left hashing, Bloom filters, and other
hashing methods, choosing a hash function from a
pairwise independent family is enough if data
has sufficient entropy. - Behavior matches truly random hash function with
high probability. - Radnomness of hash function and data combine.
- Pairwise independence enough for many
applications.
78Conclusions and Future Work
- Low Mapping current hashing techniques to
hardware is fruitful for practice. - Medium Big boom in hashing-based algorithms/data
structures. Trend is likely to continue. - Approximate concurrent state machines Natural
progression from set membership to functions
(Bloomier filter) to state machines. What is
next? - Power of d-left hashing variants for near-perfect
matchings. - High Wide open. Need to systematize our
knowledge for next generation systems. - Measurement and monitoring infrastructure built
into the system.