Title: Cuckoo Hashing and CAMs
1Cuckoo Hashing and CAMs
- For the past several years, I have had funding
from Cisco to research hash tables and related
data structures for approximate
measuring/monitoring on routers. - Extreme conditions
- Limited space.
- Limited of memory accesses.
- Amenable to hardware implementation.
- Hardware setting allows CAMs.
- Question what are the extreme conditions for
hashing applications at Google?
3Theme of The Talk
How can we use CAMs (content addressable
memories) to improve and make more practical
cuckoo hashing, a potentially breakthrough
hashing approach.
- CAM content addressable memory
- Fully associative lookup.
- Usually expensive, so must be kept small.
- Not usually considered in theoretical work, but
very useful in practice. - Can we bridge this gap?
- What can CAMs do for us?
5Cuckoo Hashing Pagh,Rodler
- Basic scheme each element gets two possible
locations. - To insert x, check both locations for x. If one
is empty, insert. - If both are full, x kicks out an old element y.
Then y moves to its other location. - If that location is full, y kicks out z, and so
on, until an empty slot is found.
6Cuckoo Hashing Examples
7Cuckoo Hashing Examples
8Cuckoo Hashing Examples
9Cuckoo Hashing Examples
10Cuckoo Hashing Examples
11Cuckoo Hashing Examples
12Good Properties of Cuckoo Hashing
- Worst case constant lookup time.
- Simple to build, design.
13Cuckoo Hashing Failures
- Bad case 1 inserted element runs into cycles.
- Bad case 2 inserted element has very long path
before insertion completes. - Could be on a long cycle.
- Bad cases occur with very small probability when
load is sufficiently low. - Theoretical solution re-hash everything if a
failure occurs.
14Basic Performance
- For 2 choices, load less than 50, n elements
gives failure rate of Q(1/n) maximum insert time
O(log n). - Generalizations for more than 2 choices possible.
- Place if possible if not, place by kicking out
a random choice, and so on. - Random walk multi-choice variant not fully
analyzed lots of open questions. - Good empirical performance.
- An impractical BFS variant has failure rate
Q(1/nd-1) for d choices.
15Problems to be Considered
- Reduce the failure probability.
- Re-hashing generally not an option in router
setting, and very expensive in other settings. - Reduce number of moves per insert.
- Insert times may need to be bounded by constant
in router setting. - CAMs provide help for both problems.
16Failure Probability Reduction
- Failure occurs when an element cannot be placed
in one of its choices within a certain number
(O(log n)) moves. - Standard cuckoo hashing failure rate is too
high for many applications. - Even with multiple choices per element.
- Re-hashing an expensive option, although
theoretically appealing.
17A CAM-Stash
- Use a CAM to stash away elements that would cause
failure. - Intuition if failures were independent,
probability that s elements cause failures goes
to Q(1/ns). - Failures not independent, but nearly so.
- A stash holding a constant number of elements
greatly reduces failure probability. - Implemented as a CAM in hardware, or a cache line
in hardware/software. - Lookup requires also looking at stash.
18Analysis Method
- Treat cells as vertices, elements as edges in
bipartite graph. - Count components that have excess edges to be
placed in stash. - Random graph analysis to bound excess edges.
6 vertices, 7 edges 1 edge must go into stash.
19A Simple Experiment
- 10,000 items, table of size 24,000, 2 choices per
element, 107 trials.
Stash Size Failures
0 9989861
1 10040
2 97
3 2
4 0
- Can similarly generalize known results for cuckoo
hashing with more than 2 choices, more than 1
element per bucket. - Stash of size s reduces failure exponent linearly
in s. - Intuition random graph analysis exposes
bottleneck in cuckoo hashing. Stashes relieve
the bottleneck.
- A CAM-stash greatly improves potential utility of
cuckoo hashing. - Drives failures down to ignorable levels.
- Constant-sized, so cheap.
- More details in ESA 2008 paper (Kirsch/Mitzenmache
r/Wieder). - Applies to other uses of cuckoo hashing.
- History-independent cuckoo hashing,
22Insertion Time Problems
- Lots of moves per insert in worst case.
- Average is constant.
- But maximum is W(log n) with non-trivial
(inverse-poly) probability. - Router hardware setting may need bounded number
of memory accesses per insert.
23A CAM-Queue
- Insertion is a sequence of suboperations.
- Of form Move x to position Hj(x).
- Use the CAM as a queue for pending suboperations.
- Perform suboperations from queue as available.
- Move attempt 1 lookup/write.
- A suboperation may cause another suboperation to
go on the queue. - Lookup check the hash table and the CAM-queue.
- De-amortization
- Use queue to turn worst-case performance into
average-case performance.
24Queue Policy
- Can reorder suboperations and maintain
correctness. - Key point better to give priority to new
insertions over moves. - New insertions have d choices moves effectively
have d 1. - Intuition suggests older elements may be less
likely to be successfully placed. - True in practice.
- Full priority queue may be too complex.
- Simple strategy new elements placed at front,
failed moves places at back.
25Experimental Evaluation
- Table of size 32768, 4 subtables.
- Target utilization u.
- Insert 32678u elements, then alternate
insertions/deletions to get to steady state. - Allow ops queue operations (parallel memory
operations) per insertion.
26Moves Needed per Insertion
27Probability of Success vs. Age
28(No Transcript)
29Queue Sizes
- Need CAM sized to overflow with negligible
probability. - Maximum queue size much bigger than average.
- Currently no analysis.
- Experiments suggest queues of size in small 100s
possible, with 4 suboperations per insert, in
- A CAM-queue can allow effective deamortization of
cuckoo hashing. - Insertion time constant at expense of a CAM to
hold pending suboperations. - Could other data structures use this
deamortization technique? - More details in Allerton 2008 paper
31Insertion Time Problems
- Lots of moves per insert in worst case.
- Average is constant.
- But maximum is W(log n) with non-trivial
(inverse-poly) probability. - Router hardware settings may need bounded
number of memory accesses per insert.
32Alternative Approach Power of One Move
- Limit to just one additional move per insert.
- One move likely to be possible in practice.
- Simple for hardware.
- Some analysis possible via differential
equations. - Insertions only case can be analyzed deletions
approximated. - Easier to analyze than cuckoo hashing.
- But with limited inserts, will need a CAM to hold
a non-trivial number of elements that cannot be
33Multilevel Hash Table BK90
- Use a multilevel hash table (MHT)
- Can store n elements with d log log n O(1)
levels in O(n) space with high probability - Example with d 4 hash functions
Skew more elements placed by early hash
functions (double exponential decay)
34A CAM-Stash Redux
- In practice, want d to be a constant.
- Constant number of levels implies constant
probability of an overflow per element. - But probability is very small.
- Need a stash to hold a constant fraction of the
elements. - Aim for small constant fraction, e.g. expected
0.2 of the elements overflow.
35Example Schemes
- Standard MHT with no moves.
- Conservative Place element if possible. If
not, try to move earliest element that has not
already replaced another element to make room.
Otherwise spill over. - Second chance Read all possible locations, and
for each location with an element, check it it
can be placed in the next subtable. Place new
element as early as possible, moving up to 1
element left 1 level. - Second chance 2 Second chance with 2
36Second Chance (SC) Scheme
- Standard MHT fills from top down
- elements cascade from table to table.
- We try to slow cascade at every step.
Standard MHT Insertion
37Second Chance (SC) Scheme
- Standard MHT fills from top down
- elements cascade from table to table.
- We try to slow cascade at every step.
38Second Chance (SC) Scheme
- Standard MHT fills from top down
- elements cascade from table to table.
- We try to slow cascade at every step.
39Implementing SC in Hardware
- Read xs d hash locations in parallel.
40Implementing SC in Hardware
- Read xs d hash locations in parallel.
- Hash discovered elements in parallel.
41Implementing SC in Hardware
- Read xs d hash locations in parallel.
- Hash discovered elements in parallel.
- Insert x, performing a move if necessary.
42Results of Moves Insertions Only
Space overhead, balanced Space overhead, skewed Fraction moved, skewed
No moves 2.00 1.79 0
Conservative 1.46 1.39 1.6
Second Choice 1.41 1.29 12.0
Second Choice, 2 1.14 1.06 14.9
43Performance with Deletions
44Stash Size Distribution
- Number of elements at each level is approximately
a sum of independent Poisson trials. - When mean is large, approximately normal.
- When mean is small, approximately Poisson.
- Use Poisson distribution to approximate stash
size distribution, to roughly estimate needed
stash size for a failure probability.
45Poisson Distributed Stash
- Even one move saves significant space.
- But with deletions things are more complex, more
space required. - Some schemes amenable to fluid limit,
differential equation analysis. - CAM-stash has different asymptotics in this
setting. - Linear size vs. constant-sized.
- CAMs a very powerful tool for hash-based data
structures. - Flexible uses stash, queue.
- Deal effectively with low probability events.
- Generally not considered in theoretical analysis.
- But should be!
- Scaling linear, logarithmic, constant size
CAMs? - Can help give high-performance, space-efficient
hash tables. - Cuckoo hashing constant time lookups, good
space utilization, low failure probability,
simple and flexible.
48Open Questions and Future Work
- Analyze practical multiple choice cuckoo hashing
variants for d gt 2 choices. - Analysis of CAM-queue for cuckoo hashing.
- Better methods of dealing with settings with
frequent deletions. - Your question here