Perfect Hashing for Network Applications - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Perfect Hashing for Network Applications

Description:

Perfect Hashing for Network Applications. Yi Lu, Balaji Prabhakar, Flavio Bonomi ... Kirsch and Mitzenmacher. ( 05) Minimize off-chip access and space ... – PowerPoint PPT presentation

Number of Views:217

Avg rating:3.0/5.0

Slides: 30

Provided by: luyi3

Category:

more less

Transcript and Presenter's Notes

Title: Perfect Hashing for Network Applications

1
Perfect Hashing for Network Applications
Yi Lu, Balaji Prabhakar, Flavio Bonomi
Stanford University Cisco Systems
2
Organization of Talk

Motivation
Limitations of existing algorithms
The Algorithm
Encoding size
Construction time guarantees
Simulation results

3
Organization of Talk

Motivation
Limitations of existing algorithms
The Algorithm
Encoding size
Construction time guarantees
Simulation results

4
Hash Tables

Review hash
Universe U, subset S µ U, h U ? finite range
integer set
good h, spread U evenly, eg. a uniform random
number

Hash tables
Fundamental data structure in many network
applications
Route lookup
Packet Classification
Per-flow state maintenance
Network monitoring

5
Fast AND Big

Requirement
Line speed is fast
eg. 10 Gbs line, 50ns per packet
Hash tables are big
eg. 100,000 entries, each 32 bytes, for a
lookup table
Available Technology
On-chip memory fast but small
eg. 7 ns, 5 Mbits
Off-chip memory big but slow
eg. 40 ns, 500 Mbits

required
On-chip
Off-chip
6
Minimize off-chip access and space

Sol Hash table resides off-chip. Use on-chip
processing to reduce off-chip access.
Song et. al. (05)
Kirsch and Mitzenmacher. (05)

Why not build a minimal perfect hash table from
the beginning?

7
Definition

Perfect Hash Function
A function h mapping U into the integers is said
to be perfect for S if, when restricted to S, it
is injective.
Minimal Perfect Hash Function
Let Sn. A perfect hash function h is minimal
if h(S) equals 0,...,n-1.

Minimize off-chip access (1 access always) and
space.

8
Organization of Talk

Motivation
Limitations of existing algorithms
The Algorithm
Encoding size
Construction time guarantees
Simulation results

9
Slow Construction

Mehlhorns construction (84)
Space
Construction Time
Fox et. al. (92) A faster algorithm
Space 2.5 bits per entry
Construction Time 6 hours on a NeXT station for
3.8 million entries
Our construction.
Space 8.6 bits per entry
Construction Time 7.7 seconds on a Pentium4 for
3.8 million entries

10
Needle in Haystack

Knuth Among all projections of S to the range
0,..., n-1, how many are injective? (S n)
total number of such projections nn
number of injective projections n!
(2pn)1/2nn/en
ratio (2pn)1/2 / en
Ratio gets exponentially small as n gets large.
Cleverly search for a needle in a haystack

11
Our Approach

Avoids searching for the function.
Start with a random set of hash functions.
Algorithm randomly divides the set S into
subgroups, each becoming the domain of one of the
functions
Trade space for speed

12
Organization of Talk

Motivation
Limitations of existing algorithms
The Algorithm
Encoding size
Construction time guarantees
Simulation results

13
Counting Bloom Filter (CBF)

Main component the counting Bloom filter
Keep track of how many entries hashed to each
address
The concept of unique address

CBF
14
Counting Bloom Filter (CBF)

Main component the counting Bloom filter
Keep track of how many entries hashed to each
address
The concept of unique address

15
Counting Bloom Filter (CBF)

Main component the counting Bloom filter
Keep track of how many entries hashed to each
address
The concept of unique address

16
Counting Bloom Filter (CBF)

Main component the counting Bloom filter
Keep track of how many entries hashed to each
address
The concept of unique address

CBF
17
Construction
CBF
indicator
s2, s4,s5, s1,s3,s6
s1,s3, s6

Entries without a unique address in one CBF is
carried over to the next
A different set of hash functions for each CBF

18
Construction
indicator
off-chip list

The off-chip structure is a simple list. There
are no empty slots as in a hash table.
On-chip structure -gt encoding

19
Organization of Talk

Motivation
Limitations of existing algorithms
The Algorithm
Encoding size
Construction time guarantees
Simulation results

20
Minimum Encoding Size

Theorem 1
The minimum number of bits needed to provide n
entries with one unique address each, with random
hashing, goes to en as n becomes large.
It is achievable with an infinite number of CBFs
with geometrically decreasing size, each with a
single hash function.

21
A practical tradeoff point

Minimum encoding size requires an infinite number
of CBFs in the worst case, hence an infinite
number of hash evaluations
Tradeoff between space and number of CBFs for 95
of entries. Over-provide in last CBF to
accommodate the remaining 5.

A little increase of space cuts the number of
CBFs in half (space 4n)

22
Organization of Talk

Motivation
Limitations of existing algorithms
The Algorithm
Encoding size
Construction time guarantees
Simulation results

23
Construction Time

A failure occurs when not all entries find a
unique address.
How big should the last CBF be so that the
failure probability is small?

24
Construction Time

Theorem 2
Let n be the number of entries remaining for the
last CBF, and m be the space assigned for the
section. Then the probability of failure can be
made double-exponentially small in m, and the
optimal number of hash functions in this section
is k ln2 m/n.
T time for one pass
average construction time T / (1-Pfail)

25
Organization of Talk

Motivation
Limitations of existing algorithms
The Algorithm
Encoding size
Construction time guarantees
Simulation results

26
Simulation results

Design Parameters
5 CBFs. (4 optimal, plus 1 for the last 5)
Space ratio 1.560.740.350.171.5 total size
8.6n
Number of hash functions 1, 1, 1, 1, 12
Results
Probability of failure 0.0012
Construction time 7.73 seconds for 3.8 million
entries
A typical Ethernet table with 100K entries
requires 125 milliseconds of construction time

27
Extension dynamic perfect hashing

Non-minimal perfect hash function
Accommodates incremental insertion and deletion
Space remains O(n). Design 20 bits per entry.
Constant-time insertion / deletion / lookup

28
Summary

A new approach to minimal perfect hashing via
counting Bloom filters.
Avoid searching by generating random subgroups
for pre-determined hash functions, and as a
result, speed up construction
The resulting construction is hardware-friendly
and fits the need of high-speed network
applications well

29
Backup 1 Lookup
indicator
counter
off-chip list