Perfect Hashing for Network Applications - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Perfect Hashing for Network Applications

Description:

Perfect Hashing for Network Applications. Yi Lu, Balaji Prabhakar, Flavio Bonomi ... Kirsch and Mitzenmacher. ( 05) Minimize off-chip access and space ... – PowerPoint PPT presentation

Number of Views:217
Avg rating:3.0/5.0
Slides: 30
Provided by: luyi3
Category:

less

Transcript and Presenter's Notes

Title: Perfect Hashing for Network Applications


1
Perfect Hashing for Network Applications
Yi Lu, Balaji Prabhakar, Flavio Bonomi
Stanford University Cisco Systems
2
Organization of Talk
  • Motivation
  • Limitations of existing algorithms
  • The Algorithm
  • Encoding size
  • Construction time guarantees
  • Simulation results

3
Organization of Talk
  • Motivation
  • Limitations of existing algorithms
  • The Algorithm
  • Encoding size
  • Construction time guarantees
  • Simulation results

4
Hash Tables
  • Review hash
  • Universe U, subset S µ U, h U ? finite range
    integer set
  • good h, spread U evenly, eg. a uniform random
    number

h
  • Hash tables
  • Fundamental data structure in many network
    applications
  • Route lookup
  • Packet Classification
  • Per-flow state maintenance
  • Network monitoring

5
Fast AND Big
  • Requirement
  • Line speed is fast
  • eg. 10 Gbs line, 50ns per packet
  • Hash tables are big
  • eg. 100,000 entries, each 32 bytes, for a
    lookup table
  • Available Technology
  • On-chip memory fast but small
  • eg. 7 ns, 5 Mbits
  • Off-chip memory big but slow
  • eg. 40 ns, 500 Mbits

required
On-chip
Off-chip
6
Minimize off-chip access and space
  • Sol Hash table resides off-chip. Use on-chip
    processing to reduce off-chip access.
  • Song et. al. (05)
  • Kirsch and Mitzenmacher. (05)
  • Why not build a minimal perfect hash table from
    the beginning?

7
Definition
  • Perfect Hash Function
  • A function h mapping U into the integers is said
    to be perfect for S if, when restricted to S, it
    is injective.
  • Minimal Perfect Hash Function
  • Let Sn. A perfect hash function h is minimal
    if h(S) equals 0,...,n-1.
  • Minimize off-chip access (1 access always) and
    space.

8
Organization of Talk
  • Motivation
  • Limitations of existing algorithms
  • The Algorithm
  • Encoding size
  • Construction time guarantees
  • Simulation results

9
Slow Construction
  • Mehlhorns construction (84)
  • Space
  • Construction Time
  • Fox et. al. (92) A faster algorithm
  • Space 2.5 bits per entry
  • Construction Time 6 hours on a NeXT station for
    3.8 million entries
  • Our construction.
  • Space 8.6 bits per entry
  • Construction Time 7.7 seconds on a Pentium4 for
    3.8 million entries

10
Needle in Haystack
  • Knuth Among all projections of S to the range
    0,..., n-1, how many are injective? (S n)
  • total number of such projections nn
  • number of injective projections n!
    (2pn)1/2nn/en
  • ratio (2pn)1/2 / en
  • Ratio gets exponentially small as n gets large.
  • Cleverly search for a needle in a haystack

11
Our Approach
  • Avoids searching for the function.
  • Start with a random set of hash functions.
  • Algorithm randomly divides the set S into
    subgroups, each becoming the domain of one of the
    functions
  • Trade space for speed

12
Organization of Talk
  • Motivation
  • Limitations of existing algorithms
  • The Algorithm
  • Encoding size
  • Construction time guarantees
  • Simulation results

13
Counting Bloom Filter (CBF)
  • Main component the counting Bloom filter
  • Keep track of how many entries hashed to each
    address
  • The concept of unique address

CBF
14
Counting Bloom Filter (CBF)
  • Main component the counting Bloom filter
  • Keep track of how many entries hashed to each
    address
  • The concept of unique address

15
Counting Bloom Filter (CBF)
  • Main component the counting Bloom filter
  • Keep track of how many entries hashed to each
    address
  • The concept of unique address

16
Counting Bloom Filter (CBF)
  • Main component the counting Bloom filter
  • Keep track of how many entries hashed to each
    address
  • The concept of unique address

CBF
17
Construction
CBF
indicator
s2, s4,s5, s1,s3,s6
s1,s3, s6
  • Entries without a unique address in one CBF is
    carried over to the next
  • A different set of hash functions for each CBF

18
Construction
indicator
off-chip list
  • The off-chip structure is a simple list. There
    are no empty slots as in a hash table.
  • On-chip structure -gt encoding

19
Organization of Talk
  • Motivation
  • Limitations of existing algorithms
  • The Algorithm
  • Encoding size
  • Construction time guarantees
  • Simulation results

20
Minimum Encoding Size
  • Theorem 1
  • The minimum number of bits needed to provide n
    entries with one unique address each, with random
    hashing, goes to en as n becomes large.
  • It is achievable with an infinite number of CBFs
    with geometrically decreasing size, each with a
    single hash function.

21
A practical tradeoff point
  • Minimum encoding size requires an infinite number
    of CBFs in the worst case, hence an infinite
    number of hash evaluations
  • Tradeoff between space and number of CBFs for 95
    of entries. Over-provide in last CBF to
    accommodate the remaining 5.
  • A little increase of space cuts the number of
    CBFs in half (space 4n)

22
Organization of Talk
  • Motivation
  • Limitations of existing algorithms
  • The Algorithm
  • Encoding size
  • Construction time guarantees
  • Simulation results

23
Construction Time
  • A failure occurs when not all entries find a
    unique address.
  • How big should the last CBF be so that the
    failure probability is small?

24
Construction Time
  • Theorem 2
  • Let n be the number of entries remaining for the
    last CBF, and m be the space assigned for the
    section. Then the probability of failure can be
    made double-exponentially small in m, and the
    optimal number of hash functions in this section
    is k ln2 m/n.
  • T time for one pass
  • average construction time T / (1-Pfail)

25
Organization of Talk
  • Motivation
  • Limitations of existing algorithms
  • The Algorithm
  • Encoding size
  • Construction time guarantees
  • Simulation results

26
Simulation results
  • Design Parameters
  • 5 CBFs. (4 optimal, plus 1 for the last 5)
  • Space ratio 1.560.740.350.171.5 total size
    8.6n
  • Number of hash functions 1, 1, 1, 1, 12
  • Results
  • Probability of failure 0.0012
  • Construction time 7.73 seconds for 3.8 million
    entries
  • A typical Ethernet table with 100K entries
    requires 125 milliseconds of construction time

27
Extension dynamic perfect hashing
  • Non-minimal perfect hash function
  • Accommodates incremental insertion and deletion
  • Space remains O(n). Design 20 bits per entry.
  • Constant-time insertion / deletion / lookup

28
Summary
  • A new approach to minimal perfect hashing via
    counting Bloom filters.
  • Avoid searching by generating random subgroups
    for pre-determined hash functions, and as a
    result, speed up construction
  • The resulting construction is hardware-friendly
    and fits the need of high-speed network
    applications well

29
Backup 1 Lookup
indicator
counter
off-chip list
  • The off-chip structure is a simple list. There
    are no empty slots as in a hash table.
  • On-chip structure -gt encoding
Write a Comment
User Comments (0)
About PowerShow.com