Hash Functions for Network Applications II - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Hash Functions for Network Applications II

Description:

Math (I) n: number of messages. m: number of bloom bits ... Math (II) ... Bloom Filter: Tricks. Union (combining two BFs) The same m and the same hash functions ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 22
Provided by: yaxu9
Category:

less

Transcript and Presenter's Notes

Title: Hash Functions for Network Applications II


1
Hash Functions for Network Applications (II)
  • Yaxuan Qi
  • NSLab, RIIT
  • Tsinghua University

2
Outline
  • Concept and Theory (12)
  • Hash functions
  • Bloom Filters
  • Applications (34)

3
Basic Idea
4
Technique
5
False Positive
n number of messages m number of bloom bits k
number of hash functions
p(y?fp ) p(y???X)p(y???k?bits??1)
p(y???k?bits??1) ???y??????k?bits,
??set(?X??)??? ????1???bit?set(?X??)???
6
Math (I)
n number of messages m number of bloom bits k
number of hash functions
Two potential assumptions m big enough kn/m
constant
7
In practice
n number of messages m number of bloom bits k
number of hash functions
If the number of 0 bits in the array is
substantially less than expected, then the
probability of a false positive will be higher
than the quantity f that we computed.
8
Optimal Number of Hash Functions
  • Given m and n
  • minimizes f as a function of k
  • Two competing forces
  • k ??
  • (from view of search) more chances to find a 0
    bit for an element that is not a match
  • k ??
  • (from view of construction) increases the
    fraction of 0 bits in the array

9
Math (II)
In practice, k must be an integer, and a smaller,
suboptimal k might be preferred since this
reduces the number of hash functions that have to
be computed.
10
Optimization Summary
  • Assumption
  • We have good hash functions, look random.
  • Given m bits for filter and n elements, choose
    number k of hash functions to minimize false
    positives
  • Let
  • Let
  • As k increases
  • more chances to find a 0
  • but more 1s in the array.
  • Conclusion

11
Partial Bloom Filters
  • The total number of bits is still m, but the bits
    are divided equally among the k hash functions.
  • Each hash function has a range of m/k consecutive
    bit, make parallelization of array accesses.
  • Though the probability of a false positive is
    actually always at least as large with this
    division, the difference is small...

12
Counting Bloom Filters Idea
13
Counting Bloom filters Implementation
4 bits is enough...
14
Compressed Bloom Filters Problem
15
Compressed Bloom Filters Motivation
  • Insight Bloom filter is not just a data
    structure, it is also a message.
  • If the Bloom filter is a message, worthwhile to
    compress it
  • Further reduce traffic of URL exchanging
  • Compressing bit vectors is easy.
  • Arithmetic coding gets close to entropy.
  • Can Bloom filters be compressed?
  • Bloom filter looks like a random string

16
Compression Technique
17
Compression Results
  • At k m (ln 2) /n, false positives are maximized
    with a compressed Bloom filter.
  • Best case without compression is worst case with
    compression
  • compression always helps.
  • Side benefit
  • Use fewer hash functions with compression
  • possible speedup (depend on the bottleneck
    memory or link).

18
Bloom Filter vs. Perfect Hash
  • If the set X of n elements is fixed, one can find
    a perfect hash function for X
  • plus a fully uniform random hash function
  • Then build a table with n entries of j bits each
  • Mapping each X to n j-bit index, thus the false
    positive is exactly (1/2)j .
  • matches the lower bound of bloom filter
  • HOWEVER
  • any change in the set X would require an
    expensive recomputation of a perfect hash
    function.

19
Bloom Filter Tricks
  • Union (combining two BFs)
  • The same m and the same hash functions
  • Just OR the two bit vectors of the original Bloom
    filters
  • Shrinking (halve a big BF)
  • just OR the first and second halves together
  • the highest order bit can be masked
  • Intersection (estimation)

20
Applications
21
Questions?
Write a Comment
User Comments (0)
About PowerShow.com