Title: Parallel and Distributed Programming
1Parallel and Distributed Programming
- Tran Vinh Cuong
- Department of Computer Science
2Bloom filter
- Conceived by Burton H. Bloom in 1970, is a
space-efficient probabilistic data structure that
is used to test whether an element is a member of
a set. - False positives are possible, but false negatives
are not. - Elements can be added to the set, but not
removed. - The more elements are added to the set, the
larger the probability of false positives.
3Bloom filter algorithm
- An empty bloom filter is a bit array of m bits,
all set to 0. - There are k different hash functions.
- Add element feed it to k hash functions, and get
k different array positions. Set these positions
to 1. - Check existence feed it to k hash functions, get
k array positions. If any of these positions is 0
-gt not in the set.
4Probability of false positive
Assume hash functions select each array position
with equal probability. The probability of a
certain bit is not set to 1 by a hash function
during insertion of an element is
The probability of that bit is not set by any of
k hash functions is
After inserting r elements, the probability that
it is still 0 is
The probability that it is 1 is therefore
5Test existence of an element that is not in the
set
The probability of all of k positions being 1,
(false positive)
This is the upper bound probability of hash
collision on the first r elements (states)
entered. And it is minimal when
m 109, r107 P(hash collision)
10-21 k89.315
6(No Transcript)
7Trade-off
- In SPIN k 2, prob. 4.10-4, average coverage
search reduces from 100 to 99