Title: Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation
1Stable Distributions, Pseudorandom Generators,
Embeddings and Data Stream Computation
Paper by Piotr Indyk
Presentation by Andy Worms and Arik Chikvashvili
2Abstract
- Stable Distributions
- Stream computation by combining the use of
stable distributions - Pseudorandom Generators (PRG) and their use to
reduce memory usage
3Introduction
Input n-dimensional point p with L1 norm
given. Storage in a sketch C(p) of O(logn/??)
words Property given C(p) and C(q) will estimate
p-q1 for points p and q up to factor (1?)
w.h.p.
4Introduction.Stream computation
Given a stream S of data, each chunk of data is
of the form (i,a) where i?n0n-1 and
a?-MM. We want to approximate the quantity
L1(S), where
5Other results
- N. Alon, Y. Matias and M. Szegedy. Space O(1/??),
w.h.p. for each i in the stream at most two pairs
(i,a), approximating L2(S) 1996 - J. Feigenbaum, S. Kannan, M. Strauss and M.
Viswanathan showed in L1(S) also for at most two
pairs 1999
6Stable Distributions
Distribution D over R is p-stable if exists p0
s. t. for any a1an real numbers and i. i. d.
variables X1 Xn with distribution D the variable
has the same distribution
as the variable
7Some News
- The good news There exist stable distributions
for (0, 2. - The bad news Most have no closed formula (i.e.
non-constructive). - Cauchy Distribution is 1-stable.
- Gauss Distribution is 2-stable.
- Source V.M. Zolotarev One-dimensional Stable
Distributions (518.2 ZOL in AAS)
8Reminder
9Tonights Program
- Obvious solution.
- Algorithm for p1.
- The algorithms limitations.
- Proof of correctness.
- Overcome the limitations.
- Time permitting p 2 and all p in (0, 2.
10The Obvious Solution
- Hold a counter for each i, and update it on each
pair found in the input stream. - Breaks the stream model O(n) memory.
11The Problem (p1)
- Input a stream S of pairs (i, a) such that 0 ? i
? n and M ? a ? M.
- Up to an error factor of 1e
- With probability 1-d
12Definitions
(c defined later)
Are independent random variables with Cauchy
distribution
A set of Buckets, Sj 0 ? j ? l, initially zero.
13The Algorithm
- For each new pair (i,a)
- Return median(S0, S1, , Sl-1)
14Limitations
- It assumes infinite precision arithmetic.
- It randomly and repeatedly accesses
- random numbers.
15Example n 7, l 3
?????? ????
(2,1)
(5,-2)
(4,-1)
16Correctness Proof
(ci 0 if there is no (i,a) in S)
17Correctness proof (cont.)
- Claim 1Each Sj has the same distribution as CX
for some random variable X with Cauchy
Distribution. - Proof follows from the 1-stability of Cauchy
Distribution.
18Correctness Proof (cont.)
- Lemma 1 If X has Cauchy Distribution, then
median(X) 1, median(aX) a. - Proof The distribution function of X is
- Since tan(?/4) 1, F(1) 1/2.
- Thus median(X)1 and median(aX)a
19(Graphics Intermezzo)
20Correctness proof (cont.)
- Fact
- For any distribution D on R with distribution
function F, take - independent samples X0, , Xl-1 of D, and let X
median(X0, , Xl-1)
21Correctness Proof (cont.)
- Fact in simple Hebrew
- You choose an error (small) and a probability
(high). - With enough samples, you will discover the median
with high probability within small error.
22Correctness Proof (cont.)
- Lemma 2
- Let F be the distribution of X, where X has
Cauchy Distribution. And let zgt0 be such that ½e
? F(z) ? ½e. - Then, if e is small enough, 1-4e ? z ? 14e
- Proof
23Correctness Proof (last)
- Therefore we have proved
- The algorithm correctly estimates L1(S) up to the
factor (1e), with probability at least (1-d).
24Correctness Proof (review)
- For those who are lost
- Each Bucket distributes like CX and
median(CX)C. - Enough samples approximate median(CX)C,
well enough. - Each bucket is a sample.
25Tonights Program
- Obvious solution.
- Algorithm for p1.
- The algorithms limitations.
- Proof of correctness.
- Overcome the limitations.
- Time permitting p 2 and all p in (0, 2.
- God Willing Uses of the algorithm.
26Bounded Precision
- The numbers of the stream are integers, the
problem is with the random variables. - We will show it is sufficient to pick them from
the set - (In Hebrew the set of fractions of small numbers)
27Bounded Precision (cont.)
- We want to generate X r.v. with Cauchy
Distribution. - We choose Y uniformly from 0,1).
- X F-1(Y) tan(?Y/2).
- Y is the multiple of 1/L closest to Y.
- X is F-1(Y) rounded to a multiple of 1/L.
28Cauchys Lottery
29Bounded Precision (cont.)
- Assume Y lt 1-K/L 1-?.
- The derivative of F-1 near Y lt 1-? is O(1/?2).
- It follows that XXE, where EO(1/?2L)?.
30Bounded Precision (cont.)
(result from previous slide)
(up to ? ?, from the algorithms proof)
31Memory Usage Reduction
- The naïve implementation uses O(n) memory words
to store the random matrix. - Couldnt we generate the random matrix on the
fly? - Yes, with a PRG.
- We also toss less coins.
32Not just for fun.
- From the Python programming language
(www.python.org).
source http//www.python.org/doc/2.3.3/lib/module
-random.html
33Review Probabilistic Algorithms
- Allow algorithm A to
- Use random bits.
- Make errors.
- Answers correctly with high probability.
- for every x, PrrA(x,r)P(x)gt1- e.
- (for very small e, say 10-1000).
34Exponential time Derandomization
- After 20 years of research we only have the
following trivial theorem. - Theorem Probabilistic Poly-time algorithms can
be simulated deterministically in exponential
time. (Time 2poly(n)).
35Proof
- Suppose that A uses r random bits.
- Run A using all 2r choices for random bits.
A
input
output
random bits
Take the Majority vote of outputs.
Time 2rpoly(n)
36Algorithms which use few bits
Algorithms with few random coins can be
efficiently derandomized!
A
input
output
random bits
37Derandomization paradigm
- Given a probabilistic algorithm that uses many
random bits. - Convert it into a probabilistic algorithm that
uses few random bits. - Derandomize it by using the previous Theorem.
38Pseudorandom Generators
Use a short seed of very few truly random bits
to generate a long string of pseudo-random bits.
Pseudo-randomness no efficient algorithm
can distinguish truly random bits from
pseudo-random bits.
39Pseudo-Random Generators
New probabilistic algorithm.
A
output
In our algorithm we need to storage only short
seed And not the whole set of pseudorandom bits
40Remember?
- There exist efficient random access (indexable)
random number generator.
41PRG definition
- Given FSM Q
- Given a seed which is really random
- Convert it into a k chunks of random bits each of
length b. - Formally- G 0,1m?(0,1b)k
- Let Q(x) be a state of Q after input x
- G is PRG if
- DQx?Dbk(x) - DQx?Dm(G(x))1 ?
42PRG properties
- Exists PRG G for space(S) with ?2-O(S) such
that - G expands O(SlogR) bits into O(R) bits
- G requires only O(S) bits of storage in addition
to its random bits - Any length-O(S) chunk of G(x)n can be computed
using O(logR) arithmetic operations on O(S)-bit
words
43Randomness reduction
- Consider a fixed Sj and O(log M) place to hold it
- O(n) for Xi, (i,a) come by increasing order of i
- So we need O(n) chunks of randomness
- gt exists PRG that needs random seed of size
O(logMlog(n/d)) to expand it to n pseudorandom
variables X1Xn
44Randomness reduction
- X1Xn variables give us Sj gt L1(S)
- But Sj does not depend on order of i-s, for each
I the same Xi will be given gt input can b
unsorted - We use lO(log(1/d))/?? random seeds
45Theorem 2
- There is algorithm which estimates L1(S) up to a
factor (1??) with probability 1-d and uses
(SlogM, Rn/d) - O(logMlog(1/d)/??) bits of random access storage
- O(log(n/d)) arithmetic operations per pair (i,a)
- O(logMlog(n/d)log(1/d)/??) random bits
46Further Results
- When p2, the algorithm and analysis are the
same, with Cauchy Distribution replaced by
Gaussian. - For general p in (0, 2 dont exist closed
formulas for densities or distribution functions. -
47General p
- Fact Can be generated p-stable random variables
from two independent variables that are
distributed uniformly over 0,1 (Chambers,
Mallows and Stuck, 1976) - Seems that Lemma2 and the algorithm itself could
work for this case also, but no need to solve
them as there are not known applications with p
that differs from 1 and 2.
48CX
49(No Transcript)