Optimal Space Lower Bounds for all Frequency Moments - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Optimal Space Lower Bounds for all Frequency Moments

Description:

Optimal Space Lower Bounds for all Frequency Moments. David ... (big-Oh notation suppresses polylog(1/ , m, q) factors) Ideas: Hashing: O(1)-wise independence ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 30
Provided by: DavidWo48
Category:

less

Transcript and Presenter's Notes

Title: Optimal Space Lower Bounds for all Frequency Moments


1
Optimal Space Lower Bounds for all Frequency
Moments
  • David Woodruff

Based on SODA 04 paper
2
The Streaming Model AMS96
  • Stream of elements a1, , aq each in 1, , m
  • Want to compute statistics on stream
  • Elements arranged in adversarial order
  • Algorithms given one pass over stream
  • Goal Minimum space algorithm

3
Frequency Moments
  • Notation
  • q stream size, m universe size
  • fi occurrences of item i

k-th moment
  • F0 of Distinct elements
  • F1 q
  • F2 repeat rate

Why are frequency moments important?
4
Applications
  • Estimating distinct elts. w/ low space
  • Estimate selectivity of queries to DB w/o
    expensive sort
  • Routers gather distinct destinations w/limited
    memory.
  • Estimating F2 estimates size of self-joins

,
5
The Best Determininistic Algorithm
  • Trivial algorithm for Fk
  • Store/update fi for each item i, sum fik at end
  • Space O(mlog q) m items i, log q bits to count
    fi
  • Negative Results AMS96
  • Compute Fk exactly gt ?(m) space
  • Any deterministic alg. outputs x with Fk x
    lt ? must use ?(m) space

What about randomized algorithms?
6
Randomized Approx Algs for Fk
  • Randomized alg. ?-approximates Fk if outputs x
    s.t. PrFk x lt ? Fk gt 2/3
  • Can ?-approximate F0 BJKST02, F2 AMS96, Fk
    CK04, k gt 2 in space
  • (big-Oh notation suppresses polylog(1/?, m, q)
    factors)
  • Ideas
  • Hashing O(1)-wise independence
  • Sampling

7
Example F0 BJKST02
  • Idea For random function hm -gt 0,1 and
    distinct elts b1, b2, , bF0, expect mini h(bi) ¼
    1/F0
  • Algorithm
  • Choose 2-wise indep. hash function h m -gt m3
  • Maintain t ?(1/?2) distinct smallest values
    h(bi)
  • Let v be t-th smallest value
  • Output tm3/v as estimate for F0
  • Success prob up to 1-? gt take median O(log
    1/?) copies
  • Space O((log 1/?)/?2)

8
Example F2 AMS99
  • Algorithm
  • Choose 4-wise indep. hash function hm -gt
    -1,1
  • Maintain Z ?i in m fi h(i)
  • Output Y Z2 as estimate for F2

Correctness
Chebyshevs inequality gt O(1/?2) space
9
Previous Lower Bounds
  • AMS96 8 k, ?approximating Fk gt ?(log m)
    space
  • Bar-Yossef ?-approximating F0 gt ?(1/?) space
  • IW03 ?-approximating F0 gt space if
  • Questions
  • Does the bound hold for k ? 0?
  • Does it hold for F0 for smaller ??

10
Our First Result
  • Optimal Lower Bound 8 k ? 1, any ? ?(m-.5),
    ?-approximate Fk gt ?(?-2) bits of space.
  • F1 q trivial in log q space
  • Fk trivial in O(m log q) space, so need ?
    ?(m-.5)
  • Technique Reduction from 2-party protocol for
    computing Hamming distance ?(x,y)
  • Use tools from communication complexity

11
Lower Bound Idea
Alice
Bob
y 2 0,1m
x 2 0,1m
Stream s(y)
Stream s(x)
S
Internal state of A
(1 ?) Fk algorithm A
(1 ?) Fk algorithm A
  • Compute (1 ?) Fk(s(x) s(y)) w.p. gt 2/3
  • Idea If can decide f(x,y) w.p. gt 2/3, space
    used
  • by A at least randomized 1-way comm.
    Complexity of f

12
Randomized 1-way comm. complexity
  • Boolean function f X Y ! 0,1
  • Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y)
  • Only 1 message m sent must be from Alice to Bob
  • Communication cost maxx,y Ecoins m
  • ? -error randomized 1-way communication
    complexity R?(f), is cost of optimal protocol
    computing f with probability 1-?

Ok, but how do we lower bound R?(f)?
13
Shatter Coefficients KNR
  • F f X ! 0,1 function family, f 2 F
    length-X bitstring
  • For S µ X, shatter coefficient SC(fS) of S
  • f Sf 2 F distinct bitstrings when
    F restricted to S
  • SC(F, p) maxS µ X, S p SC(fS). If SC(fS)
    2S, S shattered
  • Treat f X Y ! 0,1 as function family fX
  • fX fx(y) Y ! 0,1 x 2 X , where fx(y)
    f(x,y)
  • Theorem BJKS For every f X Y ! 0,1, every
    integer p, R1/3(f) ?(log(SC(fX, p)))

14
Warmup ?(1/?) Lower Bound Bar-Yossef
  • Alice input x 2R 0,1m, wt(x) m/2
  • Bob input y 2R 0,1m, wt(y) ?m
  • s(x), s(y) any streams w/char. vectors x, y
  • PROMISE
  • (1) wt(x Æ y) 0 OR (2) wt(x Æ y)
    ?m
  • f(x,y) 0
    f(x,y) 1
  • F0(s(x) s(y)) m/2 ?m F0(s(x)
    s(y)) m/2
  • R1/3(f) ?(1/?) Bar-Yossef (uses shatter
    coeffs)
  • (1?)m/2 lt (1 - ?)(m/2 ?m) for ? ?(?)
  • Hence, can decide f ! F0 alg. uses ?(1/?) space
  • Too easy! Can replace F0 alg. with a Sampler!

15
Our Reduction Hamming Distance Decision Problem
(HDDP)
Set t ?(1/?2)
Alice
Bob
x 2 0,1t
y 2 0,1t
Promise Problem ?
?(x,y) t/2 ?(t1/2) ?(x,y)
gt t/2 f(x,y) 0 OR
f(x,y) 1
  • Lower bound R1/3(f) via SC(fX, t), but need a
    lemma

16
Main Lemma
S µ0,1n
T
y
S-T
  • 9 S µ 0,1n with S n s.t. exist 2?(n)
    good sets T µ S s.t.
  • 9 y 2 0,1n s.t
  • 8 t 2 T, ?(y, t) n/2 cn1/2 for some c gt 0
  • 8 t 2 S T, ?(y,t) gt n/2

17
Lemma Resolves HDDP Complexity
  • Theorem R1/3(f) ?(t) ?(?-2).
  • Proof
  • Alice gets yT for random good set T applying
    main lemma with n t.
  • Bob gets random s 2 S
  • Let f yT T S ! 0,1.
  • Main Lemma gtSC(f) 2?(t)
  • BJKS gt R1/3(f) ?(t) ?(?-2)
  • Corollary ?(1/?2) space for randomized 2-party
    protocol to approximate ?(x,y) between inputs
  • First known lower bound in terms of ?!

18
Back to Frequency Moments
Use ?-approximator for Fk to solve HDDP
y 2 0,1t
s 2 S µ 0,1t
i-th universe element included exactly once in
stream ay iff yi 1 (as same)
ay
as
Fk Alg
Fk Alg
State
19
Solving HDDP with Fk
  • Alice/Bob compute ?-approx to Fk(ay as)
  • Fk(ay as) 2k wt(y Æ s) 1k ?(y,s)
  • For k ? 1,
  • Alice also transmits wt(y) in log m space.

Conclusion ?-approximating Fk(ay as) decides
HDDP, so space for Fk is ?(t) ?(?-2)
20
Back to the Main Lemma
  • Recall show 9 S µ 0,1n with S n s.t.
    2?(n) good sets T µ S s.t
  • 9 y 2 0,1n s.t
  • 1. 8 t 2 T, ?(y, t) n/2 cn1/2 for some c gt
    0
  • 2. 8 t 2 S T, ?(y,t) gt n/2
  • Probabilistic Method
  • Choose n random elts in 0,1n for S
  • Show arbitrary T µ S of size n/2 is good with
    probability gt 2-zn for constant z lt 1.
  • Expected good T is 2?(n)
  • So exists S with 2?(n) good T

21
Proving the Main Lemma
  • T t1, , tn/2 µ S arbitrary
  • Let y be majority codeword of T
  • What is probability p that both
  • 1. 8 t 2 T, ?(y, t) n/2 cn1/2 for some c gt
    0
  • 2. 8 t 2 S T, ?(y,t) gt n/2
  • Put x Pr8 t 2 T, ?(y,t) n/2 cn1/2
  • Put y Pr8 t 2 S-T, ?(y,t) gt n/2 2-n/2
  • Independence gt p xy x2-n/2

22
The Matrix Problem
  • Wlog, assume y 1n (recall y is majority word)
  • Want lower bound Pr8 t 2 T, ?(y,t) n/2
    cn1/2
  • Equivalent to matrix problem

t1 -gt t2 -gt tn/2 -gt
101001000101111001 100101011100011110 001110111101
010101 101010111011100011
For random n/2 x n binary matrix M, each column
majority 1, what is probablity each row n/2
cn1/2 1s?
23
A First Attempt
  • Set family A µ 20,1n monotone increasing if
  • S1 2 A, S1 µ S2 gt
    S2 2 A
  • For uniform distribution on S µ 0,1n, and A, B
    monotone increasing families, Kleitman
  • PrA Ã… B PrA PrB
  • First try
  • Let R be event M n/2 cn1/2 1s in each row, C
    event M majority 1 in each column
  • Pr8 t 2 T, ?(y,t) n/2 cn1/2 PrR C
    PrR Ã… C/PrC
  • M characteristic vector of subset of .5n2 gt
    R,C monotone increasing
  • gt PrR Ã… C/PrC PrRPrC/PrC PrR lt
    2-n/2
  • But we need gt 2-zn/2 for constant z lt 1, so this
    fails

24
A Second Attempt
  • Second Try
  • R1 M n/2 cn1/2 1s in first m rows
  • R2 M n/2 cn1/2 1s in remaining n/2-m rows
  • C M majority 1 in each column
  • Pr8 t 2 T, ?(y,t) n/2 cn1/2 PrR1 Ã… R2
    C

  • PrR1 Ã… R2 Ã… C/PrC
  • R1, R2, C monotone increasing
  • gt PrR1 Ã… R2 Ã… C/PrC PrR1 Ã… CPrR2/PrC

  • PrR1 C PrR2
  • Want this at least 2-zn/2 for z lt 1
  • Pr? Xi gt n/2 cn1/2 gt ½ - c (2/pi)1/2
    Stirling
  • Independence gt PrR2 gt (½ - c(2/pi)1/2)n/2 - m
  • Remains to show PrR1 C
    large.

25
Computing PrR1 C
  • PrR1 C PrM n/2 cn1/2 1s in 1st m rows
    C
  • Show PrR1 C gt 2-zm for certain constant z lt
    1
  • Ingredients
  • Expect to get n/2 ?(n1/2) 1s in each of 1st m
    rows C
  • Use negative correlation of entries in a given
    row gt
  • show n/2 ?(n1/2) 1s in a given row w/good
    probability for small enough c
  • A simple worst-case conditioning argument on
    these 1st m rows shows they all have n/2
    cn1/2 1s

26
Completing the Proof
  • Recall what is probability p xy, where
  • 1. x Pr 8 t 2 T, ?(y, t) n/2 cn1/2
  • y Pr 8 t 2 S T, ?(y,t) gt n/2 2-n/2
  • R1 M n/2 cn1/2 1s in first m rows
  • R2 M n/2 cn1/2 1s in remaining n/2-m rows
  • C M majority 1 in each column
  • x PrR1 C PrR2 2-zm (½ - c(2/pi)1/2)n/2
    m
  • Analysis shows z small so this
    2-zn/2, z lt 1
  • Hence p xy 2-(z1)n/2
  • Hence expected good sets 2n-O(log n)p 2?(n)
  • So exists S with 2?(n) good T

27
Bipartite Graphs
  • Matrix Problem ? Bipartite Graph Counting
    Problem


  • How many bipartite graphs exist on n/2 by n
    vertices s.t. each left vertex has degree gt n/2
    cn1/2 and each right vertex degree gt n/2?

28
Our Result on of Bipartite Graphs
  • Bipartite graph count
  • Argument shows at least 2n2/2 zn/2 n such
    bipartite graphs for constant z lt 1.
  • Main lemma shows bipartite graphs on n n
    vertices w/each vertex degree gt n/2 is gt
    2n2-zn-n
  • Can replace gt with lt
  • Previous knowncount 2n2-2n
  • MW personal comm.
  • Follows easily from Kleitman inequality

29
Summary
  • Results
  • Optimal Fk Lower Bound 8 k ? 1 and any
    ? ?(m-1/2), any ?-approximator for Fk must use
    ?(?-2) bits of space.
  • Communication Lower Bound of ?(?-2) for one-way
    communication complexity of (?, ?)-approximating
    ?(x, y)
  • Bipartite Graph Count bipartite graphs on
    n n vertices w/each vertex degree gt n/2 at
    least 2n2-zn-n for constant z lt 1.
Write a Comment
User Comments (0)
About PowerShow.com