Tight Lower Bounds for the Distinct Elements Problem - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Tight Lower Bounds for the Distinct Elements Problem

Description:

Rand. Approx. Algorithms for F0. O(log log m/ 2 log m log 1/ ) alg. outputs x with ... by A at least f's rand. 1-way comm. complexity. S. Alice. Bob ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 25
Provided by: davidwo1
Learn more at: http://web.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Tight Lower Bounds for the Distinct Elements Problem


1
Tight Lower Bounds for the Distinct Elements
Problem
  • David Woodruff
  • MIT
  • dpwood_at_mit.edu
  • Joint work with Piotr Indyk

2
The Problem
  • Stream of elements a1, , an each in 1, , m
  • Want F0 of distinct elements
  • Elements in adversarial order
  • Algorithms given one pass over stream
  • Goal Minimum-space algorithm

3
A Trivial Algorithm

0
1
1
3
7
3
4
00000000
10011011
  • Keep m-bit characteristic vector v of stream
  • j in stream vj 1
  • F0 wt(10011011) 5
  • Space m

Can we do better?
4
Negative Results
  • Any algorithm computing F0 exactly must use ?(m)
    space AMS96
  • Any deterministic alg. that outputs x with F0
    x lt ?F0 must use ?(m) space AMS96
  • What about randomized approximation algorithms?

5
Rand. Approx. Algorithms for F0
  • O(log log m/?2 log m log 1/?) alg. outputs x
    with
  • Pr F0 x lt ?F0 gt ¾ BJKST02
  • Lots of hashing tricks
  • Is this optimal?
  • Previous lower bounds
  • ?(log m) AMS96
  • ?(1/?) Bar-Yossef
  • Open Problem of BJKST02 GAP 1/? ltlt 1/?2

6
Idea Behind Lower Bounds
Alice
Bob
y 2 0,1m
x 2 0,1m
Stream s(y)
Stream s(x)
S
Internal state of A
(1 ?) F0 algorithm A
(1 ?) F0 algorithm A
  • Compute (1 ?) F0(s(x) s(y)) w.p. gt ¾
  • Idea If can decide f(x,y) w.p. gt ¾, space used
  • by A at least fs rand. 1-way comm. complexity

7
Randomized 1-way comm. complexity
  • Boolean function f X Y ! 0,1
  • Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y)
  • Only 1 message sent must be from Alice to Bob
  • Comm. cost of protocol expected length of
    longest message sent over all inputs.
  • ? -error randomized 1-way comm. complexity of f,
    R?(f), is comm. cost of optimal protocol
    computing f w.p. 1-?
  • How do we lower bound R?(f)?

8
The VC Dimension KNR
  • F f X ! 0,1 family of Boolean functions
  • f 2 F is length-X bit string
  • For S µ X, shatter coefficient SC(fS) of S is f
    Sf 2 F distinct bit strings when F
    restricted to S
  • SC(F, p) maxS 2 X, S p SC(fS)
  • If SC(fS) 2S, S shattered by F
  • VC Dimension of F, VCD(F), size of largest S
    shattered by F

9
Shatter Coefficient Theorem
  • Notation For f X Y ! 0,1, define
  • fX fx(y) Y ! 0,1 x 2 X ,
  • where fx(y) f(x,y)
  • Theorem BJKS For every f X Y
    ! 0,1, every p VCD( fX ),
  • R1/4(f) ?(log(SC(fX, p)))

10
The ?(1/?) Lower Bound Bar-Yossef
  • Alice has x 2R 0,1m, wt(x) m/2
  • Bob has y 2R 0,1m, wt(y) ?m and
  • Either wt(x Æ y) 0 OR wt(x Æ y) ?m
  • f(x,y) 0
    f(x,y) 1
  • R1/4(f) ?(VCD(fX)) ?(1/?) Bar-Yossef
  • s(x), s(y) any streams w/char. vectors x, y
  • f(x,y) 1 ! F0(s(x) s(y)) m/2
  • f(x,y) 0 ! F0(s(x) s(y)) m/2 ?m
  • (1?)m/2 lt (1 - ?)(m/2 ?m) for ? ?(?)
  • Hence, can decide f ! F0 alg. uses ?(1/?) space

11
Our Results
  • Remainder of talk ? (1/?2) lower bound for ?
    ?(m-1/(9k)) for any k gt 0.
  • ! O(log log m/?2 log m log 1/?) upper bound
    almost optimal
  • IDEA Reduce from protocol for computing dot
    product

12
The Promise Problem
  • t ?(1/?2), Y basis of unit vectors of Rt

Alice
Bob
x 2 0,1t x 1
y 2 Y
Promise Problem ? hx,yi
0 hx,yi
2/t1/2 f(x,y) 0 OR
f(x,y) 1
  • X x 2 0,1t, x 1 and 9 y 2 Y s.t.
    (x,y) 2 ?
  • We lower bound R1/4(f) via SC(fX, t)

13
Bounding SC(fX, t)
  • Theorem SC(fX, t/4) 2?(t)
  • Proof
  • 8 T ½ Y s.t. T t/4, put xT (2/t1/2) ?e
    2 T e
  • Define X1 ½ X as X1 xT T ½ Y, T t/4
  • Claim 8 s 2 0,1t w/ wt(x) t/4, s 2 truth
    tab. of fX1
  • Proof
  • Let s 2 0,1t with 1s in positions i1, , it/4
  • Put T ei1, , eit/4. 8 e 2 T, he, xTi
    2/t1/2 2?
  • 8 e 2 Y - T, h e, xT i 0
  • There are 2?(t) such s.

14
Bounding R1/4(f)
  • Corollary
  • Reduction we need protocol computing f with
    communication space used by any (1 ?) F0
    approx. alg.

15
Reduction
  • Recall
  • hx,yi 0 if f(x,y) 0
  • hx,yi 2/t1/2 if f(x,y) 1
  • Goal Reduce separation of hx,yi to separation
    of F0(s(x) s(y)) for streams s(x),s(y)
    Alice/Bob can derive from x,y
  • Use relation y-x2 y2 x2 2hx,
    yi
  • f(x,y) 0 ! y-x 21/2
  • f(x,y) 1 ! y-x lt 21/2 (1- 1/t1/2) 21/2 (1
    - ?(?))

16
Overview of Reduction
x 2 0,1t x 1
y 2 E
  • Low-distortion embedding
  • ? l2t ! l1poly(t)

?(y)
?(x)
2. Rational Approximation
3. Scale rationals to integers s
4. Convert integer coords to unary to get 0,1
vectors x,y
y
x
s(x)
s(y)
F0 Alg
F0 Alg
State
F0(s(x) s(y)) can decide f(x,y) w.p. 3/4
F0(s(x) s(y))
17
Embedding l2t into l1poly(t)
  • A (1?)-distortion embedding ? l2t ! l1d is
    mapping s.t. 8 p,q 2 l2t,
  • Theorem FLM77 8 ? 9 a (1 ?)-distortion
    embedding ? l2t ! l1d with

18
Embedding l2t into l1d
x 2 0,1t x 1
y 2 E
Low-distortion embedding ? l2t ! l1d
?(y)
?(x)
  • Using Theorem FLM77, Alice/Bob get ?(x), ?(y)
    2 Rd with d O(t (log 1/?) / ?2)
  • ? specified later

19
Rational Approximation
  • z z(t) N ! N assume z d
  • Approximate each coord. of output of embedding by
    integer multiple of 1/z

20
Scaling
  • Alice (resp. Bob) multiplies each coord. of
    (resp. ) by z
  • Obtains s( ) (resp. s( )
  • Claim coords. are integers in range -2z, 2z
  • Proof
  • ?() d/z 2
  • s( ) z

21
Converting to Unary
  • For i1 to d
  • j à s( )i
  • Replace s( )i with 12zj02z-j
  • Bob does same for s( )
  • x, y denote new length 4dz bitstrings
  • wt(x) s( ), wt(y) s( )
  • ?(x,y) s( ) s( )

22
Reducing ?(x,y) to F0
  • Alice (Bob) chooses stream ax (ay) with char.
    vector x (y).
  • Lemma If ?1 lt wt(x), wt(y) lt ?2, then
  • ?1 ?(x,y)/2 lt F0(ax ay) lt ?2 ?(x,y)/2
  • Follows from fact F0(ax ay) wt(x Ç y)

23
Reducing ?(x,y) to F0
  • Use lemma to show
  • Set ? ?(?), z ?(1/?5 log 1/?) so that two
    cases distinguished by (1 ?(?)) F0 alg

24
Conclusions
  • ax, ay must be in universe of size
    4zd ?(log (1/?)/?9)
  • Reduction only valid if 4zd m
  • ? (1/?2) bound for ? ?(m-1/(9k)) 8 k gt 0.
  • Recently lower bound improved to
  • ?(1/?2) for ? m-1/2, which is optimal
  • Find set of vectors directly in Hamming space
    via involved prob. method argument
Write a Comment
User Comments (0)
About PowerShow.com