Streaming Algorithms for Geometric Problems - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Streaming Algorithms for Geometric Problems

Description:

Geometric Data Stream Algorithms as Data Structures. Data structures that support: ... The algorithms will maintain certain statistics over nP(.), which will allow it ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 48
Provided by: Piotr55
Category:

less

Transcript and Presenter's Notes

Title: Streaming Algorithms for Geometric Problems


1
Streaming Algorithms for Geometric Problems
  • Piotr Indyk
  • MIT

2
Data Streams
  • A data stream is a (massive) sequence of data
  • Too large to store (on disk, memory, cache, etc.)
  • Examples
  • Network traffic (source/destination)
  • Sensor networks
  • Satellite data feed, etc.
  • Approaches
  • Ignore it
  • Develop algorithms for dealing with such data

3
Talk Overview
  • Computational model
  • Example problems
  • (Short) history of streaming algorithms
  • Streaming algorithms for geometric problems
  • Insertions only
  • Insertions and deletions
  • Open problems

4
Computational Model
  • Single pass over the data e1, e2, ,en
  • Bounded storage
  • Fast processing time per element

5
Related Models
Memory
  • External Memory
  • Bounded Storage
  • Data Stored on Disk
  • Random Access to Blocks of Data
  • Compact Representations of Data and Communication
    Complexity
  • Read-Once Branching Programs

Disk
Alice x
Bob y
F(x,y)?
e11 ?
Y
N
6
Classic Examples
  • Compute the number of distinct elements
  • Exactly ?(n) bits of space
  • (1?) -approximation O(1/?2 log n) bits
    Flajolet-Martin, JCSS85 ,
  • Compute the median
  • Exactly ?(n)
  • (50 ? ?) -approximation O(1/? polylog n)
    Paterson-Munro, TCS80 ,

7
Brief History of Streaming Algorithms
  • Ancient times MP80,FM85,Morris,..
  • Middle Ages
  • Renaissance Alon-Matias-Szegedy, STOC96
  • Theory
  • DB (Aqua project in Bell Labs)
  • Networking
  • Streaming became mainstream ?

8
Theoretical History
  • Vector problems
  • Stream defines an array of numbers
  • Maintain stats of the array, e.g., median
  • Metric problems
  • Clustering
  • Graph problems, Text problems
  • Geometric Problems this talk

9
Geometric Data Stream Algorithms as Data
Structures
  • Data structures that support
  • Insert(p) to P
  • Possibly Delete(p) from P
  • Compute(P)
  • Use space that is sub-linear in P

10
Insertions-only
11
Metric clustering problems
  • k-center Charikar-Chekuri-Feder-Motwani,
    STOC97
  • k-median Guha-Mishra-Motwani-OCallaghan,
    FOCS00, Meyerson, FOCS01, Charikar-OCallaghan-P
    anigrahy, STOC03
  • Bounds
  • Poly(K,log n) space
  • O(1)-approximation

12
k-median/k-center
  • k is given
  • Goal choose k medians/centers to minimize
  • k-median the sum of the distances
  • k-center the max distance

13
Geometric Problems
  • Diameter, Minimum Enclosing Ball
    Agarwal-Har-Peled, SODA01, Feigenbaum-Kannan-Zha
    ng02 (Algorithmica), Hershberger-Suri, PODS04
  • K-center AHP, SODA01
  • K-median Har-Peled-Mazumdar, STOC04
  • Range searching via ?-approximations
  • Suri-Toth-Zhou, SoCG04
  • Bagchi-Chaudhary-Eppstein-Goodrich, SoCG04

14
Dominant Approach Merge and Reduce
  • Main ideas
  • Design an (off-line) algorithm that computes a
    sketch of the input
  • Small size
  • Sufficient to solve the problem
  • A sketch of sketches is a sketch

15
Tree Computation
p1
p2
p3
p4
p5
p7
p6
p8
p9
p10
p11
p12
p13
p15
p14
p16
16
Algorithm
  • Space (sketch size)log n
  • Time sketch computation time
  • Question Where do sketches come from ?

17
Idea I solutionsketch
  • Consider k-median
  • GMMO00 approximate k-median of approximate
    weighted k-medians is an approximate k-median
  • Result
  • Constant depth tree
  • Space kn? , ?gt0
  • O(1) -approximation
  • Works for any metric space

3
2
1
3
2
1
k3
18
Use the solution, ctd.
  • ?-Approximations find a subset S?P , such that
    for any rectangle/halfspace/etc R,
  • R?S/S R?P/P ??
  • Matousek approximation of a union of
    approximations is an approximation
  • BCEG04 convert it into streaming algorithm,
    applications
  • ?1/?2 space
  • STZ04 better/optimal bounds for rectangles
    and halfspaces

19
Idea 2 Core-Sets AHP01
  • Assume we want to minimize CP(o)
  • S?P is an ?-core-set for P, if for any o, and a
    set T
  • CP?T (o) lt (1?) CS?T (o)
  • Note this must hold for all o, not just the
    optimal one

o
20
Example Core-set for MEB
  • Compute extremal points
  • Choose densely spaced direction v1 vk
  • I.e., for any u there is vi such that uvi
    u2 / (1?)
  • For each direction maintain extremal point
  • kO(1/?)(d-1)/2 suffice

21
Stream Algorithms via Core-sets
  • Diameter/MEB/width O(1/?)(d-1)/2 log n space
    AHP01
  • k-center O(k/?d) log n HP01
  • k-median O(k/?d) log n HPM04
  • Faster algorithms and other results Chan,
    SoCG04, Suri-Hershberger03

22
Limitations
  • Small core-sets might not exist (see next slide)
  • Do not support deletions

23
Minimum Weight Bi-chromatic Matching
  • Estimate the cost of MWBM

24
Insertions and Deletions
25
Streaming Algorithms for Vector Problems
  • Norm estimation
  • Stream elements (i,b) , i1m
  • Interpretation xixib
  • Want to maintain xp
  • Why ? Examples
  • xpp Si xip non-zero elements in x, as p?0

26
Dimensionality reduction
  • L2 Johnson-Lindenstrauss Lemma
  • x is an m-dimensional vector
  • A is a random m times k matrix, each entry
    independently drawn from e.g. Gaussian
    distribution, kO(log N/?2 )
  • Then with probability 1-1/N
  • x2 Ax2 (1?)x2
  • A can be pseudo-random AMS96
  • Using slightly different method for norm
    estimation

27
What it means
  • To know x2, suffices to know Ax
  • Can maintain Ax when the coordinates are
    incremented
  • A(x bei)Ax bA ei

Ax
A
x
  • Can maintain approximate L2-norm of x
  • Similar approach works for p?(0,2
    Indyk, FOCS00

28
Histograms
  • View x as a function x1n ? 1M
  • Approximate it using piecewise constant function
    h, with B pieces (buckets)
  • Problem can be formulated in 2D as well (buckets
    become rectangular tiles)

29
Results 1D
  • Gilbert-Guha-Indyk-Kotidis-Muthukrishnan-Strauss,
    STOC02
  • Maintains h with B pieces such that
  • x-h2 (1?)x-hOPT2
  • Under increments/decrements of x
  • Space poly(B,1/?,log n)
  • Time poly(B,1/?,log n)

30
Results 2D
  • Thaper-Guha-Indyk-Koudas, SIGMOD02
  • Maintains h with B log (nM) tiles such that
  • x-h2 (1?)x-hOPT2
  • Under increments/decrements of x
  • Space/Update time poly(B,1/?,log n)
  • Histogram reconstruction time poly(B,1/?, n)
  • Muthukrishnan-Strauss, FSTTCS03
  • Maintains h with 4B tiles
  • Time poly(B,1/?, log(nM))

31
General Approach
  • Maintain sketches Ax of x
  • This allows us to estimate the error of any given
    h, via x-h ? Ax-Ah
  • Construct h
  • Enumeration
  • Greedy
  • Dynamic Programming

32
Minimum Weight Matching
  • Estimate the cost of MWM

33
Minimum Spanning Tree
  • Estimate the cost of MST

34
Facility Location
  • Goal choose a set F of facilities to minimize
    the
  • sum of the distances to nearest facility plus
  • the number of facilities times f
  • Again, report the cost

35
Approach
  • Assume P?1?2
  • Reduce to vector problems
  • Impose square grids G0Gk, with side lengths
    20,21, , 2k , shifted at random.
  • For each square cell c in Gi, let nP(c) be the
    number of points from P in c.
  • The algorithms will maintain certain statistics
    over nP(.), which will allow it to approximately
    solve the problems

1
2
1
3
1
5
1
1
36
Estimators
  • MST ?i 2i ?c ?Gi nP(c)gt0
  • MWM ?i 2i ?c ?Gi nP(c) is odd
  • MWBM ?i 2i ?c ?Gi nG(c)-nB(c)
  • Fac. Loc. ?i 2i ?c ?Gi minnP(c), Ti
  • K-median ?i 2i ?c ?Gi - B(Q, 2i) nP(c)
  • (const. factor)
  • Maintain non-zero entries in nP FM85
  • Maintain L1 difference I00

37
Results Indyk04
Space (log ? log n)O(1)
follows from Charikar, STOC02 also
Agarwal-Varadarajan, SoCG04 and Indyk-Thaper02
38
Results K-median
  • Space (Klog ? log n)O(1)

39
Probabilistic embeddings into HSTs
T
1
2
1
3
1
5
1
1
  • Known Bartal, FOCS96, Charikar-Chekuri-Goel-Guha
    -Plotkin,STOC98
  • p-q Dtree (p,q)
  • E Dtree(p,q) p-q O(log ?)

40
MST
1
2
1
3
1
5
1
1
  • ECost(MST in T) O(log ?) Cost(MST)
  • Cost(MST in T) ? Cost(T)
  • How to compute Cost(T) ?
  • Sum over all levels i, of the nodes at i, times
    2i
  • Node c exists iff ni(c)gt0

41
Matching
  • Algorithm
  • Match what you can at the current level
  • Odd leftovers wait for the next level
  • Repeat
  • Optimal on the HST
  • Cost?i 2i ?c ?Gi nP(c) is odd

1
0
1
1
1
0
1
1
0

42
Conclusions
  • Algorithms for geometric data streams
  • Insertions-only merge and reduce
  • Insertions and deletions randomized linear
    embeddings

43
Open Problems
  • High dimensions
  • Diameter
  • 21/2-approx, O(d2 n1/2 ) space, follows from
    Goel-Indyk-Varadarajan, SODA01
  • c-approx, O( dn1/(c2 - 1) ) Indyk, SODA03
  • Conjecture ?21/2-approx, O(d polylog n) space
  • Min-width cylinder 18-approx, O(d) space
    Chan04
  • Other problems ?

44
Open Problems
  • Range queries
  • General lower bounds ? (Not just for ?-
    approximations)
  • ?(1/?2) -bit bound for general queries follows
    from LB for dot product Indyk-Woodruff, FOCS03
    , and is tight (for randomized algorithms)
  • What about e.g., half-space queries ? O(1/?4/3)
    is known STZ04
  • Other problems STZ04

45
Open Problems
  • Matchings, Facility Location, etc
  • Replace log ? by O(1) or even 1?
  • Possible for MST Frahling-Indyk-Sohler??
  • Related to computing bi-chromatic matching
    Agarwal-Varadarajan04
  • Min-sum clustering ?

46
Open Problems
  • Better core-sets
  • k-median 1/?d ? 1/?(d-1)/2 ? Possible for d1
    Indyk
  • k-center 1/?d ? 1/?(d-1)/2 Possible for k1
    (this is minimum enclosing ball)
  • Insertions and deletions ?
  • k-median poly(log nlog?k1/?) space/time,
    (1?) approximation ?

47
The End Thank you !
Write a Comment
User Comments (0)
About PowerShow.com