Heavy%20Hitters - PowerPoint PPT Presentation

About This Presentation
Title:

Heavy%20Hitters

Description:

Heavy Hitters Piotr Indyk MIT Last Few Lectures Recap (last few lectures) Update a vector x Maintain a linear sketch Can compute Lp norm of x (in zillion different ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 17
Provided by: indy8
Category:

less

Transcript and Presenter's Notes

Title: Heavy%20Hitters


1
Heavy Hitters
  • Piotr Indyk
  • MIT

2
Last Few Lectures
  • Recap (last few lectures)
  • Update a vector x
  • Maintain a linear sketch
  • Can compute Lp norm of x
  • (in zillion different ways)
  • Questions
  • Can we do anything else ??
  • Can we do something about linear space bound for
    L? ??

3
Heavy Hitters
  • Also called frequent elements and elephants
  • Define
  • HHpf (x) i xi f xp
  • Lp Heavy Hitter Problem
  • Parameters f and f (often f f-e)
  • Goal return a set S of coordinates s.t.
  • S contains HHpf (x)
  • S is included in HHpf (x)
  • Lp Point Query Problem
  • Parameter ?
  • Goal at the end of the stream, given i, report
  • xixi ? ? xp

4
Which norm is better ?
  • Since x1 x2 x?, we get that
    the higher Lp norms are better
  • For example, for Zipfian distributions xi1/iß,
    we have
  • x2 constant for ßgt1/2
  • x1 constant only for ßgt1
  • However, estimating higher Lp norms tends to
    require higher dependence on ?

5
A Few Facts
  • Fact 1 The size of HHpf (x) is at most 1/f
  • Fact 2 Given an algorithm for the Lp point query
    problem, with
  • parameter ?
  • probability of failure lt1/(2m)
  • one can obtain an algorithm for Lp heavy
    hitters problem with
  • parameters f and f f-2? (any f)
  • same space (plus output)
  • probability of failure lt1/2
  • Proof
  • Compute all xi (note this takes time O(m) )
  • Report i such that xi f-?

6
Point query
  • We start from L2
  • A few observations
  • xi x ei
  • For any u, v we have
  • u-v2 u2 v2 -2uv
  • Algorithm A Gilbert-Kotidis-Muthukrishnan-Strauss
    01
  • Maintain a sketch Rx, with failure probability P
  • For simplicity assume we use JL sketches, so
  • s Rx2 (1?e)x2
  • (other L2 sketches work as well, just need
    more notation)
  • Estimator
  • Y( 1 - Rx/s Rei 2/2 ) s

7
Intuition
  • Ignoring the sketching function R, we have
  • (1-x/s-ei2/2)s
  • (1-x/s2 /2 ei2 /2 x/s ei) s
  • (1-1/2-1/2x/s ei)s xei
  • Now we just need to deal with epsilons

x/s-ei
x/s
ei
8
Analysis of Y(1-Rx/s Rei2/2 )s
  • Rx/s Rei 2/2
  • R(x/s-ei) 2/2
  • (1?e) x/s ei 2/2 Holds with prob. 1-P
  • (1?e) x/(x2(1?e)) - ei 2/2
  • (1?e) 1/(1?e)2 1 2xei/( x2(1?e))/2
  • (1?ce)(1 - xei/x2 )
  • Y
  • 1 - (1?ce)(1 - xei/x2) x2(1?e)
  • 1 - (1?ce) (1?ce)xei/x2
    x2(1?e)
  • ?ce x2 (1?ce)xei (1?e)
  • ?ce x2 xei

9
Altogether
  • Can solve L2 point query problem, with parameter
    ? and failure probability P by storing O(1/?2
    log(1/P)) numbers
  • Pros
  • General reduction to L2 estimation
  • Intuitive approach (modulo epsilons)
  • In fact ei can be an arbitrary unit vector
  • Cons
  • Constants are horrible
  • There is a more direct approach using AMS
    sketches A-Gibbons-M-S99, with better constants

10
L1 Point Queries/Heavy Hitters
x
  • For starters, assume x0
  • (not crucial, but then the algorithm is really
    nice)
  • Point queries algorithm A
  • Set w2/?
  • Prepare a random hash function h 1..m?1..w
  • Maintain an array ZZ1,Zw such that
  • Zj?i h(i)j xi
  • To estimate xi return
  • xi Zh(i)

xi
xi
Z1 Zw
11
Analysis
x
  • Facts
  • xi xi
  • E xi - xi ?l?i Prh(l)h(i)xl ?/2
    x1
  • Pr xi-xi ? x1 1/2
  • Algorithm B
  • Maintain d vectors Z1Zd and functions h1hd
  • Estimator
  • xi mint Ztht(i)
  • Analysis
  • Pr xi-xi ? x1 1/2d
  • Setting dO(log m) sufficient for L1 Heavy
    Hitters
  • Altogether, we use space O(1/? log m)
  • For general x
  • replace min by median
  • adjust parameters (by a constant)

xi
xi
Z1 Z2/?
12
Comments
  • Can reduce the recovery time to about O(log m)
  • Other goodies as well
  • For details, see
  • Cormode-Muthukrishnan04 The Count-Min
    Sketch
  • Also
  • Charikar-Chen-FarachColton02
  • (variant for the L2 norm)
  • Estan-Varghese02
  • Bloom filters

13
Sparse Approximations
  • Sparse approximations (w.r.t. Lp norm)
  • For a vector x, find x such that
  • x has complexity k
  • x-xp (1?) Err , where ErrErrpk minx
    x-xp,
  • for x ranging over all vectors with
    complexity k
  • Sparsity (i.e., L0 ) is a very natural measure of
    complexity
  • In this case, best x consists of k coordinates
    of x that are largest in magnitude, i.e., heavy
    hitters
  • Then the error is the Lp norm of the non-heavy
    hitters, a.k.a. mice
  • Question can we modify the previous algorithm to
    solve the sparse approximation problem ?
  • Answer YES Cormode-Muthukrishnan05 (for L2
    norm)
  • Just set w(4/?)k
  • We will see it for the L1 norm
  • For k1 the previous proof does the job, since in
    fact
  • xi-xi ? x-xiei1

14
Point Query
x
  • We show how to get an estimate
  • xi xi ? ? Err/k
  • Assume
  • xi1 xim
  • Pr xi-xi ? Err/k is at most
  • Pr h(i)?h(i1..ik)
  • Pr ?lgtk h(il)h(i) xl ? Err/k
  • 1/(2/?) 1/4
  • lt 1/2 (if ?lt1/2)
  • Applying min/median to dO(log m) copies of the
    algorithm ensures that w.h.p
  • xi-xilt ?Err/k

xi2
xik
xi1

xi
Z1 ..Z(4/?)k
15
Sparse Approximations
  • Algorithm
  • Return a vector x consisting of largest (in
    magnitude) elements of x
  • Analysis (new proof)
  • Let S (or S ) be the set of k largest in
    magnitude coordinates of x (or x )
  • Note that xS xS1
  • We have
  • x-x1 x1 - xS1 xS-xS1
  • x1 - xS1 2xS-xS1
  • x1 - xS1 2xS-xS1
  • x1 - xS1 xS-xS1
    2xS-xS1
  • Err 3?/k k
  • (13?)Err

16
Altogether
  • Can compute k-sparse approximation to x with
    error (1?)Err1k using O(k/? log m) space
    (numbers)
  • This also gives an estimate
  • xi xi ? ? Err1k/k
Write a Comment
User Comments (0)
About PowerShow.com