Distributed Streams Algorithms for Sliding Windows - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Distributed Streams Algorithms for Sliding Windows

Description:

Tree definitions for sliding window over a collection of t 1 distributed stream: ... j=0,..,t. Let U be the union of all positions in Q1(l*),..Qt(l*). 4. Return ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 43
Provided by: searchf
Category:

less

Transcript and Presenter's Notes

Title: Distributed Streams Algorithms for Sliding Windows


1
Distributed Streams Algorithms for Sliding Windows
  • Phillip B. Gibbons,
  • Srikanta Tirthapura

2
Abstract
  • Algorithm for estimating aggregate functions over
    a sliding window of the N most recent data
    items in one or more streams.

3
Single stream
  • The first E-approximation scheme for number of
    1s in a sliding window.
  • The first E-approximation scheme for the sum of
    integers in 0..R in a sliding window.
  • Both algorithms are optimal in worst case time
    and space.
  • Both algorithms are deterministic

4
Distributed Streams
  • The first randomized E-approximation scheme for
    the number of 1s in a sliding window over the
    union of distributed streams.

5
Usage
  • Network Monitoring
  • Data Warehousing
  • Telecommunications
  • Sensor Networks

6
  • Multiple Data Source - Distributed Stream Model
  • Only the most recent data is important - Sliding
    Window

7
The Goal in the algorithms
  • Approximating a function F while minimizing
  • 1. The total memory
  • 2. The time take by each party to process a data
    item
  • 3. The time to produce an estimate - query time

8
Definition 1-An -approximation scheme
for a quantity X
  • A randomized procedure that, given any positive
    lt1 and lt1, compute an estimate
  • -approximate An estimate whose worst case
    relative error is at most

9
An Example for Basic Counting Problem
10
Algorithms for Distributed Stream
  • Each party observes only its own stream
  • Each party communicates with other parties only
    when estimate is requested
  • Each party sends a message to a Referee who
    computes the estimate

11
The Idea
  • Storing a wave consisting of many random samples
    of the stream.
  • Samples that contain only the recent items are
    sampled at a high probability, while those
    containing old items are sampled at a lower
    probability

12
Contributions
  • Introducing a data structures called waves
  • Presenting the first E-approximation scheme for
    Basic Counting.
  • Presenting the first E-approximation scheme for
    the sum of integers in 0..R. Both optimal in
    worst case space, processing time and query time.

13
Contributions
  • Presenting the first randomized
    -approximation for the number of 1s in a
    sliding window over the union of distributed
    streams

14
Related Work
  • From the paper of Datar et al
  • Using Exponential Histogram data base

15
Exponential Histogram
  • Maintain more information about recently seen
    items, less about old items.
  • k0 most recent 1s are assigned to individual
    bucket
  • The K1 next most recent 1s are assigned to
    bucket size 2.
  • The K2 next most recent 1s are assigned to
    bucket size 4.
  • So on until last N items are assigned to some
    bucket

16
Exponential Histogram
  • Each ki is either or
  • The last bucket is discarded if its position no
    longer falls within the window
  • If the new item is a 1, it is assigned to a new
    bucket of size 1.
  • If this make , then the two
    least recent buckets of size 1 are merged to form
    a bucket of size 2.
  • If k1 in now too large, the two least recent
    buckets of size 2 are merged
  • So on resulting in a cascading of up to log N
    bucket merges in the worst case.
  • The approach using waves avoids this cascading

17
The Basic Wave
  • Assumption is an integer.
  • Counters 1. pos - the current length of
    stream2. rank - the current number of 1s in the
    stream.
  • The wave contains the position of the recent 1s
    in the stream, arranged at different levels.
  • For i1,2,..,l-1, level i contains the positions
    of the most recent 1-bits whose 1-rank is a
    multiple of

18
An Example for Basic Wave
  • The crest of the wave is always over the largest
    1-rank
  • N48, 1/E3, l5

19
Estimation Steps
  • Let smax(0,pos-n1) estimation number of 1s
    in s,pos
  • Let p1 be the maximum position less than s, and
    p2 the minimum position greater/equal then s.
  • Let r1 and r2 be the rank-1 of p1 and p2
    respectively.
  • Return rank-r1 where r r2 if r2-r1 1
    otherwise r(r1r2)/2

20
LEMMA 1
  • The procedure returns an estimate that is
    within a relative error of E of the actual number
    of 1s in the window.

21
Proof
  • Let j be the smallest numbered level containing
    position p1.
  • By returning the midpoint of the range r1,r2 ,
    we guarantee that the absolute error is at most
    (r2-r1)/2
  • There is at most a gap between r1 and its next
    larger position r2.
  • Thus the absolute error in our estimate is at
    most
  • Let r3 be the earliest 1-rank at level j-1.
  • r3gt r1, r3gtr2.
  • by definition

22
Improvement
  • Use modulo N counters for pos and rank, store
    the positions in the wave as modulo N numbers -
    Take only log N bits.
  • Keep track of both the largest 1-rank discarded
    (r1) and the smallest 1-rank (r2) still in the
    wave - Number of 1s answer in O(1).
  • Instead of storing a single position in multiple
    levels, store each position only at its maximal
    level.

23
Improvement
24
Improvement
  • The positions at each level are stored in a fixed
    length queue so that each time new position is
    added , the position at the end of the queue is
    removed.
  • Maintaining a doubly link list of the position in
    the wave in increasing order.
  • By storing the difference between consecutive
    positions instead of the absolute positions -
    reduce the space from to

25
The deterministic wave algorithm
  • Upon receiving a stream bit b1.Increment pos
    (modulo N2N)2.If the head(p,r) of the linked
    list L has expired (pltpos-N), then discard it
    from L and from its queue, and store r as the
    largest 1-rank discarded
  • 3.If b1 then do(a)Increment rank, and
    determine the corresponding wave level j, the
    largest j such that rank is a multiple of (b)If
    the level j queue is full,discard the tail of the
    queue and splice it out of L(c)Add(pos,rank) to
    the head of the level j queue and the tail of L

26
Answering a query for a sliding window of size N
  • 1. Let r1 the largest 1-rank discarded. (If no
    such r1, return rank as exact answer.) Let r2 be
    1-rank at the head of the linked list L. (If L is
    empty, return 0).
  • 2. Return rank-r1, where rr2 if r2-r11
    and otherwise r(r1r2)/2

27
  • Space -
  • Process time for each item - O(1)
  • Estimate time - O(1)
  • In related work (Datar et al)
  • Space -
  • Process time for each item - O(log(EN))

28
Sum of Bounded Integers
  • The sum over a sliding window can range from 0 to
    NR.
  • Let N be smallest power of 2 greater than/equal
    to 2RN.
  • Counters(modulo N)pos - the current
    lengthtotal - the running sum
  • llog(2ENR) levels.
  • Storing triple for each item (p,v,z)v-the value
    for the data itemz-the partial sum trough this
    item

29
  • The answer for query is the midpoint of the
    interval total-z2v2,total-z1)

30
The Algorithm for the sum of last N items in a
data stream
  • Upon receiving a stream value v between 0 to R
  • 1.Increment pos (modulo N2N)
  • 2.If the head(p,v,z) of the linked list L has
    expired (pltpos-N), then discard it from L and
    from its queue, and store z as the largest
    partial sum discarded
  • 3.If vgt0 then do
  • (a)Determine the largest j such that some number
    in (total,totalv) is a multiple of Add v to
    total.
  • (b)If the level j queue is full,discard the tail
    of the queue and splice it out of L
  • (c)Add(pos,v,total) to the head of the level j
    queue and the tail of L

31
Step 3a
  • The desired wave level is the largest position j
    such that some number y in the interval
    (total,totalv has 0s in all positions less
    than j.
  • y-1 and y differ in bit position j.
  • If bit j changes from 1 to 0 at any point in
    total,totalv,then j is not the largest
  • j is the position of the most-significant bit
    that is 0 in total and 1 in totalv.
  • j is the most -significant bit that is 1 in
    bitwise xor between total and totalv

32
Answering a query for a sliding window of size N
  • 1. Let z1 be the largest partial sum discarded
    from L. (If no such z1, return total as exact
    answer.) Let (pos,v2,z2) be the head of the
    linked list L. (If L is empty, return 0).
  • 2. Return total - (z1z2-v2)/2

33
  • Space -O(1/E(logNlogR)) memory word of
    O(logNlogR)
  • Process time for each item - O(1)
  • Estimate time - O(1)
  • In related work (Datar et al)
  • Space - O(1/E(logNlogR)) buckets of
    logNlog(logNlogR)
  • Process time for each item - O(logNlogR)

34
Distributed Streams
  • Tree definitions for sliding window over a
    collection of tgt1 distributed stream1. Seeking
    the total number of 1s in the last N items in
    each of the t streams (tN items in total)2. A
    single logical stream has been split arbitrarily
    among the parties. Each party receives items that
    include a sequence number in the logical stream.
    Seeking the total number of 1s in the last N
    items in the logical stream.3.Seeking the total
    number of 1s in the last N items in the
    position-wise union of the t streams

35
Solution for First Scenario
  • Applying single stream algorithm to each stream.
  • To answer a query, each party sends its count to
    the Referee.
  • The Referee sums the answers.
  • Because each individual count is within E
    relative error, so is the total.

36
Solution for Second Scenario
  • To answer a query, each party sends its wave to
    the Referee.
  • The Referee computes the maximum sequence number
    over all the parties use each wave to obtain an
    estimate over the resulting window, and sum the
    result.
  • Because each individual count is within E
    relative error, so is the total.

37
Randomized Waves
  • Contains the positions of the recent 1s in the
    data stream, stored at different levels.
  • Each level i contains the most recently selected
    positions of the 1-bits, where a position is
    selected into level i with probability
  • The deterministic wave select 1 out of every
    1-bits at regular interval.
  • A randomized wave selects an expected 1 out of
    every 1-bits random interval.
  • The randomize wave retains more position per
    level.

38
The Basic Randomized Wave
  • Let N be the power of 2 that is at least 2N
  • Let dlogN
  • Let Elt1 be the desired error probability
  • Each Party Pj maintains a basic randomized wave
    for its stream consisting of d1 queues,
    Qj(0),..,Qj(d), one for each level.
  • Using a psedo-random hash function h to map
    positions to levels, according to exponential
    distribution

39
The Steps for Maintaining the Randomized Wave
  • Party Pj, upon receiving a stream bit
    b1.Increment pos (modulo N2N)2.Discard any
    position p in the tail of a queue that has
    expired (pltpos-N)3.If b1 then for l
    0,..,h(pos) do(a) If the level l queue Qj(l) is
    full, then discard the tail of Qj(l)(b) Add pos
    to the head of Qj(l).
  • The sample for each level, stored in a queue,
    contains the most recent position selected
    into the level. (c36)

40
  • Consider a queue Qj(l) contains all the 1-bitwise
    the interval I,pos whose position i. Then Qj(l)
    contains all the 1-bits in the interval i,pos
    whose positions hash to a value greater than
    equal to l.
  • As we move from level l to l1, the range may
    increase.
  • The queues at lower numbered levels may have
    ranges that fail to contain the window, but as we
    move to higher levels, we will find a level whose
    contains the window







41
Answering a query for a sliding window of size
nltN
  • After each party has observed pos bits1. Each
    party j sends its wave, Qj(0),..,Qj(logN)), to
    the Referee, let smax(0,pos-n1). Then Ws,pos
    is the desired window.2.For j1,..,t let lj be
    the minimum level such that the tail of Qj(lj) is
    a position plts.3.Let lmaxlj,j0,..,t. Let U
    be the union of all positions in
    Q1(l),..Qt(l).4. Return

42
  • The algorithm returns an estimate for Union
    Counting Problem for any sliding window of size
    nltN that is within a relative error E with
    probability greater than 2/3
  • space -
Write a Comment
User Comments (0)
About PowerShow.com