Sketching Asynchronous Streams Over a Sliding Window - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Sketching Asynchronous Streams Over a Sliding Window

Description:

Costas Busch (Rensselaer Polytechnic Institute) 2 /32. Data Stream Processing ... Bob. Alice. Bob. Carol. Sketch 3. 28 /32. Union of Streams (3,13) (2,9) (3,6) ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 33
Provided by: csc61
Category:

less

Transcript and Presenter's Notes

Title: Sketching Asynchronous Streams Over a Sliding Window


1
Sketching Asynchronous Streams Over a Sliding
Window
  • Srikanta Tirthapura (Iowa State University)
  • Bojian Xu (Iowa State University)
  • Costas Busch (Rensselaer Polytechnic
    Institute)

2
Data Stream Processing
  • Example I All packets on a network link,
    maintain the number of different ip sources in
    the last one hour
  • Example II Large database, continuously maintain
  • Frequency Moments
  • Median of all the elements
  • Processing Requirements
  • One pass processing
  • Small workspace poly-logarithmic in the size of
    data
  • Fast processing time per element
  • Approximate answers are ok

3
Data Stream Model
  • Data stream(v0,t0), (v1,t1), (v2,t2), ...
  • vi observed value
  • ti timestamp of creation
  • Synchronous stream
  • ti In ascending order
  • Asynchronous stream
  • ti No order guaranteed

4
Why Asynchronous Data Streams?
Synchronous stream
Asynchronous stream
Network delay multi-path routing
Synchronous
Asynchronous
Synchronous
Merge w/o control
5
Recent Elements
  • More interested in elements with recent
    timestamps
  • Example Network monitoring

Interesting within last 5 mins
129.186.9.17 1159 7/24/6 129.186.59.7 1112 7/23/6 129.186.13.9 1145 7/23/06 129.186.5.63 1201 7/24/6
Current time 1203 7/24/6
Not interesting out of last 5 mins
6
Timestamp Sliding Window
  • Timestamp sliding window over stream S
  • c current time
  • W window size

7
Sliding Window - example
  • Window size 10

Current time17
Stream
5,2
19,7
7,8
22,8
5,6
Current window
Clock time
recent
old
8
Sliding Window - example
  • Window size 10

Current time18
Stream
5,2
19,7
7,8
22,8
5,6
9,11
Current window
Clock time
recent
old
9
Our Contributions
  • First study of aggregate computation over recent
    elements of an asynchronous data stream
  • Randomized algorithms for estimating the sum and
    median over a sliding window of an asynchronous
    stream
  • Workspace much smaller than size of window
  • Fast processing time per item
  • Distributed aggregation over the union of
    asynchronous streams

10
Outline
  • Problem Sum of Recent Elements
  • Intuition Algorithm
  • Union of Streams

11
Problem
  • Network monitoring
  • Current time 1203 7/24/6

Interesting within last 5 mins
129.186.9.17, 423, 1159 7/24/6 129.186.59.7, 32, 1112 7/23/6 129.186.13.9, 145, 1145 7/23/06 129.186.5.63, 101, 1201 7/24/6
Not interesting out of last 5 mins
12
Sum Problem
  • Given
  • Data Stream S (v0,t0), (v1,t1), (v2,t2), ...
  • Max sliding window size W
  • User inputs e, d.
  • Task For all w W, continuously maintain an
    (e-d)-estimate of

An (e-d)-estimate for X is a random variable Y,
such that PrY-X gt eX lt d.
13
Previous Work
  • M. Datar, A. Gionis, P. Indyk, R. Motwani.
    Maintaining stream statistics over sliding
    windows. SIAM Journal on Computing,
    31(6)17941813, 2002.
  • P. Gibbons and S. Tirthapura. Distributed
    streams algorithms for sliding windows. Theory
    of Computing Systems, 37457478, 2004.

14
Algorithm for Sum
  • Problem Estimate the sum of elements within
    sliding window
  • Random Sampling
  • Randomly sample elements of this set
  • Compute sum of random sample
  • Multiply by appropriate scaling factor

15
Intuition I
  • To estimate the size of a set, sample the
    universe until enough elements chosen from set

Population
With Green Eyes
Sample
sample
With Green Eyes
Prob. pj
16
Intuition II
  • Maintain many samples of fixed-size

Prob. pj
Elements within the Sliding Window
  • Each element is randomly selected into the
    samples from higher level to lower level, until
    it fails at some sample or the lowest sample is
    reached.
  • Each sample keeps a most recent elements.

17
Intuition III
  • Items with larger values should have more weight
    to be selected into the sample.

(3,7)
p1
1/2
(1.5,7)
1/4
(1,7)
(v,t)(3, 7)
Elements within the Sliding Window
1/8
Failed
1/16
1/2m
For element (v, t) If (vp1) ? insert (vp, t)
into the sample. (Deterministic insertion) If
(vplt1)? insert(1,t) into the sample w.p. vp.
(Random insertion)
18

Algorithm for Sum
Current Time
17
20
18
22
22
Stream
(2,15), (3,16), (2,12), (3,11), (2,19)
t0 -1
t1 -1
t2 -1
t3 -1
c 17, W10, c-W, c7, 17
19

Algorithm for Sum
Current Time
17
20
18
22
22
Stream
(2,15), (3,16), (2,12), (3,11), (2,19)
Deterministic insertion
(2,15)
t0 -1
(1,15)
t1 -1
Random insertion
(1,15)
t2 -1
t3 -1
c 17, W10, c-W, c7, 17
If (vp1) ? insert (vp, t) into the sample.
(Deterministic insertion) If (vplt1)? insert(1,t)
into the sample w.p. vp. (Random insertion)
20

Algorithm for Sum
Current Time
17
20
18
22
22
Stream
(2,15), (3,16), (2,12), (3,11), (2,19)
Deterministic insertion
(3,16)
(2,15)
t0 -1
(1.5,16)
(1,15)
t1 -1
Random insertion
(1,16)
(1,15)
t2 -1
t3 -1
(1,16)
c 18, W10, c-W, c8, 18
If (vp1) ? insert (vp, t) into the sample.
(Deterministic insertion) If (vplt1)? insert(1,t)
into the sample w.p. vp. (Random insertion)
21

Algorithm for Sum
Current Time
17
20
18
22
22
Stream
(2,15), (3,16), (2,12), (3,11), (2,19)
Deterministic insertion
(3,16)
(2,15)
(2,12)
t0 -1
(1.5,16)
(1,15)
(1,12)
t1 -1
(1,16)
(1,15)
t2 -1
t3 -1
(1,16)
c 20, W10, c-W, c10, 20
If (vp1) ? insert (vp, t) into the sample.
(Deterministic insertion) If (vplt1)? insert(1,t)
into the sample w.p. vp. (Random insertion)
22

Algorithm for Sum
Current Time
17
20
18
22
22
Stream
(2,15), (3,16), (2,12), (3,11), (2,19)
Out of current window
(3,16)
(2,15)
(2,12)
t0 -1
(1.5,16)
(1,15)
(1,12)
t1 -1
(1,16)
t2 -1
(1,15)
t3 -1
(1,16)
c 22, W10, c-W, c12, 22
23

Algorithm for Sum
Current Time
17
20
18
22
22
Stream
(2,15), (3,16), (2,12), (3,11), (2,19)
Deterministic insertion
(2,19)
(3,16)
(2,15)
t0 -1
(2,12)
(1,19)
(1.5,16)
(1,15)
t1 -1
(1,12)
(1,16)
(1,15)
t2 -1
t3 -1
(1,16)
c 22, W10, c-W, c12, 22
If (vp1) ? insert (vp, t) into the sample.
(Deterministic insertion) If (vplt1)? insert(1,t)
into the sample w.p. vp. (Random insertion)
24

Algorithm for Sum
Current Time
17
20
18
22
22
Stream
(2,15), (3,16), (2,12), (3,11), (2,19)
Deterministic insertion
(2,19)
(3,16)
(2,15)
t0 12
Largest timestamp of all the elements discarded
from the sample
(1,19)
(1.5,16)
(1,15)
t1 12
(1,16)
(1,15)
t2 -1
t3 -1
(1,16)
c 22, W10, c-W, c12, 22
If (vp1) ? insert (vp, t) into the sample.
(Deterministic insertion) If (vplt1)? insert(1,t)
into the sample w.p. vp. (Random insertion)
25

Algorithm for Sum
Current Time
17
20
18
22
22
Stream
(2,15), (3,16), (2,12), (3,11), (2,19)
(2,19)
(3,16)
(2,15)
t0 12
  • c-W, c12,22
  • Level 01 overflowed
  • Use Level 2

(1,19)
(1.5,16)
(1,15)
t1 12
(1,16)
(1,15)
t2 -1
(1,16)
t3 -1
c 22, W10, c-W, c12, 22
26
Algorithm Complexity
  • Space complexity
  • Time complexity
  • Expected time for processing each item
  • Worst case time for processing each item
  • Time for answering a query

Vmax Upper bound of the sum of all items within
the sliding window m Upper bound of the value
of any single item.
27
Union of Streams
Alice
Stream 1
  • Why union of streams ?

Stream 1
Carol
Bob
Stream 2
Stream 2
sketch 1
Alice
Stream 1
Sketch forwarding reduces the message
complexity.
Carol
sketch 2
Stream 2
Bob
Sketch is Compact Lossless
28
Union of Streams
Sketch of stream 1
(3,13)
(2,9)
(3,6)
Sketch of union of stream 12
(9,12)
(7,10)
(15,6)
Each sample keeps 3 most recent items.
Sketch of stream 2
29
Proof
  • Deterministic insertion Random insertion

0-1 random variables
Accurate portion
Hoeffding Bound
Error bounded
If (vp1) ? insert (vp, t) into the sample.
(Deterministic insertion) If (vplt1)? insert(1,t)
into the sample w.p. vp. (Random insertion)
30
Conclusions
  • Aggregates on a sliding window over asynchronous
    streams
  • First algorithms for the sum and median
  • Distributed aggregation over the union of
    asynchronous streams

31
Future Work
  • Deterministic algorithm
  • Lower bounds

32
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com