Probabilistic Skyline Operator over Sliding Windows - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Probabilistic Skyline Operator over Sliding Windows

Description:

Xuemin Lin, Ying Zhang, Wei Wang (UNSW & NICTA) Jeffrey Xu Yu (CUHK) Outline. Background ... Elements continuously arrive with occurrence probabilities ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 33
Provided by: wind830
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Skyline Operator over Sliding Windows


1
  • Probabilistic Skyline Operator over Sliding
    Windows

Wenjie Zhang University of New South Wales
NICTA, Australia
Joint work Xuemin Lin, Ying Zhang, Wei Wang
(UNSW NICTA) Jeffrey Xu Yu (CUHK)
2
Outline
  • Background
  • Framework
  • Algorithms
  • Experiment
  • Conclusion

3
Background
2
0.1
1
1
0.1
4
0.8
6
0.5
3
0.4
5
0.1
  • Elements continuously arrive with occurrence
    probabilities
  • Problem How to continuously compute skylines in
    a sliding window with size N (elements)?
  • Sliding window N 5

4
Background
  • Multi-criteria decision making regarding
    uncertain data
  • Online auction
  • Financial market

5
Related work
  • Probabilistic skyline (VLDB07)
  • Probabilistic reverse skyline (SIGMOD08)
  • Probabilistic aggregates and sketches over
    uncertain streams (SIGMOD07, SODA07, PODS07)
  • Frequent items on uncertain streams (SIGMOD08)
  • Top-k queries over uncertain sliding window
    (VLDB08)

Probabilistic skyline computation
Uncertain stream processing
6
Models and Problem Definition
  • Model DS is a stream of elements, each element a
    is in a d-dimensional space and with an
    occurrence probability P(a) ( in (0, 1)
  • The skyline probability of an element a is
  • Problem Definition retrieving elements from the
    most recent N elements, with skyline probability
    no less than a given threshold q

7
Challenges and Contributions
  • Space efficiency
  • Contribution Space reduction O(N) to O(lnd-1N)
  • Time efficiency
  • Contribution R-tree based efficient incremental
    algorithms

8
Outline
  • Background and Preliminaries
  • Framework
  • Algorithms
  • Experiment
  • Conclusion

9
Framework what to keep ?
Pold (2) 1 P(1)
2
0.1
0.1
1
Pnew(2) (1 P(3)) (1 P(4))
4
0.8
3
0.4
Pnew (2) lt q , element 2 will never become
skyline in the window
5
0.1
  • window size N 5 probability threshold 0.5

10
Framework what to keep ?
  • Candidate set SN,q
  • Correctness
  • (1) no missing skyline points
  • (2) no false hits to determine SN, q
  • (3) no false positive to determine skyline
    results
  • (4) no false negative to determine skyline
    results
  • --- probability based on SN,q may not be
    accurate, but
  • satisfies the threshold
    requirement.

11
Framework
  • Space required for SN,q
  • SN,q is the minimum information to be maintained
    to get a correct answer.

Psky(3) 0.9 (1 0.4) (1- 0.3) lt q
Psky(3) 0.9 gt q
3
0.9
0.4
2
2
0.3
1
1
4
0.8
window size N 4 probability threshold q 0.5
12
Space of Candidate Set
  • Theorem Candidate Set requires a
    poly-logarithmic space on average case regarding
    uniform distributions, O(f(q)lnd-1N).

13
Outline
  • Background and Preliminaries
  • Framework
  • Algorithms
  • Experiment
  • Conclusion

14
Algorithms
  • We maintain two R-trees
  • R1 SKYN,q --- skylines
  • R2 SN,q - SKYN,q --- candidates skylines

15
Algorithms
R1 SKYN,q
not in SN,q
1(.1)
6(.8)
8(.2)
5(.8)
10(.2)
7(.6)
3(.4)
9(.5)
11(.6)
R2 SN,q SKYN,q
13(.1)
12(.1)
2(.1)
4(.1)
  • window size N 13 probability threshold q 0.2

16
Algorithms
  • New element arrives
  • Check Psky Pnew on R1
  • Check Pnew on R2
  • Handling elements with Pnew lt q
  • Old element expires
  • Update Pold
  • Check Psky on R2

17
Algorithms new elements arrives
R1 SKYN,q
Delete an Entry
6(.8)
8(.2)
5(.8)
Before update Pnew (1, 1) Psky (0.8,
0.8) global Pnew 1 0.2 After update global
Pnew 1- 0.8 Delete from R1
10(.2)
7(.6)
3(.4)
9(.5)
11(.6)
R2 SN,q - SKYN,q
13(.1)
12(.1)
2(.1)
4(.1)
14(0.8)
  • window size N 13 probability threshold q 0.2

18
Algorithms new elements arrives
Move an Entry from R1 to R2
R1 SKYN,q
8(.2)
Before update Pnew (1, 1) Psky (0.24,
0.6) global Pnew 1 After update global Pnew
1 0.8 min Pnew 0.2 q max Psky 0.12 lt
q Move from R1 to R2
10(.2)
7(.6)
3(.4)
9(.5)
11(.6)
R2 SN,q - SKYN,q
13(.1)
12(.1)
2(.1)
4(.1)
14(0.8)
  • window size N 13 probability threshold q 0.2

19
Algorithms new elements arrives
R1 SKYN,q
8(.2)
R2 SN,q - SKYN,q
10(.2)
Before update Pnew (0.9, 1) global Pnew
1 After update global Pnew 1 0.8 min Pnew
lt q max Pnew q Drill down and delete 2
7(.6)
3(.4)
9(.5)
11(.6)
13(.1)
12(.1)
2(.1)
4(.1)
14(0.8)
  • window size N 13 probability threshold q 0.2

20
Algorithms new elements arrives
R1 SKYN,q
8(.2)
R2 SN,q - SKYN,q
10(.2)
Update Pold
7(.6)
3(.4)
Update Pold of 12 13 global Pold / (1 0.1)
9(.5)
11(.6)
13(.1)
12(.1)
2(.1)
4(.1)
14(0.8)
  • window size N 13 probability threshold q 0.2

21
Algorithms new elements arrives
R1 SKYN,q
8(.2)
R2 SN,q - SKYN,q
10(.2)
7(.6)
Insert new element Pnew 1. compute Psky
3(.4)
9(.5)
11(.6)
13(.1)
12(.1)
4(.1)
14(0.8)
  • window size N 13 probability threshold q 0.2

22
Algorithm old element expires
  • Delete it from R1 or R2.
  • Update Pold of remaining elements
  • Record global Pold on intermediate entries fully
    dominated by it
  • Check Psky after update

23
Algorithms old element expires
R1 SKYN,q
8(.2)
Pold (7) / 1 P(3)
10(.2)
R2 SKYN,q
7(.6)
3(.4)
9(.5)
11(.6)
13(.1)
12(.1)
4(.1)
global Pold / 1 P(4)
14(0.8)
  • window size N 13 probability threshold q 0.2

24
Algorithms handling multiple thresholds
  • Continuous queries
  • Users specify k probability thresholds q1, , qk.
    (qi lt qi-1)
  • Solution instead of maintaining R1, we maintain
    R1, , Rk, each corresponding to a confidence
    value.
  • Ad-hoc queries
  • Users issue a query retrieve skylines with
    probability at least q (q qk)
  • Solution find an Ri with qi q lt qi-1. Then
    all elements in Rj j lt i -1 are results. We
    search Ri-1 to output qualified skylines

25
Experiment
  • Data set
  • Real stock transactions. 2-d. probability
    assigned randomly. Size 2 million
  • Synthetic spatial location (independent or
    anti-correlated) probability (uniform or
    normal) 2d to 5d 2 million
  • Default values p 0.3 d 3 N 1M spatial
    distribution anti-correlated probability
    uniform

26
Experiment space
  • 0.1 to the sliding window size for 2-d data
    save around 89 space even for 5-d data.

27
Experiment space
  • Size of SN,q deceases with the increase of Pu,
    while size of SKYN,q increases with it.

28
Experiment space
29
Experiment time
30
Experiment time
  • Maintenance time increases with probability
    thresholds query time deceases with it.

31
Conclusion
  • We characterize a candidate set with minimum size
    and propose time efficient techniques.
  • We extend the framework to handle multiple
    thresholds.

32
  • Thanks !
Write a Comment
User Comments (0)
About PowerShow.com