Finding kDominant Skylines In High Dimensional Space - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Finding kDominant Skylines In High Dimensional Space

Description:

A point p dominates another point q if. p is not worse than q in all dimensions ... Point A cannot dominate point B. as long as B is better than A in one dimension ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 39
Provided by: dcs3
Category:

less

Transcript and Presenter's Notes

Title: Finding kDominant Skylines In High Dimensional Space


1
Finding k-Dominant Skylines In High Dimensional
Space
  • Chee-Yong Chan1, H.V. Jagadish2,
  • Kian-Lee Tan1, Anthony K.H. Tung1 and
  • Zhenjie Zhang1
  • 1National University of Singapore
  • 2University of Michigan

2
Outline
  • Motivation
  • k-Dominant Skyline
  • Properties
  • Algorithms for k-dominant skyline query
  • Experimental Results
  • Conclusion

3
Definition of Skyline
  • A point p dominates another point q if
  • p is not worse than q in all dimensions
  • p is better than q in at least one dimension
  • The skyline of a data set D contains all the
    points not dominated by any other point
  • Assumption smaller value is better

4
Definition of Skyline
  • Example Hotel (price, distance to beach)
  • A dominates B, C dominates B, etc.

Price of the hotel
E
F
B
C
A
Distance to beach
skyline
D
5
Advantages of Skyline
  • Insensitive to scaling and shifting of the
    dimensions
  • Movie Rating
  • Different users may have different rating
    preferences
  • Meaningful for incomparable dimensions
  • Cell Phone Finder
  • Weight, size, standby time, price, etc.

6
High Dimension Problem
  • Too many skyline points in high dimensional
    spaces
  • Point A cannot dominate point B
  • as long as B is better than A in one dimension
  • Example NBA data set, 17000 player season
    statistics on 17 attributes
  • Over 1000 skyline points in the full space
  • Some average-skilled players are in the skyline
    if they are not bad on some attributes.

7
High Dimension Problem
  • Possible Solutions
  • Dimension Reduction Techniques
  • Requires domain knowledge
  • Subspace Skylines
  • Many subspaces need to be explored
  • Relax the notion of dominance
  • The idea of this paper

8
Outline
  • Motivation
  • k-Dominant Skyline
  • Properties
  • Algorithms for k-dominant skyline query
  • Experimental Results
  • Conclusion

9
k-Dominant Skyline
  • k-Dominate
  • If A is not worse than B on k dimensions, and
    better on at least one of the k dimensions, we
    say A k-dominate B.
  • k-Dominant Skyline
  • k-dominant skyline contains all the points that
    cannot be k-dominated by any other point

10
k-Dominant Skyline
  • k-Dominant Skyline Query
  • Given a data set, find the k-dominant skyline
  • When kd, we have the conventional skyline

11
k-Dominant Skyline
  • Example

conventional skyline
5-dominant skyline
4-dominant skyline
Smaller k, smaller k-dominant skyline
12
Outline
  • Motivation
  • k-Dominant Skyline
  • Properties
  • Algorithms for k-dominant skyline query
  • Experimental Results
  • Conclusion

13
K-Dominance can be cyclic
  • A 3-dominate B

14
K-Dominance can be cyclic
  • B 3-dominate C

15
K-Dominance can be cyclic
  • C 3-dominate D

16
K-Dominance can be cyclic
  • D 3-dominate A

17
Dominance Region
  • Dominance Region
  • The dominance region of point set P is the region
    on space S that is k-dominated by any point in P.
  • Example
  • 2-dimension space and 1-dominance

18
Dominance Region
  • Question
  • Whats the smallest subset P of D that can have
    the same dominance region as D?
  • Knowing P, we can find the k-dominant skyline by
    comparing every point against points in P
  • Answer
  • P Conventional skyline

19
Outline
  • Motivation
  • k-Dominant Skyline and its Variants
  • Properties
  • Algorithms for k-dominant skyline query
  • Experimental Results
  • Conclusion

20
Algorithms Outline
  • One Scan Algorithm (OSA)
  • Scan the whole data set once
  • Two Scan Algorithm (TSA)
  • Scan the whole data set twice
  • Sorted Retrieval Algorithm (SRA)
  • Exploit the sorting result on every dimension

21
One-Scan Algorithm (OSA)
  • Scan the data set one time
  • Candidate Buffer
  • Contains potential k-dominant skyline points
  • Pruning Buffer
  • Contains conventional skyline points

22
One Scan Algorithm (OSA)
Candidate Buffer
p1
Pruning Buffer
p1
23
One Scan Algorithm (OSA)
Candidate Buffer
p1 p2
Pruning Buffer
p1 p2
24
One Scan Algorithm (OSA)
Candidate Buffer
p1 p2
Pruning Buffer
p1 p2 p3
25
One Scan Algorithm (OSA)
Candidate Buffer
p1 p2
Pruning Buffer
p1 p2 p3
26
Two Scan Algorithm (TSA)
  • Maintain candidate buffer only
  • Invoke a second scan to remove any false positives

27
Sorted Retrieval Algorithm (SRA)
  • Pi sorted point list in non-desc. order of ith
    dim.
  • Maintain two buffers
  • candidate buffer C of potential k-dominant
    points
  • Each point p in C is associated with a counter
    count(p)
  • buffer B of k-dominant points
  • Iteratively choose a dimension i consider the
    top unprocessed points Ti in Pi (with same
    values)
  • use Ti to prune points in C
  • If p in Ti is in C and count(p) d-k1, then
    move p to B
  • Increment count(p)

28
Sorted Retrieval Algorithm (SRA)
4-dominant skyline query
29
Sorted Retrieval Algorithm (SRA)
4-dominant skyline query
30
Sorted Retrieval Algorithm (SRA)
4-dominant skyline query
31
Sorted Retrieval Algorithm (SRA)
4-dominant skyline query
32
Other Queries
  • Top-N Skyline Query
  • Returns a set S of k-dominant skyline, where
    SgtN and k is the smallest
  • Its more natural for user to set N than k
  • w-Dominant Skyline Query
  • Not all dimensions are equally important
  • Every dimension has a weight assignment
  • Point A w-dominates B if A can dominate B on a
    set of dimensions whose weighted sum is larger
    than w.

33
Outline
  • Motivation
  • k-Dominant Skyline and its Variants
  • Properties
  • Algorithms for k-dominant skyline query
  • Experimental Results
  • Conclusion

34
Experimental Study
  • Data Sets
  • Synthetic Data Sets
  • Correlated, Independent and Anti-Correlated
  • Real Data Sets
  • NBA data set and MovieLens data set
  • Query Types
  • k-Dominant Skyline Query
  • Top-N Skyline Query
  • w-Dominant Skyline Query

35
Efficiency Test
  • k-dominant query efficiency on varying k

36
Real Data Sets
  • Combine top-N and w-Dominant Query
  • Top-5 w-Dominant Skyline Query Result on NBA data
    set with three different weight assignments

37
Conclusion
  • Introduced k-dominant skyline and its variants
  • Analyzed the properties of k-dominant skyline
  • Proposed three algorithms
  • Demonstrated effectiveness with empirical study

38
Question and Answer
Write a Comment
User Comments (0)
About PowerShow.com