Title: Handling WorstCase in Skyline
1Handling Worst-Case in Skyline
- Romil Jain
2Introduction (Hotel Example)
Query Find hotels that are best on stars,
distance, price
3Formal Definition
In simple words, a Skyline is the set of all
non-dominated tuples
T set of tuples n Tuples k dimensions (or
columns) for each tuple ti value of tuple t on
dimension i
t, t? T, t dominates t iff ? i ? 1..k, ti ?
ti ? ? i ? 1..k, ti ? ti
The skyline of T is t ? T ? t ? T, t
dominates t
4Previous ResultsGSG07
Other prominent algorithms FLET, LDC, SDC,
BBS, NN
5Research Motivation
- Create a skyline algorithm that
- Improves worst-case complexity
- Is external (i.e., with low I/O cost)
Vector, point, tuple all mean the same thing
6SFS CGGL03
y
P
x
- Basic approach
- Sort input with some topographical function F
(e.g. volume) - Compare all points with those with highest F
using a window.
7Z-Order LZLL07
- Z-address Interleave bits of all values. E.g.,
lt1,6gt ? 010110
- Assign a Z-address to all vectors, store them in
a B tree (SRC).
- Maintain another B tree for storing skylines
(SKY).
- Compare points from SRC with points from SKY.
Update SKY.
8Key Lessons
- Scan-based algorithms use less I/O.
- Sorting with topographical function F, and
comparing input with the vectors with highest F,
can eliminate large number of vectors.
- Pair-wise comparisons can be reduced by
partitioning input into regions, and comparing
regions first.
9A Framework
y
- Partition the points into cells according to
some strategy.
- Compare cells mutually to eliminate dominated
cells
- Compare points from only dependent cells
x
Key idea is that number of cells is much lower
than number of points.
10Contributions
- Worked on five different heuristics
- Sorts
- Pivot
- Lattice
- Cubes
- Spider
- For simplicity, assume that points are
- distinct on any dimension
- uniformly distributed on any dimension
- normalized (i.e., values ranging from 1n)
11Lattice
- Choose Best-Low as a pivot.
- Best-Low is very effective if its close to
- lt n, n, ..., n gt
- lt n - n/k , n - n/k , ..., n - n/k gt
- lt n/k , n/k , ..., n/k gt
- Requires a dependency table
- Under UI, Best-Low is estimated to be close
to - ltb, b, ..., b gt, b n(1 1/n1/k)
- Best-Low is the point whose lowest value in any
dimension is the highest among the lowest values
of all the points.
- With strict assumptions, Lattice is O(n1.58) in
worst-case.
12Spider
- Apply a modified form of SFS
- Split the points into cells
- Compare cells to eliminate dominated cells.
- Solve each cell individually.
- Compare points from comparable cells by
reapplying Spider over reduced dimensions.
- With strict assumptions, Lattice is O(n1.58) in
worst-case.
13Results (1)
14Results (2)
15Results (3)
16Results (4)
17Conclusions
- Created an algorithm that attempts to
incorporate best features of other algorithms.
- An EF window, similar to LESS, is used to solve
the best-case and average case efficiently.
- A divide-and-conquer technique, similar to DDC,
is used to solve the worst case efficiently..
- Partitioning, similar to Z-order, to reduce
pair-wise comparisons.
- A scan-based approach which is conducive for
externalization.
18Future Work
- Determine the exact cause of Spiders failure in
higher number of dimensions and fix it if
possible.
- Conduct an experimental analysis for the Lattice
algorithm.
- Come up with more reliable theoretical analysis
of Lattice and Spider.
19References(1)
- BCL90 J.L. Bentley, K.L. Clarkson and D.B.
Levine. Fast Linear Expected-time Algorithms for
Computing Maxima and Convex Hulls. In Proceedings
of the 1st Annual ACM-SIAM Symposium on Discrete
Algorithms (SODA), pages 179187. 1990. - BKS01 Stephan Borzsonyi, Donald Kossmann and
Konrad Stocker. The Skyline Operator. In
Proceedings of the 17th International Conference
on Data Engineering, pages 421430. 2001. - BKST78 J.L Bentley, H.T. Kung, M. Schkolnick
and C.D. Thompson. On the Average Number of
Maxima in a Set of Vectors and Applications. In
Journal of the Association for Computing
Machinery (ACM), 25(4)pages 536543, 1978. - Buc89 C. Buchta. On the Average Number of
Maxima in a Set of Vectors. In Information
Processing Letters, 33pages 6365, 1989. - CGGL03 Jan Chomicki, Parke Godfrey, Jarek Gryz
and Dongming Liang. Skyline with Presorting. In
Proceedings of the 19th International Conference
on Data Engineering (ICDE), pages 717719.
Bangalore, India, 2003. - CGGL05 Jan Chomicki, Parke Godfrey, Jarek Gryz
and Dongming Liang. Skyline with Presorting
Theory and Optimization. In Proceedings of the
Intelligent Information Systems Conference (IIS)
New Trends in Intelligent Information Processing
and Web Mining, pages 593602. Gdansk, Poland,
2005. - God04 Parke Godfrey. Skyline Cardinality for
Relational Processing. In Proceedings of the 3rd
International Symposium on Foundations of
Information and Knowledge Systems, pages 7897.
Springer, Wilhelminenberg Castle, Austria, 2004.
Continued
20References(2)
- God04 Parke Godfrey. Skyline Cardinality for
Relational Processing. In Proceedings of the 3rd
International Symposium on Foundations of
Information and Knowledge Systems, pages 7897.
Springer, Wilhelminenberg Castle, Austria, 2004. - KLP75 H. T. Kung, F. Luccio and F. P.
Preparata. On Finding the Maxima of a Set of
Vectors. In Journal of the Association for
Computing Machinery (ACM), 22(4)pages 469476,
1975. - KRR02 D. Kossmann, F. Ramsak and S. Rost.
Shooting Stars in the Sky an Online Algorithm
for Skyline Queries. In Very Large Data Bases
Conference (VLDB), pages 275286. 2002. - LZLL07 Ken C. K. Lee, Baihua Zheng, Huajing Li
and Wang-Chien Lee. Approaching the Skyline in Z
Order. In Proceedings of the 33rd International
Conference on Very Large Data Bases (VLDB), pages
279290. Vienna, Austria, 2007. - PTFS05 D. Papadias, Y. Tao, G. Fu and B.
Seeger. Progressive Skyline Computation in
Database Systems. In Association for Computing
Machinery (ACM) TODS, 30(1)pages 4182, 2005. - TC02 Riccardo Torlone and Paolo Ciaccia. Which
Are My Preferred Items? In Workshop on
Recommendation and Personalization in eCommerce,
pages 19. Malaga, Spain, 2002.