Depth Estimation for Ranking Query Optimization - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Depth Estimation for Ranking Query Optimization

Description:

DEEP accesses statistics with two methods. getFreq(b): Return frequency of b ... Abstracts the physical statistics format. Allows statistics to be generated on-the-fly ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 34

Provided by: karl114

Category:

more less

Transcript and Presenter's Notes

Title: Depth Estimation for Ranking Query Optimization

1
Depth Estimation forRanking Query Optimization

Karl Schnaitter, UC Santa Cruz
Joshua Spiegel, BEA Systems, Inc.
Neoklis Polyzotis, UC Santa Cruz

2
Relational Ranking Queries

RANK BY 0.3/h.price 0.5r.rating
0.2isMusic(e)
LIMIT 10

SELECT h.hid, r.rid, e.eid FROM Hotels h,
Restaurants r, Events e WHERE h.city r.city AND
r.city e.city

A base score for each table in 0,1
Combined with a scoring function S
S(bH, bR, bE) 0.3bH 0.5bR 0.2bE
Return top k results based on S
In this case, k 10

3
Ranking Query Execution

SELECT h.hid, r.rid, e.eid
FROM Hotels h, Restaurants r, Events e
WHERE h.city r.city AND r.city e.city
RANK BY 0.3/h.price 0.5r.rating
0.2isMusic(e)
LIMIT 10

rank-aware plan
conventional plan
Fetch 10 results
Fetch 10 results
Rank join
Sort on S
Join
Rank join
E
Join
E
H
R
H
R
Ordered by score
4
Depth Estimation

Depth number of accessed tuples
Indicates execution cost
Linked to memory consumption
The problem Estimate depths for each operator in
a rank-aware plan

Rank join
H
R
left depth
right depth
5
Depth Estimation Methods

Ilyas et al. (SIGMOD 2004)
Uses probabilistic model of data
Assumes relations of equal size and a scoring
function that sums scores
Limited applicability
Li et al. (SIGMOD 2005)
Samples a subset of rows from each table
Independent samples give a poor model of join
results

6
Our Solution DEEP

DEpth Estimation for Physical plans
Strengths of DEEP
A principled methodology
Uses statistical model of data distribution
Formally computes depth over statistics
Efficient estimation algorithms
Widely applicable
Works with state-of-the-art physical plans
Realizable with common data synopses

7
(No Transcript)
8
(No Transcript)
9
Monotonic Functions

A function f(x1,...,xn) is monotonic if
?i(xiyi) ? f(x1,...,xn) f(y1,...,yn)

f(x)
x
10
Monotonic Functions

A function f(x1,...,xn) is monotonic if
?i(xiyi) ? f(x1,...,xn) f(y1,...,yn)
Most scoring functions are monotonic
E.g. sum, product, avg, max, min
Monotonicity enables bound on score
In example query, score was
0.3/h.price 0.5r.rating 0.2isMusic(e)
Given a restaurant r, upper bound is
0.31 0.5r.rating 0.21

11
Hash Rank Join IAE04

The Hash Rank Join algorithm
Joins inputs sorted by score
Returns results with highest score
Main ideas
Alternate between inputs based on pull strategy
Score bounds allow early termination

Bound 1.8
Bound 1.7
Query Top result from L R with scoring
function S(bL, bR) bL bR
Result y Score 1.8
12
HRJN IAE04

The HRJN pull strategy
Pull from the input with highest bound
If (a) is a tie, pull from input with the smaller
number of pulls so far
If (b) is a tie, pull from the left

Bound 2.0
Bound 2.0
1.8
1.9
1.7
Query Top result from L R with scoring
function S(bL, bR) bL bR
x 1.0 y 0.8
y 1.0 z 0.9 w 0.7
Result y Score 1.8
?
?
13
(No Transcript)
14
Supported Operators

Evidence in favor of HRJN
Pull strategy has strong properties
Within constant factor of optimal cost
Optimal for a significant class of inputs
More details in the paper
Efficient in experiments IAE04
? DEEP explicitly supports HRJN
Easily extended to other join operators
Selection operators too

15
DEEP Conceptual View

Formalization

Depth Computation
Statistical Data Model
defined in terms of
Implementation
Estimation Algorithms
Statistics Interface
Data Synopsis
defined in terms of
16
Statistics Model

Statistics yield the distribution of scores for
base tables and joins

FL
FL R
FR
17
Statistics Interface

DEEP accesses statistics with two methods
getFreq(b) Return frequency of b
nextScore(b,i) Return next lowest score on
dimension i

getFreq(b) 3 nextScore(b,1)0.9 nextScore(b,2)0
.5
b

The interface allows for efficient algorithms
Abstracts the physical statistics format
Allows statistics to be generated on-the-fly

18
Statistics Implementation

Interface can be implemented over common types of
data synopses
Can use a histogram if
Base score function is invertible, or
Base score measures distance
Assume uniformity independence if
Base score function is too complex, or
Sufficient statistics are not available

19
Depth Estimation Overview
Top-k query plan
1
s1
1
1
l1
r1
s2
C
2
2
2
l2
r2
B
A
20
Estimating Terminal Score

Idea
Sort by total score
Sum frequencies

Suppose we want the 10th best score

6 9 11
6 3 2 4 2
2.0 1.7 1.6 1.5 1.3
Sterm 1.6
21
Estimation Algorithm

Idea Only process necessary statistics

1
0.7
0.5
6
4
3
1
2
0.9
0.8
2
0.6
Sterm 1.6

Algorithm relies solely on getFreq and nextScore
Avoids materializing complete table
Worst-case complexity equivalent to sorting table
More efficient in practice

22
Depth Estimation Overview
Top-k query plan
1
s1
1
1
l1
r1
2
s2
C
2
2
l2
r2
B
A
23
Estimating Depth for HRJN
Theorem i ? depth of HRJN ? j
Input Scores
Example Sterm 1.6
i
j
11 depth 15