Supporting topk join queries in relational databases - PowerPoint PPT Presentation

About This Presentation
Title:

Supporting topk join queries in relational databases

Description:

Example 2 Same Video DB System - Get top 10 frames most similar to query image ... Block obtain data b tuples at a time, for classic ripple join b = 1 ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 50
Provided by: Yah964
Learn more at: https://crystal.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: Supporting topk join queries in relational databases


1
Supporting top-k join queries in relational
databases
  • Ihab F. Ilyas, Walid G. Aref,
  • Ahmed K. Elmagarmid
  • Presented by Rebecca M. Atchley

Thursday, April 19, 2007
2
Examples to Consider
  • single criterion ranking
  • Example 1 Video DB System storing video
    features
  • multi-criteria ranking (or top-k join query)
  • Example 2 Same Video DB System - Get top 10
    frames most similar to query image based on color
    AND texture combined.
  • Example 3 - a user interested in finding top 5
    locations where combined cost of buying a house
    (in Houses DB) and paying school tuition (in
    Schools DB) in that location is minimum.

3
Motivation
  • Sort-merge joins (MGJN) only preserves order of
    joined column data
  • Nested-loop joins (NLJN) only orders on the outer
    relations are preserved through the join
  • Hash join (HSJN) doesnt preserve order if hash
    tables do not fit in memory

4
Example Ranking Query Q1
  • SELECT A.1, B.2
  • FROM A, B, C
  • WHERE A.1 B.1 and B.2 C.2
  • ORDER BY (0.3A.1 0.7B.2)
  • STOP AFTER 5
  • Problems???

5
What Is Needed
  • Perform basic join operation
  • Conform with current query operator interface
  • So it can be integrated into query execution
    plans
  • Utilize the orderings of its inputs
  • Avoid the unnecessary sorting of the join
    results
  • Produce 1st ranked join results ASAP
  • Adapt to input fluctuations
  • Major characteristic in applications that depend
    on ranking

6
Papers Contributions
  • Proposes a new Rank-Join algorithm satisfying
    this criteria along with its correctness proof
  • Analyzes the I/O cost of the algorithm along with
    proof of its optimality
  • Implements the proposed algorithm in pipelined
    rank-join operators (based on ripple join)
  • Integrate into QEPs as ordinary join operations
  • Retain orders of inputs avoid sort of join
    results
  • Produce top-k results incrementally
  • Proposes an optimal join strategy
  • score-guided and adaptive
  • Provides optimization mechanism to determine best
    order to perform the rank-join operations
  • Evaluates performance and compares other
    approaches

7
Related Work
  • Fagin et al. introduced the 1st set of algorithms
    to answer ranking queries
  • The TA Algorithm
  • The NRA Algorithm
  • The J Algorithm
  • The NRA-RJ Algorithm
  • Importance-based join processing

8
Overview of the Ripple Join
JOIN L.A R.A L and R are descending, ordered
by B
We get a tuple from L and a tuple from
R (L1(1,1,5) R1(1,3,5)) No valid join result
L
R
9
3. Ripple Join --contd
JOIN L.A R.A
We get a second tuple from L and a second tuple
from R and join with prior tuples, creating all
possible combinations (L2,R2) (2,2,4),(2,1,4)
L
R
10
Ripple Join --contd
JOIN L.A R.A
(L2,R2) (2,2,4),(2,1,4) (L2,R1) 2,2,4),
(1,3,5) (L1,R2) (1,1,5), (2,1,4) is a valid
join result!
L
R
11
Variations of the Ripple Join
Rectangle obtain tuples from one source at a
higher rate than from the other source Block
obtain data b tuples at a time, for classic
ripple join b 1 Hash in memory, maintain hash
tables of the samples obtained so far Faster IO
Degrades to block ripple join when hash tables
exceed memory size
12
The Rank-Join Algorithm
13
A Rank-Join Algorithm Example
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
Initial Input (1). Get a valid join combination
using some join strategy Ripple Select (L1, R1)
gt No Valid Join Result Next input (1). Get a
valid join combination using some join
strategy Ripple Select (L2, R2) (L2, R2), (L2,
R1), (L1, R2) gt (L1, R2) is a valid join result
14
Example Continued (1)
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(2). Compute the score (J) for the result J1(L1,
R2) gt L.B R.B 5 4 9 (3). Compute a
threshold score T Max ( Last L.B First R.B,
First L.B Last R.B ) For Ripple Selection
(L2, R2) gt T Max ( L2.B R1.B, L1.B R2.B
) Max (45, 54) 9
15
Example Continued (4)
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
J1 9 T 9 (4). J1 gt T, so report J1 in top-k
results (i.e. add it to list Lk ) Since we need
top 2 (k2), continue until k2 and Min(J1, J2,
Jk) gt T
16
Example Continued (5)
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
Next input 1). Get a valid join combination using
some join strategy Ripple Select (L3, R3) (L3,
R3), (L3, R1), (L3, R2), (L1, R3), (L2, R3) gt
(L3, R3), (L2, R3) are valid join results (2).
Compute the scores (J) for the results J2(L2,
R3) 4 3 7 J3(L3, R3) 3 3 6
17
Example Continued (6)
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(3). Calculate a NEW threshold T T Max ( Last
L.B First R.B, First L.B Last R.B ) Max
( L3.B R1.B , L1.B R3.B ) Max(3 5, 5
3) 8
18
Example Continued (7)
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
T 8 J1(L1,R2) 9 reported J2( L2, R3) 7
J3(L3, R3) 6 Note, Js are in descending
order (4). Min (J) 6 lt T so continue Comment
Calculate T before J is more efficient. Can stop
after find first Jk gt T
19
Example Continued (8)
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
Next input (1). Get a valid join combination
using some join strategy Ripple Select ( L4, R4)
gt (L4, R1), (L2, R4), (L3, R4) (2). Compute the
scores (J) for the results J(L4, R4) 7, J(L2,
R4) 6, J(L3, R4) 5 (3). Calculate a NEW
threshold T T Max( L4.BR1.B, L1.B R4.B )
Max( 7, 7 ) 7
20
Example Continued (9)
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
T 7 J1(L1,R2) 9, J2(L2, R3) 7, J3(L4, R1)
7, J3(L3, R3) 6, J4(L2, R4) 6, J5(L3, R4)
5 (4). Min(J1, J2) 7 gt T (k 2), so report
J2 and STOP Comment When reach all records, T
does not need to be calculated, unless, calculate
T first, and compare each J(i) with T immediately
21
Example With Different Strategy
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
Instead of Select Ripple, Select Rectangle
Ripple Obtain all tuples in L and only 1st tuple
in R Initial Input (1). Get a valid join
combination using some join strategy Rectangle
Ripple (L1 to L4, R1) gt (L4, R1) is a valid join
result (2). Compute the score (J) for the
result J1(L4, R1) 2 5 7 (3). Calculate a
NEW threshold T T Max( L4.BR1.B, L1.B R1.B )
Max( 7, 10 ) 10
22
Example With Different Strategy Continued (2)
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(4). Min(J1, J2) 7 lt T 10 (k 2), so just
continue Next Input (1). Get a valid join
combination using some join strategy Rectangle
Ripple (L1 to L4, R2) gt (L1, R2) is a new valid
join result (2). Compute the score (J) for the
result J2(L1, R2) 9 (3). Calculate a NEW
threshold T T Max( L1.BR1.B, L1.B R2.B )
Max( 10, 9 ) 10
23
Example With Different Strategy Continued (3)
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(4). Min(J1, J2) 9 lt T 10 (k 2), so
continue J2 cannot be reported because of
threshold 10, but this was the top-ranked join
result in prior strategy This suggests using
join strategies that reduce the threshold value
as quickly as possible to be able to report
top-ranked join results early on.
24
New Physical rank-join Operators
  • Hash rank join operator (HRJN)
  • - Use Hash Ripple Join
  • - Two hash table contain the two inputs
  • - A queue holds ordered join results
  • - Ltop, Rtop, Lbottom, Rbottom are used
  • to calculate T

25
Open operation of HRJN
26
GetNext operation of HRJN
27
HRJN Implementation Issues
  • Buffer problem
  • Local Ranking Problem

OP1
L3
OP2
L2
L1
28
Solving the Issues
  • Use Block Ripple Join to Solve Local Ranking
    Problem - Set p 2

29
HRJN join strategy
  • HRJN is score-guided
  • - How to select next (block) tuple
  • T1 Ltop Rbottom , T2 Lbottom Rtop
  • T Max(T1, T2)
  • If T1 gt T2, need to reduce T1. How?
  • HRJN uses XJoin to determine input availability
    and use that as a guide

30
HRJN join strategy
  • HRJN is score-guided
  • - How to select next (block) tuple
  • T1 Ltop Rbottom , T2 Lbottom Rtop
  • T Max(T1, T2)
  • If T1 gt T2, need to reduce T1. How?
  • Reduce Rbottom and not reduce Lbottom
    (descending ordered), thus more tuples should be
    retrieved from R to reduce T1
  • HRJN uses XJoin to determine input availability
    and use that as a guide

31
Effect of Join Order
  • When more than two tables join, the join order
    matters. (A and C have high similarity)

32
Determining Join Order
  • Rank-Join order heuristic
  • - Get a ranked sample of size S from L and R
  • - Calculate the similarity using footrule
    distance

Where L(i) and R(j) are the ranks of object i in
L and object j in R and i,j is a valid join result
33
The Rank-Join Order Algorithm
34
Similarity-Based Join Ordering
35
7. Generalizing the rank-join
  • Using indexes
  • an index on only one of the two inputs
  • an index on each of the two inputs.
  • Eliminate duplications
  • Faster termination

36
Performance Evaluation
37
Selectivity 0.2 and m 4
38
Selectivity 0.2 and m 4
39
Selectivity 0.2 and m 4
40
m 4 and k 50
41
m 4 and k 50
42
m 4 and k 50
43
selectivity 0.2 and k 50
44
selectivity 0.2 and k 50
45
selectivity 0.2 and k 50
46
(No Transcript)
47
Conclusions
  • New join-rank algorithm (independent of join
    strategy) is correct and optimal
  • Physical query operator HRJN (Hash Rank Join) is
    based on ripple join implements algorithm
  • Score-guided join strategy applied to HRJN is the
    HRJN operator and integrates into QEPs
  • Efficient rank-order join heuristic chooses
    near-optimal join order
  • General rank-join algorithm uses indexes for
    faster termination of ranking
  • Experimental evaluation shows significant
    performance enhancement

48
References
  • Ihab F. Ilyas, Walid G. Aref, Ahmed K.
    Elmagarmid Supporting top-k join queries in
    relational databases. VLDB J. 13(3) 207-221
    (2004)
  • Jing Chen CSE 6392 - Spring 2005, University of
    Texas at Arlington, PowerPoint slide presentation
    of Supporting top-k join queries in relational
    databases, http//ranger.uta.edu/gdas/Courses/Sp
    ring2005/DBIR/slides/top-k_join.ppt.

49
THANK YOU !
Write a Comment
User Comments (0)
About PowerShow.com