Title: Supporting top-k join queries in relational databases
1Supporting top-k join queries in relational
databases
- Ihab Ilyas, Walid Aref,
- Ahmed Elmagarmid
- Presented by Jing Chen
21. Motivation
- Select A.1, B.2
- From A, B, C
- Where A.1B.1
- and B.2 C.2
- Order By (0.3A.1 0.7B.2)
- Stop After 5
3Traditional Joins
- Nested Loop
- Merge Join
- Hash Join
42. Goals/Contribution
- Propose a pipe lined Rank-Join algorithm
- Analyze the I/O cost of the algorithm
- Implement the algorithm
- Propose a optimal join strategy
- Evaluate performance
53. Ripple Join
JOIN L.A R.A L and R are descending ordered
by B
(L1(1,1,5) R1(1,3,5))
L
R
63. Ripple Join --contd
JOIN L.A R.A
(L2,R2) (2,2,4),(2,1,4)
L
R
7Ripple Join --contd
JOIN L.A R.A
(L2,R2) (2,2,4),(2,1,4) (L2,R1) 2,2,4),
(1,3,5) (L1,R2) (1,1,5), (2,1,4)
L
R
83. Variation Of Ripple Join
Rectangle Block Hash
94. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(1). Get a valid combination using any certain
algorithm Ripple Select (L1, R1) gt No Result
104. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(1). Get a valid combination using any certain
algorithm Select (L2, R2) (L2, R2), (L2, R1),
(L1, R2) gt (L1, R2)
114. Proposed Algorithm -- Contd
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(2). Compute the score (J) for the result J1(L1,
R2) gt L.B R.B 5 4 9
124. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(3). Compute a water mark score (T) by Max (Last
L).B (First R.B), (First L).B (Last R).B
134. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(3). Compute a water mark (T) score by Max (Last
L).B (First R.B), (First L).B (Last
R).B Selection (L1, R1) , (L2, R2) gt T Max
(L2.B R1.B, L1.B R2.B) Max (45, 54) 9
144. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
J1 9 T 9 J1 gt T Report J1 Since we need top
2 (k2), continue until k2 and Min(J1, J2, Jk)
gt T
154. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
Select (L3, R3) (L3, R3), (L3, R1), (L3, R2),
(L1, R3), (L2, R3) gt (L3, R3), (L2, R3) J2(L2,
R3) 4 3 7 J3(L3, R3) 3 3 6
164. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
Calculate T T Max (Last L).B (First R).B,
(First L).B (Last R).B Max L3.B R1.B ,
L1.B R3.B Max(3 5, 5 3) 8
174. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
T 8 J1(L1,R2) 9 reported J2( L2, R3) 7
J3(L3, R3) 6 Note, Js are in descending
order Min (J) 6 lt T Continue Comment
Calculate T before J is more efficient. Can stop
after find first Jk gt T
184. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
Select (L4, R4) gt (L4, R1), (L2, R4), (L3,
R4) J(L4, R4) 7, J(L2, R4) 6, J(L3, R4)
5 T Max(L4.BR1.B, L1.B R4.B) Max(7, 7) 7
194. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
T 7 J1(L1,R2) 9, J2(L2, R3) 7, J3(L4, R1)
7, J3(L3, R3) 6, J4(L2, R4) 6, J5(L3, R4)
5 Min(J1, J2) 7 gt T (k 2) Comment When
reach all records, T does not need to be
calculated, unless, calculate T first, and
compare each J(i) with T immediately
204. The Algorithm
215. Implementation
- Hash rank join operator (HRJN)
- - Use Hash Ripple Join
- - Two hash table contain the two inputs
- - A queue holds ordered join results
- - Ltop, Rtop, Lbottom, Rbottom are used
- to calculate T
225. Implementation contd
- Issues with HRJN
- Buffer problem
- Local Ranking Problem
-
235. Implementation contd
- Use Block Ripple Join to Solve Local Ranking
Problem. (block size 2) -
245. Implementation contd
- HRJN score-guided join strategy
- - How to select next (block) tuple
- T1 Ltop Rbottom , T2 Lbottom Rtop
- T Max(T1, T2)
- If T1 gt T2, need to reduce T1. How?
-
255. Implementation contd
- HRJN score-guided join strategy
- - How to select next (block) tuple
- T1 Ltop Rbottom , T2 Lbottom Rtop
- T Max(T1, T2)
- If T1 gt T2, need to reduce T1. How?
- Reduce Rbottom and not reduce Lbottom
(descending ordered), thus more tuples should be
retrieved from R to reduce T1
266. Join Order
- When more than two tables join, the join order
matters. (A and C have high similarity)
276. Join Order -- contd
- Rank-Join order heuristic
- - Get a ranked sample, top S ranked list from L
and R - - Calculate the similarity using footrule
Where L(i) and R(i) are the rank of object i in L
and R
286. Join Order -- contd
297. Generalizing the rank-join
- Using indexes
- Eliminate duplications
- Faster termination
308. Performance Evaluation
Selectivity 0.2 and m 4
318. Performance Evaluation
Selectivity 0.2 and m 4
328. Performance Evaluation
Selectivity 0.2 and m 4
33m 4 and K50
34m 4 and K50
35m 4 and K50