Supporting top-k join queries in relational databases - PowerPoint PPT Presentation

About This Presentation
Title:

Supporting top-k join queries in relational databases

Description:

Supporting top-k join queries in relational databases ... Presented by Jing Chen. 1. Motivation. Select A.1, B.2. From A, B, C. Where A.1=B.1. and B.2 = C.2 ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 36
Provided by: Yah964
Learn more at: https://ranger.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: Supporting top-k join queries in relational databases


1
Supporting top-k join queries in relational
databases
  • Ihab Ilyas, Walid Aref,
  • Ahmed Elmagarmid
  • Presented by Jing Chen

2
1. Motivation
  • Select A.1, B.2
  • From A, B, C
  • Where A.1B.1
  • and B.2 C.2
  • Order By (0.3A.1 0.7B.2)
  • Stop After 5

3
Traditional Joins
  • Nested Loop
  • Merge Join
  • Hash Join

4
2. Goals/Contribution
  • Propose a pipe lined Rank-Join algorithm
  • Analyze the I/O cost of the algorithm
  • Implement the algorithm
  • Propose a optimal join strategy
  • Evaluate performance

5
3. Ripple Join
JOIN L.A R.A L and R are descending ordered
by B
(L1(1,1,5) R1(1,3,5))
L
R
6
3. Ripple Join --contd
JOIN L.A R.A
(L2,R2) (2,2,4),(2,1,4)
L
R
7
Ripple Join --contd
JOIN L.A R.A
(L2,R2) (2,2,4),(2,1,4) (L2,R1) 2,2,4),
(1,3,5) (L1,R2) (1,1,5), (2,1,4)
L
R
8
3. Variation Of Ripple Join
Rectangle Block Hash
9
4. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(1). Get a valid combination using any certain
algorithm Ripple Select (L1, R1) gt No Result
10
4. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(1). Get a valid combination using any certain
algorithm Select (L2, R2) (L2, R2), (L2, R1),
(L1, R2) gt (L1, R2)
11
4. Proposed Algorithm -- Contd
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(2). Compute the score (J) for the result J1(L1,
R2) gt L.B R.B 5 4 9
12
4. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(3). Compute a water mark score (T) by Max (Last
L).B (First R.B), (First L).B (Last R).B
13
4. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
(3). Compute a water mark (T) score by Max (Last
L).B (First R.B), (First L).B (Last
R).B Selection (L1, R1) , (L2, R2) gt T Max
(L2.B R1.B, L1.B R2.B) Max (45, 54) 9

14
4. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
J1 9 T 9 J1 gt T Report J1 Since we need top
2 (k2), continue until k2 and Min(J1, J2, Jk)
gt T
15
4. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
Select (L3, R3) (L3, R3), (L3, R1), (L3, R2),
(L1, R3), (L2, R3) gt (L3, R3), (L2, R3) J2(L2,
R3) 4 3 7 J3(L3, R3) 3 3 6
16
4. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
Calculate T T Max (Last L).B (First R).B,
(First L).B (Last R).B Max L3.B R1.B ,
L1.B R3.B Max(3 5, 5 3) 8
17
4. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
T 8 J1(L1,R2) 9 reported J2( L2, R3) 7
J3(L3, R3) 6 Note, Js are in descending
order Min (J) 6 lt T Continue Comment
Calculate T before J is more efficient. Can stop
after find first Jk gt T
18
4. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
Select (L4, R4) gt (L4, R1), (L2, R4), (L3,
R4) J(L4, R4) 7, J(L2, R4) 6, J(L3, R4)
5 T Max(L4.BR1.B, L1.B R4.B) Max(7, 7) 7
19
4. Proposed Algorithm
Select From L, R Where L.A R.A Order By L.B
R.B Stop After 2
T 7 J1(L1,R2) 9, J2(L2, R3) 7, J3(L4, R1)
7, J3(L3, R3) 6, J4(L2, R4) 6, J5(L3, R4)
5 Min(J1, J2) 7 gt T (k 2) Comment When
reach all records, T does not need to be
calculated, unless, calculate T first, and
compare each J(i) with T immediately
20
4. The Algorithm
21
5. Implementation
  • Hash rank join operator (HRJN)
  • - Use Hash Ripple Join
  • - Two hash table contain the two inputs
  • - A queue holds ordered join results
  • - Ltop, Rtop, Lbottom, Rbottom are used
  • to calculate T

22
5. Implementation contd
  • Issues with HRJN
  • Buffer problem
  • Local Ranking Problem

23
5. Implementation contd
  • Use Block Ripple Join to Solve Local Ranking
    Problem. (block size 2)

24
5. Implementation contd
  • HRJN score-guided join strategy
  • - How to select next (block) tuple
  • T1 Ltop Rbottom , T2 Lbottom Rtop
  • T Max(T1, T2)
  • If T1 gt T2, need to reduce T1. How?

25
5. Implementation contd
  • HRJN score-guided join strategy
  • - How to select next (block) tuple
  • T1 Ltop Rbottom , T2 Lbottom Rtop
  • T Max(T1, T2)
  • If T1 gt T2, need to reduce T1. How?
  • Reduce Rbottom and not reduce Lbottom
    (descending ordered), thus more tuples should be
    retrieved from R to reduce T1

26
6. Join Order
  • When more than two tables join, the join order
    matters. (A and C have high similarity)

27
6. Join Order -- contd
  • Rank-Join order heuristic
  • - Get a ranked sample, top S ranked list from L
    and R
  • - Calculate the similarity using footrule

Where L(i) and R(i) are the rank of object i in L
and R
28
6. Join Order -- contd
29
7. Generalizing the rank-join
  • Using indexes
  • Eliminate duplications
  • Faster termination

30
8. Performance Evaluation
Selectivity 0.2 and m 4
31
8. Performance Evaluation
Selectivity 0.2 and m 4
32
8. Performance Evaluation
Selectivity 0.2 and m 4
33
m 4 and K50
34
m 4 and K50
35
m 4 and K50
Write a Comment
User Comments (0)
About PowerShow.com