Generalized Hash Teams for Join and GroupBy - PowerPoint PPT Presentation

About This Presentation
Title:

Generalized Hash Teams for Join and GroupBy

Description:

Generalized Hash Teams for Joins/Grouping. False Drops Analysis ... [Graefe, Bunker, Cooper: VLDB 98] R. S. A. A. T. Result. A. A. R.A. S.A. T.A. R. A. A. S. T ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 27
Provided by: professo73
Category:

less

Transcript and Presenter's Notes

Title: Generalized Hash Teams for Join and GroupBy


1
Generalized Hash Teams for Join and Group-By
  • Alfons Kemper
  • Donald Kossmann
  • Christian Wiesner
  • Universität Passau
  • Germany

2
Outline
  • Motivating Example
  • Standard Hash Teams
  • Generalized Hash Teams for Joins
  • Generalized Hash Teams for Joins/Grouping
  • False Drops Analysis
  • Application Examples (TPC-D)
  • Performance Evaluation

3
Traditional Join Plan
R
T
S
R
A
Result
S
A
T
4
Traditional Hash Team Join PlanGraefe, Bunker,
Cooper VLDB 98
R
A
R.A
Result
S
A
S.A
T
T.A
5
Generalized Hash Teams
6
Generalized Hash Teams
6 mod 5 1
Partition on B odd yellow even green
7
Generalized Hash Team for Grouping/Aggregation
  • select c.City, sum(o.Value)from Customer c,
    Order owhere c.C o.Cgroup by c.City

Join and grouping team
Agg
Agg
Ptn on City
Bit- maps (BM)
Ptn on BM
Ptn on C
Ptn on C
Ptn on City
Order
Customer
Customer
Order
8
Group (Customer Order )
Partition on City and generate bitmaps for C
City
Customer
C
Partition with bitmaps for C
Order
C
9
Partition on City and generate bitmaps for C
City
Customer
C
Partition with bitmaps for C and generate
bitmaps for O
C
Order
O
O
Partition with bitmaps for O
Lineitem
10
False Drops
11
Overlapping Partitions
Partition on B and generate bitmaps for A
T
Customer
Partition on C and generate bitmaps for O
S
Order
Partition based on the bitmaps for A
Lineitem
R
Partition with Bitmaps
12
Applicability ofGeneralized Hash Teams
Partition on B
  • for partitioning hierarchical structures A
    B
  • but it is also correct for
    non-strict hierarchies A B (but
    performance deteriorates)

Partition on bitmaps for A
13
Non-strict hierarchyA B
T S R
14
False Drops Estimation
b cardinality of the bitmaps n number of
partitions probability that some s sets a bit
leading to a false drop of an r into a particular
partition total number of false
drops conservative approximation
15
Implementation DetailsFine Tuning the
Partitioning
Bloom-Filter Bratbergsengen Valduriez
Bitmaps
16
Implementation DetailsTeaming up Join and
Grouping
Group (Customer Order )
Partition on City and generate bitmaps for C
City
Customer
C
C
Order
Partition with bitmaps for C
17
Teaming Up Join and Grouping Build Phase
5
25
13
23
PA
M
18
Teaming Up Join and Grouping Probe Phase
5
25
13
23
10
1
PA
M
19
Performance Comparison
Group (Customer Order )
Memory MB
20
False Drops Estimation and Measurement
21
Performance Comparison
Memory MB
22
False Drops Estimation and Measurement
23
Conclusion and Future Work
  • Look-Ahead Partitioning for Joins and Grouping
  • Applicable for hierarchical data structures
  • correctness does not depend on strict hierarchies
  • Applicable for several TPC-D (TPC-H and TPC-R)
    queries e.g., Q5, Q10, Q18
  • Combining Generalized Hash Teams and Order
    Preserving Hash Joins (OHJ)

24
TPC-D Q5
SELECT N_NAME, SUM(L_EXTENDEDPRICE ( 1
- L_DISCOUNT)) AS REVENUE FROM
CUSTOMER, ORDER, LINEITEM, SUPPLIER, NATION,
REGION WHERE C_CUSTKEY O_CUSTKEY AND
O_ORDERKEY L_ORDERKEY AND L_SUPPKEY
S_SUPPKEY AND C_NATIONKEY S_NATIONKEY
AND S_NATIONKEY N_NATIONKEY AND
N_REGIONKEY R_REGIONKEY AND R_NAME
'region' AND O_ORDERDATE gt DATE 'date'
AND O_ORDERDATE lt DATE 'date' INTERVAL 1
YEAR GROUP BY N_NAME ORDER BY
REVENUE DESC
25
TPC-D Q10
SELECT C_CUSTKEY, C_NAME,
SUM(L_EXTENDEDPRICE
(1 - L_DISCOUNT)) AS REVENUE, C_ACCTBAL,
N_NAME, C_ADDRESS, C_PHONE, C_COMMENT FROM
CUSTOMER, ORDER, LINEITEM, NATION WHERE
C_CUSTKEY O_CUSTKEY AND L_ORDERKEY
O_ORDERKEY AND O_ORDERDATE gt DATE
'date' AND O_ORDERDATE lt DATE 'date'
INTERVAL 3 MONTH AND L_RETURNFLAG 'R'
AND C_NATIONKEY N_NATIONKEY GROUP BY
C_CUSTKEY, C_NAME, C_ACCTBAL, C_PHONE,
N_NAME, C_ADDRESS, C_COMMENT ORDER BY
REVENUE DESC
26
Indirectly Partitioning a Hierarchical Structure
Customer
City
C
C
Order
O
O
Lineitem
Partition 1
Partition 3
Partition 2
Write a Comment
User Comments (0)
About PowerShow.com