Title: Generalized Hash Teams for Join and GroupBy
1Generalized Hash Teams for Join and Group-By
- Alfons Kemper
- Donald Kossmann
- Christian Wiesner
- Universität Passau
- Germany
2Outline
- Motivating Example
- Standard Hash Teams
- Generalized Hash Teams for Joins
- Generalized Hash Teams for Joins/Grouping
- False Drops Analysis
- Application Examples (TPC-D)
- Performance Evaluation
3Traditional Join Plan
R
T
S
R
A
Result
S
A
T
4Traditional Hash Team Join PlanGraefe, Bunker,
Cooper VLDB 98
R
A
R.A
Result
S
A
S.A
T
T.A
5Generalized Hash Teams
6Generalized Hash Teams
6 mod 5 1
Partition on B odd yellow even green
7Generalized Hash Team for Grouping/Aggregation
- select c.City, sum(o.Value)from Customer c,
Order owhere c.C o.Cgroup by c.City
Join and grouping team
Agg
Agg
Ptn on City
Bit- maps (BM)
Ptn on BM
Ptn on C
Ptn on C
Ptn on City
Order
Customer
Customer
Order
8Group (Customer Order )
Partition on City and generate bitmaps for C
City
Customer
C
Partition with bitmaps for C
Order
C
9Partition on City and generate bitmaps for C
City
Customer
C
Partition with bitmaps for C and generate
bitmaps for O
C
Order
O
O
Partition with bitmaps for O
Lineitem
10False Drops
11Overlapping Partitions
Partition on B and generate bitmaps for A
T
Customer
Partition on C and generate bitmaps for O
S
Order
Partition based on the bitmaps for A
Lineitem
R
Partition with Bitmaps
12Applicability ofGeneralized Hash Teams
Partition on B
- for partitioning hierarchical structures A
B
- but it is also correct for
non-strict hierarchies A B (but
performance deteriorates)
Partition on bitmaps for A
13Non-strict hierarchyA B
T S R
14False Drops Estimation
b cardinality of the bitmaps n number of
partitions probability that some s sets a bit
leading to a false drop of an r into a particular
partition total number of false
drops conservative approximation
15Implementation DetailsFine Tuning the
Partitioning
Bloom-Filter Bratbergsengen Valduriez
Bitmaps
16Implementation DetailsTeaming up Join and
Grouping
Group (Customer Order )
Partition on City and generate bitmaps for C
City
Customer
C
C
Order
Partition with bitmaps for C
17Teaming Up Join and Grouping Build Phase
5
25
13
23
PA
M
18Teaming Up Join and Grouping Probe Phase
5
25
13
23
10
1
PA
M
19Performance Comparison
Group (Customer Order )
Memory MB
20False Drops Estimation and Measurement
21Performance Comparison
Memory MB
22False Drops Estimation and Measurement
23Conclusion and Future Work
- Look-Ahead Partitioning for Joins and Grouping
- Applicable for hierarchical data structures
- correctness does not depend on strict hierarchies
- Applicable for several TPC-D (TPC-H and TPC-R)
queries e.g., Q5, Q10, Q18 - Combining Generalized Hash Teams and Order
Preserving Hash Joins (OHJ)
24TPC-D Q5
SELECT N_NAME, SUM(L_EXTENDEDPRICE ( 1
- L_DISCOUNT)) AS REVENUE FROM
CUSTOMER, ORDER, LINEITEM, SUPPLIER, NATION,
REGION WHERE C_CUSTKEY O_CUSTKEY AND
O_ORDERKEY L_ORDERKEY AND L_SUPPKEY
S_SUPPKEY AND C_NATIONKEY S_NATIONKEY
AND S_NATIONKEY N_NATIONKEY AND
N_REGIONKEY R_REGIONKEY AND R_NAME
'region' AND O_ORDERDATE gt DATE 'date'
AND O_ORDERDATE lt DATE 'date' INTERVAL 1
YEAR GROUP BY N_NAME ORDER BY
REVENUE DESC
25TPC-D Q10
SELECT C_CUSTKEY, C_NAME,
SUM(L_EXTENDEDPRICE
(1 - L_DISCOUNT)) AS REVENUE, C_ACCTBAL,
N_NAME, C_ADDRESS, C_PHONE, C_COMMENT FROM
CUSTOMER, ORDER, LINEITEM, NATION WHERE
C_CUSTKEY O_CUSTKEY AND L_ORDERKEY
O_ORDERKEY AND O_ORDERDATE gt DATE
'date' AND O_ORDERDATE lt DATE 'date'
INTERVAL 3 MONTH AND L_RETURNFLAG 'R'
AND C_NATIONKEY N_NATIONKEY GROUP BY
C_CUSTKEY, C_NAME, C_ACCTBAL, C_PHONE,
N_NAME, C_ADDRESS, C_COMMENT ORDER BY
REVENUE DESC
26Indirectly Partitioning a Hierarchical Structure
Customer
City
C
C
Order
O
O
Lineitem
Partition 1
Partition 3
Partition 2