Title: Chapter 7: Data Warehousing
1- Chapter 7 Data Warehousing
- Title Data Cube - A Relational Aggregation
Operator Generalizing Group-By, Cross-Tab, and
Sub-Totals - Authors J. Gray, S. Chaudhuri, A. Bosworth, A.
Layman, D. Reichart, M. Venkatrao, F. Pellow, and
H. Pirahesh - (Microsoft and IBM Research Centers)
- Reviewers G10
- Kuo-Wei Hsu Amanuel Godefa
2Data Cube - A Relational Aggregation Operator
Generalizing Group-By, Cross-Tab, and Sub-Totals
- Outline
- Motivation and Background
- Problem Statement
- Contributions
- Key Concepts
- Methodology
- Assumptions
- Rewrite Today
3Motivation and Background
- Need of better tool or operator
- achieve analysis on large and complex datasets,
e.g., data warehousing - HISTOGRAM
- Statistical summary of groups and subgroups
sub-totals totals - rollup drilldown reports
4Problem Statement
- Given
- Multi-dimensional data or datasets.
- Find
- Summation/aggregation functions over one or more
dimensions. - Objective
- Provide a new operators
- Description
- SQL on decision making on multi-dimensional data
- Relates sub-totals and totals for drill-down and
roll-ups reports - And Shortcomings of GROUP BY
5Contributions
- Enhance traditional relational DBs
multi-dimensional data - ability to support operations over N-dimensional
- Improve 2D flat files (SQL tables) model with an
N-dimensional problem
6Contributions
- Fit cube and roll-up operators in SQL
- Define new aggregate functions for cubes
- efficient techniques to compute cube
- Improve SQL Standard.
7Key Concepts
- How to obtain the data cube / how to implement
CUBE and ROLLUP operations - Recursive algorithm / recursively computes
aggregates with referencing sub-aggregates - Categorize of aggregate functions
- Distributive (e.g. count, min, max, sum,)
- Algebraic (e.g. average, variance,)
- Holistic (e.g. median, mode)
8 Aggregate FunctionDistributive( F ) if there
exists a G such that F( Xi,j ) G ( F ( Xi,j )
) where i1,n j1,..m
9 Aggregate Function.Algebraic( F ) if there
exists a G1(),,Gp() and H() such that F( Xi,j )
H(Gq( F ( Xi,j ) ) i1,n j1,..m
q1pe.g. Average
10Aggregation up to 3-Dimension
From Data Cube- fig 3
11Example
12Example
- SELECT Model, Year,
- SUM (sales) AS YTD_Sales
- FROM autoSales
- GROUP BY Model, Year
- WITH CUBE
13CUBE
SELECT Model, Year, Color, Sum(Sales) AS
SalesFROM SalesWHERE Model in Ford,
ChevyAND Year BETWEEN 1990 AND 1992GROUP BY
CUBE Model, Year, Color
Example CUBE Operator Source Data CUBE
14Validation Methodology
- Theoretical Analysis,
- e.g., the three types of aggregate functions
- Practical Examples
- Case Studies,
- e.g., Microsoft SQL server
15Assumptions
- Relational data model
- The size of the data
- Normalized tables
- Multi - Dimensional
16To rewrite the paper
- Memory size doesnt need to be focused
- utilize memory without partitioning datasets
(participating tables) - Warehousing in a distributed environment.
- Heterogeneous DBMSs
- normalized locally but unnormalized globally
- ROLLUP and CUBE are SQL extensions
- SQL 6.5 and Oracle 8i (above).
17References
- (T7.3) J. Gray, S. Chaudhuri, A. Bosworth, A.
Layman, D. Reichart and M. Venkatrao, DataCube A
Relational Aggregation Operator Generalizing
Group-by, Cross-tab and Sub-Totals, Data Mining
and Knowledge Discovery 1(1), 1997, 29-53. - SQL Server 2005 Books Online, Microsoft
Corporation - Lecture note from Prof. S. Shekharwww.cs.umn.edu/
research/shashi-group/fbkof_datacube.ppt