Incremental Maintenance for Non-Distributive Aggregate Functions - PowerPoint PPT Presentation

About This Presentation

Title:

Incremental Maintenance for Non-Distributive Aggregate Functions

Description:

selective recomputation. no longer enough to compute delta ... recomputation step not needed when. only ... maintenance without recomputation. STDDEV, MIN/MAX ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 29

Provided by: the137

Category:

more less

Transcript and Presenter's Notes

Title: Incremental Maintenance for Non-Distributive Aggregate Functions

1
Incremental Maintenance for Non-Distributive
Aggregate Functions
Themis Palpanas (U of Toronto) Richard
Sidle Bobbie Cochrane Hamid Pirahesh

work done at IBM Almaden Research Center

2
Motivation

large amounts of data stored in databases
often times data warehouses are used
consolidate data from many sources
offer more general and descriptive view of data
queried by business intelligence tools and
decision support systems
produce expensive OLAP queries
these OLAP queries have nice properties
based on same set of tables
perform similar aggregations

3
Motivation (contd)

can efficiently support such queries with
Automatic Summary Tables (ASTs)
materialized queries defined over a set of base
tables
precomputed once, used many times
answer complex queries fast
must maintain ASTs when base tables change
inserts, updates, deletes

4
Motivation (contd)
AST
AST definition
5
Aggregate Functions

characterization of functions wrt insertion and
deletion operations
updates are series of deletions and insertions
distributive aggregate functions
new value computed based on old value and value
of operation
SUM()
non-distributive aggregate functions
above property does not hold
STDDEV()
MIN() (because of deletions)

6
Problem Statement

given ASTs with aggregate functions
distributive
SUM, COUNT
non-distributive
STDDEV, CORRELATION, REGRESSION, MIN/MAX, XMLAGG,
when base tables change
incrementally maintain affected ASTs
efficient maintenance of ASTs with
non-distributive aggregate functions

7
Outline

Current Approach
Our Solution
Experimental Evaluation
Related Work
Conclusions

8
Current Approach
AST
delta
AST definition
9
Current Approach (contd)

works for distributive
SUM, COUNT
does not work for non-distributive
STDDEV, CORRELATION, REGRESSION
MIN/MAX
XMLAGG
need new way to deal with these functions

10
Our Solution

selective recomputation
no longer enough to compute delta
must recompute some aggregation groups
minimize work to be done
choose which groups to recompute
optimize query plan

11
Our Solution (contd)
combine old and new values
recompute affected groups
AST
delta
AST definition
12
Our Solution (contd)

the 5 steps
compute new aggregate values
change column derivation
recompute only affected groups
eliminate unnecessary operations
optimize for special cases

13
Initial Query Plan

Query Graph Model (QGM)

14
1. Compute New Aggregate Values

compute delta for distributive functions
recompute non-distributive functions
get those values only for affected groups
duplicate computation for distributive functions!

15
2. Change Column Derivation
UDI

change column derivation
rewrite phase projects out unused columns
entire AST gets recomputed!

LOJ
AST
LOJ
non-distributive only
prop
AST
distributive only
16
2. Change Column Derivation

example AST
SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salar
y)
FROM employees
GROUP BY dept_id
result of COUNT() computed from old propagate
phase
results of MAX() and STDDEV() from AST definition

17
3. Recompute Affected Groups
UDI

push join predicate down in AST
only affected groups are recomputed
special rules for super-aggregates
GROUPING SETS
ROLLUP
CUBE

non-distributive only
LOJ
AST
LOJ
distributive only
AST
J
J
prop

T1
Tk
18
3. Recompute Affected Groups

special treatment for ASTs with super-aggregates
predicates not pushdownable
caution not to compute totals of totals
build special join predicate
ensure correct aggregations
change rewrite rules
allow predicate pushdown through super aggregates
applicable only for special join predicate

19
4. Remove Unnecessary Operations

outerjoin not always needed
when changes are only inserts
reroute columns from propagate phase through AST
remove outerjoin operator
same for updates not referencing AST grouping
columns and predicates

UDI
LOJ
all columns
AST
distributive only
AST
J
J
prop

T1
Tk
20
4. Remove Unnecessary Operations

example AST
SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salar
y)
FROM employees
GROUP BY dept_id
modification on base tables
UPDATE employees SET salary10 WHERE agegt40
outerjoin operation will not be built
update does not refer to grouping column
(dept_id), and no predicate in AST refers to
updated column (salary)
certain that no tuples in AST will be deleted
only STDDEV() will be recomputed
the rest are not affected by changes

21
5. Optimize for Special Cases

recomputation step not needed when
only insertions and only MIN/MAX functions
build predicate in apply phase
check if new min/max should replace old values
only deletions referring only to grouping columns
of AST
can only cause entire groups to be deleted
handled in apply phase

22
5. Optimize for Special Cases

example AST
SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salar
y)
FROM employees
GROUP BY dept_id
modification on base tables
DELETE FROM employees WHERE dept_idgt40
selective recomputation step not needed
deletion refers only to grouping column (dept_id)
certain that entire groups will be deleted from
AST
no other groups will be affected

23
Experimental Evaluation

prototype implementation in IBM DB2 UDB
star schema database
sales of products over 5 year time period
fact table 10 million tuples
AST with non-distributive aggregate function
240,000 tuples
workload simulates nightly updates
add/delete data for first day of month
add/delete data for second day of month
add/delete data for full month

24
Experimental Evaluation (contd)
workload 1 workload 2 workload 3
incremental 286 294 420
full refresh 699 702 692

deletions require 40-60 of full refresh time

workload 1 workload 2 workload 3
incremental 3 n/a 31
full refresh 699 702 692

optimized deletions require 1-4 of full refresh
time

25
Experimental Evaluation (contd)
workload 1 workload 2 workload 3
incremental 151 158 180
full refresh 702 702 721

insertions/updates require 20-25 of full refresh
time

26
Related Work

incremental view maintenance
differential refresh algorithms
Lindsay et al. 1986, Blakeley et al. 1986, Qian
and Wiederhold 1991, Ceri and Widom 1991
deferred incremental maintenance
Colby et al. 1996, Salem et al. 2000
views with aggregation
Quass 1996, Mumick et al. 1997

27
Conclusions