Incremental Maintenance for Non-Distributive Aggregate Functions - PowerPoint PPT Presentation

About This Presentation
Title:

Incremental Maintenance for Non-Distributive Aggregate Functions

Description:

selective recomputation. no longer enough to compute delta ... recomputation step not needed when. only ... maintenance without recomputation. STDDEV, MIN/MAX ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 29
Provided by: the137
Category:

less

Transcript and Presenter's Notes

Title: Incremental Maintenance for Non-Distributive Aggregate Functions


1
Incremental Maintenance for Non-Distributive
Aggregate Functions
Themis Palpanas (U of Toronto) Richard
Sidle Bobbie Cochrane Hamid Pirahesh
  • work done at IBM Almaden Research Center

2
Motivation
  • large amounts of data stored in databases
  • often times data warehouses are used
  • consolidate data from many sources
  • offer more general and descriptive view of data
  • queried by business intelligence tools and
    decision support systems
  • produce expensive OLAP queries
  • these OLAP queries have nice properties
  • based on same set of tables
  • perform similar aggregations

3
Motivation (contd)
  • can efficiently support such queries with
    Automatic Summary Tables (ASTs)
  • materialized queries defined over a set of base
    tables
  • precomputed once, used many times
  • answer complex queries fast
  • must maintain ASTs when base tables change
  • inserts, updates, deletes

4
Motivation (contd)
AST
AST definition
5
Aggregate Functions
  • characterization of functions wrt insertion and
    deletion operations
  • updates are series of deletions and insertions
  • distributive aggregate functions
  • new value computed based on old value and value
    of operation
  • SUM()
  • non-distributive aggregate functions
  • above property does not hold
  • STDDEV()
  • MIN() (because of deletions)

6
Problem Statement
  • given ASTs with aggregate functions
  • distributive
  • SUM, COUNT
  • non-distributive
  • STDDEV, CORRELATION, REGRESSION, MIN/MAX, XMLAGG,
  • when base tables change
  • incrementally maintain affected ASTs
  • efficient maintenance of ASTs with
  • non-distributive aggregate functions

7
Outline
  • Current Approach
  • Our Solution
  • Experimental Evaluation
  • Related Work
  • Conclusions

8
Current Approach
AST
delta
AST definition
9
Current Approach (contd)
  • works for distributive
  • SUM, COUNT
  • does not work for non-distributive
  • STDDEV, CORRELATION, REGRESSION
  • MIN/MAX
  • XMLAGG
  • need new way to deal with these functions

10
Our Solution
  • selective recomputation
  • no longer enough to compute delta
  • must recompute some aggregation groups
  • minimize work to be done
  • choose which groups to recompute
  • optimize query plan

11
Our Solution (contd)
combine old and new values
recompute affected groups
AST
delta
AST definition
12
Our Solution (contd)
  • the 5 steps
  • compute new aggregate values
  • change column derivation
  • recompute only affected groups
  • eliminate unnecessary operations
  • optimize for special cases

13
Initial Query Plan
  • Query Graph Model (QGM)

14
1. Compute New Aggregate Values
  • compute delta for distributive functions
  • recompute non-distributive functions
  • get those values only for affected groups
  • duplicate computation for distributive functions!

15
2. Change Column Derivation
UDI
  • change column derivation
  • rewrite phase projects out unused columns
  • entire AST gets recomputed!

LOJ
AST
LOJ
non-distributive only
prop
AST
distributive only
16
2. Change Column Derivation
  • example AST
  • SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salar
    y)
  • FROM employees
  • GROUP BY dept_id
  • result of COUNT() computed from old propagate
    phase
  • results of MAX() and STDDEV() from AST definition

17
3. Recompute Affected Groups
UDI
  • push join predicate down in AST
  • only affected groups are recomputed
  • special rules for super-aggregates
  • GROUPING SETS
  • ROLLUP
  • CUBE

non-distributive only
LOJ
AST
LOJ
distributive only
AST
J
J
prop

T1
Tk
18
3. Recompute Affected Groups
  • special treatment for ASTs with super-aggregates
  • predicates not pushdownable
  • caution not to compute totals of totals
  • build special join predicate
  • ensure correct aggregations
  • change rewrite rules
  • allow predicate pushdown through super aggregates
  • applicable only for special join predicate

19
4. Remove Unnecessary Operations
  • outerjoin not always needed
  • when changes are only inserts
  • reroute columns from propagate phase through AST
  • remove outerjoin operator
  • same for updates not referencing AST grouping
    columns and predicates

UDI
LOJ
all columns
AST
distributive only
AST
J
J
prop

T1
Tk
20
4. Remove Unnecessary Operations
  • example AST
  • SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salar
    y)
  • FROM employees
  • GROUP BY dept_id
  • modification on base tables
  • UPDATE employees SET salary10 WHERE agegt40
  • outerjoin operation will not be built
  • update does not refer to grouping column
    (dept_id), and no predicate in AST refers to
    updated column (salary)
  • certain that no tuples in AST will be deleted
  • only STDDEV() will be recomputed
  • the rest are not affected by changes

21
5. Optimize for Special Cases
  • recomputation step not needed when
  • only insertions and only MIN/MAX functions
  • build predicate in apply phase
  • check if new min/max should replace old values
  • only deletions referring only to grouping columns
    of AST
  • can only cause entire groups to be deleted
  • handled in apply phase

22
5. Optimize for Special Cases
  • example AST
  • SELECT dept_id,COUNT(emp_id),MAX(age),STDDEV(salar
    y)
  • FROM employees
  • GROUP BY dept_id
  • modification on base tables
  • DELETE FROM employees WHERE dept_idgt40
  • selective recomputation step not needed
  • deletion refers only to grouping column (dept_id)
  • certain that entire groups will be deleted from
    AST
  • no other groups will be affected

23
Experimental Evaluation
  • prototype implementation in IBM DB2 UDB
  • star schema database
  • sales of products over 5 year time period
  • fact table 10 million tuples
  • AST with non-distributive aggregate function
  • 240,000 tuples
  • workload simulates nightly updates
  • add/delete data for first day of month
  • add/delete data for second day of month
  • add/delete data for full month

24
Experimental Evaluation (contd)
workload 1 workload 2 workload 3
incremental 286 294 420
full refresh 699 702 692
  • deletions require 40-60 of full refresh time

workload 1 workload 2 workload 3
incremental 3 n/a 31
full refresh 699 702 692
  • optimized deletions require 1-4 of full refresh
    time

25
Experimental Evaluation (contd)
workload 1 workload 2 workload 3
incremental 151 158 180
full refresh 702 702 721
  • insertions/updates require 20-25 of full refresh
    time

26
Related Work
  • incremental view maintenance
  • differential refresh algorithms
  • Lindsay et al. 1986, Blakeley et al. 1986, Qian
    and Wiederhold 1991, Ceri and Widom 1991
  • deferred incremental maintenance
  • Colby et al. 1996, Salem et al. 2000
  • views with aggregation
  • Quass 1996, Mumick et al. 1997

27
Conclusions
  • incremental maintenance for ASTs with
    non-distributive aggregate functions
  • support MIN/MAX, STDDEV, CORRELATION, REGRESSION,
    XMLAGG,
  • efficient selective recomputation
  • recompute only affected groups
  • optimize query plan
  • customize for special cases
  • significant performance improvements

28
Future Work
  • examine use of work areas
  • temporary storage space
  • store intermediate values
  • maintenance without recomputation
  • STDDEV, MIN/MAX(?),
  • very helpful for ASTs defined with
    super-aggregates
  • ASTs with HAVING clauses
  • do not know when groups will enter/leave AST
Write a Comment
User Comments (0)
About PowerShow.com