Title: Materialized View Selection and Maintenance using Multi-Query Optimization
1Materialized View Selection and Maintenance using
Multi-Query Optimization
- Hoshi Mistry
- Prasan Roy
- S. Sudarshan
- Krithi Ramamritham
2Materialized Views
- Complex results materialized in order to speed up
queries that depend on these results - Increasingly being supported by commercial
database systems (e.g. Oracle8i) - Crucial in data warehousing environments
3Materialized View Maintenance
- As underlying data changes, the materialized
views need to be refreshed - Efficient view maintenance crucial!
- Need to provide up-to-date query responses
growing - Amount of data added to data warehouses
increasing - Maintenance time window shrinking
4Focus
- Efficient techniques for maintenance of a set of
materialized views (MVs) by - Transient materialization of common
subexpressions (CSEs) - Selection of additional MVs
- Computation of the best maintenance policy and
plan for each MV
5Transient Materialization of Common Subexpressions
- CSEs materialized to reduce maintenance cost by
sharing computation, disposed after use - Motivated by Blakeley et al. SIGMOD86, Ross et
al. SIGMOD96 - Huge search space considered impractical
- Earlier work by Sellis TODS88
- Efficient heuristic algorithms proposed by Roy et
al. SIGMOD00
6Selection of Additional MVs
- Additional views materialized permanently to
reduce the overall maintenance cost - Motivated by Ross et al. SIGMOD96
- restricted to incremental maintenance only
- do not consider transient materialization
- MV selection in general addressed in Roussopolous
TODS82, Agrawal et al. VLDB00
7Best Maintenance Policy and Plan Computation
- For each MV,
- Determine the best maintenance policy
(incremental or recomputation) - Find the corresponding best plan
-
- Earlier work by Vista EDBT98
- Does not take into account transient
materialization of CSEs or presence of other MVs - Current systems need manual specification of the
maintenance policy
8Contribution
- A framework that consolidates the choice of
- CSEs to be transiently materialized
- Additional MVs
- Best maintenance plan (incremental/recomputation)
- Integrated with a state of the art query
optimizer (Volcano ICDE93)
9Example
initial set
incremental refresh
recomputation
recomputation
CDE
BCDE
ABC
permanent
permanent
permanent
merge
incremental refresh
DE
permanent
BC
merge
transient
dA
B
C
D
dE
10Approach
- Setting up the search space of maintenance plans
- Best maintenance plan computation
- Transient/Permanent materialized view selection
11Approach
- Setting up the search space of maintenance plans
- Best maintenance plan computation
- Transient/Permanent materialized view selection
12Setting Up the Maintenance Plan Space
- The Query DAG representation for recomputation
plans - Incorporating incremental plans
13Representation of the Recomputation Plan Space
BCD
ABC
Equivalence Class (OR node)
Operation (AND node)
BC
CD
AB
Best Plan
C
D
B
A
Additionally incorporates subsumption derivations
Details in Roy et al. SIGMOD00
14Incremental PlansPropagation Based Differential
Generation
- Differentials propagated one at a time
- For each differential dR
- Start at dR and compute node differentials
bottom-up along the best plan in a topological
order - Differential of a node computed as a function of
its inputs and their differentials - e.g. d(E1E2) E1 dE2 U E2dE1 U dE1dE2 where
dEi differential of Ei wrt dR - Refresh the relation R and the affected MVs wrt
dR by merging with the differentials computed as
above
Ross et al. SIGMOD96
15Incorporating Incremental PlansPropagation
Based Differential Generation
BCdA
Equivalence Class (OR node)
Operation (AND node)
BC
BdA
Best Plan
C
B
dA
16Incorporating Incremental PlansPropagation
Based Differential Generation
CDdB
ACdB
Equivalence Class (OR node)
Operation (AND node)
CdB
CD
AdB
Best Plan
C
D
dB
A
17Incorporating Incremental PlansPropagation
Based Differential Generation
BDdC
ABdC
Equivalence Class (OR node)
Operation (AND node)
BdC
DdC
AB
Best Plan
dC
D
B
A
18Incorporating Incremental PlansPropagation
Based Differential Generation
BCdD
Equivalence Class (OR node)
Operation (AND node)
BC
CdD
Best Plan
C
dD
B
19Incorporating Incremental Plans
Merge operator
AB
incremental plan
recomputation plan
BdA
AdB
A
B
dB
dA
- For each equiv node and each base differential
affecting it - Introduce a new equiv node representing its
differential - Populate with the differential plans
- Maintain statistics for the full expression after
successive merges - Large space overhead!
20Incorporating Incremental Plans
- Actual space-efficient representation
AB
BdA
AdB
B
dA
A
dB
- Reuse the same structure for successive
propagation cycles - separate best plan pointers for each cycle
- separate statistics for the full expression after
successive merges - Also incorporates sort-orders, indices, etc. Roy
et al. SIGMOD00
21Approach
- Setting up the search space of maintenance plans
- Best maintenance plan computation
- Transient/Permanent materialized view selection
22Maintenance Plan Computation
- Given
- Set of nodes Mt materialized transiently
- can include full results as well as differentials
- Set of nodes Mp materialized permanently
- includes full results but not differentials
- compute the best consolidated maintenance plan
for Mp
23Maintenance Plan Computation
- Best plan computed using a query optimizer
extended as follows - Plan accessing a materialized view (trans/perm)
does not include its computation, only its use - Cost of a maintenance plan
- totalcost(Mp, Mt) ?e?Mpmaintcost(e Mp, Mt)
?e?Mttrmatcost(e Mp, Mt) - where
- maintcost(Mp, Mt) cost of cheapest maintenance
plan for e (recomputation/incremental) - trmatcost(Mp, Mt) cost of computing and
materializing e
24Approach
- Setting up the search space of maintenance plans
- Best maintenance plan computation
- Transient/Permanent materialized view selection
25Transient/Permanent Materialized View Selection
- Given set of MVs M already materialized,
determine - Set of nodes Mt to materialize transiently
- Set of nodes Mp (? M) to materialize permanently
- such that totalcost(Mp, Mt) is minimized
- Exhaustive approach too expensive. Need
heuristics!
26Transient/Permanent Materialized View SelectionA
Greedy Heuristic
- Input Initial MVs M
- Output Mp (? M) , Mt, corresp. best plan
- Begin
- Mp M Mt
- S set of equivalence nodes in the DAG for M
- While ( S ? )
- Pick z ? S which maximizes Benefit(z Mp, Mt)
- If ( Benefit(z Mp, Mt) ? 0 )
- break
- If ( z is a full result and
maintcost(z Mp, Mt) lt
trmatcost(z Mp, Mt) ) - Mp Mp U z
- else Mt Mt U z
- S S z
- Return (Mp, Mt)
- End
How to compute Benefit(z Mp, Mt)?
27Transient/Permanent Materialized View
SelectionBenefit Computation
- Benefit(z Mp, Mt) gain(z Mp, Mt) -
investment(z Mp, Mt) - where
- gain(z Mp, Mt) ?e?Mp(maintcost(e Mp, Mt) -
maintcost(e Mp, Mt U z)) - ?e?Mt(trmatcost(e Mp, Mt) -
trmatcost(e Mp, Mt U z)) - and
- investment(z Mp, Mt) min(maintcost(z Mp,
Mt), trmatcost(z Mp, Mt)) - if z is a full result
- trmatcost(z Mp, Mt) if z is a
differential - Benefit computation expensive. Need efficient
techniques!
28Transient/Permanent Materialized View
SelectionImproving Efficiency of the Greedy
Heuristic
- Cost-propagation based incremental techniques to
efficiently compute Benefit - Monotonicity assumption
- Reduces the number of Benefit computations
- Techniques to determine if a node can be shared
across a given maintenance plan - Reduces the number of nodes considered for
transient materialization - Adapted from Roy et al. SIGMOD00. See paper for
details.
29Benchmark
- Single Views
- Same views as above, refreshed separately
- Set of Views
- 10 views (5 with aggregates, 5 without) on 8
distinct relations, refreshed together
30Effect of Transient and Permanent Materialization
Single Views
Set of Views
31Effect of Adaptive Maintenance Policy Selection
Single Views
Set of Views
32Scalability Analysis
Optimization Memory Requirements
Optimization Time
Negligible one-time costs
33Conclusion
- Presented techniques
- Automate sharing of computation
- Automate view selection
- Automate maintenance policy selection and plan
computation - Do the above in an integrated manner
- leading to benefits greater than could be
achieved by considering each dimension
individually - Are efficient and scalable
- the overall benefits greatly outweigh the
one-time cost - Integrate with state-of-the-art optimizers
(e.g. MS SQL-Server)
34Future Work
- Extend presented techniques
- To handle limited space
- To speed up a workload of queries in addition to
maintenance of a set of materialized views - To work in dynamic query result caching
environments
35Questions