Title: Materialized%20View%20Selection%20in%20a%20Multidimensional%20Database
1Materialized View Selection in a Multidimensional
Database
- Presenter Dong Wang
- 3/14/2006
2outlines
- What is multidimensional database.
- Why materialize views.
- The cost evaluation.
- The MDred-lattice.
3Multidimensional Database
- A multidimensional database (MDDB) is a data
repository that provides an integrated
environment for decision support queries that
require complex aggregations on huge amounts of
historical data. - An MDDB is a relational data warehouse where the
information is organized following the so-called
star-model.
4A Practical Example
- Consider the MDDB for a large store chain,
characterized by a large number of stores, each
of which is a supermarket selling a wide variety
of different products. We can identify the
following dimensions - Product, which can be characterized by
Product_id, Department, Manufactured_date and
Price. - Store, which can be characterized by Store_id,
store address (which can be decomposed into City,
State, and Zip). - Time, which can be characterized by Timestamp,
Date, Week, Month, Quarter, Year.
5The schema of the example
Sales
Transaction_id Timestamp Product_id Store_id
6Example queries
- Query 1 the total sales for year 2003.
- SELECT SUM (Price)
- FROM Sales, Time, Product
- WHERE Sales.Product_id Product.Product_id
- AND Sales.Timestamp Time.Timestamp
- AND Time.Year 2003
- Query 2 the total sales for store at Ohio.
- SELECT SUM (Price)
- FROM Sales, Store, Product
- WHERE Sales.Product_id Product.Product_id
- AND Sales.Store_id Store.Store_id
- AND Store.State Ohio
7How many views an MDDB can have?
- It depends on the number of attributes of the
dimensions of the MDDB - without hierarchies on the dimensional tables,
the number is - In our example database with only 3 dimension
tables of 6, 4, 4 attributes, this number is
18785, but for a real-world database with 50
attributes, this number is 2501015,
8outlines
- What is multidimensional database.
- Why materialize views.
- The cost evaluation.
- Data-cube lattice, MD-lattice and MDred-lattice.
9Materialized View
- A materialized view is the result of some
queries, which we choose to store in the
database, rather than reconstructing it as needed
in response to queries. - INSERT INTO SalesV1
- SELECT SUM (Price)
- FROM Sales, Time, Product
- WHERE Sales.Product_id Product.Product_id
- AND Sales.Timestamp Time.Timestamp
- GROUP BY (Time.Year)
The materialized view SalesV1 can answer the
query 1 directly.
10outlines
- What is multidimensional database.
- Why materialize views.
- The cost evaluation.
- MDred-lattice.
11The MDmat-Problemthe cost
- Query cost the cost of computing query qi,
given a set of materializations M. We want to
minimize this cost. - Update cost , here mi is the ith view in M
and - fmi is the frequency mj is updated and cu(mi) is
the update cost for mi. We want to minimize this
cost too. - So, given the query set and the materialized
view set, the cost of this solution is the sum of
the above two costs
12choose the right views to materialize
- Compare to the possible views we can have, the
number of queries is extremely small. Consider
the data-cube lattice we have below, among the
total 16 nodes, only 4 nodes may be used to
answer queries. So we can only select a small
number of views to materialize.
13Functional Dependence
- Functional dependency is a constraint on the
content of the dimension table - for each tuple pair t1,t2 and fd Al?Ar,
t1Alt2Al?t1Art2Ar - Examples
- 1. In the dimension table Store, we have fds1
Store_id ?Zip, fds2 Zip ? City, fds3 City
?State. - 2. In the dimension talbe Time, we have fdt1
timestamp ?week, fdt2 timestamp ?date, fdt3
date ?month, fdt4 month ?quarter, fdt5 quarter
?year. -
Use the attributes hierarchy, we can get the
multidimensional lattice.
14The MD-lattice
Timestamp
Date
Week
Month
Quarter
Year
all
all
The MD-lattice of the Store dimension
The MD-lattice of the Time dimension
15Candidate Views
- Its impossible (and no need) to materialize all
the possible views in the data cube. We only need
the views which can help us to answer the
queries. - We only consider the views that can provide some
contribution to reduce the total cost, the
candidate views. - A view vi belonging to an MD-lattice is a
candidate view if one of the following two
conditions holds - 1. View vi is associated to some query qi
- 2. There exist two candidate views vj and vk,
and vi is the least upper bound of vj and
vk.
16The materialization of a non-candidate view will
not help
- Suppose there is a non-candidate view vi and its
materialized. We consider two cases - There is no candidate view depending on vi. Since
vi will not change the query cost, and the update
cost for view vi is always positive, so
materialize vi will not help. - At least one candidate view exists depending on
vi. Say theres a candidate view vj depending on
vi. Since the size of vj is smaller than vi, we
can see the update cost of vj is always smaller
than vi. That means the materialization of vi
always costs more. - Conclusion we should always choose the candidate
view to materialize.
case 1 case 2 Both views are
materialized only the non-candidate view is
materialized
17Candidate views examples
- For query 1 on slide 5, we can choose the view
SalesV1 to materialize. - For query 2, we can do
- CREAT MATERIALIZED VIEW SalesV2
- SELECT SUM (Price)
- FROM Sales, Store, Product
- WHERE Sales.Product_id Product.Product_id
- AND Sales.Store_id Store.Store_id
- GROUP BY (Store.State)
- In both examples, we choose the view which is
associated to the query to materialize. -
18outlines
- What is multidimensional database.
- Why materialize views.
- The cost evaluation.
- The MDred-lattice.
19The MDred-lattice
- Given an MD-lattice and a set of queries Q, the
set of its candidate views forms the
MDred-lattice. - The MDred-lattice Construction Algorithm
-
20An MD-lattice construction
- Suppose we have two queries
- query 1 the total sale of the week 50.
- query 2 the total sale of the 3rd quarter of
year 2005. - From the MDred-lattice construction algorithm,
first we need to materialize the views group by
attribute Week and attribute Quarter to answer
the queries, then we need to extend the view set
by adding the least upper bound, attribute
Timestamp to the view set.
21The cost evaluation
- Suppose we have two queries qj and qk, consider
both the query cost and the update cost, we have
two options - Option 1 materialize vj and vk. The total cost
is - Option 2 only materialize vi, which is the
least upper bound of vj and vk. The total cost is
22The cost evaluation (cont.)
- For option 1, let fu0.8, cu(vj)100, fqj0.5,
cqj(vj)100, cu(vk)100, cqk(vk)100, we can get - C10.81000.51000.81000.5100260.
- For option 2, let fu0.8, the update cost will be
larger (since the cardinality of vi is larger),
say cu(vi)120, the query cost will also be
larger (since additional aggregation will be used
to answer the queries), say cqj(vi)110,
cqk(vi)110, we can get - C20.81200.51100.5110206.
- So option 2 is the better choice!
-
-
23References
- Materialized view selection in a multidimensional
database. Elena Baralis, Stefano Paraboschi and
Ernest Teniente. Proceedings of the 23rd VLDB
Conference.1997 - Designing Data Warehouses. Dimitri Theodoratos,
Timos Sellis. 1999