Title: Overview of Data Warehousing with Materialized Views
1Oracle8i Data Warehousing Guide
Overview of Data Warehousing with Materialized
Views Typically, data flows from one or more
online transaction processing (OLTP) databases
into a data warehouse on a monthly, weekly, or
daily basis. The data is usually processed in a
staging file before being added to the data
warehouse. Data warehouses typically range in
size from tens of gigabytes to a few terabytes,
usually with the vast majority of the data stored
in a few very large fact tables. One technique
employed in data warehouses to improve
performance is the creation of summaries, or
aggregates. They are a special kind of aggregate
view that improves query execution times by
precalculating expensive joins and aggregation
operations prior to execution and storing the
results in a table in the database. For example,
a table could be created to contain the sum of
sales by region and by product. Prior to
Oracle8i, organizations using summaries spent a
significant amount of time manually creating
summaries, identifying which ones to create,
indexing the summaries, updating them, and
advising their users on which ones to use. The
introduction of summary management in the Oracle
server eases the workload of the DBA and means
the end user no longer has to be aware of which
summaries have been defined. The DBA creates one
or more materialized views, which are the
equivalent of a summary. The end user queries the
tables and views in the database and the query
rewrite mechanism in the Oracle server
automatically rewrites the SQL query to use the
summary tables. This mechanism significantly
improves the response time for returning results
from the query and eliminates the need for the
end user or database application to be aware of
the materialized views that exist within the data
warehouse. The summaries or aggregates that are
referred to in this book and in literature on
data warehousing are created in Oracle using a
schema object called a materialized view.
Materialized views can perform a number of roles,
such as improving query performance or providing
replicated data, as described below. Although
materialized views are usually accessed via the
query rewrite mechanism, an end-user or database
application can construct queries that directly
access the summaries. However, serious
consideration should be given to whether users
should be allowed to do this, because, once the
summaries are directly referenced in queries, the
DBA will not be free to drop and create summaries
without affecting applications.
It is highly recommended that you read Oracles
Data Warehouse guide for Materialized Views. It
is an excellent overview of all the capabilities
and requirements.
2Typical Sales Star Schema
Time Periods Dimension Table
Promotions Dimension Table
Geographies Dimension Table
Promo_desc
Month_desc Quarter_desc Year_desc
Country_desc Region_desc City_desc
Sales Fact Table
Products Dimension Table
Customers Dimension Table
Units_Sold Price_Per_Unit Dollars_Sold
Prod_desc Prod_category Prod_size Prod_color Prod_
list_Price
Cust_name Cust_city Cust_gender Cust_age Cust_inco
me
3Typical Sales Star Schema Join
Select a.Year_desc,
b.Category_desc, c.Promo_desc,
d.Region_desc,
e.Cust_city, sum(f.Dollars_Sold Fro
m Dim_time a, Dim_prod b,
Dim_promo c, Dim_geo d,
Dim_cust e Where a.time_key
f.time_key and b.prod_key
f.prod_key and c.promo_key
f.promo_key and d.geo_key
f.geo_key and e.cust_key
f.cust_key and a.Year_desc in
(2002,2001,2000) and b.Category_desc
Shirts and d.Region_desc USA
East and e.Cust_city in
(NY,BOSTON,MIAMI) Group By a.Year_desc,
b.Category_desc,
c.Promo_desc, d.Region_desc,
e.Cust_city
Expensive join
Resubmitted over and over with different filters
Expensive sort
4Typical Sales Materlialized View
CREATE MATERIALIZED VIEW Star_Sales_mv PCTFREE 0
TABLESPACE mviews STORAGE (initial 1M next 10M
pctincrease 0) BUILD DEFERRED REFRESH COMPLETE ON
DEMAND ENABLE QUERY REWRITE AS Select
a.Year_desc, b.Category_desc,
c.Promo_desc,
d.Region_desc, e.Cust_city,
sum(f.Dollars_Sold From Dim_time a,
Dim_prod b, Dim_promo c,
Dim_geo d, Dim_cust e Where
a.time_key f.time_key and
b.prod_key f.prod_key and
c.promo_key f.promo_key and
d.geo_key f.geo_key and
e.cust_key f.cust_key Group By
a.Year_desc, b.Category_desc,
c.Promo_desc,
d.Region_desc, e.Cust_city
Select a.Year_desc,
b.Category_desc, c.Promo_desc,
d.Region_desc,
e.Cust_city, sum(f.Dollars_Sold Fro
m Dim_time a, Dim_prod b,
Dim_promo c, Dim_geo d,
Dim_cust e Where a.time_key
f.time_key and b.prod_key
f.prod_key and c.promo_key
f.promo_key and d.geo_key
f.geo_key and e.cust_key
f.cust_key and a.Year_desc in
(2002,2001,2000) and b.Category_desc
Shirts and d.Region_desc USA
East and e.Cust_city in
(NY,BOSTON,MIAMI) Group By a.Year_desc,
b.Category_desc,
c.Promo_desc, d.Region_desc,
e.Cust_city
5Typical Sales Star Schema Join
Select a.Year_desc,
b.Category_desc, c.Promo_desc,
d.Region_desc,
e.Cust_city, sum(f.Dollars_Sold Fro
m Dim_time a, Dim_prod b,
Dim_promo c, Dim_geo d,
Dim_cust e Where a.time_key
f.time_key and b.prod_key
f.prod_key and c.promo_key
f.promo_key and d.geo_key
f.geo_key and e.cust_key
f.cust_key and b.Category_desc in
(Shirts, Pants) and d.Country_desc
USA and e.Cust_city in
(NY,LA,CHICAGO,MIAMI,DALLAS) Group By
a.Year_desc, b.Category_desc,
c.Promo_desc,
d.Region_desc, e.Cust_city
New filter conditions that query rewrite will
intercept and use materialized view.
6- First Time Approach
- The first time approach should be a small step
with a reoccurring long running join. - Identify reoccurring long running joins with
filter conditions, probably generated by a third
party product - Identify base joins and selectable attributes
they have in common - Build a common Materialized View based on common
join and attribute structure - Build the Materialized View, run Analyze on it,
and build a bitmap index. - Determine how effective the materialized view has
been on workload performance
- Overview of Materialized View Management Tasks
- The motivation for using materialized views is to
improve performance, but the overhead associated
with materialized view management can become a
significant system management problem.
Materialized view management activities include - Identifying what materialized views to create
initially - Indexing the materialized views
- Ensuring that all materialized views and
materialized view indexes are refreshed properly
each time the database is updated - Checking which materialized views have been used
- Determining how effective each materialized view
has been on workload performance - Measuring the space being used by materialized
views - Determining which new materialized views should
be created - Determining which existing materialized views
should be dropped - Archiving old detail and materialized view data
that is no longer useful
Security Issues To create a materialized view,
the privilege CREATE MATERIALIZED VIEW is
required, and to create a materialized view that
references a table in another schema, you must
have SELECT privileges on that table. Moreover,
if you enable query rewrite, you must have the
QUERY REWRITE or GLOBAL QUERY REWRITE privilege
to reference tables in your own schema. To enable
query rewrite on a materialized view that
references tables outside your schema, you must
have the GLOBAL QUERY REWRITE privilege.