Jerry Held

About This Presentation

Title:

Jerry Held

Description:

UT. MOAB. George W. Hayduke. 7396. 5000. 84532. 84532. 84532. 30605. Zip. Jan 15, 2003. Jan 15, 2003 ... UT. UT. UT. GA. State. MOAB. Doc Sarvis. 7397. MOAB ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 75

Provided by: Analy7

Category:

Tags: held | jerry | ut

more less

Transcript and Presenter's Notes

Title: Jerry Held

1
(No Transcript)
2
How cool is that? Capitalizing on Oracle 9i for
data warehousing
Session id 36482

Mike AmesOracle DBA
SAS Institute

3
Topics

Performance Parallelism and Design
Capitalizing on of Oracle 9i for DW
External Tables
Merge Multiple Inserts
Partitioning
Materialized View Enhancements
Bitmap Join Indexes
SQL for Analysis
Conclusions Questions

4
Performance Parallelism and Design serial
operation

At 20MB/sec it takes 14 ½ hours to read 1TB
serially
With perfect parallelism and 16 parallel
processes it takes about 54 minutes (assuming
sustained throughput)

5
Performance Parallelism and Design

Amdahls Law in a nutshell
Speedup (S) Efficiency gained by executing a
process in parallel
Formula for Speedup
S 1/ ( sequential) ( parallel/
processors) overhead)
S Speedup
N Number of Processors
B of the process or algorithm that is serial
S 1/ ( B (1-B/N) O )
Example 8 processors 5 serial operations

6
Performance Parallelism and Design

Why is this important?
Dependencies created by design and complexity
reduce our ability to parallelize data
warehouse operations.
S 1/ ( 5) (95/ 8) 0) 5.9
Assuming perfect parallelism, a query that takes
30 minutes to execute serially would take just
over 5 minutes in parallel.
Keys to parallel performance
Minimize dependencies
Minimize overhead associated with complexity

7
Performance Parallelism and Design

Two things to note
Incremental speedup by doubling the processors
is dependant on Sequential
5 sequential with 4 processors gt 20 sequential
with 8 processors

8
Performance Parallelism and Design realistic
expectations
9
Performance Parallelism and Design measuring
performance

Database performance is generally measured by
Load Performance
Ability to parallelize
Order number of operations and complexity to
maintain integrity
Ability to leverage RDBMS load facilities
Query Performance
Ability to parallelize
Ability to leverage partitioning
Ability to exploit query re-write summary data
Number of sort and join operations
Usability
Ability of users to capitalize on data
Level of complexity

10
Performance Parallelism and Design critical
factor

Database design is critical for performance
Ability to parallelize is constrained by
dependencies
Data and referential integrity
Order of operations
Ability to leverage RDBMS features is constrained
by design
Loading
Partitioning
Indexing
Query re-write
Materialized Views
End user satisfaction is constrained by
complexity
Ease of use
Data quality

11
Performance Parallelism and Design

The bottom line design dictates performance
Good Design
Maximizes Parallelism by minimizing dependencies
Minimizes Complexity
Capitalizes on features of the RDMBS (Oracle 9i)

12
Capitalizing on of Oracle 9i

External Tables
Merge Multiple Inserts
Partitioning
Materialized View Enhancements
Bitmap Join Indexes
SQL for Analysis

13
Capitalizing on External Tables

External Tables
Enable you to reference multiple flat files as if
they were a table on your database.
Restrictions
Read Only no DML (INSERT, UPDATE, DELETE)
Cant be used for partition exchange

14
Capitalizing on External Tables

Loading from flat files
Old method
Create a Stage Table on your warehouse
Use SQLLoader to bulk load the table
Read from stage table performing operations to
put data into final format.
New method
Create a table that references the external file
Read from the external table performing
operations to put data into final format.
Significantly reduces the number of times data
has to be moved around and virtually eliminates
the need to use SQLLoader directly.

15
Capitalizing on External Tables
External Table Insert /Append/ Example
File(s)
Target
Source

Steps
FTP 02Nov2003_Sales Extract
Alter Table add file(s) to location
Perform INSERT /Append/ INTO

16
Capitalizing on External Tables
CREATE TABLE NOV_SALES_EXTERNAL( PROD_ID
NUMBER(6), CUST_ID NUMBER, TIME_ID DATE,
CHANNEL_ID CHAR(1), PROMO_ID NUMBER(6),
QUANTITY_SOLD NUMBER(3), AMOUNT_SOLD
NUMBER(10,2), UNIT_COST NUMBER(10,2),
UNIT_PRICE NUMBER(10,2)) ORGANIZATION external (
TYPE oracle_loader DEFAULT DIRECTORY extracts
ACCESS PARAMETERS ( RECORDS DELIMITED BY
NEWLINE CHARACTERSET US7ASCII BADFILE
log_file_dir' NOV_SALES_EXTERNAL.bad'
LOGFILE log_file_dir' NOV_SALES_EXTERNAL.log'
FIELDS TERMINATED BY "" LDRTRIM ) location
( '01Nov2003Sales.dat) )REJECT LIMIT UNLIMITED
Step 0. Create a target Table and an external
extract table Step 1. FTP Nov 2nd and 3rd extracts
17
Capitalizing on External Tables

Step 2. Alter External Table adding the new files
to the location.
ALTER TABLE NOV_SALES_EXTERNAL
LOCATION ('01Nov2003Sales.dat',
'02Nov2003Sales.dat', '03Nov2003Sales.dat')
Step 3. Insert into target table from external
table
INSERT / APPEND/ INTO SALES_FACT
SELECT FROM NOV_SALES_EXTERNAL

18
Capitalizing on External Tables
Sales Extract External Table
Sales Table
INSERT/APPEND/ INTO SALES SELECT FROM
SALES_EXTRACT_EXTERNAL
19
Capitalizing on External Tables

Fixed-length fields are processed faster than
delimited fields or records terminated by a
string.
Single-character delimiters for record
terminators and field delimiters are faster to
process than multi-character delimiters.
Using INSERT/APPEND/ with either key lookup
functions or joins (for surrogate key lookup) is
a great method to load fact tables.

20
Capitalizing on merge

Merge
Enables you to perform updates to matched records
and inserts of new records.
Leverages parallelism
Is a slick way of performing slowly changing
dimension logic.

21
Capitalizing on merge
Customer Extract External
Customer Existing Table
Customer Post Merge
MERGE INTO CUSTOMERS USING ( CUST_EXT x) WHEN
MATCHED THEN UPDATE SET city x.city WHEN NOT
MATCHED THEN INSERT (CUST_ID) VALUES(x.cust_id)
New
Matched updated
22
Capitalizing on merge
Typical type 1 slowly changing dimension (SCD)
logic
Source Extract
Update Existing
New Changed Records
Compare
Generate Key
Master Dimension Cross Reference
Insert New
23
Capitalizing on merge for type 1 SCD

Capitalize on MERGE for Type 1 SCDs
Steps
Create your dimension table
Create an external table as your extract table
that contains all of the columns in your
dimension except the surrogate key
Create an Oracle sequence this will be used for
the surrogate key of your dimension
Use MERGE to populate your dimension

24
Capitalizing on merge for type 1 SCD

Create our Dimension (Target) table

Note cust_key is our the surrogate key
cust_id is the natural key or production key
CREATE TABLE CUSTOMER_DIM ( CUST_KEY
NUMBER NOT NULL, CUST_ID NUMBER
NOT NULL, CUST_NAME VARCHAR2(20) NOT
NULL, ZIP CHAR(5) NOT NULL,
CITY VARCHAR2(30) NOT NULL,
STATE VARCHAR2(40) NULL, CONSTRAINT CUSTOMER_PK
PRIMARY KEY (CUST_KEY) )
25
Capitalizing on merge for type 1 SCD

Create our extract or staging table

2a. Running a new extract is simply a matter of
referencing a new file ALTER TABLE
customer_extract LOCATION (CIF_NOV_2003.dat)
CREATE TABLE CUSTOMER_EXTRACT ( CUST_KEY
NUMBER, CUST_ID
NUMBER, CUST_NAME VARCHAR2(20),
ZIP CHAR(5) , CITY
VARCHAR2(30) , STATE VARCHAR2(40)) ORGAN
IZATION EXTERNAL ( TYPE oracle_loader DEFAULT
DIRECTORY extracts_dir ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE CHARACTERSET
US7ASCII BADFILE log_file_dir'customer_extract.ba
d' LOGFILE log_file_dir'customer_extract.log'
FIELDS TERMINATED BY "" LDRTRIM ) location
( 'cust_extract.dat' )) REJECT LIMIT UNLIMITED
PARALLEL
26
Capitalizing on merge for type 1 SCD

Create our Sequence this will used for the
surrogate key

CREATE SEQUENCE CUST_SEQ START WITH 1000
INCREMENT BY 1000
27
Capitalizing on merge for type 1 SCD
Capitalizing on merge

Use a single MERGE statement to perform our type
1 SCD logic

Customer Extract External
Existing Customer Dimension
MERGE INTO CUSTOMER_DIM c USING CUSTOMER_EXTRACT
X ON (c.cust_id x.cust_id) WHEN MATCHED
THEN UPDATE SET CUST_FIRST_NAME
X.CUST_FIRST_NAME, CUST_LAST_NAME
X.CUST_LAST_NAME,. CUST_EMAIL
X.CUST_EMAIL WHEN NOT MATCHED THEN INSERT
( CUST_KEY, CUST_ID,) VALUES( CUST_SEQ.NEXTVAL, X
.CUST_ID, X.CUST_EMAIL)
Updates
Matched updated
Customer Dimension Post Merge
New
New
28
Capitalizing on merge for type 2 SCD

Type 2 Slowly Changing Dimension (SCD)
Type 2 SCD a technique where a new dimension
record is created with a new surrogate key each
to reflect the change
We can do this quite simply a single merge
statement simply add the change columns to the ON
() portion of the merge.

MERGE INTO CUSTOMER_DIM c USING CUSTOMER_EXTRACT
X ON (c.cust_id x.cust_id and c.cityx.city and
c.statex.state and c.zip x.zip) WHEN MATCHED
THEN UPDATE SET CUST_NAME X.CUST_NAME, City
X.CITY, STATEX.STATE, ZipX.ZIP. WHEN NOT
MATCHED THEN INSERT ( CUST_KEY, CUST_ID, CUST_NAME
, CITY, STATE, ZIP, CREATE_DATE) VALUES( CUST_SEQ
.NEXTVAL, / CUST_KEY / X.CUST_ID, X.CUST_NAME T
RUNC(SYSDATE)) / CREATE_DATE /
Change Columns
Change Pointer
Surrogate Key
29
Capitalizing on merge for type 2 SCD

A more common approach to Type 2 Slowly Changing
Dimension logic is the addition of change
pointers for reference data, Unfortunately, this
requires a multi-step process.
This is generally performed with a series of
insert and update statements or procedural
logic but can be accomplished with two merge
statements as well.
One to insert new records and update existing
(close out)
One to insert new changed records

Surrogate Key
Change Pointers
30
Capitalizing on merge for type 2 SCD
1
2
/ First Merge Close out existing, Insert New /
MERGE INTO CUST_DIM C USING (SELECT cust_id,
cust_name, city, state, zip FROM CUST_EXTRACT
MINUS SELECT cust_id, cust_name, city,
state, zip FROM CUST_DIM WHERE CURRENT_FLAG
'Y') X ON (C.CUST_ID X.CUST_ID AND
C.END_DATEto_date('15-JAN-2099','DD-MON-YYYY')) W
HEN MATCHED THEN UPDATE SET c.current_flag
'N' WHEN NOT MATCHED THEN INSERT
(CUST_KEY,CUST_ID,CUST_NAME, CITY,STATE,ZIP,START_
DATE,END_DATE, CURRENT_FLAG) VALUES(
CUST_SEQ.NEXTVAL, X.CUST_ID, X.CUST_NAME, X.CITY,
X.STATE, X.ZIP, trunc(SYSDATE), TO_DATE(01-JAN-2
099','DD-MON-YYYY'), 'Y') COMMIT
/ Second Merge Insert new changed record
/ MERGE INTO CUST_DIM C USING ( SELECT cust_id,
cust_name, city, state, zip FROM CUST_EXTRACT
MINUS SELECT cust_id, cust_name, city, state, zip
FROM CUST_DIM) X ON (C.CUST_ID X.CUST_ID AND
C.CURRENT_FLAG'Y') WHEN MATCHED THEN UPDATE SET
c.END_DATEtrunc(SYSDATE -1) WHEN NOT MATCHED
THEN INSERT (CUST_KEY,CUST_ID,CUST_NAME, CITY,STAT
E,ZIP,START_DATE,END_DATE, CURRENT_FLAG) VALUES(
CUST_SEQ.NEXTVAL, X.CUST_ID,X.CUST_NAME,X.CITY, X
.STATE,X.ZIP, trunc(SYSDATE), TO_DATE(01-JAN-2099
','DD-MON-YYYY'), 'Y') COMMIT
3
/ Final step Date Closeout / UPDATE CUST_DIM
C SET c.END_DATEtrunc(SYSDATE -1) WHERE
C.CURRENT_FLAG 'N' AND c.END_DATETO_DATE(01-JA
N-2099','DD-MON-YYYY') commit
31
Capitalizing on merge for type 2 SCD
Customer Extract External
Take note of how a single table design decision
has impacted our ability to parallelize. Instead
of a single merge statement we now have a three
step process.
Customer Existing Dimension
Customer Post Type 2 Merge (s)
32
Capitalizing on multiple inserts

Multiple Inserts
Enables us to conditionally insert into multiple
tables in parallel.
All result set is applied to all conditions
First result set is applied to the first
condition
Leverages parallelism
Is a slick way to segment data and load fact
tables.

33
Capitalizing on multiple inserts
GA_SALES

Multiple Inserts (ALL)

INSERT ALL WHEN state'GA or state FL THEN
INTO GA_SALES VALUES(prod_id, cust_id,sale_date,s
ale_amount qty_sold) WHEN state 'FL' THEN INTO
FL_SALES VALUES(prod_id, cust_id,sale_date,sale_a
mount qty_sold) ELSE INTO ALL_OTHER_SALES
VALUES(prod_id, cust_id,sale_date,sale_amount
qty_sold) SELECT prod_id, cust_id,sale_date,sale_a
mount qty_sold FROM sales_extract
FL_SALES
ALL_OTHER_SALES
Query
34
Capitalizing on multiple inserts
GA_SALES

Multiple Inserts (FIRST)

INSERT FIRST WHEN state 'GA OR state FL
THEN INTO GA_SALES VALUES(prod_id,
cust_id,sale_date,sale_amount qty_sold) WHEN
state 'FL' THEN INTO FL_SALES VALUES(prod_id,
cust_id,sale_date,sale_amount qty_sold) ELSE INTO
ALL_OTHER_SALES VALUES(prod_id,
cust_id,sale_date,sale_amount qty_sold) SELECT
prod_id, cust_id,sale_date,sale_amount qty_sold
FROM sales_extract
FL_SALES
ALL_OTHER_SALES
35
Capitalizing on - Partitioning

Types of Partitioning
Range - maps rows to partitions based on ranges
of column values
List (New) - enables you to explicitly control
how rows map to partitions.
Hash - evenly distributes rows among partitions
Composite
Range-Hash benefits of range partitioning then
further hash distributing the sub-partition.
Partition pruning Parallel processing
Range List (New for 9i) benefits of range
partitioning and further discrete
sub-partitioning.

36
Capitalizing on - Partitioning

Why is partitioning important
partitioning enables you to split large volumes
of data into smaller separate buckets that can be
managed independently
Partition Pruning / Elimination
Partition-wise Joins
Parallel DML
Partition Exchanging / Swapping

37
Capitalizing on Oracle 9i partitioning
Jan 2003

Why Partition?
Partition pruning
Ability to eliminate partitions that dont
satisfy query conditions

Feb 2003
Mar 2003

SELECT sum(qty_sold) FROM sales WHERE sale_date
BETWEEN Feb 1, 2003 and Feb 15, 2003
38
Capitalizing on Oracle 9i partitioning

Why Partition?
Partition-wise joins
Full Equi-partitioned on the join keys i.e. the
two tables are both partitioned on the same key.
Hash-Hash is the easiest example.
Partial Oracle dynamically repartitions based
on the reference table.

Here accounts and transactions are both hash
partitioned by account_id into 32 partitions
Note to achieve equal work distribution, the
number of partitions should always be a multiple
of the degree of parallelism. Ex. Here we hashed
account and transaction into 32 partitions with a
degree of parallelism 8
39
Capitalizing on Oracle 9i partitioning
ACCT ID NAME
ACCT ID Price QTY
Partition-wise Join
40
Capitalizing on - Partitioning
PQ1
Partition Exchanging
Stock Extracts
PQ2
PQ3
ALTER TABLE stock_sales EXCHANGE PARTITION PQ3
WITH stock_extracts Does not work with
external tables
41
Capitalizing on - Partitioning
Range Example
CREATE TABLE account_balance_range( account_key NU
MBER(7) CONSTRAINT acct_nn NOT NULL, branch_key
NUMBER(7) CONSTRAINT brch_nn NOT
NULL, product_key NUMBER(7) CONSTRAINT prod_nn
NOT NULL, snapshot_date DATE CONSTRAINT mnth_nn
NOT NULL, state_key CHAR(2) CONSTRAINT stat_nn
NOT NULL, ending_bal NUMBER(7,3) CONSTRAINT
ebal_nn NOT NULL, average_daily_bal
NUMBER(7,3) CONSTRAINT abal_nn NOT
NULL, transaction_count NUMBER(7) CONSTRAINT
txnc_nn NOT NULL, interest_paid
NUMBER(7,3) CONSTRAINT intp_nn NOT
NULL, fees_charged NUMBER(7,3) CONSTRAINT
feec_nn NOT NULL) PARTITION BY RANGE
(snapshot_date)( PARTITION Q1_ACCT_BAL VALUES
LESS THAN (TO_DATE('01-APR-2000','DD-MON-YYYY')),
PARTITION Q2_ACCT_BAL VALUES LESS THAN
(TO_DATE('01-JUL-2000','DD-MON-YYYY')), PARTITION
Q3_ACCT_BAL VALUES LESS THAN (TO_DATE('01-OCT-200
0','DD-MON-YYYY')), PARTITION Q4_ACCT_BAL VALUES
LESS THAN (TO_DATE('01-JAN-2001','DD-MON-YYYY')))
42
Slick New Feature - Partitioning
List Example
CREATE TABLE account_balance_list( account_key NUM
BER(7) CONSTRAINT acct_nn NOT NULL, branch_key
NUMBER(7) CONSTRAINT brch_nn NOT
NULL, product_key NUMBER(7) CONSTRAINT prod_nn
NOT NULL, snapshot_date DATE CONSTRAINT mnth_nn
NOT NULL, state_key CHAR(2) CONSTRAINT stat_nn
NOT NULL, ending_bal NUMBER(7,3) CONSTRAINT
ebal_nn NOT NULL, average_daily_bal
NUMBER(7,3) CONSTRAINT abal_nn NOT
NULL, transaction_count NUMBER(7) CONSTRAINT
txnc_nn NOT NULL, interest_paid
NUMBER(7,3) CONSTRAINT intp_nn NOT
NULL, fees_charged NUMBER(7,3) CONSTRAINT
feec_nn NOT NULL) PARTITION BY LIST
(state_key)( PARTITION northwest VALUES ('OR',
'WA'), PARTITION southwest VALUES ('AZ', 'UT',
'NM'), PARTITION southeast VALUES
('FL','GA','SC','AL','TN','NC'), PARTITION rest
VALUES (DEFAULT)))
43
Capitalizing on - Partitioning
Hash Example
CREATE TABLE account_balance_hash ( account_key NU
MBER(7) CONSTRAINT acct_nn NOT NULL, branch_key
NUMBER(7) CONSTRAINT brch_nn NOT
NULL, product_key NUMBER(7) CONSTRAINT prod_nn
NOT NULL, snapshot_date DATE CONSTRAINT mnth_nn
NOT NULL, state_key CHAR(2) CONSTRAINT stat_nn
NOT NULL ending_bal NUMBER(7,3) CONSTRAINT
ebal_nn NOT NULL, average_daily_bal
NUMBER(7,3) CONSTRAINT abal_nn NOT
NULL, transaction_count NUMBER(7) CONSTRAINT
txnc_nn NOT NULL, interest_paid
NUMBER(7,3) CONSTRAINT intp_nn NOT
NULL, fees_charged NUMBER(7,3) CONSTRAINT
feec_nn NOT NULL ) PARTITION BY HASH
(account_key)( PARTITIONS 16 STORE IN (TS1_DATA,
TS2_DATA, TS3_DATA, TS4_DATA)
44
Capitalizing on - Partitioning
Composite Range-Hash Example

CREATE TABLE account_bal_range_hash(
account_key NUMBER(7) CONSTRAINT acct_nn NOT
NULL,
branch_key NUMBER(7) CONSTRAINT brch_nn NOT
NULL,
product_key NUMBER(7) CONSTRAINT prod_nn NOT
NULL,
snapshot_date DATE CONSTRAINT mnth_nn NOT NULL,
state_key CHAR(2) CONSTRAINT stat_nn NOT NULL,
ending_bal NUMBER(7,3) CONSTRAINT ebal_nn NOT
NULL,
average_daily_bal NUMBER(7,3) CONSTRAINT abal_nn
NOT NULL,
transaction_count NUMBER(7) CONSTRAINT txnc_nn
NOT NULL,
interest_paid NUMBER(7,3) CONSTRAINT intp_nn
NOT NULL,
fees_charged NUMBER(7,3) CONSTRAINT feec_nn NOT
NULL)
PARTITION BY RANGE (snapshot_date)
SUBPARTITION BY HASH (account_key)
SUBPARTITIONS 8
( PARTITION Q1_ACCT_BAL
VALUES LESS THAN (TO_DATE('01-APR-2000','DD-MON-Y
YYY')),
PARTITION Q2_ACCT_BAL
VALUES LESS THAN (TO_DATE('01-JUL-2000','DD-MON-Y
YYY')),
PARTITION Q3_ACCT_BAL

45
Capitalizing on - Partitioning
Composite Range-List Example

CREATE TABLE account_bal_range_list(
account_key NUMBER(7) CONSTRAINT acct_nn NOT
NULL,
branch_key NUMBER(7) CONSTRAINT brch_nn NOT
NULL,
product_key NUMBER(7) CONSTRAINT prod_nn NOT
NULL,
snapshot_date DATE CONSTRAINT mnth_nn NOT NULL,
state_key CHAR(2) CONSTRAINT stat_nn NOT NULL,
ending_bal NUMBER(7,3) CONSTRAINT ebal_nn NOT
NULL,
average_daily_bal NUMBER(7,3) CONSTRAINT abal_nn
NOT NULL,
transaction_count NUMBER(7) CONSTRAINT txnc_nn
NOT NULL,
interest_paid NUMBER(7,3) CONSTRAINT intp_nn
NOT NULL,
fees_charged NUMBER(7,3) CONSTRAINT feec_nn NOT
NULL)
PARTITION BY RANGE (snapshot_date)
SUBPARTITION BY LIST (state)
SUBPARTITION TEMPLATE(
PARTITION northwest VALUES ('OR', 'WA'),
PARTITION southwest VALUES VALUES ('AZ', 'UT',
'NM'),
PARTITION southeast VALUES ('FL',
'GA','SC','AL','TN','NC'),
PARTITION rest VALUES (DEFAULT)))(
PARTITION q1_2002 VALUES LESS THAN(TO_DATE('1-APR-
2002','DD-MON-YYYY')),

46
Capitalizing on - Partitioning
External Table to Partition using INSERT
/APPEND/
File(s)
Target
Source

FTP 02Nov2003_Sales Extract
Alter Table add file(s) to location
Perform INSERT /Append/ INTO partitioned table

47
Capitalizing on Partitioning
PQ1
Sales Extract External
PQ2
INSERT/APPEND/ INTO SALES (PQ3) SELECT FROM
SALES_EXTRACT_EXTERNAL
PQ3
48
Capitalizing on Partitioning

What to partition
Fact Tables
Generally Range-Hash composite
Range for some date (partition elimination)
Hash on the driving dimension key (partition-wise
join)
Dimension
Hashing on the primary key of dimension tables
facilitates full and partial partition-wise
joins.
To for a full partition-wise join between a fact
and dimension table you need to hash partition on
the same key in the same number of buckets.
Materialized Views
Generally Range-Hash composite
Generally mirror the fact tables partition scheme

49
Capitalizing on Materialized Views

Materialized Views
Enable queries to be re-written to take advantage
of pre-calculated summaries thus reducing or
eliminating sorts and joins.
Materialized Views can dramatically increase
performance of queries when applied judiciously
Reduce the number of sorts
Reduce the number of joins
Pre-filters data
Can be Indexed and Partitioned
Seamless to end users
Enhancements in 9i include
Removed restrictions enabling them to be
leveraged in more situations
Fast refresh is now possible on a materialized
views containing the UNION ALL operator.

50
Capitalizing on Materialized Views

Materialized Views cont

SELECT T.FISCAL_QUARTER_DESC, C.CUST_CITY,
COUNT() AS SALE_COUNT, SUM(S.AMOUNT_SOLD) AS
SALE_DOLLARS FROM oradata.CUSTOMERS C,
oradata.SALES S, oradata.TIMES T WHERE C.CUST_ID
S.CUST_ID AND T.TIME_ID S.TIME_ID GROUP BY
T.FISCAL_QUARTER_DESC, C.CUST_CITY
Query is re-written to be resolved from the MV
instead of from the base tables
51
Capitalizing on Materialized Views

How to capitalize on MVs
Identify candidate queries
Analysis of common queries based on design
Oracles Summary Advisor Wizard
DBMS_OLAP
Create MVs based on analysis and refresh
requirements.
Test
Benchmark
Measure Utilization
Repeat

52
Capitalizing on Materialized Views
proc sql / CTAS Implicit Pass-Through / CREATE
TABLE work.quartly_city_canidate AS SELECT
T.FISCAL_QUARTER_DESC, C.CUST_CITY, COUNT() AS
SALE_COUNT, SUM(S.AMOUNT_SOLD) AS
SALE_DOLLARS FROM oradata.CUSTOMERS C,
oradata.SALES S, oradata.TIMES T WHERE C.CUST_ID
S.CUST_ID AND T.TIME_ID S.TIME_ID GROUP BY
T.FISCAL_QUARTER_DESC, C.CUST_CITY quit
real time 9.62 seconds
Execution Plan -----------------------------------
----------------------- 0 SELECT
STATEMENT OptimizerCHOOSE (Cost5392 Card7453
Bytes350291) 1 0 SORT (GROUP BY)
(Cost5392 Card7453 Bytes350291) 2 1
HASH JOIN (Cost1077 Card1016271
Bytes47764737) 3 2 TABLE ACCESS
(FULL) OF 'TIMES' (Cost6 Card1461 Bytes23376)
4 2 HASH JOIN (Cost1043 Card1016271
Bytes31504401) 5 4 TABLE ACCESS
(FULL) OF 'CUSTOMERS' (Cost106 Card50000
Bytes700000) 6 4 PARTITION RANGE
(ALL) 7 6 TABLE ACCESS (FULL) OF
'SALES' (Cost469 Card1016271 Bytes17276607) Sta
tistics ------------------------------------------
---------------- 5855 consistent gets 1 sorts
(memory) 0 sorts (disk) 5075 rows processed
Note the Number of Joins and Sorts, the amount of
memory, and the number of full table scans.
53
Capitalizing on Materialized Views
proc sql connect to ORACLE as ORACON (usersh
passwordsh1 pathdemo.na.sas.com') / Create a
Materialized View / execute ( CREATE
MATERIALIZED VIEW QTRLY_CITY_SALES_MV compress BUI
LD IMMEDIATE REFRESH ON COMMIT ENABLE QUERY
REWRITE AS SELECT T.FISCAL_QUARTER_DESC,
C.CUST_CITY, COUNT() SALE_COUNT,
SUM(S.AMOUNT_SOLD) SALE_DOLLARS FROM CUSTOMERS
C, SALES S, TIMES T WHERE C.CUST_ID S.CUST_ID
AND T.TIME_ID S.TIME_ID GROUP BY
T.FISCAL_QUARTER_DESC, C.CUST_CITY ) by
ORACON disconnect from ORACON quit
Build Immediate Create this Now Refresh on
Commit Keep the MV current when Insert, Update,
and Deletes occur Enable Query Rewrite Enables
dynamic query re-direction. Compress
compresses redundant data i.e. makes the MV
smaller
Question did it improve the performance of our
query?
54
Capitalizing on Materialized Views
proc sql / CTAS Implicit Pass-Through / CREATE
TABLE work.quartly_city_canidate AS SELECT
T.FISCAL_QUARTER_DESC, C.CUST_CITY, COUNT() AS
SALE_COUNT, SUM(S.AMOUNT_SOLD) AS
SALE_DOLLARS FROM oradata.CUSTOMERS C,
oradata.SALES S, oradata.TIMES T WHERE C.CUST_ID
S.CUST_ID AND T.TIME_ID S.TIME_ID GROUP BY
T.FISCAL_QUARTER_DESC, C.CUST_CITY quit
Real Time 0.76 seconds
Execution Plan -----------------------------------
----------------------- 0 SELECT
STATEMENT OptimizerCHOOSE (Cost4 Card2206
Bytes114712) 1 0 TABLE ACCESS (FULL) OF
'QTRLY_CITY_SALES_MV' (Cost4 Card2206
Bytes114712) Statistics -------------------------
--------------------------------- 7
recursive calls 0 db block gets
366 consistent gets 0 physical
reads 0 redo size 137260 bytes
sent via SQLNet to client 2641 bytes
received via SQLNet from client 340
SQLNet roundtrips to/from client 0
sorts (memory) 0 sorts (disk)
5075 rows processed
No Joins, No Sorts, Less Memory
55
Capitalizing on Alt aggregate strategies

Aggregate building with pCTAS

CREATE TABLE quartly_city_passthrough PARALLEL
NOLOGGING as SELECT T.FISCAL_QUARTER_DESC,
C.CUST_CITY, COUNT() AS SALE_COUNT,
SUM(S.AMOUNT_SOLD) AS SALE_DOLLARS FROM
CUSTOMERS C, SALES S, TIMES T WHERE
C.CUST_ID S.CUST_ID AND T.TIME_ID
S.TIME_ID GROUP BY T.FISCAL_QUARTER_DESC,
C.CUST_CITY
56
Capitalizing on Alt aggregate strategies

Aggregate building with pIIAS

INSERT / APPEND / INTO quartly_city_iias SELECT
T.FISCAL_QUARTER_DESC, C.CUST_CITY, COUNT() AS
SALE_COUNT, SUM(S.AMOUNT_SOLD) AS
SALE_DOLLARS FROM CUSTOMERS C, SALES S, TIMES
T WHERE C.CUST_ID S.CUST_ID AND T.TIME_ID
S.TIME_ID GROUP BY T.FISCAL_QUARTER_DESC,
C.CUST_CITY
57
Capitalizing on bitmap join index

Bitmap Join Index
Creates a bitmap index for the resolution of
joins of two or more tables. Works similar to a
materialized view.
Bitmap join index is space efficient because it
compresses the rowids where a materialized view
does not.
Can be leveraged to improve performance of
snowflake schemas and common join operations
across facts.

58
Capitalizing on bitmap join index
SELECT countries.country_name, sum(sales.amount_so
ld) FROM sales, customers, countries WHERE
sales.cust_id customers.cust_id AND
customers.country_id countries.country_id
CREATE BITMAP INDEX bjx_sales_country ON
sales(countries.country_name)) FROM sales,
customers, countries WHERE sales.cust_id
customers.cust_id AND countries.country_id
customers.country_id LOCAL PARALLEL NOLOGGING
COMPUTE STATISTICS
Can be leveraged to improve performance of
snowflake query problems
59
Capitalizing on SQL for Analysis

SQL for Analysis
Especially useful for reporting and preparing
data sets for statistical analysis
Rankings and percentiles
cumulative distributions, percent rank, and
N-tiles.
Moving window calculations
allow you to find moving and cumulative
aggregations, such as sums and averages.
Lag/lead analysis
enables direct inter-row references so you can
calculate period-to-period changes.
First/last analysis
first or last value in an ordered group.

60
Capitalizing on SQL for Analysis

RANK
RANK ( ) OVER ( query_partition_clause
order_by_clause )
DENSE_RANK ( ) OVER ( query_partition_clause
order_by_clause )

SELECT country_id, TO_CHAR(SUM(amount_sold),
'9,999,999,999') Sales_Total, RANK() OVER (ORDER
BY SUM(amount_sold) DESC NULLS LAST) AS
sales_leader FROM sales, products, customers,
times, channels WHERE sales.prod_idproducts.prod_
id AND sales.cust_idcustomers.cust_id
AND sales.time_idtimes.time_id
AND sales.channel_idchannels.channel_id
AND times.calendar_month_desc IN ('2000-09',
'2000-10') GROUP BY country_id
CO SALES_TOTAL SALES_LEADER -- --------------
------------ US 13,333,510 1 NL
7,174,053 2 UK 6,421,240
3 DE 6,346,440 4 FR
4,404,921 5 ES 1,699,209
6 IE 1,549,407 7 IN
732,502 8 AU 632,475
9 BR 606,281 10
61
Capitalizing on SQL for Analysis

Rank for Top N

SELECT FROM (SELECT country_id, TO_CHAR(SUM(amou
nt_sold), '9,999,999,999') Sales_Total, RANK()
OVER (ORDER BY SUM(amount_sold) DESC NULLS LAST)
AS sales_leader FROM sales, products, customers,
times, channels WHERE sales.prod_idproducts.prod_
id AND sales.cust_idcustomers.cust_id
AND sales.time_idtimes.time_id
AND sales.channel_idchannels.channel_id AND
times.calendar_month_desc IN ('2000-09',
'2000-10') GROUP BY country_id ) / inline view
/ WHERE COUNTRY_RANK lt 10
CO SEPT_TOTAL COUNTRY_RANK -- --------------
------------ US 6,517,786 1 NL
3,447,121 2 UK 3,207,243
3 DE 3,194,765 4 FR
2,125,572 5 ES 777,453
6 IE 770,758 7 IN
371,198 8 BR 317,001
9 AU 302,393 10
62
Capitalizing on SQL for Analysis

Moving window Example running total

SELECT c.country_id AS CO, t.calendar_quarter_desc
AS QUARTER, TO_CHAR (SUM(amount_sold),
'9,999,999,999') AS Q_SALES, TO_CHAR(SUM(SUM(amoun
t_sold)) OVER (PARTITION BY c.country_id ORDER BY
c.country_id, t.calendar_quarter_desc ROWS
UNBOUNDED PRECEDING), '9,999,999,999') AS
RUNNING_TOTAL FROM sales s, times t, customers
c WHERE s.time_idt.time_id AND s.cust_idc.cust_i
d AND t.calendar_year2000 GROUP BY c.country_id,
t.calendar_quarter_desc ORDER BY c.country_id,
t.calendar_quarter_desc
CO QUARTER Q_SALES RUNNING_TOTAL US 2000-Q1
21,719,528 21,719,528 US 2000-Q2 21,915,534
43,635,062 US 2000-Q3 18,857,276 62,492,338 US
2000-Q4 14,970,316 77,462,654
63
Capitalizing on SQL for Analysis
Moving window Example moving average
SELECT c.country_id AS CO, t.calendar_month_desc
AS CAL, TO_CHAR (SUM(amount_sold),
'9,999,999,999') AS SALES , TO_CHAR(AVG(SUM(amount
_sold)) OVER (ORDER BY c.country_id,
t.calendar_month_desc ROWS 2 PRECEDING),'9,999,999
,999') AS MOVING_3_MONTH FROM sales s, times t,
customers c WHERE s.time_idt.time_id AND
s.cust_idc.cust_id AND t.calendar_year2000
GROUP BY c.country_id, t.calendar_month_desc ORDE
R BY c.country_id, t.calendar_month_desc
CO CALENDAR SALES MOVING_3_MONTH --
-------- -------------- -------------- AR 2000-01
172,380 172,380 AR 2000-02
140,906 156,643 AR 2000-03 142,581
151,956 AR 2000-04 169,727
151,071 AR 2000-05 157,016
156,441 AR 2000-06 155,675 160,806
64
Capitalizing on SQL for Analysis

LAG / LEAD

TIME_ID SALES LAG1 LAG2
LEAD1 LEAD2 --------- ---------- ----------
---------- ---------- ---------- 01-JAN-00
869,132 909,726
896,626 02-JAN-00 909,726 869,132
896,626 895,204 03-JAN-00 896,626
909,726 869,132 895,204
954,066 04-JAN-00 895,204 896,626
909,726 954,066 918,154 05-JAN-00
954,066 895,204 896,626 918,154
895,849 06-JAN-00 918,154 954,066
895,204 895,849 889,525
SELECT time_id, TO_CHAR(SUM(amount_sold),'9,999,9
99') AS SALES, TO_CHAR(LAG(SUM(amount_sold),1)
OVER (ORDER BY time_id),'9,999,999') AS
LAG1, TO_CHAR(LAG(SUM(amount_sold),2) OVER
(ORDER BY time_id),'9,999,999') AS
LAG2, TO_CHAR(LEAD(SUM(amount_sold),1) OVER
(ORDER BY time_id),'9,999,999') AS
LEAD1, TO_CHAR(LEAD(SUM(amount_sold),2) OVER
(ORDER BY time_id),'9,999,999') AS LEAD2 FROM
sales WHERE time_id between TO_DATE('01-JAN-2000')
AND TO_DATE('31-JAN-2000') GROUP BY time_id
65
Capitalizing on SQL for Analysis

FIRST/LAST lets you order on column A but return
an result of an aggregate applied on column B.

List price of the product with the lowest
minimum price LPLO_MINP
Lowest minimum price LO_MINP
List price of the product with the highest
minimum price LPHI_MINP
Highest minimum price HI_MINP

SELECT prod_subcategory, MIN(prod_list_price)
KEEP (DENSE_RANK FIRST ORDER BY (prod_min_price))
AS LPLO_MINP, MIN(prod_min_price) AS LO_MINP,
MAX(prod_list_price) KEEP (DENSE_RANK LAST ORDER
BY (prod_min_price))AS LPHI_MINP, MAX(prod_min_pri
ce) AS HI_MINP FROM products WHERE
prod_category'Women' GROUP BY prod_subcategory
PROD_SUBCATEGORY LPLO_MINP LO_MINP
LPHI_MINP HI_MINP -------------------------
------------- ---------- -------------
---------- Dresses - Women 44
31.28 189 165 Easy Shapes -
Women 51 39.47 149
127.39 Knit Outfits - Women 38
17.78 138 95.63 Outerwear - Women
58 27.14 198
131.87 Shirts And Jackets - Women 19
13.68 162 145.8
66
Other Important New Features

Table Compression
Multiple Block Sizes
RAC for warehousing

67
Capitalizing on Table Compression

Table Compression
Can improve performance by reducing both disk and
memory (buffer cache) requirements.
Note tables with large amounts of DML operations
are not good candidates for compression
Ideal candidates are partitioned fact tables,
materialized views with rollups, and tables with
a high degree of redundant data
Regular tables
create table () compress
alter table compress
Partitioned tables
Can compress the entire table or on a partition
by partition basis
Create table () compress partition by ()
PARTITION p1 VALUES (FL', GA') COMPRESS
Materialized Views
CREATE MATERIALIZED VIEW QTRLY_SALES_MV COMPRESS
Alter materialized view compress

68
Capitalizing on Multiple Block Sizes

Multiple Block Size Capitalization
When you need to run a mix of OLTP activity and
DSS activity within the same instance
When you have an OLTP system with a smaller block
sizes and using transportable tablespaces to move
tables to a decision support system.
Place small static dimensions in a smaller block
cache (4K or 8K) and larger dimensions and facts
in a large block cache (16K)

69
Capitalizing on RAC
Node 1
Node 2
Node 3
Node N
High Speed Interconnect
Storage Area Network (SAN)
70
Capitalizing on RAC

RAC provides both speedup and scale up
Theoretically to double performance simply double
the number of nodes.
Limiting traffic over the interconnect is key to
performance.
Parallel Loading
Multiple SQLLoader sessions
Collocated extracts
Querying
Partition key choices
Data model design choices
Define join dependencies
Partition wise joins are key to limiting the
traffic over the interconnect.
match partitions so that they are collocated on
the same node
Oracles automatic node affinity improves
performance of DML operations by routing DML
operations to the node that has affinity for the
partition.

71
Conclusion

Parallelism is key to performance of DSS
applications
Design is the limiting factor to parallelism and
performance
Oracle has some slick new features that enhance
and simplify common warehouse operations.
The future direction of data warehousing
Better performance
Increased parallelism reduced dependencies
Loads
Queries
Reduced complexity higher user satisfaction
Better design paradigms and ideologies
Enhanced features
Increased usability
Increased ability to capitalize