Title: Jerry Held
1(No Transcript)
2How cool is that? Capitalizing on Oracle 9i for
data warehousing
Session id 36482
- Mike AmesOracle DBA
- SAS Institute
3Topics
- Performance Parallelism and Design
- Capitalizing on of Oracle 9i for DW
- External Tables
- Merge Multiple Inserts
- Partitioning
- Materialized View Enhancements
- Bitmap Join Indexes
- SQL for Analysis
- Conclusions Questions
4Performance Parallelism and Design serial
operation
- At 20MB/sec it takes 14 ½ hours to read 1TB
serially - With perfect parallelism and 16 parallel
processes it takes about 54 minutes (assuming
sustained throughput)
5Performance Parallelism and Design
- Amdahls Law in a nutshell
- Speedup (S) Efficiency gained by executing a
process in parallel - Formula for Speedup
- S 1/ ( sequential) ( parallel/
processors) overhead) - S Speedup
- N Number of Processors
- B of the process or algorithm that is serial
- S 1/ ( B (1-B/N) O )
- Example 8 processors 5 serial operations
6Performance Parallelism and Design
- Why is this important?
- Dependencies created by design and complexity
reduce our ability to parallelize data
warehouse operations. - S 1/ ( 5) (95/ 8) 0) 5.9
- Assuming perfect parallelism, a query that takes
30 minutes to execute serially would take just
over 5 minutes in parallel. - Keys to parallel performance
- Minimize dependencies
- Minimize overhead associated with complexity
7Performance Parallelism and Design
- Two things to note
- Incremental speedup by doubling the processors
is dependant on Sequential - 5 sequential with 4 processors gt 20 sequential
with 8 processors
8Performance Parallelism and Design realistic
expectations
9Performance Parallelism and Design measuring
performance
- Database performance is generally measured by
- Load Performance
- Ability to parallelize
- Order number of operations and complexity to
maintain integrity - Ability to leverage RDBMS load facilities
- Query Performance
- Ability to parallelize
- Ability to leverage partitioning
- Ability to exploit query re-write summary data
- Number of sort and join operations
- Usability
- Ability of users to capitalize on data
- Level of complexity
10Performance Parallelism and Design critical
factor
- Database design is critical for performance
- Ability to parallelize is constrained by
dependencies - Data and referential integrity
- Order of operations
- Ability to leverage RDBMS features is constrained
by design - Loading
- Partitioning
- Indexing
- Query re-write
- Materialized Views
- End user satisfaction is constrained by
complexity - Ease of use
- Data quality
11Performance Parallelism and Design
- The bottom line design dictates performance
- Good Design
- Maximizes Parallelism by minimizing dependencies
- Minimizes Complexity
- Capitalizes on features of the RDMBS (Oracle 9i)
12Capitalizing on of Oracle 9i
- External Tables
- Merge Multiple Inserts
- Partitioning
- Materialized View Enhancements
- Bitmap Join Indexes
- SQL for Analysis
13Capitalizing on External Tables
- External Tables
- Enable you to reference multiple flat files as if
they were a table on your database. - Restrictions
- Read Only no DML (INSERT, UPDATE, DELETE)
- Cant be used for partition exchange
14Capitalizing on External Tables
- Loading from flat files
- Old method
- Create a Stage Table on your warehouse
- Use SQLLoader to bulk load the table
- Read from stage table performing operations to
put data into final format. - New method
- Create a table that references the external file
- Read from the external table performing
operations to put data into final format. - Significantly reduces the number of times data
has to be moved around and virtually eliminates
the need to use SQLLoader directly.
15Capitalizing on External Tables
External Table Insert /Append/ Example
File(s)
Target
Source
- Steps
- FTP 02Nov2003_Sales Extract
- Alter Table add file(s) to location
- Perform INSERT /Append/ INTO
16Capitalizing on External Tables
CREATE TABLE NOV_SALES_EXTERNAL( PROD_ID
NUMBER(6), CUST_ID NUMBER, TIME_ID DATE,
CHANNEL_ID CHAR(1), PROMO_ID NUMBER(6),
QUANTITY_SOLD NUMBER(3), AMOUNT_SOLD
NUMBER(10,2), UNIT_COST NUMBER(10,2),
UNIT_PRICE NUMBER(10,2)) ORGANIZATION external (
TYPE oracle_loader DEFAULT DIRECTORY extracts
ACCESS PARAMETERS ( RECORDS DELIMITED BY
NEWLINE CHARACTERSET US7ASCII BADFILE
log_file_dir' NOV_SALES_EXTERNAL.bad'
LOGFILE log_file_dir' NOV_SALES_EXTERNAL.log'
FIELDS TERMINATED BY "" LDRTRIM ) location
( '01Nov2003Sales.dat) )REJECT LIMIT UNLIMITED
Step 0. Create a target Table and an external
extract table Step 1. FTP Nov 2nd and 3rd extracts
17Capitalizing on External Tables
- Step 2. Alter External Table adding the new files
to the location. - ALTER TABLE NOV_SALES_EXTERNAL
- LOCATION ('01Nov2003Sales.dat',
'02Nov2003Sales.dat', '03Nov2003Sales.dat') - Step 3. Insert into target table from external
table - INSERT / APPEND/ INTO SALES_FACT
- SELECT FROM NOV_SALES_EXTERNAL
18Capitalizing on External Tables
Sales Extract External Table
Sales Table
INSERT/APPEND/ INTO SALES SELECT FROM
SALES_EXTRACT_EXTERNAL
19Capitalizing on External Tables
- Fixed-length fields are processed faster than
delimited fields or records terminated by a
string. - Single-character delimiters for record
terminators and field delimiters are faster to
process than multi-character delimiters. - Using INSERT/APPEND/ with either key lookup
functions or joins (for surrogate key lookup) is
a great method to load fact tables.
20Capitalizing on merge
- Merge
- Enables you to perform updates to matched records
and inserts of new records. - Leverages parallelism
- Is a slick way of performing slowly changing
dimension logic.
21Capitalizing on merge
Customer Extract External
Customer Existing Table
Customer Post Merge
MERGE INTO CUSTOMERS USING ( CUST_EXT x) WHEN
MATCHED THEN UPDATE SET city x.city WHEN NOT
MATCHED THEN INSERT (CUST_ID) VALUES(x.cust_id)
New
Matched updated
22Capitalizing on merge
Typical type 1 slowly changing dimension (SCD)
logic
Source Extract
Update Existing
New Changed Records
Compare
Generate Key
Master Dimension Cross Reference
Insert New
23Capitalizing on merge for type 1 SCD
- Capitalize on MERGE for Type 1 SCDs
- Steps
- Create your dimension table
- Create an external table as your extract table
that contains all of the columns in your
dimension except the surrogate key - Create an Oracle sequence this will be used for
the surrogate key of your dimension - Use MERGE to populate your dimension
24Capitalizing on merge for type 1 SCD
- Create our Dimension (Target) table
Note cust_key is our the surrogate key
cust_id is the natural key or production key
CREATE TABLE CUSTOMER_DIM ( CUST_KEY
NUMBER NOT NULL, CUST_ID NUMBER
NOT NULL, CUST_NAME VARCHAR2(20) NOT
NULL, ZIP CHAR(5) NOT NULL,
CITY VARCHAR2(30) NOT NULL,
STATE VARCHAR2(40) NULL, CONSTRAINT CUSTOMER_PK
PRIMARY KEY (CUST_KEY) )
25Capitalizing on merge for type 1 SCD
- Create our extract or staging table
2a. Running a new extract is simply a matter of
referencing a new file ALTER TABLE
customer_extract LOCATION (CIF_NOV_2003.dat)
CREATE TABLE CUSTOMER_EXTRACT ( CUST_KEY
NUMBER, CUST_ID
NUMBER, CUST_NAME VARCHAR2(20),
ZIP CHAR(5) , CITY
VARCHAR2(30) , STATE VARCHAR2(40)) ORGAN
IZATION EXTERNAL ( TYPE oracle_loader DEFAULT
DIRECTORY extracts_dir ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE CHARACTERSET
US7ASCII BADFILE log_file_dir'customer_extract.ba
d' LOGFILE log_file_dir'customer_extract.log'
FIELDS TERMINATED BY "" LDRTRIM ) location
( 'cust_extract.dat' )) REJECT LIMIT UNLIMITED
PARALLEL
26Capitalizing on merge for type 1 SCD
- Create our Sequence this will used for the
surrogate key
CREATE SEQUENCE CUST_SEQ START WITH 1000
INCREMENT BY 1000
27Capitalizing on merge for type 1 SCD
Capitalizing on merge
- Use a single MERGE statement to perform our type
1 SCD logic
Customer Extract External
Existing Customer Dimension
MERGE INTO CUSTOMER_DIM c USING CUSTOMER_EXTRACT
X ON (c.cust_id x.cust_id) WHEN MATCHED
THEN UPDATE SET CUST_FIRST_NAME
X.CUST_FIRST_NAME, CUST_LAST_NAME
X.CUST_LAST_NAME,. CUST_EMAIL
X.CUST_EMAIL WHEN NOT MATCHED THEN INSERT
( CUST_KEY, CUST_ID,) VALUES( CUST_SEQ.NEXTVAL, X
.CUST_ID, X.CUST_EMAIL)
Updates
Matched updated
Customer Dimension Post Merge
New
New
28Capitalizing on merge for type 2 SCD
- Type 2 Slowly Changing Dimension (SCD)
- Type 2 SCD a technique where a new dimension
record is created with a new surrogate key each
to reflect the change - We can do this quite simply a single merge
statement simply add the change columns to the ON
() portion of the merge.
MERGE INTO CUSTOMER_DIM c USING CUSTOMER_EXTRACT
X ON (c.cust_id x.cust_id and c.cityx.city and
c.statex.state and c.zip x.zip) WHEN MATCHED
THEN UPDATE SET CUST_NAME X.CUST_NAME, City
X.CITY, STATEX.STATE, ZipX.ZIP. WHEN NOT
MATCHED THEN INSERT ( CUST_KEY, CUST_ID, CUST_NAME
, CITY, STATE, ZIP, CREATE_DATE) VALUES( CUST_SEQ
.NEXTVAL, / CUST_KEY / X.CUST_ID, X.CUST_NAME T
RUNC(SYSDATE)) / CREATE_DATE /
Change Columns
Change Pointer
Surrogate Key
29Capitalizing on merge for type 2 SCD
- A more common approach to Type 2 Slowly Changing
Dimension logic is the addition of change
pointers for reference data, Unfortunately, this
requires a multi-step process. - This is generally performed with a series of
insert and update statements or procedural
logic but can be accomplished with two merge
statements as well. - One to insert new records and update existing
(close out) - One to insert new changed records
Surrogate Key
Change Pointers
30Capitalizing on merge for type 2 SCD
1
2
/ First Merge Close out existing, Insert New /
MERGE INTO CUST_DIM C USING (SELECT cust_id,
cust_name, city, state, zip FROM CUST_EXTRACT
MINUS SELECT cust_id, cust_name, city,
state, zip FROM CUST_DIM WHERE CURRENT_FLAG
'Y') X ON (C.CUST_ID X.CUST_ID AND
C.END_DATEto_date('15-JAN-2099','DD-MON-YYYY')) W
HEN MATCHED THEN UPDATE SET c.current_flag
'N' WHEN NOT MATCHED THEN INSERT
(CUST_KEY,CUST_ID,CUST_NAME, CITY,STATE,ZIP,START_
DATE,END_DATE, CURRENT_FLAG) VALUES(
CUST_SEQ.NEXTVAL, X.CUST_ID, X.CUST_NAME, X.CITY,
X.STATE, X.ZIP, trunc(SYSDATE), TO_DATE(01-JAN-2
099','DD-MON-YYYY'), 'Y') COMMIT
/ Second Merge Insert new changed record
/ MERGE INTO CUST_DIM C USING ( SELECT cust_id,
cust_name, city, state, zip FROM CUST_EXTRACT
MINUS SELECT cust_id, cust_name, city, state, zip
FROM CUST_DIM) X ON (C.CUST_ID X.CUST_ID AND
C.CURRENT_FLAG'Y') WHEN MATCHED THEN UPDATE SET
c.END_DATEtrunc(SYSDATE -1) WHEN NOT MATCHED
THEN INSERT (CUST_KEY,CUST_ID,CUST_NAME, CITY,STAT
E,ZIP,START_DATE,END_DATE, CURRENT_FLAG) VALUES(
CUST_SEQ.NEXTVAL, X.CUST_ID,X.CUST_NAME,X.CITY, X
.STATE,X.ZIP, trunc(SYSDATE), TO_DATE(01-JAN-2099
','DD-MON-YYYY'), 'Y') COMMIT
3
/ Final step Date Closeout / UPDATE CUST_DIM
C SET c.END_DATEtrunc(SYSDATE -1) WHERE
C.CURRENT_FLAG 'N' AND c.END_DATETO_DATE(01-JA
N-2099','DD-MON-YYYY') commit
31Capitalizing on merge for type 2 SCD
Customer Extract External
Take note of how a single table design decision
has impacted our ability to parallelize. Instead
of a single merge statement we now have a three
step process.
Customer Existing Dimension
Customer Post Type 2 Merge (s)
32Capitalizing on multiple inserts
- Multiple Inserts
- Enables us to conditionally insert into multiple
tables in parallel. - All result set is applied to all conditions
- First result set is applied to the first
condition - Leverages parallelism
- Is a slick way to segment data and load fact
tables.
33Capitalizing on multiple inserts
GA_SALES
INSERT ALL WHEN state'GA or state FL THEN
INTO GA_SALES VALUES(prod_id, cust_id,sale_date,s
ale_amount qty_sold) WHEN state 'FL' THEN INTO
FL_SALES VALUES(prod_id, cust_id,sale_date,sale_a
mount qty_sold) ELSE INTO ALL_OTHER_SALES
VALUES(prod_id, cust_id,sale_date,sale_amount
qty_sold) SELECT prod_id, cust_id,sale_date,sale_a
mount qty_sold FROM sales_extract
FL_SALES
ALL_OTHER_SALES
Query
34Capitalizing on multiple inserts
GA_SALES
INSERT FIRST WHEN state 'GA OR state FL
THEN INTO GA_SALES VALUES(prod_id,
cust_id,sale_date,sale_amount qty_sold) WHEN
state 'FL' THEN INTO FL_SALES VALUES(prod_id,
cust_id,sale_date,sale_amount qty_sold) ELSE INTO
ALL_OTHER_SALES VALUES(prod_id,
cust_id,sale_date,sale_amount qty_sold) SELECT
prod_id, cust_id,sale_date,sale_amount qty_sold
FROM sales_extract
FL_SALES
ALL_OTHER_SALES
35Capitalizing on - Partitioning
- Types of Partitioning
- Range - maps rows to partitions based on ranges
of column values - List (New) - enables you to explicitly control
how rows map to partitions. - Hash - evenly distributes rows among partitions
- Composite
- Range-Hash benefits of range partitioning then
further hash distributing the sub-partition. - Partition pruning Parallel processing
- Range List (New for 9i) benefits of range
partitioning and further discrete
sub-partitioning.
36Capitalizing on - Partitioning
- Why is partitioning important
- partitioning enables you to split large volumes
of data into smaller separate buckets that can be
managed independently - Partition Pruning / Elimination
- Partition-wise Joins
- Parallel DML
- Partition Exchanging / Swapping
37Capitalizing on Oracle 9i partitioning
Jan 2003
- Why Partition?
- Partition pruning
- Ability to eliminate partitions that dont
satisfy query conditions
Feb 2003
Mar 2003
SELECT sum(qty_sold) FROM sales WHERE sale_date
BETWEEN Feb 1, 2003 and Feb 15, 2003
38Capitalizing on Oracle 9i partitioning
- Why Partition?
- Partition-wise joins
- Full Equi-partitioned on the join keys i.e. the
two tables are both partitioned on the same key.
Hash-Hash is the easiest example. - Partial Oracle dynamically repartitions based
on the reference table.
Here accounts and transactions are both hash
partitioned by account_id into 32 partitions
Note to achieve equal work distribution, the
number of partitions should always be a multiple
of the degree of parallelism. Ex. Here we hashed
account and transaction into 32 partitions with a
degree of parallelism 8
39Capitalizing on Oracle 9i partitioning
ACCT ID NAME
ACCT ID Price QTY
Partition-wise Join
40Capitalizing on - Partitioning
PQ1
Partition Exchanging
Stock Extracts
PQ2
PQ3
ALTER TABLE stock_sales EXCHANGE PARTITION PQ3
WITH stock_extracts Does not work with
external tables
41Capitalizing on - Partitioning
Range Example
CREATE TABLE account_balance_range( account_key NU
MBER(7) CONSTRAINT acct_nn NOT NULL, branch_key
NUMBER(7) CONSTRAINT brch_nn NOT
NULL, product_key NUMBER(7) CONSTRAINT prod_nn
NOT NULL, snapshot_date DATE CONSTRAINT mnth_nn
NOT NULL, state_key CHAR(2) CONSTRAINT stat_nn
NOT NULL, ending_bal NUMBER(7,3) CONSTRAINT
ebal_nn NOT NULL, average_daily_bal
NUMBER(7,3) CONSTRAINT abal_nn NOT
NULL, transaction_count NUMBER(7) CONSTRAINT
txnc_nn NOT NULL, interest_paid
NUMBER(7,3) CONSTRAINT intp_nn NOT
NULL, fees_charged NUMBER(7,3) CONSTRAINT
feec_nn NOT NULL) PARTITION BY RANGE
(snapshot_date)( PARTITION Q1_ACCT_BAL VALUES
LESS THAN (TO_DATE('01-APR-2000','DD-MON-YYYY')),
PARTITION Q2_ACCT_BAL VALUES LESS THAN
(TO_DATE('01-JUL-2000','DD-MON-YYYY')), PARTITION
Q3_ACCT_BAL VALUES LESS THAN (TO_DATE('01-OCT-200
0','DD-MON-YYYY')), PARTITION Q4_ACCT_BAL VALUES
LESS THAN (TO_DATE('01-JAN-2001','DD-MON-YYYY')))
42Slick New Feature - Partitioning
List Example
CREATE TABLE account_balance_list( account_key NUM
BER(7) CONSTRAINT acct_nn NOT NULL, branch_key
NUMBER(7) CONSTRAINT brch_nn NOT
NULL, product_key NUMBER(7) CONSTRAINT prod_nn
NOT NULL, snapshot_date DATE CONSTRAINT mnth_nn
NOT NULL, state_key CHAR(2) CONSTRAINT stat_nn
NOT NULL, ending_bal NUMBER(7,3) CONSTRAINT
ebal_nn NOT NULL, average_daily_bal
NUMBER(7,3) CONSTRAINT abal_nn NOT
NULL, transaction_count NUMBER(7) CONSTRAINT
txnc_nn NOT NULL, interest_paid
NUMBER(7,3) CONSTRAINT intp_nn NOT
NULL, fees_charged NUMBER(7,3) CONSTRAINT
feec_nn NOT NULL) PARTITION BY LIST
(state_key)( PARTITION northwest VALUES ('OR',
'WA'), PARTITION southwest VALUES ('AZ', 'UT',
'NM'), PARTITION southeast VALUES
('FL','GA','SC','AL','TN','NC'), PARTITION rest
VALUES (DEFAULT)))
43Capitalizing on - Partitioning
Hash Example
CREATE TABLE account_balance_hash ( account_key NU
MBER(7) CONSTRAINT acct_nn NOT NULL, branch_key
NUMBER(7) CONSTRAINT brch_nn NOT
NULL, product_key NUMBER(7) CONSTRAINT prod_nn
NOT NULL, snapshot_date DATE CONSTRAINT mnth_nn
NOT NULL, state_key CHAR(2) CONSTRAINT stat_nn
NOT NULL ending_bal NUMBER(7,3) CONSTRAINT
ebal_nn NOT NULL, average_daily_bal
NUMBER(7,3) CONSTRAINT abal_nn NOT
NULL, transaction_count NUMBER(7) CONSTRAINT
txnc_nn NOT NULL, interest_paid
NUMBER(7,3) CONSTRAINT intp_nn NOT
NULL, fees_charged NUMBER(7,3) CONSTRAINT
feec_nn NOT NULL ) PARTITION BY HASH
(account_key)( PARTITIONS 16 STORE IN (TS1_DATA,
TS2_DATA, TS3_DATA, TS4_DATA)
44Capitalizing on - Partitioning
Composite Range-Hash Example
- CREATE TABLE account_bal_range_hash(
- account_key NUMBER(7) CONSTRAINT acct_nn NOT
NULL, - branch_key NUMBER(7) CONSTRAINT brch_nn NOT
NULL, - product_key NUMBER(7) CONSTRAINT prod_nn NOT
NULL, - snapshot_date DATE CONSTRAINT mnth_nn NOT NULL,
- state_key CHAR(2) CONSTRAINT stat_nn NOT NULL,
- ending_bal NUMBER(7,3) CONSTRAINT ebal_nn NOT
NULL, - average_daily_bal NUMBER(7,3) CONSTRAINT abal_nn
NOT NULL, - transaction_count NUMBER(7) CONSTRAINT txnc_nn
NOT NULL, - interest_paid NUMBER(7,3) CONSTRAINT intp_nn
NOT NULL, - fees_charged NUMBER(7,3) CONSTRAINT feec_nn NOT
NULL) - PARTITION BY RANGE (snapshot_date)
- SUBPARTITION BY HASH (account_key)
- SUBPARTITIONS 8
- ( PARTITION Q1_ACCT_BAL
- VALUES LESS THAN (TO_DATE('01-APR-2000','DD-MON-Y
YYY')), - PARTITION Q2_ACCT_BAL
- VALUES LESS THAN (TO_DATE('01-JUL-2000','DD-MON-Y
YYY')), - PARTITION Q3_ACCT_BAL
45Capitalizing on - Partitioning
Composite Range-List Example
- CREATE TABLE account_bal_range_list(
- account_key NUMBER(7) CONSTRAINT acct_nn NOT
NULL, - branch_key NUMBER(7) CONSTRAINT brch_nn NOT
NULL, - product_key NUMBER(7) CONSTRAINT prod_nn NOT
NULL, - snapshot_date DATE CONSTRAINT mnth_nn NOT NULL,
- state_key CHAR(2) CONSTRAINT stat_nn NOT NULL,
- ending_bal NUMBER(7,3) CONSTRAINT ebal_nn NOT
NULL, - average_daily_bal NUMBER(7,3) CONSTRAINT abal_nn
NOT NULL, - transaction_count NUMBER(7) CONSTRAINT txnc_nn
NOT NULL, - interest_paid NUMBER(7,3) CONSTRAINT intp_nn
NOT NULL, - fees_charged NUMBER(7,3) CONSTRAINT feec_nn NOT
NULL) - PARTITION BY RANGE (snapshot_date)
- SUBPARTITION BY LIST (state)
- SUBPARTITION TEMPLATE(
- PARTITION northwest VALUES ('OR', 'WA'),
- PARTITION southwest VALUES VALUES ('AZ', 'UT',
'NM'), - PARTITION southeast VALUES ('FL',
'GA','SC','AL','TN','NC'), - PARTITION rest VALUES (DEFAULT)))(
- PARTITION q1_2002 VALUES LESS THAN(TO_DATE('1-APR-
2002','DD-MON-YYYY')),
46Capitalizing on - Partitioning
External Table to Partition using INSERT
/APPEND/
File(s)
Target
Source
- FTP 02Nov2003_Sales Extract
- Alter Table add file(s) to location
- Perform INSERT /Append/ INTO partitioned table
47Capitalizing on Partitioning
PQ1
Sales Extract External
PQ2
INSERT/APPEND/ INTO SALES (PQ3) SELECT FROM
SALES_EXTRACT_EXTERNAL
PQ3
48Capitalizing on Partitioning
- What to partition
- Fact Tables
- Generally Range-Hash composite
- Range for some date (partition elimination)
- Hash on the driving dimension key (partition-wise
join) - Dimension
- Hashing on the primary key of dimension tables
facilitates full and partial partition-wise
joins. - To for a full partition-wise join between a fact
and dimension table you need to hash partition on
the same key in the same number of buckets. - Materialized Views
- Generally Range-Hash composite
- Generally mirror the fact tables partition scheme
49Capitalizing on Materialized Views
- Materialized Views
- Enable queries to be re-written to take advantage
of pre-calculated summaries thus reducing or
eliminating sorts and joins. - Materialized Views can dramatically increase
performance of queries when applied judiciously - Reduce the number of sorts
- Reduce the number of joins
- Pre-filters data
- Can be Indexed and Partitioned
- Seamless to end users
- Enhancements in 9i include
- Removed restrictions enabling them to be
leveraged in more situations - Fast refresh is now possible on a materialized
views containing the UNION ALL operator.
50Capitalizing on Materialized Views
SELECT T.FISCAL_QUARTER_DESC, C.CUST_CITY,
COUNT() AS SALE_COUNT, SUM(S.AMOUNT_SOLD) AS
SALE_DOLLARS FROM oradata.CUSTOMERS C,
oradata.SALES S, oradata.TIMES T WHERE C.CUST_ID
S.CUST_ID AND T.TIME_ID S.TIME_ID GROUP BY
T.FISCAL_QUARTER_DESC, C.CUST_CITY
Query is re-written to be resolved from the MV
instead of from the base tables
51Capitalizing on Materialized Views
- How to capitalize on MVs
- Identify candidate queries
- Analysis of common queries based on design
- Oracles Summary Advisor Wizard
- DBMS_OLAP
- Create MVs based on analysis and refresh
requirements. - Test
- Benchmark
- Measure Utilization
- Repeat
52Capitalizing on Materialized Views
proc sql / CTAS Implicit Pass-Through / CREATE
TABLE work.quartly_city_canidate AS SELECT
T.FISCAL_QUARTER_DESC, C.CUST_CITY, COUNT() AS
SALE_COUNT, SUM(S.AMOUNT_SOLD) AS
SALE_DOLLARS FROM oradata.CUSTOMERS C,
oradata.SALES S, oradata.TIMES T WHERE C.CUST_ID
S.CUST_ID AND T.TIME_ID S.TIME_ID GROUP BY
T.FISCAL_QUARTER_DESC, C.CUST_CITY quit
real time 9.62 seconds
Execution Plan -----------------------------------
----------------------- 0 SELECT
STATEMENT OptimizerCHOOSE (Cost5392 Card7453
Bytes350291) 1 0 SORT (GROUP BY)
(Cost5392 Card7453 Bytes350291) 2 1
HASH JOIN (Cost1077 Card1016271
Bytes47764737) 3 2 TABLE ACCESS
(FULL) OF 'TIMES' (Cost6 Card1461 Bytes23376)
4 2 HASH JOIN (Cost1043 Card1016271
Bytes31504401) 5 4 TABLE ACCESS
(FULL) OF 'CUSTOMERS' (Cost106 Card50000
Bytes700000) 6 4 PARTITION RANGE
(ALL) 7 6 TABLE ACCESS (FULL) OF
'SALES' (Cost469 Card1016271 Bytes17276607) Sta
tistics ------------------------------------------
---------------- 5855 consistent gets 1 sorts
(memory) 0 sorts (disk) 5075 rows processed
Note the Number of Joins and Sorts, the amount of
memory, and the number of full table scans.
53Capitalizing on Materialized Views
proc sql connect to ORACLE as ORACON (usersh
passwordsh1 pathdemo.na.sas.com') / Create a
Materialized View / execute ( CREATE
MATERIALIZED VIEW QTRLY_CITY_SALES_MV compress BUI
LD IMMEDIATE REFRESH ON COMMIT ENABLE QUERY
REWRITE AS SELECT T.FISCAL_QUARTER_DESC,
C.CUST_CITY, COUNT() SALE_COUNT,
SUM(S.AMOUNT_SOLD) SALE_DOLLARS FROM CUSTOMERS
C, SALES S, TIMES T WHERE C.CUST_ID S.CUST_ID
AND T.TIME_ID S.TIME_ID GROUP BY
T.FISCAL_QUARTER_DESC, C.CUST_CITY ) by
ORACON disconnect from ORACON quit
Build Immediate Create this Now Refresh on
Commit Keep the MV current when Insert, Update,
and Deletes occur Enable Query Rewrite Enables
dynamic query re-direction. Compress
compresses redundant data i.e. makes the MV
smaller
Question did it improve the performance of our
query?
54Capitalizing on Materialized Views
proc sql / CTAS Implicit Pass-Through / CREATE
TABLE work.quartly_city_canidate AS SELECT
T.FISCAL_QUARTER_DESC, C.CUST_CITY, COUNT() AS
SALE_COUNT, SUM(S.AMOUNT_SOLD) AS
SALE_DOLLARS FROM oradata.CUSTOMERS C,
oradata.SALES S, oradata.TIMES T WHERE C.CUST_ID
S.CUST_ID AND T.TIME_ID S.TIME_ID GROUP BY
T.FISCAL_QUARTER_DESC, C.CUST_CITY quit
Real Time 0.76 seconds
Execution Plan -----------------------------------
----------------------- 0 SELECT
STATEMENT OptimizerCHOOSE (Cost4 Card2206
Bytes114712) 1 0 TABLE ACCESS (FULL) OF
'QTRLY_CITY_SALES_MV' (Cost4 Card2206
Bytes114712) Statistics -------------------------
--------------------------------- 7
recursive calls 0 db block gets
366 consistent gets 0 physical
reads 0 redo size 137260 bytes
sent via SQLNet to client 2641 bytes
received via SQLNet from client 340
SQLNet roundtrips to/from client 0
sorts (memory) 0 sorts (disk)
5075 rows processed
No Joins, No Sorts, Less Memory
55Capitalizing on Alt aggregate strategies
- Aggregate building with pCTAS
CREATE TABLE quartly_city_passthrough PARALLEL
NOLOGGING as SELECT T.FISCAL_QUARTER_DESC,
C.CUST_CITY, COUNT() AS SALE_COUNT,
SUM(S.AMOUNT_SOLD) AS SALE_DOLLARS FROM
CUSTOMERS C, SALES S, TIMES T WHERE
C.CUST_ID S.CUST_ID AND T.TIME_ID
S.TIME_ID GROUP BY T.FISCAL_QUARTER_DESC,
C.CUST_CITY
56Capitalizing on Alt aggregate strategies
- Aggregate building with pIIAS
INSERT / APPEND / INTO quartly_city_iias SELECT
T.FISCAL_QUARTER_DESC, C.CUST_CITY, COUNT() AS
SALE_COUNT, SUM(S.AMOUNT_SOLD) AS
SALE_DOLLARS FROM CUSTOMERS C, SALES S, TIMES
T WHERE C.CUST_ID S.CUST_ID AND T.TIME_ID
S.TIME_ID GROUP BY T.FISCAL_QUARTER_DESC,
C.CUST_CITY
57Capitalizing on bitmap join index
- Bitmap Join Index
- Creates a bitmap index for the resolution of
joins of two or more tables. Works similar to a
materialized view. - Bitmap join index is space efficient because it
compresses the rowids where a materialized view
does not. - Can be leveraged to improve performance of
snowflake schemas and common join operations
across facts.
58Capitalizing on bitmap join index
SELECT countries.country_name, sum(sales.amount_so
ld) FROM sales, customers, countries WHERE
sales.cust_id customers.cust_id AND
customers.country_id countries.country_id
CREATE BITMAP INDEX bjx_sales_country ON
sales(countries.country_name)) FROM sales,
customers, countries WHERE sales.cust_id
customers.cust_id AND countries.country_id
customers.country_id LOCAL PARALLEL NOLOGGING
COMPUTE STATISTICS
Can be leveraged to improve performance of
snowflake query problems
59Capitalizing on SQL for Analysis
- SQL for Analysis
- Especially useful for reporting and preparing
data sets for statistical analysis - Rankings and percentiles
- cumulative distributions, percent rank, and
N-tiles. - Moving window calculations
- allow you to find moving and cumulative
aggregations, such as sums and averages. - Lag/lead analysis
- enables direct inter-row references so you can
calculate period-to-period changes. - First/last analysis
- first or last value in an ordered group.
60Capitalizing on SQL for Analysis
- RANK
- RANK ( ) OVER ( query_partition_clause
order_by_clause ) - DENSE_RANK ( ) OVER ( query_partition_clause
order_by_clause )
SELECT country_id, TO_CHAR(SUM(amount_sold),
'9,999,999,999') Sales_Total, RANK() OVER (ORDER
BY SUM(amount_sold) DESC NULLS LAST) AS
sales_leader FROM sales, products, customers,
times, channels WHERE sales.prod_idproducts.prod_
id AND sales.cust_idcustomers.cust_id
AND sales.time_idtimes.time_id
AND sales.channel_idchannels.channel_id
AND times.calendar_month_desc IN ('2000-09',
'2000-10') GROUP BY country_id
CO SALES_TOTAL SALES_LEADER -- --------------
------------ US 13,333,510 1 NL
7,174,053 2 UK 6,421,240
3 DE 6,346,440 4 FR
4,404,921 5 ES 1,699,209
6 IE 1,549,407 7 IN
732,502 8 AU 632,475
9 BR 606,281 10
61Capitalizing on SQL for Analysis
SELECT FROM (SELECT country_id, TO_CHAR(SUM(amou
nt_sold), '9,999,999,999') Sales_Total, RANK()
OVER (ORDER BY SUM(amount_sold) DESC NULLS LAST)
AS sales_leader FROM sales, products, customers,
times, channels WHERE sales.prod_idproducts.prod_
id AND sales.cust_idcustomers.cust_id
AND sales.time_idtimes.time_id
AND sales.channel_idchannels.channel_id AND
times.calendar_month_desc IN ('2000-09',
'2000-10') GROUP BY country_id ) / inline view
/ WHERE COUNTRY_RANK lt 10
CO SEPT_TOTAL COUNTRY_RANK -- --------------
------------ US 6,517,786 1 NL
3,447,121 2 UK 3,207,243
3 DE 3,194,765 4 FR
2,125,572 5 ES 777,453
6 IE 770,758 7 IN
371,198 8 BR 317,001
9 AU 302,393 10
62Capitalizing on SQL for Analysis
- Moving window Example running total
SELECT c.country_id AS CO, t.calendar_quarter_desc
AS QUARTER, TO_CHAR (SUM(amount_sold),
'9,999,999,999') AS Q_SALES, TO_CHAR(SUM(SUM(amoun
t_sold)) OVER (PARTITION BY c.country_id ORDER BY
c.country_id, t.calendar_quarter_desc ROWS
UNBOUNDED PRECEDING), '9,999,999,999') AS
RUNNING_TOTAL FROM sales s, times t, customers
c WHERE s.time_idt.time_id AND s.cust_idc.cust_i
d AND t.calendar_year2000 GROUP BY c.country_id,
t.calendar_quarter_desc ORDER BY c.country_id,
t.calendar_quarter_desc
CO QUARTER Q_SALES RUNNING_TOTAL US 2000-Q1
21,719,528 21,719,528 US 2000-Q2 21,915,534
43,635,062 US 2000-Q3 18,857,276 62,492,338 US
2000-Q4 14,970,316 77,462,654
63Capitalizing on SQL for Analysis
Moving window Example moving average
SELECT c.country_id AS CO, t.calendar_month_desc
AS CAL, TO_CHAR (SUM(amount_sold),
'9,999,999,999') AS SALES , TO_CHAR(AVG(SUM(amount
_sold)) OVER (ORDER BY c.country_id,
t.calendar_month_desc ROWS 2 PRECEDING),'9,999,999
,999') AS MOVING_3_MONTH FROM sales s, times t,
customers c WHERE s.time_idt.time_id AND
s.cust_idc.cust_id AND t.calendar_year2000
GROUP BY c.country_id, t.calendar_month_desc ORDE
R BY c.country_id, t.calendar_month_desc
CO CALENDAR SALES MOVING_3_MONTH --
-------- -------------- -------------- AR 2000-01
172,380 172,380 AR 2000-02
140,906 156,643 AR 2000-03 142,581
151,956 AR 2000-04 169,727
151,071 AR 2000-05 157,016
156,441 AR 2000-06 155,675 160,806
64Capitalizing on SQL for Analysis
TIME_ID SALES LAG1 LAG2
LEAD1 LEAD2 --------- ---------- ----------
---------- ---------- ---------- 01-JAN-00
869,132 909,726
896,626 02-JAN-00 909,726 869,132
896,626 895,204 03-JAN-00 896,626
909,726 869,132 895,204
954,066 04-JAN-00 895,204 896,626
909,726 954,066 918,154 05-JAN-00
954,066 895,204 896,626 918,154
895,849 06-JAN-00 918,154 954,066
895,204 895,849 889,525
SELECT time_id, TO_CHAR(SUM(amount_sold),'9,999,9
99') AS SALES, TO_CHAR(LAG(SUM(amount_sold),1)
OVER (ORDER BY time_id),'9,999,999') AS
LAG1, TO_CHAR(LAG(SUM(amount_sold),2) OVER
(ORDER BY time_id),'9,999,999') AS
LAG2, TO_CHAR(LEAD(SUM(amount_sold),1) OVER
(ORDER BY time_id),'9,999,999') AS
LEAD1, TO_CHAR(LEAD(SUM(amount_sold),2) OVER
(ORDER BY time_id),'9,999,999') AS LEAD2 FROM
sales WHERE time_id between TO_DATE('01-JAN-2000')
AND TO_DATE('31-JAN-2000') GROUP BY time_id
65Capitalizing on SQL for Analysis
- FIRST/LAST lets you order on column A but return
an result of an aggregate applied on column B.
- List price of the product with the lowest
minimum price LPLO_MINP - Lowest minimum price LO_MINP
- List price of the product with the highest
minimum price LPHI_MINP - Highest minimum price HI_MINP
SELECT prod_subcategory, MIN(prod_list_price)
KEEP (DENSE_RANK FIRST ORDER BY (prod_min_price))
AS LPLO_MINP, MIN(prod_min_price) AS LO_MINP,
MAX(prod_list_price) KEEP (DENSE_RANK LAST ORDER
BY (prod_min_price))AS LPHI_MINP, MAX(prod_min_pri
ce) AS HI_MINP FROM products WHERE
prod_category'Women' GROUP BY prod_subcategory
PROD_SUBCATEGORY LPLO_MINP LO_MINP
LPHI_MINP HI_MINP -------------------------
------------- ---------- -------------
---------- Dresses - Women 44
31.28 189 165 Easy Shapes -
Women 51 39.47 149
127.39 Knit Outfits - Women 38
17.78 138 95.63 Outerwear - Women
58 27.14 198
131.87 Shirts And Jackets - Women 19
13.68 162 145.8
66Other Important New Features
- Table Compression
- Multiple Block Sizes
- RAC for warehousing
67Capitalizing on Table Compression
- Table Compression
- Can improve performance by reducing both disk and
memory (buffer cache) requirements. - Note tables with large amounts of DML operations
are not good candidates for compression - Ideal candidates are partitioned fact tables,
materialized views with rollups, and tables with
a high degree of redundant data - Regular tables
- create table () compress
- alter table compress
- Partitioned tables
- Can compress the entire table or on a partition
by partition basis - Create table () compress partition by ()
- PARTITION p1 VALUES (FL', GA') COMPRESS
- Materialized Views
- CREATE MATERIALIZED VIEW QTRLY_SALES_MV COMPRESS
- Alter materialized view compress
68Capitalizing on Multiple Block Sizes
- Multiple Block Size Capitalization
- When you need to run a mix of OLTP activity and
DSS activity within the same instance - When you have an OLTP system with a smaller block
sizes and using transportable tablespaces to move
tables to a decision support system. - Place small static dimensions in a smaller block
cache (4K or 8K) and larger dimensions and facts
in a large block cache (16K)
69Capitalizing on RAC
Node 1
Node 2
Node 3
Node N
High Speed Interconnect
Storage Area Network (SAN)
70Capitalizing on RAC
- RAC provides both speedup and scale up
- Theoretically to double performance simply double
the number of nodes. - Limiting traffic over the interconnect is key to
performance. - Parallel Loading
- Multiple SQLLoader sessions
- Collocated extracts
- Querying
- Partition key choices
- Data model design choices
- Define join dependencies
- Partition wise joins are key to limiting the
traffic over the interconnect. - match partitions so that they are collocated on
the same node - Oracles automatic node affinity improves
performance of DML operations by routing DML
operations to the node that has affinity for the
partition.
71Conclusion
- Parallelism is key to performance of DSS
applications - Design is the limiting factor to parallelism and
performance - Oracle has some slick new features that enhance
and simplify common warehouse operations. - The future direction of data warehousing
- Better performance
- Increased parallelism reduced dependencies
- Loads
- Queries
- Reduced complexity higher user satisfaction
- Better design paradigms and ideologies
- Enhanced features
- Increased usability
- Increased ability to capitalize
72Next Steps.
- Recommended sessions
- Optimal Usage of Oracle's Partitioning Option
- Oracle9i The Features They Didn't Tell You About
- Advanced PL/SQL and Oracle9i ETL
- Oracle 9i RAC Concepts and Implementation - A
Practical Perspective
73A
74(No Transcript)