Title: Advanced%20PL/SQL%20and%20Oracle%20ETL
1Open World 2003
- Advanced PL/SQL and Oracle ETL
- Doug Cosman
- Senior Oracle DBA
- SageLogix, Inc.
- doug_cosman_at_yahoo.com
2Agenda
- Overview of Oracle 9i ETL
- Provides Fast Transformations Using Only the 9i
DB - Advanced PL/SQL Features Necessary for
Understanding Oracle 9i ETL - PL/SQL Performance Techniques for Data Warehouse
Environments
3What Is ETL?
- Extract
- Pull the Data From the Source
- Transform
- Convert the Input Format to the Target Format
- Encode any Values
- Load
- Insert the Transformed Data to the Target Tables
4Oracle 9i ETL
- Extract
- Oracle 9i External Tables
- Transform
- PL/SQL Pipelined Table Functions
- Oracle Warehouse Builder Can also be Used to
Build Pipelined Table Functions - Maps Source Data Layout and Target Schema and
Builds PL/SQL and SQL Code - Load
- Direct Path Inserts
5Performance Factors
- SQL Execution Time
- Efficiency of Execution Plan
- Hardware Resource Waits
- Code Logic Execution Time
- Speed of Host Language
- Variable Binding
- Time to Bind Values Back to Host Language
6PL/SQL Binding
- Types of Binds
- IN-Binds
- Bind Values From Host Language to SQL Engine
- OUT-Binds
- Values are Returned from SQL Objects to Host
Variables - Bind Options
- Single Row Binds
- Bulk Binds
7Single Row Binds
- Cursor FOR-LOOP
- DECLARE
- CURSOR cust_cur (p_customer_id NUMBER) IS
- SELECT FROM f_sales_detail
- WHERE customer_id p_customer_id
- Â
- v_customer_id NUMBER 1234
- Â
- BEGIN
- FOR rec IN cust_cur (v_customer_id) LOOP
- INSERT INTO sales_hist (customer_id,
detail_id, process_date) - VALUES (v_customer_id, rec.sales_id,
sysdate) - END LOOP
- END
- Â
8Context Switching
OUT-BIND
DB
PL/SQL Engine
SQL Engine
IN-BIND
9Single Row Binds
- The Most Expensive Operation by Far is the
Binding - Single Row Binding is SLOW for Large Result Sets
10Bulk Binding
- PL/SQL Bulk Bind Support added in 8i
- IN-Binds
- An Array of Values is Passed to the SQL Engine
- OUT-Binds
- SQL Engine Populates a PL/SQL Bind-Array
- Context Switch Once per Batch Instead of Once per
Row - Performance Increase of Up to 15 Times
11Bulk Operators
- BULK COLLECT
- Specifies that Bulk Fetches Should be Used
- Be Careful to Handle Last Batch
- LIMIT
- Defines the Batch Size for Bulk Collections
- FORALL
- Bulk DML Operator
- Not a Looping Construct like a Cursor-For-Loop
- PL/SQL Table is Referenced in the Statement
12DECLARE TYPE sales_t IS TABLE OF
f_sales_detail.sales_idTYPE INDEX BY
BINARY_INTEGER Â sales_ids sales_t
v_customer_id NUMBER 1234
max_rows CONSTANT NUMBER 10000 Â CURSOR
sales(p_customer_id NUMBER) IS SELECT
sales_id FROM f_sales_detail
WHERE customer_id p_customer_id  BEGIN OPEN
sales(v_customer_id) LOOP EXIT WHEN
salesNOTFOUND FETCH sales BULK COLLECT INTO
sales_ids LIMIT max_rows FORALL i IN
1..sales_ids.COUNT INSERT INTO sales_hist
(customer_id, detail_id, process_date)
VALUES (v_customer_id, sales_ids(i),
sysdate) Â END LOOP CLOSE sales END
13Native Compilation
- Allows PL/SQL to be Executed as a Compiled C
Program - Requires Native C Compiler on Host
- Enabling
- Set init.ora PLSQL_ Parameters
- Compile as Native Code
- PL/SQL is First Compiled Down to P-Code
- C Source Code is Generated from the P-Code
- Native Compiler is Invoked Creating a C Shared
Object Library - Subsequent Calls to PL/SQL Object are Run by the
C Library
14Native Compilation
- Performance
- Language Execution Speed is About Five Times
Faster when not Interacting with the Database - In Typical Code Interacting with Larger Data
Volumes Execution Speed is Very Similar to
Interpreted Code - Remember that Variable Binding can be a Bigger
Factor than Code Execution Speed - Mixing Native and Interpreted PL/SQL
- Oracle Recommends an All or None Approach for
Production - Including User-Defined and Supplied Packages
15Collection Types
- Associative Arrays (PL/SQL Tables)
- PL/SQL Type Only
- Nested Tables
- Shared Type
- Varrays
- Shared Type
16Associative Arrays
- PL/SQL Type Only
- Not a SQL Type
- Easy to Use
- Automatic Element Allocation
- No Need to Initialize
- Two Kinds in 9i Release 2
- INDEX BY BINARY_INTEGER
- INDEX BY VARCHAR2
- Similar to
- Java Hashtables
- Perl and Awk Associative Arrays
17Associative Arrays
- DECLARE
- TYPE hash_table_t IS TABLE OF NUMBER
- INDEX BY VARCHAR2(30)
- email_map hash_table_t
- CURSOR users IS
- SELECT username, user_id
- FROM dba_users
- BEGIN
- FOR user IN users LOOP
- email_map(user.username) user.user_id
- END LOOP
- END
18Multi-Dimensional Arrays
- New in 9i Release 1
- Implemented as Collection of Collections
- DECLARE
- TYPE element IS TABLE OF NUMBER
- INDEX BY BINARY_INTEGER
- TYPE twoDimensional IS TABLE OF element
- INDEX BY BINARY_INTEGER
- twoD twoDimensional
- Â
- BEGIN
- twoD(1)(1) 123
- twoD(1)(2) 456
- END
19Nested Tables
- No Maximum Size
- Harder to Use than Associative Arrays
- Need to be Initialized
- Code Must Explicitly Allocate New Elements
- Shared Type with SQL
- Two Options for Type Definition
- Local PL/SQL Definition
- Global SQL Type Declared in the Database
- Allows Variables to be Shared Between Both
Environments
20Nested Tables
- PL/SQL Scoped Type
- DECLARE
- TYPE nest_tab_t IS TABLE OF NUMBER
- nt nest_tab_t
nest_tab_t() -
- BEGIN
- FOR i IN 1..100 LOOP
- nt.EXTEND
- nt(i) i
- END LOOP
- END
21Nested Tables
- Globally Defined in SQL
- CREATE OR REPLACE TYPE email_demo_obj_t AS
OBJECT - ( email_id NUMBER,
- demo_code NUMBER,
- value VARCHAR2(30) )
- /
- Â
- CREATE OR REPLACE TYPE email_demo_nt_t AS
TABLE OF email_demo_obj_t - /
22Nested Tables
- SQL-Defined Nested Tables
- PL/SQL Variables can be Manipulated by the SQL
Engine - Local PL/SQL Variables Can Be
- Sorted
- Aggregated
- Used for Dynamic In-Lists
- Joined With SQL Tables
- Joined with Other PL/SQL Nested Tables
23Table Functions
- Nested Tables Enable Table Functions
- SELECT FROM
- TABLE( CAST(eml_dmo_nt AS
email_demo_nt_t) ) - TABLE Operator
- Tells Oracle to Treat the Variable as a SQL Table
- CAST Operator
- Explicitly Tells Oracle the Data Type to be Used
to Handle the Operation
24Table Function Example
DECLARE eml_dmo_nt
email_demo_nt_t email_demo_nt_t() Â
BEGIN -- Some logic that populates the
nested table eml_dmo_nt.EXTEND(3)
eml_dmo_nt(1) email_demo_obj_t(45, 3,
'23') eml_dmo_nt(2)
email_demo_obj_t(22, 3, '41')
eml_dmo_nt(3) email_demo_obj_t(18, 7,
'over_100k') -- Process the data in
assending order of email id. FOR r IN
(SELECT FROM
TABLE(CAST(eml_dmo_nt AS email_demo_nt_t))
ORDER BY 1) LOOP
dbms_output.put_line(r.email_id ' '
r.demo_id) END LOOP END
25Returning Result Sets
- Returning Collections Directly
- Return the Data Structure Itself
- Returning Reference Cursors
- Returns and Open Cursor to an Application
- Doesnt Return Data from PL/SQL Directly
- Calling a Table Function from the SQL Context
- Convert Function Return Value into a Cursor
26Returning Collections
- Return a Collection Type Explicitly
- Best Suited for PL/SQL Calling Programs
- FUNCTION get_email_demo(p_email_id NUMBER)
- RETURN email_demo_nt_t IS
- eml_dmo email_demo_nt_t
- BEGIN
- SELECT email_demo_obj_t(email_id,
demo_id, value) - BULK COLLECT INTO eml_dmo
- FROM email_demographic
- WHERE email_id p_email_id
- Â
- -- Apply some business logic on the
nested table here. - RETURN eml_dmo
- END
27Table Functions
- Can be Used in a SQL Context Too
- A Table Function Takes a Collection Type as an
Argument - A Function that Returns a Collection Works Too
- Allows us to Pass Out PL/SQL Collections as a
Cursor to any Host Language - SELECT FROM
- TABLE( CAST( get_email_demo(45) AS
email_demo_nt_t))
28Table Functions
- Data is Buffered in the Local Variable During
Function Execution - Cursor Returns Rows after Function Completes
- Private Memory Issues if the Result Set is Large
- Need a Way to Stream Results
- 9i Pipelined Table Functions
- Provides a Streaming Interface
- Rows are Returned as they are Produced
- Rows are Actually Buffered in Small Batches
- Remember Bulk Binding Issue?
- Can be Run in Parallel
- PIPELINED Keyword
- PIPE ROW Operator
29Pipelined Table Function
FUNCTION get_email_demo RETURN email_demo_nt_t
PIPELINED IS CURSOR email_demo_cur IS
SELECT email_demo_obj_t(email_id, demo_id,
value) FROM email_demographic Â
eml_dmo_nt email_demo_nt_t  BEGIN OPEN
email_demo_cur LOOP EXIT WHEN
email_demo_curNOTFOUND FETCH
email_demo_cur BULK COLLECT INTO eml_dmo_nt LIMIT
1000 Â FOR i IN 1..eml_dmo_nt.COUNT
LOOP / Apply some business logic
on the object here, and return a row. /
PIPE ROW (eml_dmo_nt(i)) END
LOOP Â END LOOP RETURN END
30External Tables
- One Last Piece of Background Information
- Oracle 9i External Tables
- Provides a Way for Oracle to Read Directly from
Flat Files on the Database Server - File can be Queried as if it is a Real Database
Table - Can Sort, Aggregate, Filter Rows, etc.
- External File Can be Queried in Parallel
- Only Table Definition is Stored in the Database
- Data is External
- Table Definition is Similar to SQLLoader Control
File
31External Tables
CREATE TABLE ext_tab (email
VARCHAR2(50), age NUMBER,
income VARCHAR2(20)) ORGANIZATION
EXTERNAL ( TYPE oracle_loader
DEFAULT DIRECTORY data_dir ACCESS PARAMETERS
(RECORDS DELIMITED BY NEWLINE LOGFILE
data_dir 'ext_tab.log' BADFILE
data_dir 'ext_tab.bad' FIELDS TERMINATED
BY ',' MISSING FIELD VALUES ARE NULL
(email CHAR(50), age
INTEGER EXTERNAL(2), income
CHAR(20) ) ) LOCATION ('ext_tab.dat') )
REJECT LIMIT UNLIMITED
32ETL Example
- Normalize, Encode and Pivot Input Record
EMAIL AGE INCOME
John.Doe_at_excite.com 58 over_100k
EMAIL_ID DEMO_CODE VALUE
2345 3 58
2345 7 over_100k
33PACKAGE BODY etl IS TYPE hash_table_t IS
TABLE OF NUMBER INDEX BY VARCHAR2(30)
email_map hash_table_t Â
FUNCTION transform (new_data SYS_REFCURSOR)
RETURN email_demo_nt_t PIPELINED
PARALLEL_ENABLE ( PARTITION new_data BY ANY )
IS Â TYPE ext_tab_array IS TABLE OF
ext_tabROWTYPE INDEX BY BINARY_INTEGER Â
indata ext_tab_array
email_demo_obj email_demo_obj_t
email_demo_obj_t(null,null,null)
demo_map hash_table_t  BEGIN
LOOP EXIT WHEN new_dataNOTFOUND
FETCH new_data BULK COLLECT INTO indata
LIMIT 1000 FOR i IN 1..indata.COUNT
LOOP email_demo_obj.email_id
email_map(indata(i).email)
email_demo_obj.demo_code 3
email_demo_obj.value
indata(i).age PIPE ROW
(email_demo_obj)
email_demo_obj.demo_code 7
email_demo_obj.value
indata(i).income PIPE ROW
(email_demo_obj) END LOOP
END LOOP RETURN END BEGIN FOR
email IN (SELECT email_id, email FROM email)
LOOP email_map(email.email)
email.email_id END LOOP END
34Oracle 9i ETL
- Transformation is Just a Simple INSERT as SELECT
- Elegant Solution to Parallel, Transactional
Co-processing - INSERT / append nologging / INTO
email_demographic - (SELECT / parallel( a, 4 ) /
- FROM TABLE(
- CAST( etl.transform( CURSOR(SELECT FROM
ext_tab )) AS email_demo_nt_t)) a)
35Parallel Co-processing
Extract
Transform
Load
PL/SQL
PQ Slave
Input File
PL/SQL
PQ Slave
DB
INSERT
PL/SQL
PQ Slave
PL/SQL
PQ Slave
36Performance Issues
- Speed is Respectable but There is a Performance
Bottleneck with the Table Function Mechanism - Possibly an Issue Binding Data Back from the SQL
Engine - Throughput is about Three Times Slower than
Coding with BULK COLLECT and FORALL Operators - However These Dont Support Parallel Operations
- Oracle Expects to have it Fixed in Next Release
37ETL Alternatives
- The Multi-Table INSERT Statement
- New in 9i
- Each Sub-Query Input Row Can be INSERTed to a
Different Table - or the Same Table Multiple Times
- Faster than Using PL/SQL
- Its Always Faster to do Something in Pure SQL
than Using Any Host Language - Binding is Avoided
38Multi-Table Insert
INSERT / append nologging / ALL INTO
email_demographic (email_id, demo_id,
value) VALUES (email_id, 3, age) INTO
email_demographic (email_id, demo_id,
value) VALUES (email_id, 7, income)
(SELECT / ordered index( b ) / b.email_id,
a.income, a.age FROM ext_tab a, email b
WHERE a.email b.email)
39SQL-Only Processing
Extract
Transform
Load
PQ Slave
Input File
PQ Slave
DB
INSERT
PQ Slave
PQ Slave
40Performance Solutions
- Minimize SQL Execution Time
- Exploiting Caching to Eliminate Some SQL Look-ups
and Joins - Direct Path Inserts
- Code Logic Execution Time
- Replacing Interpreted PL/SQL with Native
Compilation - Eliminating Host Language Using Multi-Table
INSERTS - Variable Binding
- Replace Single Row Binds with Bulk Binds
41Conclusion
- Oracle 9i ETL is a High Performance ETL Solution
- Especially Once the Table Function Issue is
Resolved - Already Included in the Cost of the RDBMS