Advanced%20PL/SQL%20and%20Oracle%20ETL - PowerPoint PPT Presentation

About This Presentation
Title:

Advanced%20PL/SQL%20and%20Oracle%20ETL

Description:

Transformation is Just a Simple INSERT as SELECT ... The Multi-Table INSERT Statement. New in 9i. Each Sub-Query Input Row Can be INSERT'ed to a Different Table ... – PowerPoint PPT presentation

Number of Views:816
Avg rating:3.0/5.0
Slides: 42
Provided by: timg158
Category:

less

Transcript and Presenter's Notes

Title: Advanced%20PL/SQL%20and%20Oracle%20ETL


1
Open World 2003
  • Advanced PL/SQL and Oracle ETL
  • Doug Cosman
  • Senior Oracle DBA
  • SageLogix, Inc.
  • doug_cosman_at_yahoo.com

2
Agenda
  • Overview of Oracle 9i ETL
  • Provides Fast Transformations Using Only the 9i
    DB
  • Advanced PL/SQL Features Necessary for
    Understanding Oracle 9i ETL
  • PL/SQL Performance Techniques for Data Warehouse
    Environments

3
What Is ETL?
  • Extract
  • Pull the Data From the Source
  • Transform
  • Convert the Input Format to the Target Format
  • Encode any Values
  • Load
  • Insert the Transformed Data to the Target Tables

4
Oracle 9i ETL
  • Extract
  • Oracle 9i External Tables
  • Transform
  • PL/SQL Pipelined Table Functions
  • Oracle Warehouse Builder Can also be Used to
    Build Pipelined Table Functions
  • Maps Source Data Layout and Target Schema and
    Builds PL/SQL and SQL Code
  • Load
  • Direct Path Inserts

5
Performance Factors
  • SQL Execution Time
  • Efficiency of Execution Plan
  • Hardware Resource Waits
  • Code Logic Execution Time
  • Speed of Host Language
  • Variable Binding
  • Time to Bind Values Back to Host Language

6
PL/SQL Binding
  • Types of Binds
  • IN-Binds
  • Bind Values From Host Language to SQL Engine
  • OUT-Binds
  • Values are Returned from SQL Objects to Host
    Variables
  • Bind Options
  • Single Row Binds
  • Bulk Binds

7
Single Row Binds
  • Cursor FOR-LOOP
  • DECLARE
  • CURSOR cust_cur (p_customer_id NUMBER) IS
  • SELECT FROM f_sales_detail
  • WHERE customer_id p_customer_id
  •  
  • v_customer_id NUMBER 1234
  •  
  • BEGIN
  • FOR rec IN cust_cur (v_customer_id) LOOP
  • INSERT INTO sales_hist (customer_id,
    detail_id, process_date)
  • VALUES (v_customer_id, rec.sales_id,
    sysdate)
  • END LOOP
  • END
  •  

8
Context Switching
OUT-BIND
DB
PL/SQL Engine
SQL Engine
IN-BIND
9
Single Row Binds
  • The Most Expensive Operation by Far is the
    Binding
  • Single Row Binding is SLOW for Large Result Sets

10
Bulk Binding
  • PL/SQL Bulk Bind Support added in 8i
  • IN-Binds
  • An Array of Values is Passed to the SQL Engine
  • OUT-Binds
  • SQL Engine Populates a PL/SQL Bind-Array
  • Context Switch Once per Batch Instead of Once per
    Row
  • Performance Increase of Up to 15 Times

11
Bulk Operators
  • BULK COLLECT
  • Specifies that Bulk Fetches Should be Used
  • Be Careful to Handle Last Batch
  • LIMIT
  • Defines the Batch Size for Bulk Collections
  • FORALL
  • Bulk DML Operator
  • Not a Looping Construct like a Cursor-For-Loop
  • PL/SQL Table is Referenced in the Statement

12
DECLARE TYPE sales_t IS TABLE OF
f_sales_detail.sales_idTYPE INDEX BY
BINARY_INTEGER   sales_ids sales_t
v_customer_id NUMBER 1234
max_rows CONSTANT NUMBER 10000   CURSOR
sales(p_customer_id NUMBER) IS SELECT
sales_id FROM f_sales_detail
WHERE customer_id p_customer_id   BEGIN OPEN
sales(v_customer_id) LOOP EXIT WHEN
salesNOTFOUND FETCH sales BULK COLLECT INTO
sales_ids LIMIT max_rows FORALL i IN
1..sales_ids.COUNT INSERT INTO sales_hist
(customer_id, detail_id, process_date)
VALUES (v_customer_id, sales_ids(i),
sysdate)   END LOOP CLOSE sales END
13
Native Compilation
  • Allows PL/SQL to be Executed as a Compiled C
    Program
  • Requires Native C Compiler on Host
  • Enabling
  • Set init.ora PLSQL_ Parameters
  • Compile as Native Code
  • PL/SQL is First Compiled Down to P-Code
  • C Source Code is Generated from the P-Code
  • Native Compiler is Invoked Creating a C Shared
    Object Library
  • Subsequent Calls to PL/SQL Object are Run by the
    C Library

14
Native Compilation
  • Performance
  • Language Execution Speed is About Five Times
    Faster when not Interacting with the Database
  • In Typical Code Interacting with Larger Data
    Volumes Execution Speed is Very Similar to
    Interpreted Code
  • Remember that Variable Binding can be a Bigger
    Factor than Code Execution Speed
  • Mixing Native and Interpreted PL/SQL
  • Oracle Recommends an All or None Approach for
    Production
  • Including User-Defined and Supplied Packages

15
Collection Types
  • Associative Arrays (PL/SQL Tables)
  • PL/SQL Type Only
  • Nested Tables
  • Shared Type
  • Varrays
  • Shared Type

16
Associative Arrays
  • PL/SQL Type Only
  • Not a SQL Type
  • Easy to Use
  • Automatic Element Allocation
  • No Need to Initialize
  • Two Kinds in 9i Release 2
  • INDEX BY BINARY_INTEGER
  • INDEX BY VARCHAR2
  • Similar to
  • Java Hashtables
  • Perl and Awk Associative Arrays

17
Associative Arrays
  • DECLARE
  • TYPE hash_table_t IS TABLE OF NUMBER
  • INDEX BY VARCHAR2(30)
  • email_map hash_table_t
  • CURSOR users IS
  • SELECT username, user_id
  • FROM dba_users
  • BEGIN
  • FOR user IN users LOOP
  • email_map(user.username) user.user_id
  • END LOOP
  • END

18
Multi-Dimensional Arrays
  • New in 9i Release 1
  • Implemented as Collection of Collections
  • DECLARE
  • TYPE element IS TABLE OF NUMBER
  • INDEX BY BINARY_INTEGER
  • TYPE twoDimensional IS TABLE OF element
  • INDEX BY BINARY_INTEGER
  • twoD twoDimensional
  •  
  • BEGIN
  • twoD(1)(1) 123
  • twoD(1)(2) 456
  • END

19
Nested Tables
  • No Maximum Size
  • Harder to Use than Associative Arrays
  • Need to be Initialized
  • Code Must Explicitly Allocate New Elements
  • Shared Type with SQL
  • Two Options for Type Definition
  • Local PL/SQL Definition
  • Global SQL Type Declared in the Database
  • Allows Variables to be Shared Between Both
    Environments

20
Nested Tables
  • PL/SQL Scoped Type
  • DECLARE
  • TYPE nest_tab_t IS TABLE OF NUMBER
  • nt nest_tab_t
    nest_tab_t()
  • BEGIN
  • FOR i IN 1..100 LOOP
  • nt.EXTEND
  • nt(i) i
  • END LOOP
  • END

21
Nested Tables
  • Globally Defined in SQL
  • CREATE OR REPLACE TYPE email_demo_obj_t AS
    OBJECT
  • ( email_id NUMBER,
  • demo_code NUMBER,
  • value VARCHAR2(30) )
  • /
  •  
  • CREATE OR REPLACE TYPE email_demo_nt_t AS
    TABLE OF email_demo_obj_t
  • /

22
Nested Tables
  • SQL-Defined Nested Tables
  • PL/SQL Variables can be Manipulated by the SQL
    Engine
  • Local PL/SQL Variables Can Be
  • Sorted
  • Aggregated
  • Used for Dynamic In-Lists
  • Joined With SQL Tables
  • Joined with Other PL/SQL Nested Tables

23
Table Functions
  • Nested Tables Enable Table Functions
  • SELECT FROM
  • TABLE( CAST(eml_dmo_nt AS
    email_demo_nt_t) )
  • TABLE Operator
  • Tells Oracle to Treat the Variable as a SQL Table
  • CAST Operator
  • Explicitly Tells Oracle the Data Type to be Used
    to Handle the Operation

24
Table Function Example
DECLARE eml_dmo_nt
email_demo_nt_t email_demo_nt_t()  
BEGIN -- Some logic that populates the
nested table eml_dmo_nt.EXTEND(3)
eml_dmo_nt(1) email_demo_obj_t(45, 3,
'23') eml_dmo_nt(2)
email_demo_obj_t(22, 3, '41')
eml_dmo_nt(3) email_demo_obj_t(18, 7,
'over_100k') -- Process the data in
assending order of email id. FOR r IN
(SELECT FROM
TABLE(CAST(eml_dmo_nt AS email_demo_nt_t))
ORDER BY 1) LOOP
dbms_output.put_line(r.email_id ' '
r.demo_id) END LOOP END
25
Returning Result Sets
  • Returning Collections Directly
  • Return the Data Structure Itself
  • Returning Reference Cursors
  • Returns and Open Cursor to an Application
  • Doesnt Return Data from PL/SQL Directly
  • Calling a Table Function from the SQL Context
  • Convert Function Return Value into a Cursor

26
Returning Collections
  • Return a Collection Type Explicitly
  • Best Suited for PL/SQL Calling Programs
  • FUNCTION get_email_demo(p_email_id NUMBER)
  • RETURN email_demo_nt_t IS
  • eml_dmo email_demo_nt_t
  • BEGIN
  • SELECT email_demo_obj_t(email_id,
    demo_id, value)
  • BULK COLLECT INTO eml_dmo
  • FROM email_demographic
  • WHERE email_id p_email_id
  •  
  • -- Apply some business logic on the
    nested table here.
  • RETURN eml_dmo
  • END

27
Table Functions
  • Can be Used in a SQL Context Too
  • A Table Function Takes a Collection Type as an
    Argument
  • A Function that Returns a Collection Works Too
  • Allows us to Pass Out PL/SQL Collections as a
    Cursor to any Host Language
  • SELECT FROM
  • TABLE( CAST( get_email_demo(45) AS
    email_demo_nt_t))

28
Table Functions
  • Data is Buffered in the Local Variable During
    Function Execution
  • Cursor Returns Rows after Function Completes
  • Private Memory Issues if the Result Set is Large
  • Need a Way to Stream Results
  • 9i Pipelined Table Functions
  • Provides a Streaming Interface
  • Rows are Returned as they are Produced
  • Rows are Actually Buffered in Small Batches
  • Remember Bulk Binding Issue?
  • Can be Run in Parallel
  • PIPELINED Keyword
  • PIPE ROW Operator

29
Pipelined Table Function
FUNCTION get_email_demo RETURN email_demo_nt_t
PIPELINED IS CURSOR email_demo_cur IS
SELECT email_demo_obj_t(email_id, demo_id,
value) FROM email_demographic  
eml_dmo_nt email_demo_nt_t   BEGIN OPEN
email_demo_cur LOOP EXIT WHEN
email_demo_curNOTFOUND FETCH
email_demo_cur BULK COLLECT INTO eml_dmo_nt LIMIT
1000   FOR i IN 1..eml_dmo_nt.COUNT
LOOP / Apply some business logic
on the object here, and return a row. /
PIPE ROW (eml_dmo_nt(i)) END
LOOP   END LOOP RETURN END
30
External Tables
  • One Last Piece of Background Information
  • Oracle 9i External Tables
  • Provides a Way for Oracle to Read Directly from
    Flat Files on the Database Server
  • File can be Queried as if it is a Real Database
    Table
  • Can Sort, Aggregate, Filter Rows, etc.
  • External File Can be Queried in Parallel
  • Only Table Definition is Stored in the Database
  • Data is External
  • Table Definition is Similar to SQLLoader Control
    File

31
External Tables
CREATE TABLE ext_tab (email
VARCHAR2(50), age NUMBER,
income VARCHAR2(20)) ORGANIZATION
EXTERNAL ( TYPE oracle_loader
DEFAULT DIRECTORY data_dir ACCESS PARAMETERS
(RECORDS DELIMITED BY NEWLINE LOGFILE
data_dir 'ext_tab.log' BADFILE
data_dir 'ext_tab.bad' FIELDS TERMINATED
BY ',' MISSING FIELD VALUES ARE NULL
(email CHAR(50), age
INTEGER EXTERNAL(2), income
CHAR(20) ) ) LOCATION ('ext_tab.dat') )
REJECT LIMIT UNLIMITED
32
ETL Example
  • Normalize, Encode and Pivot Input Record

EMAIL AGE INCOME
John.Doe_at_excite.com 58 over_100k
EMAIL_ID DEMO_CODE VALUE
2345 3 58
2345 7 over_100k
33
PACKAGE BODY etl IS TYPE hash_table_t IS
TABLE OF NUMBER INDEX BY VARCHAR2(30)
email_map hash_table_t  
FUNCTION transform (new_data SYS_REFCURSOR)
RETURN email_demo_nt_t PIPELINED
PARALLEL_ENABLE ( PARTITION new_data BY ANY )
IS   TYPE ext_tab_array IS TABLE OF
ext_tabROWTYPE INDEX BY BINARY_INTEGER  
indata ext_tab_array
email_demo_obj email_demo_obj_t
email_demo_obj_t(null,null,null)
demo_map hash_table_t   BEGIN
LOOP EXIT WHEN new_dataNOTFOUND
FETCH new_data BULK COLLECT INTO indata
LIMIT 1000 FOR i IN 1..indata.COUNT
LOOP email_demo_obj.email_id
email_map(indata(i).email)
email_demo_obj.demo_code 3
email_demo_obj.value
indata(i).age PIPE ROW
(email_demo_obj)
email_demo_obj.demo_code 7
email_demo_obj.value
indata(i).income PIPE ROW
(email_demo_obj) END LOOP
END LOOP RETURN END BEGIN FOR
email IN (SELECT email_id, email FROM email)
LOOP email_map(email.email)
email.email_id END LOOP END
34
Oracle 9i ETL
  • Transformation is Just a Simple INSERT as SELECT
  • Elegant Solution to Parallel, Transactional
    Co-processing
  • INSERT / append nologging / INTO
    email_demographic
  • (SELECT / parallel( a, 4 ) /
  • FROM TABLE(
  • CAST( etl.transform( CURSOR(SELECT FROM
    ext_tab )) AS email_demo_nt_t)) a)

35
Parallel Co-processing
Extract
Transform
Load
PL/SQL
PQ Slave
Input File
PL/SQL
PQ Slave
DB
INSERT
PL/SQL
PQ Slave
PL/SQL
PQ Slave
36
Performance Issues
  • Speed is Respectable but There is a Performance
    Bottleneck with the Table Function Mechanism
  • Possibly an Issue Binding Data Back from the SQL
    Engine
  • Throughput is about Three Times Slower than
    Coding with BULK COLLECT and FORALL Operators
  • However These Dont Support Parallel Operations
  • Oracle Expects to have it Fixed in Next Release

37
ETL Alternatives
  • The Multi-Table INSERT Statement
  • New in 9i
  • Each Sub-Query Input Row Can be INSERTed to a
    Different Table
  • or the Same Table Multiple Times
  • Faster than Using PL/SQL
  • Its Always Faster to do Something in Pure SQL
    than Using Any Host Language
  • Binding is Avoided

38
Multi-Table Insert
INSERT / append nologging / ALL INTO
email_demographic (email_id, demo_id,
value) VALUES (email_id, 3, age) INTO
email_demographic (email_id, demo_id,
value) VALUES (email_id, 7, income)
(SELECT / ordered index( b ) / b.email_id,
a.income, a.age FROM ext_tab a, email b
WHERE a.email b.email)
39
SQL-Only Processing
Extract
Transform
Load
PQ Slave
Input File
PQ Slave
DB
INSERT
PQ Slave
PQ Slave
40
Performance Solutions
  • Minimize SQL Execution Time
  • Exploiting Caching to Eliminate Some SQL Look-ups
    and Joins
  • Direct Path Inserts
  • Code Logic Execution Time
  • Replacing Interpreted PL/SQL with Native
    Compilation
  • Eliminating Host Language Using Multi-Table
    INSERTS
  • Variable Binding
  • Replace Single Row Binds with Bulk Binds

41
Conclusion
  • Oracle 9i ETL is a High Performance ETL Solution
  • Especially Once the Table Function Issue is
    Resolved
  • Already Included in the Cost of the RDBMS
Write a Comment
User Comments (0)
About PowerShow.com