Explain DB2 EXPLAIN Explanations - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Explain DB2 EXPLAIN Explanations

Description:

22 years marriage with DB2 since V1R1M0. Developer, DBA, Presales, Technical Manager, Presenter ... Host variables will be parsed out, so that you may test your SQL ... – PowerPoint PPT presentation

Number of Views:2689
Avg rating:3.0/5.0
Slides: 47
Provided by: Ca66
Category:

less

Transcript and Presenter's Notes

Title: Explain DB2 EXPLAIN Explanations


1
Explain DB2 EXPLAIN Explanations
2
Introduction
  • Who are you ?
  • Job function Application Developer / DBA
  • DB2 experience
  • Who am I ?
  • Steen Rasmussen, CA inc.
  • Principal Technical Specialist DB2 Tools
  • 22 years marriage with DB2 since V1R1M0
  • Developer, DBA, Presales, Technical Manager,
    Presenter

3
DISCLAIMER
  • DB2 Performance is a huge topic
  • You will not become performance specialists today
    ?
  • This presentation is to be considered an
    appetizer for more detailed presentations
  • Understand the fundamentals of Explain
  • A quick start to decrypt Explain output
  • What to focus on

4
Agenda the BIG one (if you like the
agenda and Roberta I will be back)
  • What is EXPLAIN and why performance is important
  • What influences EXPLAIN output (Access Path
    Selection)
  • EXPLAIN prerequisites
  • Explain table(s)
  • How to create the tables
  • History and backward compatibility
  • How can EXPLAIN be executed
  • Access Path introduction
  • What is Explain
  • OPTIMIZER overview
  • Explain Output Explain EXPLAIN

5
What can you do to save DB2 CPU (and
improve performance)
  • How to build SQL statements
  • Static ltgt Dynamic SQL differences
  • What if DB2 decides the wrong Access Path
  • OPTHINT
  • REOPT
  • RUNSTATS overview
  • Help to the Optimizer
  • Filter Factors

6
Hvad kan du gore for at spare CPU (og
forbedre performance)
  • DB2 Locking
  • Lock types
  • Recommendations
  • SQL Index design
  • STAGE1 and STAGE2 predicates does it matter and
    what is it in reality
  • How do you find tuning potential the bad
    SQLs
  • Benchmarking
  • What is it
  • How can we learn from our experiences

7
Some terms to understand when digging into DB2
Performance issues
  • RI
  • Buffer Pool (BP)
  • RID pool List Prefetch
  • Parallelism DEGREE()
  • Access Path (AP)
  • Sargable STAGE1 STAGE2
  • Getpage - GETP
  • Optimizer
  • Explain
  • Sequential Prefetch (detection)
  • Deadlock / Timeout
  • Lock Avoidance
  • Lock, Latch and Claim
  • Constraints
  • Filter Factor Cardinality
  • Correlated query
  • Firstkeycardf og Fullkeycardf

8
What is EXPLAINOptimizeroverview
9
What is EXPLAIN
  • Illustrates how DB2s Optimizer will execute a
    SQL statement, a Package or a Plan (depending on
    how EXPLAIN is executed)
  • Why is EXPLAIN a necessity
  • Cant we simply look at the SQL-statement and
    estimate if it has been coded all right
  • What if its a 12 tabeller joines
  • Or the statement is 2 MB (or just 24 KB)
  • Does performance mean anything if a statement
    executes in half a second or 5 minutes ?
  • Predict WHAT will happen when SQL changes or at a
    Package REBIND after a reorganization and
    Runstats
  • The latest example how will upgrading to DB2 V8
    impact Access Path (compare DB2 V7 Optimizer with
    DB2 V8)
  • Do you always know whether a JOIN or
    SUBSELECT/EXISTS provides the best performance ??

Lets see a cool example
10
What is EXPLAIN
  • Why is Performance so important
  • Bad performing SQL costs
  • One benchmark illustrates
  • It costs 30 to correct bad SQL in test
  • It costs 1000 to correct in production
  • If response times are not optimal
  • Fewer transactions will go through the pipe
  • Each end user will be less productive
  • Other SQL-statements will suffer due to sharing
    of resources
  • Buffer Pools, I/O channels, Locking conflicts,
    contention in shared pools like SORT area, RID
    pool, . . . .
  • Hardware upgrade to conform to SLA

11
Whats influencing the EXPLAIN output
  • Some factors to consider when comparing Explain
    between two different enviroments
  • Optimizer looks at Hardware type
  • Optimizer looks at number of processors
  • Optimizer looks at Buffer Pools
  • Host variables will be replaced by Parameter
    Markers when doing dynamic Explain this could
    be a major problem in earlier DB2 versions (pre
    DB2 V8) if host variables (or column predicate)
    was defined differently than the column defined
    in the DB2 catalog

12
Whats influencing the EXPLAIN output
  • Table size (and compression)
  • Column cardinality and Filter Factor
  • Indexes present and the columns
  • FIRSTKEYCARDF and FULLKEYCARDF (details later)
  • Different RUNSTATS methods to collect statistics
  • Clustering (Cluster Ratio) as well as clustered
    indicator
  • Number of Index Levels (NLEVELS)
  • SQL predicates (predicate analysis)
  • ORDER BY and the ability to eliminate sorts
  • . . . . . . . . . .

13
EXPLAIN OPTIMIZER overview
  • What is the purpose of the Optimizer
  • Decides how the database navigates
  • Parsing SQL statements to check tables and
    columns
  • Investigates statistics from the DB2 catalog
    (which can be updated by RUNSTATS utility or
    manual)
  • Decides what is the LEAST expensive access path
  • DB2s Optimizer is COST BASED (opposite from
    Oracle)
  • Finds statistics from the DB2 Catalog
  • Calculates Filter Factor(s) (estimated qualified
    rows)
  • Finds a potential number of Access Paths
  • Calculates the cost based on CPU and I/O cost

14
EXPLAIN OPTIMIZER overview
  • Cost based Optimizer CPU and I/O
  • CPU costs
  • Looks at predicates (STAGE-1 or STAGE-2)
    (this can be covered in another
    session next page is an appetizer )
  • How many PAGEs to look at
  • SORT any kind
  • I/O costs
  • DB2 Catalog statistics (like Clusterratio which
    also can be covered in a separate session)
  • Buffer Pool size
  • What does it cost to allocate temporary files
    etc.
  • A lot of activity spent on looking at which
    indexes exist and how these are matching the
    predicates

15
EXPLAIN OPTIMIZER overview
Result Set
Relational Data Manager
SQL statement
Apply stage 2 predicates and sort data (can be
expensive)
Optimized SQL
Data Manager
Apply stage 1 predicates
Read Buffer or get data (I/O)
Data
Buffer Manager
16
EXPLAIN Pre-Requisites
  • EXPLAIN tables
  • creator.PLAN_TABLE (minimum to do explain)
  • Records Optimizers choice of Access Path
  • Not immediately easy to decrypt many codes
    (see next slide)
  • creator.DSN_STATEMNT_TABLE (optional)
  • Shows COST estimates (this is HUGE in my opinion)
  • creator.DSN_FUNCTION_TABLE (optional)
  • Only used if UDF (User Defined Function) needs to
    be explained
  • Creator.DSN_STATEMENT_CACHE_TABLE (new in DB2 V8)
  • Used to explain DB2 Dynamic Statement Cache
    or parts of this

17
EXPLAIN TABLE content (example)
  • QUERYNO QBLOCKNO APPLNAME PROGNAME PLANNO METHOD
    CREATOR TNAME TABNO
  • 1411 1 RQATD 1 0
    SYSIBM SYSTABLES 1
  • ACCESSTYPE MATCHCOLS ACCESSCREATOR ACCESSNAME
    INDEXONLY
  • I 0 SYSIBM DSNDTX02
    N
  • SORTN_UNIQ SORTN_JOIN SORTN_ORDERBY SORTN_GROUPBY
    SORTC_UNIQ SORTC_JOIN
  • N N N N
    N N
  • SORTC_ORDERBY SORTC_GROUPBY TSLOCKMODE TIMESTAMP
    REMARKS
  • N N IS
    2007073116080531
  • PREFETCH COLUMN_FN_EVAL MIXOPSEQ VERSION
    COLLID
  • 0
    CAD72_2004-08-31-18.24.46 RQPAR110_ALL
  • COLLID ACCESS_DEGREE ACCESS_PGROUP_ID
    JOIN_DEGREE JOIN_PGROUP_ID
  • RQPAR110_ALL ------ ------
    ------ ------
  • SORTC_PGROUP_ID SORTN_PGROUP_ID PARALLELISM_MODE
    MERGE_JOIN_COLS
  • ------ ------ -
    ------
  • CORRELATION_NAME PAGE_RANGE JOIN_TYPE
    GROUP_MEMBER IBM_SERVICE_DATA
  • A YÄ.½T æ ³ INITUK15
  • WHEN_OPTIMIZE QBLOCK_TYPE BIND_TIME
    OPTHINT HINT_USED
  • SELECT 2007-07-31-16.08.03.6151
    84
  • PRIMARY_ACCESSTYPE PARENT_QBLOCKNO TABLE_TYPE

18
EXPLAIN Pre-Requisites
  • How to create the tables for EXPLAIN ?
  • Every new DB2 version adds additional columns
  • DB2 V1R0 had no EXPLAIN (afair)
  • DB2 V1R2 had 25 columns in PLAN_TABLE
  • . . . . . . . . .
  • DB2 V7 has 51 columns in PLAN_TABLE
  • DB2 V8 has 58 columns in PLAN_TABLE
  • DB2 V9 has 59 columns in PLAN_TABLE
  • Backward and forward compatible !!!
  • CREATE TABLE steen.plan_table like . . . . . . .
  • Usually the Systems Programmer creates one as
    part of the IVP
  • Look into IBM SDSNSAMP

19
What EXPLAIN DOESNT show
  • Explain only shows SELECT, DELETE, UPDATE, INSERT
  • Important not to forget the issues below when
    doing performance / tuning
  • RI Definitions
  • TRIGGERs executed as part of the SQL-statement
  • UDFs
  • Table- and Column Check Constraints
  • Not always a guarantee DB2 will USE the
    illustrated AP
  • Prefetch activities can be disabled depending on
    BP status
  • Parallelism is decided at the execution time
  • RID pool shortage

20
How to execute EXPLAIN
  • EXPLAIN
  • Explain Dynamic Statement cache will not be
    covered in detail. Normally used during DBA
    Performance/Tuning
  • Manual type-in via SPUFI
  • BIND eller REBIND PACKAGE med EXPLAIN(YES)

21
How to execute EXPLAIN
  • Manual type in

IQPSQLE1 --- (CAPS ON) --- SQL Editor --- (NULLS
ON) --- 2007/08/03 132554 COMMAND gt
SCROLL
gt CSR
Edit the
SQL statements that you wish to test below. When
you are done editing the SQL, hit PF3/15
or enter 'END' to return to the RC/SQL control
panel. Host variables will be parsed out, so
that you may test your SQL repeatedly with
various host values.

___
explain plan set queryno 190 for
___ select from
sysibm.systables
___ where name ?
___ and
creator 'PTI'
___ with UR

___
___ commit

___
___
select from plan_table where queryno190
___

BOTTOM OF
DATA
22
How to execute EXPLAIN
  • BIND / REBIND and use EXPLAIN(YES)
  • Note you will have to select from PLAN_TABLE to
    view Access path information.
  • Consequences can be less than optimal or even
    career limiting decisions

To Rebind Or Not To Rebind . That is the
QUESTION !
23
How to execute EXPLAIN
  • Edit BIND or REBIND

COMMAND gt
SCROLL gt PAGE

TOP OF DATA
.CALL DSN
PARM(S81A)
.DATA

REBIND PACKAGE(RQPAR110_ALL.RQATD.(CAD72_2004
-08-31-18.24.46)) -
OWNER(RASST02)
- QUALIFIER(RASST02)
-
CURRENTDATA(NO )
-
VALIDATE(RUN ) EXPLAIN(yes)
- ISOLATION(CS)
RELEASE(COMMIT ) -
DEGREE(1 )
-
REOPT(NONE )
- KEEPDYNAMIC(NO )
-
DBPROTOCOL(PRIVATE)
-
ENCODING(EBCDIC )
- IMMEDWRITE(NO )
-
FLAG(I)
.ENDDATA


BOTTOM OF DATA
24
BIND / REBIND with EXPLAIN warning
  • Manual EXPLAIN
  • This is a What if analysis
  • Illustrates WHICH Access Path will be chosen if a
    BIND or REBIND is executed
  • You can check the implications / consequences
  • BIND / REBIND with EXPLAIN(YES)
  • DB2 will generate a NEW Access path !!!!!
  • Is this a good idea ? Maybe !
  • What if the DB2 catalog statistics ISNT optimal
  • No RUNSTATS executed ? statistics columns in the
    catalog have -1 ? Optimizer has NO clue about
    statistics for tables, indexes, columns

25
Versioning of Explain
  • How
  • One extra step at Program promotion
  • Can be integrated with your existing Change
    Management process
  • Execute EXPLAIN
  • Save Explain output in a table
  • Each Explain will generate a new VERSION
  • Save xx versions
  • Create a SQL statement which will flag packages
    where the cost difference is greater than xx
  • SUM(PROCMS) and SUM(PROCSU) per Explained Package
    version and compare to the costs from previous
    Explain
  • Also consider saving key values from the DB2
    catalog used by the Optimizer

26
Versioning of Explain
  • Why
  • The primary reason is quickly to identify WHEN
    and WHY the performance of a program
    increased/decreased
  • It is always possible to verify if performance
    has changes
  • Primarily due to COST or AP changes
  • You can see when the change happened (every
    Explain has a unique timestamp)
  • You can see what the DB2 Catalog statistics were
    at the Explain time

27
Access Path what is good / bad AP
  • No definitive answer exists for this question
    except for
  • IT DEPENDS !
  • Lets look at some issues in order to find the
    correct answer
  • One example One SQL-statement costs 0,050
    CPU-sec while another one costs 2 CPU-minutes
    which one do you want to spend time tuning ?

28
Access Path what is good / bad AP
  • The cheap SQL-statement is executed 100,000
    times per hour in an online transaction, while
    the other is executed once every day in a batch
    job
  • One weeks consumption
  • Online SQL (0,050 x 100000 x 24 x 7) 840000
    CPU-sec.
  • Batch SQL (2 x 60 x 7)
    840 CPU-sec.

29
ExplainEXPLAIN
30
Explain EXPLAIN
  • It will take hours to describe every column
    currently existing in PLAN_TABLE here are a few
    to focus on
  • Lets use a real SQL-statement which has been
    explained to decrypt the PLAN_TABLE columns.
  • Description of all the PLAN_TABLE columns can be
    viewed in this manual
  • IBM DB2 SQL REFERENCE GUIDE (SC18-7426-04)
    http//publib.boulder.ibm.com/epubs/pdf/dsnsqj14.p
    df

31
Explain EXPLAIN
EXPLAIN PLAN SET QUERYNO75 for SELECT A.NAME ,
A.CREATOR , A.TBNAME , A.TBCREATOR , A.CREATEDBY
, B.COLNAME , B.COLSEQ , B.ORDERING ,
A.CLUSTERING , A.UNIQUERULE FROM
SYSIBM.SYSINDEXES A , SYSIBM.SYSKEYS B
WHERE ( A.CREATOR
B.IXCREATOR AND A.NAME B.IXNAME )
AND A.CREATOR hostvar1
ORDER BY
A.NAME , A.CREATOR , B.COLSEQ COMMIT SELECT
FROM PLAN_TABLE WHERE QUERYNO75
QUERYNO QBLOCKNO PROGNAME PLANNO METHOD CREATOR
TNAME ACCESSTYPE MATCHCOLS 75 1
BPASQL8 1 0 SYSIBM SYSINDEXES
I 1 75 1 BPASQL8 2
1 SYSIBM SYSKEYS I 2 75
1 BPASQL8 3 3
0
ACCESSCREATOR ACCESSNAME INDEXONLY SORTN_UNIQ
SORTN_JOIN SORTN_ORDERBY SORTN_GROUPBY SYSIBM
DSNDXX01 N N N
N N SYSIBM
DSNDKX01 N N N
N N
N N N N
N
SORTC_UNIQ SORTC_JOIN SORTC_ORDERBY
SORTC_GROUPBY PREFETCH TSLOCKMODE N
N N N L
IS N N N
N IS N N
Y N
32
Explain EXPLAIN
  • QUERYNO
  • The number specified doing manual Explain or -
    the statement number from the
    package/plan which was explained.
  • For programs this number is used to identify a
    SQL-statement in the source code.
  • In this scenario, the QUERYNO75 which also was
    specified in the manual explain.
  • Did you know you can assign a fixed QUERYNO to
    the SQL statement in your program so its easier
    to compare statements when the program is changed
    ???!!!

33
Explain EXPLAIN
  • QBLOCKNO
  • A number identifying every query block inside the
    SQL statement.
  • A typical example is if a UNION or SUBSELECT is
    present where DB2 will have to execute several
    statements within one SQL statement.
  • In this example the SQL-statement is a simply
    JOIN, which is why we only see QBLOCKNO1

34
Explain EXPLAIN
  • PROGNAME
  • Name of the program which holds the SQL-statement
    being explained
  • Package name when REBIND / BIND of a package is
    executed using EXPLAIN(YES)
  • Using dynamic explain this column will hold the
    name of the program executing the dynamic explain
  • In this scenario a dynamic explain was executed
    using program BPASQL8.

35
Explain EXPLAIN
  • PLANNO
  • Specifies which sequence the individual
    components are executed inside a QBLOCKNO.
  • Is this really interesting and do we need to
    worry about it ?
  • This will be covered in the section dealing with
    FILTER FACTOR this is one of the most
    important informations to pay attention to
  • In this scenario we can see SYSINDEXES is
    accessed first and then SYSKEYS, and finally a
    third component is performed (sorting)

36
Explain EXPLAIN
  • METHOD
  • Describes which form of JOIN being used.
  • 0 First table accessed in this step (QBLOCKNO)
  • 1 NESTED LOOP JOIN
  • 2 MERGE SCAN JOIN
  • 3 Sorting (of some kind)
  • (ORDER, GROUP, DISTINCT, UNION)
  • 4 HYBRID JOIN
  • We can see SYSINDEXES is accessed first, and then
    a NESTED LOOP JOIN with SYSKEYS and finally a
    SORT is performed to satisfy the ORDER BY

37
Explain EXPLAIN
  • CREATOR TNAME
  • Table name and creator for the table accessed
  • If METHOD3 , these columns will be spaces
  • In this scenario two tables are accessed
  • SYSIBM.SYSINDEXES
  • SYSIBM.SYSKEYS

38
Explain EXPLAIN
  • ACCESSTYPE
  • Describes how the table mentioned is accessed.
  • This is a very important information which in
    many performance exercises can be a first shop
    stop (why will be explained)
  • In DB2 9 you can find 17 different methods the
    most important ones will be covered here
  • I Index Access
  • I1 One-Fetch Index scan
  • M Superceeded by MX, MI, MU Multiple Index
    Access (can be excellent or very bad at least
    two indexes from the same table are used)
  • N Index Scan where IN is used
  • R Tablespace Scan if many pages exist for
    this tablespace, this can be extremely expensive
  • In this scenario both table accessed have
    ACCESSTYPEI, so we know indexe(s) are used and
    we dont face a tablespace scan ? ?

39
Explain EXPLAIN
  • MATCHCOLS
  • Illustrates how many columns being used in the
    index listed under ACCESSNAME.
  • Used for ACCESSTYPE I , I1 , N and MX
  • The value can be
  • o The entire index is scanned warning !!!!!
  • gt0 The number of columns being used from the
    index listed prior to Tablespace access (unless
    Index Only).
  • If the index has 4 columns and MATCHCOLS 1 ,
    this can lead to non-optimal response times
    depending on column cardinality.
  • Assume the table has 1.000.000 rows
  • The index has 4 columns
  • The first column has 2 distinct values
    (cardinality0.5)
  • DB2 will guess half the table rows qualify ?
    TS-scan

40
Explain EXPLAIN
  • ACCESSNAME and ACCESSCREATOR
  • If an index is used to access the table, these
    columns illustrates which index(es) being used.
  • Use your common sense and think about the index
    listed is it the best one based on the
    predicates used in the WHERE component of the
    SQL-statement.
  • Also think about this information in conjunction
    with MATCHCOLS if more columns exist in the
    index compared to what is described in MATCHCOLS
    maybe an additional pedicate can improve
    performance.
  • In our scenario an index is used to access both
    tables specified in our JOIN statement.

41
Explain EXPLAIN
  • INDEXONLY
  • Describes if DB2 is satisfied by ONLY scanning
    the index listed under ACCESSNAME without
    looking into the data piece (tablespace).
  • The value can be Y(es) or N(o)
  • Often it may pay off to have an additional column
    in the index to avoid the tablespace access
    (indexonly), and maybe one I/O can be spared at
    in every statement (the most expensive in the DB2
    world). Before making this decision please have
    a closer look at index
    complications advantages and disadvantages
  • Our JOIN scenario is using two indexes, and both
    have INDEXONLYN , meaning what is being
    selected can NOT be satisfied by the columns in
    the 2 indexes used.

42
Explain EXPLAIN
  • SORTN_xxxxxx hvor xxxxxx
  • UNIQUE Must the internal table ne sorted to
    remove duplicates
  • JOIN Is it necessary to sort due to METHOD 2
    or 4 ?
  • ORDERBY Must the internal table be sorted due
    to ORDER BY
  • GROUPBY Must the internal table be sorted due
    to GROUP BY

43
Explain EXPLAIN
  • SORTC_xxxxxx hvor xxxxxx
  • UNIQUE Must the composite table be sorted to
    remove duplicates ?
  • JOIN Sorting due to METHOD 2 or 4 ?
  • ORDERBY Must the composite table be sorted
    due to ORDER BY
  • GROUPBY Must the composite table be sorted
    due to GROUP BY
  • Our scenario requires the composite result table
    (from the JOIN) to be sorted due to ORDER BY in
    the SQL-statement.

44
Explain EXPLAIN
  • PREFETCH
  • Describes which PREFETCH method MIGHT be used.
  • L List Prefetch DB2 sorts RIDs from
    index(es) to avoid reading the same data page
    more than once.
  • D Dynamic Prefetch DB2 will start to read
    blocks of data into the buffer pool
    asynchronously (if it pays off)
  • S Sequential Prefetch DB2 will read all
    needed pages into the buffer pool
    asynchronously to save time doing I/O
  • This scenario illustrates index DSNDXX01 will be
    used where CREATOR fullfill the WHERE clause.
    Since the table also has to be accessed, the
    RIDs are being sorted so DB2 only needs to read
    the same page once. If this index was the
    CLUSTERING index maybe LIST PREFETCH could have
    been avoided.

45
Explain EXPLAIN
  • TSLOCKMODE
  • Describes what kind of LOCK DB2 will use for the
    tablespace being accessed. Beside the TS-locks,
    DB2 will do table or row locks (covered in a
    separate section)
  • Different LOCK types will be described later, but
    in general X og IX are not considered nice
    since these can prohibit concurrent access and
    ultimately lead to TIMEOUTs and DEADLOCKs.
  • In our scenario we have IS (Intent Share) for
    both tables being accessed, which is expected
    since its a simple SELECT statement.

46
What can YOU do to save DB2 CPU and improve
performance- - - - -Until next time ?Thank
You
Write a Comment
User Comments (0)
About PowerShow.com