Title: Explain DB2 EXPLAIN Explanations
1Explain DB2 EXPLAIN Explanations
2Introduction
- Who are you ?
- Job function Application Developer / DBA
- DB2 experience
- Who am I ?
- Steen Rasmussen, CA inc.
- Principal Technical Specialist DB2 Tools
- 22 years marriage with DB2 since V1R1M0
- Developer, DBA, Presales, Technical Manager,
Presenter
3DISCLAIMER
- DB2 Performance is a huge topic
- You will not become performance specialists today
? - This presentation is to be considered an
appetizer for more detailed presentations - Understand the fundamentals of Explain
- A quick start to decrypt Explain output
- What to focus on
4Agenda the BIG one (if you like the
agenda and Roberta I will be back)
- What is EXPLAIN and why performance is important
- What influences EXPLAIN output (Access Path
Selection) - EXPLAIN prerequisites
- Explain table(s)
- How to create the tables
- History and backward compatibility
- How can EXPLAIN be executed
- Access Path introduction
- What is Explain
- OPTIMIZER overview
- Explain Output Explain EXPLAIN
5What can you do to save DB2 CPU (and
improve performance)
- How to build SQL statements
- Static ltgt Dynamic SQL differences
- What if DB2 decides the wrong Access Path
- OPTHINT
- REOPT
- RUNSTATS overview
- Help to the Optimizer
- Filter Factors
6Hvad kan du gore for at spare CPU (og
forbedre performance)
- DB2 Locking
- Lock types
- Recommendations
- SQL Index design
- STAGE1 and STAGE2 predicates does it matter and
what is it in reality - How do you find tuning potential the bad
SQLs - Benchmarking
- What is it
- How can we learn from our experiences
7Some terms to understand when digging into DB2
Performance issues
- RI
- Buffer Pool (BP)
- RID pool List Prefetch
- Parallelism DEGREE()
- Access Path (AP)
- Sargable STAGE1 STAGE2
- Getpage - GETP
- Optimizer
- Explain
- Sequential Prefetch (detection)
- Deadlock / Timeout
- Lock Avoidance
- Lock, Latch and Claim
- Constraints
- Filter Factor Cardinality
- Correlated query
- Firstkeycardf og Fullkeycardf
8What is EXPLAINOptimizeroverview
9What is EXPLAIN
- Illustrates how DB2s Optimizer will execute a
SQL statement, a Package or a Plan (depending on
how EXPLAIN is executed) - Why is EXPLAIN a necessity
- Cant we simply look at the SQL-statement and
estimate if it has been coded all right - What if its a 12 tabeller joines
- Or the statement is 2 MB (or just 24 KB)
- Does performance mean anything if a statement
executes in half a second or 5 minutes ? - Predict WHAT will happen when SQL changes or at a
Package REBIND after a reorganization and
Runstats - The latest example how will upgrading to DB2 V8
impact Access Path (compare DB2 V7 Optimizer with
DB2 V8) - Do you always know whether a JOIN or
SUBSELECT/EXISTS provides the best performance ??
Lets see a cool example
10What is EXPLAIN
- Why is Performance so important
- Bad performing SQL costs
- One benchmark illustrates
- It costs 30 to correct bad SQL in test
- It costs 1000 to correct in production
- If response times are not optimal
- Fewer transactions will go through the pipe
- Each end user will be less productive
- Other SQL-statements will suffer due to sharing
of resources - Buffer Pools, I/O channels, Locking conflicts,
contention in shared pools like SORT area, RID
pool, . . . . - Hardware upgrade to conform to SLA
11Whats influencing the EXPLAIN output
- Some factors to consider when comparing Explain
between two different enviroments - Optimizer looks at Hardware type
- Optimizer looks at number of processors
- Optimizer looks at Buffer Pools
- Host variables will be replaced by Parameter
Markers when doing dynamic Explain this could
be a major problem in earlier DB2 versions (pre
DB2 V8) if host variables (or column predicate)
was defined differently than the column defined
in the DB2 catalog
12Whats influencing the EXPLAIN output
- Table size (and compression)
- Column cardinality and Filter Factor
- Indexes present and the columns
- FIRSTKEYCARDF and FULLKEYCARDF (details later)
- Different RUNSTATS methods to collect statistics
- Clustering (Cluster Ratio) as well as clustered
indicator - Number of Index Levels (NLEVELS)
- SQL predicates (predicate analysis)
- ORDER BY and the ability to eliminate sorts
- . . . . . . . . . .
13EXPLAIN OPTIMIZER overview
- What is the purpose of the Optimizer
- Decides how the database navigates
- Parsing SQL statements to check tables and
columns - Investigates statistics from the DB2 catalog
(which can be updated by RUNSTATS utility or
manual) - Decides what is the LEAST expensive access path
- DB2s Optimizer is COST BASED (opposite from
Oracle) - Finds statistics from the DB2 Catalog
- Calculates Filter Factor(s) (estimated qualified
rows) - Finds a potential number of Access Paths
- Calculates the cost based on CPU and I/O cost
14EXPLAIN OPTIMIZER overview
- Cost based Optimizer CPU and I/O
- CPU costs
- Looks at predicates (STAGE-1 or STAGE-2)
(this can be covered in another
session next page is an appetizer ) - How many PAGEs to look at
- SORT any kind
- I/O costs
- DB2 Catalog statistics (like Clusterratio which
also can be covered in a separate session) - Buffer Pool size
- What does it cost to allocate temporary files
etc. - A lot of activity spent on looking at which
indexes exist and how these are matching the
predicates
15EXPLAIN OPTIMIZER overview
Result Set
Relational Data Manager
SQL statement
Apply stage 2 predicates and sort data (can be
expensive)
Optimized SQL
Data Manager
Apply stage 1 predicates
Read Buffer or get data (I/O)
Data
Buffer Manager
16EXPLAIN Pre-Requisites
- EXPLAIN tables
- creator.PLAN_TABLE (minimum to do explain)
- Records Optimizers choice of Access Path
- Not immediately easy to decrypt many codes
(see next slide) - creator.DSN_STATEMNT_TABLE (optional)
- Shows COST estimates (this is HUGE in my opinion)
- creator.DSN_FUNCTION_TABLE (optional)
- Only used if UDF (User Defined Function) needs to
be explained - Creator.DSN_STATEMENT_CACHE_TABLE (new in DB2 V8)
- Used to explain DB2 Dynamic Statement Cache
or parts of this
17EXPLAIN TABLE content (example)
- QUERYNO QBLOCKNO APPLNAME PROGNAME PLANNO METHOD
CREATOR TNAME TABNO - 1411 1 RQATD 1 0
SYSIBM SYSTABLES 1 - ACCESSTYPE MATCHCOLS ACCESSCREATOR ACCESSNAME
INDEXONLY - I 0 SYSIBM DSNDTX02
N - SORTN_UNIQ SORTN_JOIN SORTN_ORDERBY SORTN_GROUPBY
SORTC_UNIQ SORTC_JOIN - N N N N
N N - SORTC_ORDERBY SORTC_GROUPBY TSLOCKMODE TIMESTAMP
REMARKS - N N IS
2007073116080531 - PREFETCH COLUMN_FN_EVAL MIXOPSEQ VERSION
COLLID - 0
CAD72_2004-08-31-18.24.46 RQPAR110_ALL - COLLID ACCESS_DEGREE ACCESS_PGROUP_ID
JOIN_DEGREE JOIN_PGROUP_ID - RQPAR110_ALL ------ ------
------ ------ - SORTC_PGROUP_ID SORTN_PGROUP_ID PARALLELISM_MODE
MERGE_JOIN_COLS - ------ ------ -
------ - CORRELATION_NAME PAGE_RANGE JOIN_TYPE
GROUP_MEMBER IBM_SERVICE_DATA - A YÄ.½T æ ³ INITUK15
- WHEN_OPTIMIZE QBLOCK_TYPE BIND_TIME
OPTHINT HINT_USED - SELECT 2007-07-31-16.08.03.6151
84 - PRIMARY_ACCESSTYPE PARENT_QBLOCKNO TABLE_TYPE
18EXPLAIN Pre-Requisites
- How to create the tables for EXPLAIN ?
- Every new DB2 version adds additional columns
- DB2 V1R0 had no EXPLAIN (afair)
- DB2 V1R2 had 25 columns in PLAN_TABLE
- . . . . . . . . .
- DB2 V7 has 51 columns in PLAN_TABLE
- DB2 V8 has 58 columns in PLAN_TABLE
- DB2 V9 has 59 columns in PLAN_TABLE
- Backward and forward compatible !!!
- CREATE TABLE steen.plan_table like . . . . . . .
- Usually the Systems Programmer creates one as
part of the IVP - Look into IBM SDSNSAMP
19What EXPLAIN DOESNT show
- Explain only shows SELECT, DELETE, UPDATE, INSERT
- Important not to forget the issues below when
doing performance / tuning - RI Definitions
- TRIGGERs executed as part of the SQL-statement
- UDFs
- Table- and Column Check Constraints
- Not always a guarantee DB2 will USE the
illustrated AP - Prefetch activities can be disabled depending on
BP status - Parallelism is decided at the execution time
- RID pool shortage
20How to execute EXPLAIN
- EXPLAIN
- Explain Dynamic Statement cache will not be
covered in detail. Normally used during DBA
Performance/Tuning - Manual type-in via SPUFI
- BIND eller REBIND PACKAGE med EXPLAIN(YES)
21How to execute EXPLAIN
IQPSQLE1 --- (CAPS ON) --- SQL Editor --- (NULLS
ON) --- 2007/08/03 132554 COMMAND gt
SCROLL
gt CSR
Edit the
SQL statements that you wish to test below. When
you are done editing the SQL, hit PF3/15
or enter 'END' to return to the RC/SQL control
panel. Host variables will be parsed out, so
that you may test your SQL repeatedly with
various host values.
___
explain plan set queryno 190 for
___ select from
sysibm.systables
___ where name ?
___ and
creator 'PTI'
___ with UR
___
___ commit
___
___
select from plan_table where queryno190
___
BOTTOM OF
DATA
22How to execute EXPLAIN
- BIND / REBIND and use EXPLAIN(YES)
- Note you will have to select from PLAN_TABLE to
view Access path information. - Consequences can be less than optimal or even
career limiting decisions
To Rebind Or Not To Rebind . That is the
QUESTION !
23How to execute EXPLAIN
COMMAND gt
SCROLL gt PAGE
TOP OF DATA
.CALL DSN
PARM(S81A)
.DATA
REBIND PACKAGE(RQPAR110_ALL.RQATD.(CAD72_2004
-08-31-18.24.46)) -
OWNER(RASST02)
- QUALIFIER(RASST02)
-
CURRENTDATA(NO )
-
VALIDATE(RUN ) EXPLAIN(yes)
- ISOLATION(CS)
RELEASE(COMMIT ) -
DEGREE(1 )
-
REOPT(NONE )
- KEEPDYNAMIC(NO )
-
DBPROTOCOL(PRIVATE)
-
ENCODING(EBCDIC )
- IMMEDWRITE(NO )
-
FLAG(I)
.ENDDATA
BOTTOM OF DATA
24BIND / REBIND with EXPLAIN warning
- Manual EXPLAIN
- This is a What if analysis
- Illustrates WHICH Access Path will be chosen if a
BIND or REBIND is executed - You can check the implications / consequences
- BIND / REBIND with EXPLAIN(YES)
- DB2 will generate a NEW Access path !!!!!
- Is this a good idea ? Maybe !
- What if the DB2 catalog statistics ISNT optimal
- No RUNSTATS executed ? statistics columns in the
catalog have -1 ? Optimizer has NO clue about
statistics for tables, indexes, columns
25Versioning of Explain
- How
- One extra step at Program promotion
- Can be integrated with your existing Change
Management process - Execute EXPLAIN
- Save Explain output in a table
- Each Explain will generate a new VERSION
- Save xx versions
- Create a SQL statement which will flag packages
where the cost difference is greater than xx - SUM(PROCMS) and SUM(PROCSU) per Explained Package
version and compare to the costs from previous
Explain - Also consider saving key values from the DB2
catalog used by the Optimizer
26Versioning of Explain
- Why
- The primary reason is quickly to identify WHEN
and WHY the performance of a program
increased/decreased - It is always possible to verify if performance
has changes - Primarily due to COST or AP changes
- You can see when the change happened (every
Explain has a unique timestamp) - You can see what the DB2 Catalog statistics were
at the Explain time
27Access Path what is good / bad AP
- No definitive answer exists for this question
except for - IT DEPENDS !
- Lets look at some issues in order to find the
correct answer - One example One SQL-statement costs 0,050
CPU-sec while another one costs 2 CPU-minutes
which one do you want to spend time tuning ?
28Access Path what is good / bad AP
- The cheap SQL-statement is executed 100,000
times per hour in an online transaction, while
the other is executed once every day in a batch
job - One weeks consumption
- Online SQL (0,050 x 100000 x 24 x 7) 840000
CPU-sec. - Batch SQL (2 x 60 x 7)
840 CPU-sec.
29ExplainEXPLAIN
30Explain EXPLAIN
- It will take hours to describe every column
currently existing in PLAN_TABLE here are a few
to focus on - Lets use a real SQL-statement which has been
explained to decrypt the PLAN_TABLE columns. - Description of all the PLAN_TABLE columns can be
viewed in this manual - IBM DB2 SQL REFERENCE GUIDE (SC18-7426-04)
http//publib.boulder.ibm.com/epubs/pdf/dsnsqj14.p
df
31Explain EXPLAIN
EXPLAIN PLAN SET QUERYNO75 for SELECT A.NAME ,
A.CREATOR , A.TBNAME , A.TBCREATOR , A.CREATEDBY
, B.COLNAME , B.COLSEQ , B.ORDERING ,
A.CLUSTERING , A.UNIQUERULE FROM
SYSIBM.SYSINDEXES A , SYSIBM.SYSKEYS B
WHERE ( A.CREATOR
B.IXCREATOR AND A.NAME B.IXNAME )
AND A.CREATOR hostvar1
ORDER BY
A.NAME , A.CREATOR , B.COLSEQ COMMIT SELECT
FROM PLAN_TABLE WHERE QUERYNO75
QUERYNO QBLOCKNO PROGNAME PLANNO METHOD CREATOR
TNAME ACCESSTYPE MATCHCOLS 75 1
BPASQL8 1 0 SYSIBM SYSINDEXES
I 1 75 1 BPASQL8 2
1 SYSIBM SYSKEYS I 2 75
1 BPASQL8 3 3
0
ACCESSCREATOR ACCESSNAME INDEXONLY SORTN_UNIQ
SORTN_JOIN SORTN_ORDERBY SORTN_GROUPBY SYSIBM
DSNDXX01 N N N
N N SYSIBM
DSNDKX01 N N N
N N
N N N N
N
SORTC_UNIQ SORTC_JOIN SORTC_ORDERBY
SORTC_GROUPBY PREFETCH TSLOCKMODE N
N N N L
IS N N N
N IS N N
Y N
32Explain EXPLAIN
- QUERYNO
- The number specified doing manual Explain or -
the statement number from the
package/plan which was explained. - For programs this number is used to identify a
SQL-statement in the source code. - In this scenario, the QUERYNO75 which also was
specified in the manual explain. - Did you know you can assign a fixed QUERYNO to
the SQL statement in your program so its easier
to compare statements when the program is changed
???!!!
33Explain EXPLAIN
- QBLOCKNO
- A number identifying every query block inside the
SQL statement. - A typical example is if a UNION or SUBSELECT is
present where DB2 will have to execute several
statements within one SQL statement. - In this example the SQL-statement is a simply
JOIN, which is why we only see QBLOCKNO1
34Explain EXPLAIN
- PROGNAME
- Name of the program which holds the SQL-statement
being explained - Package name when REBIND / BIND of a package is
executed using EXPLAIN(YES) - Using dynamic explain this column will hold the
name of the program executing the dynamic explain - In this scenario a dynamic explain was executed
using program BPASQL8.
35Explain EXPLAIN
- PLANNO
- Specifies which sequence the individual
components are executed inside a QBLOCKNO. - Is this really interesting and do we need to
worry about it ? - This will be covered in the section dealing with
FILTER FACTOR this is one of the most
important informations to pay attention to - In this scenario we can see SYSINDEXES is
accessed first and then SYSKEYS, and finally a
third component is performed (sorting)
36Explain EXPLAIN
- METHOD
- Describes which form of JOIN being used.
- 0 First table accessed in this step (QBLOCKNO)
- 1 NESTED LOOP JOIN
- 2 MERGE SCAN JOIN
- 3 Sorting (of some kind)
- (ORDER, GROUP, DISTINCT, UNION)
- 4 HYBRID JOIN
- We can see SYSINDEXES is accessed first, and then
a NESTED LOOP JOIN with SYSKEYS and finally a
SORT is performed to satisfy the ORDER BY
37Explain EXPLAIN
- CREATOR TNAME
- Table name and creator for the table accessed
- If METHOD3 , these columns will be spaces
- In this scenario two tables are accessed
- SYSIBM.SYSINDEXES
- SYSIBM.SYSKEYS
38Explain EXPLAIN
- ACCESSTYPE
- Describes how the table mentioned is accessed.
- This is a very important information which in
many performance exercises can be a first shop
stop (why will be explained) - In DB2 9 you can find 17 different methods the
most important ones will be covered here - I Index Access
- I1 One-Fetch Index scan
- M Superceeded by MX, MI, MU Multiple Index
Access (can be excellent or very bad at least
two indexes from the same table are used) - N Index Scan where IN is used
- R Tablespace Scan if many pages exist for
this tablespace, this can be extremely expensive - In this scenario both table accessed have
ACCESSTYPEI, so we know indexe(s) are used and
we dont face a tablespace scan ? ?
39Explain EXPLAIN
- MATCHCOLS
- Illustrates how many columns being used in the
index listed under ACCESSNAME. - Used for ACCESSTYPE I , I1 , N and MX
- The value can be
- o The entire index is scanned warning !!!!!
- gt0 The number of columns being used from the
index listed prior to Tablespace access (unless
Index Only). - If the index has 4 columns and MATCHCOLS 1 ,
this can lead to non-optimal response times
depending on column cardinality. - Assume the table has 1.000.000 rows
- The index has 4 columns
- The first column has 2 distinct values
(cardinality0.5) - DB2 will guess half the table rows qualify ?
TS-scan
40Explain EXPLAIN
- ACCESSNAME and ACCESSCREATOR
- If an index is used to access the table, these
columns illustrates which index(es) being used. - Use your common sense and think about the index
listed is it the best one based on the
predicates used in the WHERE component of the
SQL-statement. - Also think about this information in conjunction
with MATCHCOLS if more columns exist in the
index compared to what is described in MATCHCOLS
maybe an additional pedicate can improve
performance. - In our scenario an index is used to access both
tables specified in our JOIN statement.
41Explain EXPLAIN
- INDEXONLY
- Describes if DB2 is satisfied by ONLY scanning
the index listed under ACCESSNAME without
looking into the data piece (tablespace). - The value can be Y(es) or N(o)
- Often it may pay off to have an additional column
in the index to avoid the tablespace access
(indexonly), and maybe one I/O can be spared at
in every statement (the most expensive in the DB2
world). Before making this decision please have
a closer look at index
complications advantages and disadvantages - Our JOIN scenario is using two indexes, and both
have INDEXONLYN , meaning what is being
selected can NOT be satisfied by the columns in
the 2 indexes used. -
42Explain EXPLAIN
- SORTN_xxxxxx hvor xxxxxx
- UNIQUE Must the internal table ne sorted to
remove duplicates - JOIN Is it necessary to sort due to METHOD 2
or 4 ? - ORDERBY Must the internal table be sorted due
to ORDER BY - GROUPBY Must the internal table be sorted due
to GROUP BY
43Explain EXPLAIN
- SORTC_xxxxxx hvor xxxxxx
- UNIQUE Must the composite table be sorted to
remove duplicates ? - JOIN Sorting due to METHOD 2 or 4 ?
- ORDERBY Must the composite table be sorted
due to ORDER BY - GROUPBY Must the composite table be sorted
due to GROUP BY - Our scenario requires the composite result table
(from the JOIN) to be sorted due to ORDER BY in
the SQL-statement.
44Explain EXPLAIN
- PREFETCH
- Describes which PREFETCH method MIGHT be used.
- L List Prefetch DB2 sorts RIDs from
index(es) to avoid reading the same data page
more than once. - D Dynamic Prefetch DB2 will start to read
blocks of data into the buffer pool
asynchronously (if it pays off) - S Sequential Prefetch DB2 will read all
needed pages into the buffer pool
asynchronously to save time doing I/O - This scenario illustrates index DSNDXX01 will be
used where CREATOR fullfill the WHERE clause.
Since the table also has to be accessed, the
RIDs are being sorted so DB2 only needs to read
the same page once. If this index was the
CLUSTERING index maybe LIST PREFETCH could have
been avoided.
45Explain EXPLAIN
- TSLOCKMODE
- Describes what kind of LOCK DB2 will use for the
tablespace being accessed. Beside the TS-locks,
DB2 will do table or row locks (covered in a
separate section) - Different LOCK types will be described later, but
in general X og IX are not considered nice
since these can prohibit concurrent access and
ultimately lead to TIMEOUTs and DEADLOCKs. - In our scenario we have IS (Intent Share) for
both tables being accessed, which is expected
since its a simple SELECT statement.
46What can YOU do to save DB2 CPU and improve
performance- - - - -Until next time ?Thank
You