Title: Oracle Wait Interface 911
1 Oracle Wait Interface - 911
- Surender Sara
- Verities Solutions LLC.
- Surender.sara_at_veritiesllc.com
2 Performance Methodology Goals
- Always Consistent and Repeatable Results
- Cost Effective
- Corporate Wide Deployment
- Customizable
- Always Accurate and becomes a Standard
3 Performance Issues Categories
- Category 1
- When we dont know anything about the issue or
the exact timings on when it happens and need to
zero in to a narrow time window and eventually to
the set of problem statements. - Category 2
- We know when the problem seem to occur but not
the problem. Ex, Every day 9-12 AM things are
slow - Category 3
- We know what specific problem we need to resolve
(end user response time perspective) users know
what needs to be tuned BEST START ! , Focus
based - Category 4
- Meeting ongoing performance SLA Specific Things
Need to be tuned to be able to service SLAs
effectively - Category 5
- Building Performance History ( STATSPACK like
approach but more intelligent and specific )
Ability to capture the problem details only
4 Performance Tuning Goals (1)
- To be able to answer questions about sudden
slowness, localized or generalized - when we receive a help desk call
- someone walks in
- we get a page
- Ability to compare hourly performance
horizontally, this will enable us to answer WHY
ARE WE SLOW TODAY type questions compared to
YESTERDAY WAS FINE - Need to build historical performance data to
compare the fluctuations vertically ( today) and
horizontally ( same time frame but different days
)
5 Performance Tuning Goals (2)
- Look for signatures and patterns over time and to
know what to expect in response time and when (
PROACTIVE GOAL ) - Baseline for tuning efforts
- Translate the data into meaningful capacity
planning needs - Quick, few clicks and the problem is exposed
6Best Strategy, Always On!
- The ideal way to collect data is ALWAYS ON
strategy. This can be accomplished in two ways - 10046 always on
- Init.ora parameter
- Database logon trigger
- Sampling vsession_wait every second
- Direct SGA attach method where we can sample the
data without using database resources can be the
best way to accomplish this goal ( need c, proc
expertise)
7Issues with Best Strategy - 10046
- 10046
- Not possible to leave it on always, will have
negative effect on your performance - Will fill your space in no time on a highly
active system (USER_DUMP_DEST) - If desired can use it as follows
- Trace all sessions from 9 AM to 10 AM Use logon
Trigger to set the trace and then DROP or disable
the trigger
8Example Database Logon Trigger with 10046
- CREATE OR REPLACE TRIGGER TRC_APPS
- AFTER LOGON ON APPS.schema
- DECLARE
- event_stmt varchar2(100) 'ALTER SESSION SET
EVENTS '''''' 10046 TRACE NAME CONTEXT
FOREVER, LEVEL 12 ''''' - BEGIN
- EXECUTE IMMEDIATE event_stmt
- END
- /
9Issues with Best Strategy Sampling
vsession_wait
- If sampling is done per second on this table via
direct attached SGA method that method will work
as long as the data store is not in the same
database that is being monitored
10What to do with - 10046 ?
- 10046
- Use this for tracing programs once they are
identified as a bottleneck. - Use this to trace all sessions when problematic
time window is defined for ex. Put a database
logon trigger at 9 am and stop at 10 am. We will
get trace for all the users at that time interval - Use this to trace all given user connection for
already active sessions ( will start tracing
inactive sessions also when they become active)
using PLSQL code - Have to be prudent to trace specific programs
manually by eliminating the START and END empty
wait times - EMAIL SURENDER.SARA_at_veritiesllc.com for the
tracing paper
11What to do vsession_wait ?
- Sample data every X seconds 30 to 60 seconds
recommended - We are after big bottlenecks during the first
phase - This way we meet the goal of always on for all
sessions but not creating problems of our own
12Fitting It together - GOALS!
- Need something that will capture what each
session did in terms of WAITS and RESOURCES used
before they disconnect - Need something to sample every sessions all the
time - Meets our goal always on but is lightweight
- Provide historical data
- Provide a quick way to catch big issues every
time they occur and the details - Cost effective and corporate standard
13Program Long Response Time
14 Data Collection
- Collect Intelligent Data based on IO and Memory
Related waits - Build holding tables, one for IO and couple for
memory related waits - Build ssdba_extents and ssdba_segments tables and
build indexes on them, depending on the segment
changes may have to refresh them before running
the diagnosis script, recreate these tables once
a day - Sample data every 30 seconds to 1 minute from
vsession_wait into holding tables, this is
changeable. - Create logoff trigger to save vsession_event and
sesstat statistics per user into logoff holding
tables - Memory Related waits need to capture the
problematic constructs at the time they occur - IO related waits can be taken care of using
holding tables ( direct IO read/write can be an
exception)
15Logoff trigger
- Creating logoff stats table in the same database
that is being monitored - Careful about using this and DONT get too
aggressive on collection here as logout time will
be affected - Keep 5-7 days data at minimum
- Create one table for events and one for
statistics
16Loggoff Event table
- drop table ssdba.ssdba_logoff_event_t
- create table ssdba.ssdba_logoff_event_t as
- select
- a.SID ,
- a.EVENT ,
- a.TOTAL_WAITS ,
- a.TOTAL_TIMEOUTS ,
- a.TIME_WAITED ,
- a.AVERAGE_WAIT ,
- a.MAX_WAIT ,
- sysdate as logoff_time ,
- b.SERIAL ,
- b.PADDR ,
- b.USERNAME ,
- b.OSUSER ,
- b.PROCESS ,
- b.MACHINE ,
- b.TERMINAL ,
- b.TYPE ,
17Loggoff Stats table
- drop table ssdba.ssdba_logoff_stats_t
- create table ssdba.ssdba_logoff_stats_t as
- select
- c.username ,
- c.osuser,
- a.sid,c.serial,
- c.paddr,
- c.process,
- c.logon_time ,
- a.statistic ,
- b.name,
- a.value,
- sysdate as logoff_time
- from vsession c, vsesstat a, vstatname b
- where 12
18ssdba_io_dump (IO Capture)
- HOUR
NUMBER - DAY
NUMBER - MONTH
NUMBER - YEAR
NUMBER - SNAP_DT
DATE - SID
NUMBER - SEQ
NUMBER - EVENT
VARCHAR2(64) - P1TEXT
VARCHAR2(64) - P1
NUMBER - P1RAW
RAW(4) - P2TEXT
VARCHAR2(64) - P2
NUMBER - P2RAW
RAW(4) - P3TEXT
VARCHAR2(64) - P3
NUMBER - P3RAW
RAW(4) - WAIT_TIME
NUMBER - SECONDS_IN_WAIT
NUMBER
19ssdba_Memory_dump
- HOUR
NUMBER - DAY
NUMBER - MONTH
NUMBER - YEAR
NUMBER - SNAP_DT
DATE - SID
NUMBER - SERIAL
NUMBER - USERNAME
VARCHAR2(30) - PADDR
RAW(4) - LOGON_TIME
DATE - SQL_HASH_VALUE
NUMBER - KGLPNMOD
NUMBER - KGLNAOBJ
VARCHAR2(1000)
20ssdba_sqltext_dump
- HOUR
NUMBER - DAY
NUMBER - MONTH
NUMBER - YEAR
NUMBER - SNAP_DT
DATE - HASH_VALUE
NUMBER - ADDRESS
RAW(4) - PIECE
NUMBER - SQL_TEXT
VARCHAR2(64)
21 FILE-BLOCK Related Waits
- buffer busy due to global cache file
block id - buffer busy waits file block id
- buffer read retry file block not
used - control file parallel write files blocks
requests - control file sequential read file block
Blocks - control file single write file block
Blocks - conversion file read files
block Blocks - db file parallel read files blocks
requests - db file parallel write files blocks
requests - db file scattered read file block
Blocks - db file sequential read file block
Blocks - db file single write file block
Blocks - direct path write file number first dba
block cnt - enqueue namemode id1 id2
- free buffer waits file block
set-id - write complete waits file block id
22Memory Related Waits, need to capture wait as
well as the details
- library cache pin
- Shared Pool Latch
- Hot Blocks
23Reports
- Will tell you exactly what you waited for and
what caused the wait ! - Fry the big fish in days !
- Resort to 10046 once the TOP Bad code is found !
- Do typical performance fixes for events
discovered. - Analyzing wait event data will yield a path
toward a solution for almost any problem.
24Sample QUERY
- ---- Top Average Waited SQL per EVENT
- SELECT Event , SQL_HASH_VALUE , average_waited
- FROM (
- SELECT Event , SQL_HASH_VALUE , average_wait ,
- MAX(average_wait) OVER (PARTITION BY
event) average_waited - FROM ssdba_event e
- ) WHERE average_wait average_waited
- ---- Top Seconds Waited SQL per EVENT
- SELECT Event , SQL_HASH_VALUE , seconds_waited
- FROM (
- SELECT Event , SQL_HASH_VALUE ,SECONDS_IN_WAIT ,
- MAX(SECONDS_IN_WAIT) OVER (PARTITION BY
event) seconds_waited - FROM ssdba_event e
- ) WHERE SECONDS_IN_WAIT seconds_waited
25Client Example using this TOOL ( last 4 days data
collection)
- EVENT
SQL_HASH_VALUE AVERAGE_WAITED - ----------------------------------------
------------------ ------------ - buffer busy waits
2950132120 12.166 - db file scattered read
0 1 - db file sequential read
0 6.75 - direct path read
3975150470 .08 - direct path write
927814176 .02 - enqueue
2898243099 19440 - latch free
3208076674 3998 - library cache load lock
1736625236 7 - library cache pin
3180879507 613.5
26Lets Drill down
- select distinct p1, p2, event, sql_text
- from
- ssdba_event a ,
- ssdba_sqltext b
- where
- a.idb.id and
- event like 'enqueue'
- and
- sql_hash_valueSQLHASH
- group by
- p1, p2, event, sql_text
- order by 4 desc
- /
27Who is the CULPRIT - enqueue !
- P1 P2 EVENT SQL_TEXT
- ---------- ---------- ----------
------------------------- - 1415053318 3473416 enqueue DELETE FROM
AP_TRIAL_BAL - 1415053318 3473458 enqueue DELETE FROM
AP_TRIAL_BAL - 1415053318 3538948 enqueue DELETE FROM
AP_TRIAL_BAL
28Who is the CULPRIT Latch Free !
- select distinct event, latchname, count()
- from
- ssdba_event a , ssdba_sqltext b
- where
- a.idb.id and
- event like 'latch free'
- group by
- event, latchname
- having count() gt 1
29Who is the CULPRIT Latch Free !
- EVENT LATCHNAME
COUNT() - ---------- ---------------------------------------
- ---------- - latch free cache buffer handles
3 - latch free cache buffers chains
503 - latch free cache buffers lru chain
70 - latch free library cache
924 - latch free parallel query alloc buffer
2 - latch free row cache objects
277 - latch free session allocation
10 - latch free shared pool
205 - latch free transaction allocation
13 - latch free
22
30Who Caused it -'cache buffers chains
- select distinct event , sql_hash_value ,
count() , - sum(WAIT_TIME ),
- sum(SECONDS_IN_WAIT),
- sum(AVERAGE_WAIT)
- from
- ssdba_event a , ssdba_sqltext b
- where a.idb.id
- and latchname like 'cache buffers chains'
- group by event, segment_type , block_type,
- sql_hash_value
- /
31Who Caused it -'cache buffers chains
- SQL_HASH_VALUE COUNT() SUM(WAIT_TIME)
SUM(SECONDS_IN_WAIT) SUM(AVERAGE_WAIT) - -------------- ---------- --------------
-------------------- ----------------- - 146743156 58 95
130 9911.75 - 3356970949 7 12
0 2291.11 - 3147322109 25 44
195 1746.40 - 2582753673 4 6
0 1625.00 - 3575111500 6 1
0 1386.91 - 2803300295 9 2
0 1211.60 - select distinct p1, p2, event, sql_text
- from
- ssdba_event a ,
- ssdba_sqltext b
- where
- a.idb.id and
- event like 'enqueue'
- and
- sql_hash_value 3356970949
- group by
- insert into n_query_user_roles ( user_name,
role_name, delete_flag ) - select
- qu.user_name, r.role_name, 'Y'
- from n_query_users qu, n_roles r
- where
- qu.user_name ! 'NOETIX_SYS' and qu.
- user_type in ( 'O', 'U' ) and 'Y'
- 'Y' and not exists ( select 'role assigned
to user' - from n_query_user_roles qur
- where qur.role_name r.role_name
- and qur.user_name qu.user_name ) and
- exists ( select 'role assigned to query user'
- from sys.dba_role_privs drp where
drp.granted_role - r.role_name and drp.grantee qu.user_name
and rownum 1 )
32Q A
- Email Me For the feel tool surender.sara_at_veritiesl
lc.com