Title: Wait-Time%20Based%20Oracle%20Performance%20Management
1Wait-Time Based Oracle Performance Management
Prepared for NCOUGPresented by Matt Larson
CTO, Confio Software
2Who am I?
- Founder and CTO of database performance software
company - Former DBA consultant specializing in Oracle
performance tuning - Co-author of three Oracle books (Oracle
Development Unleashed, Oracle Unleashed - 2nd Edition, Oracle8 Server Unleashed)
- Co-author of two other database related books
3Agenda
- Foundation
- Case Study One PL/SQL Issue
- Case Study Two Full Table Scans
- Case Study Three Inefficient Indexes
- Case Study Four Locking Problems
- QA
4Working the Wrong Problems
- After spending an agonizing week tuning Oracle
buffers to minimize I/O operations, management
typically rewards you with - A. An all expense paid vacation
- B. A free lunch
- C. A stale donut
- D. Reward? Nobody even noticed!
5Tuning Success (or lack thereof)
- Your role in the rollout of a new customer facing
application results in - A. Keys to drive the CEOs Porsche
- B. Keys to use the executive restroom
- C. A mop to use in the executive restroom
- D. Your office has been moved to the restroom
6Conventional Tools Measure System Health
- Assumption If I make the database healthy,
users benefit - Symptoms
- DBA finds big problem and fixes it, users
report no impact - Lots of data to review and things to fix, not
sure which to do first - Unclear view of performance leads to
Finger-pointing
Its your Code!
Its your Database!
IT staff
Developer or vendor
7 RMM Focuses on User Wait-Time
End User
- Identify each bottleneck affecting the user
- Rank bottlenecks by user impact
- Implement proven suggestions
- Set correct expectations on impact of fix
- Show proof the fix helped users
8RMM Confios Underlying Methodology
- Resource Mapping Methodology Industry
best-practice optimizing performance tuning for
maximum business impact - Three Key Principles of RMM
- 1. SQL View All statistics at SQL statement
level - 2. Time View Measure Time, not number of times a
resource is utilized - 3. Full View Separately measure every resource
to isolate source of problems
9Illustrating example SQL View Principle
- Example CEO measuring employee output
- Averaging over entire company gives no useful
data - Must measure each job separately
- DBA must manage database similarly
- Measure and identify bottlenecks for each SQL
independently
10Illustrating example Time View Principle
- Example CEO counting tasks vs. time to
complete - Counting system statistics not meaningful
- Must measure Time to complete
- System stats (buffer size, hit ratios, I/O
counts) do not identify where database customers
are waiting - Identify and optimize Wait Time for each SQL as
best indicator of performance
11Illustrating example Full View Principle
- Example CEO measuring results with blind spot
hiding key processes - Without direct visibility, valuable info is lost
- Must have visibility to every process step
- Distinctly identify and measure each Oracle
resource for each distinct SQL
12RMM-compliant Performance Tool Types
- Two Primary Types of Tools
- Session Specific Tools
- Tools that focus on one session at a time often
by tracing the process - Examples tkprof (Oracle), OraSRP Profiler (open
source) - Continuous DB Wide Monitoring Tools
- Tools that focus on all sessions by sampling
Oracle - Example Confio Ignite
- Both tools have a place in the organization
13Tracing
- Tracing with wait events complies with RMM
- Should be used cautiously in non-batch
environments due to session statistics skew - 80 out of 100 sessions have no locking contention
issues - 20 out of 100 have spent 99 of time waiting for
locked rows - If you trace one of the 80 sessions, it appears
as if you have no locking issues (and spend time
trying to tune other items that may not be
important) - If you trace one of the 20 sessions, it appears
as if you could fix the locking problems and
reduce your wait time by 99
14Tracing (cont)
- Very precise statistics, may be only way to get
certain statistics - Bind variable information is available
- Different types of tracing available providing
detail analysis even deeper than wait events - Ideal if a known problem is going to occur in the
future (and known session) - Difficult to see trends over time
- Primary audience is technical user
15Continuous DB Wide Monitoring Tools
- 24/7 sampling provides real-time and historical
perspective - Allows DBA to go back in time and retrieve
information even if problem was not expected - Not the level of detail provided by tracing
- Most of these tools have trend reports that allow
communication with others outside of the group - What is starting to perform poorly?
- What progress have we made while tuning?
16Case Study OnePL/SQL Issue
17Problem Observed
- Critical situation application performance
unsatisfactory - Response time between 240 and 900 seconds
- Most times users shutdown application
- Very high network traffic (3x4x normal),
indicating time-outs and user refreshes - CritSit declared major effort to resolve
problem
18Wait Events During Problem
library cache lock
library cache pin
19Investigation
20What does RMM tell us?
- Which SQL CERN_PROFILE
- Truncate
- Which Resource library cache pin
- library cache lock
- How much time up to 16 Hours of wait time
per hour
21Results
- Found an invalid trigger
- Insert statement was trying to fire trigger
- Truncate was locked behind it
- Response time improvement from 60,000 seconds
(worst case) to 0 seconds - Configured alert to notify DBA when the problem
starts next time - Problem should not occur for 22 hours without
anyone knowing
22Case Study Two DB File Scattered Reads
23Problem Observed
- Problem Login taking 4 minutes for each user
everyday they started their day - High wait accumulation from 630 830 am
- 600 Users X 4 Minutes 40 Hours Every Day
- 40 Hours lost productivity every day
- Applied RMM approach to problem identification
- Identify Wait Time, offending SQL, offending
Resource
24Wait Events During Problem
25Investigation
26What does RMM tell us?
- Which SQL LoginLookup
- UpdateInventory
- Which Resource Scattered Read
- Buffer Busy Waits
- How much time 40 Hour
- Every Day
27Hypotheses Oracle Interpretations
- Two Alternative paths for optimization
- Eliminate Full Table Scan
- There isnt a need to read the whole table, so we
need to find the right shortcut - Improve response time
- We need to read most or all of the table anyway,
so lets just figure out how to do it faster - Key Questions
- Is full table scan necessary?
- What causes a full table scan for this SQL
Statement?
28I. Unnecessary Full Table Scan?
- Solutions
- Add / Modify index(es) on the table
- Update table and/or index statistics if proper
index not being used - Add hint to use existing index
- Optimize the application
29Full Table Scan is Needed
- Two alternative paths for optimization
- I. Eliminate Full Table Scan
- There isnt a need to read the whole table, so we
need to find the right shortcut - II. Improve response time
- We need to read most or all of the table anyway,
so lets just figure out how to do it faster
30II. Improve Response Time for Db File Scattered
Reads
- Solutions
- Use Parallel Reads
- Set Database Parameters
- Improve I/O Speed
- Optimize the application
- Larger Database Caches (64-bit)
311. Use Parallel Reads Faster FTS
- Parallel Reads
- Can be set at the table level (use with caution)
- Alter table customer parallel degree 4
- Normally used by hinting in the SQL Statement
- select / FULL(customer) PARALLEL(customer, 4)
/ customer_namefrom customer - A delicate tradeoff
- sacrifice the performance of others for the
running query. - Not necessarily efficient, just faster
- Parallel Reads may actually do twice the work of
a sequential query but have four workers, thus
finishing in half the time while using 8x
resource
322. Set database parameters
- DB_FILE_MULTIBLOCK_READ_COUNT
- specifies the maximum number of blocks read in
one I/O operation during a sequential scan - Impacts the optimizer
- Reduces number of I/Os required
- For OLTP, typically between 4 to 16
- Optimizer will more likely to FTS if set too high
- Ensure that the database read requests are synced
up with the O/S. - This gets tricky if different block sizes are
used in different tablespaces
333. Improve I/O speed
- Get your SA involved
- Investigate I/O sub-system
- Iostat, vmstat, sar, for potential problems
- Monitor during high activity
- Investigate contention at the disk/controller
level. - Learn which disks share common resources
- Use more disks to spread I/O and reduce hot spots
- Investigate caching on disk sub-system and
current memory usage
344. Optimizing the Application
- Review application do you have access to code
for changes? - Understand the code around the problem SQL
- Techniques to Optimize a statement
- Reduce the number of calls for a SQL
- Caching the data in the application
- Creating a summary table (perhaps via a
materialized view) - Eliminating the need for the data
- Retrieve Less Data with each statement
- Add fields to the WHERE clause
- Combine SQLs for fewer calls
- Combine several SQLs with different bind
variables into one large statement that retrieves
all the data in one shot
355. Larger Database Caches (64-bit)
- Larger cache means fewer disk reads
- May need large increase to have significant
impact
Performance Gain
of database in memory
36Results
- Added indexes to underlying tables
- Added Materialized View
Full Table Scan Fixed
37Case Study Three DB File Sequential Reads
38Problem Observed
- Data Warehouse loads were taking too long
- Noticed high wait times on db file sequential
read wait event - DBAs were confused why are data loads reading
data - Applied RMM approach to problem identification
- Identify Wait Time, offending SQL, offending
Resource
39Investigation
Sequential read time
SQL
Sequential read time by object for SQL
40What does RMM tell us?
- Which SQL 3 Insert Statements
- Which Resource DB File Sequential Read
- How much time 5 hour 90 of wait time
41Investigating db file sequential reads
- Often considered a good read
- DB file sequential reads normally occur during
index lookups - Often a single-block read although it may
retrieve more than one block. - Sequential Read may also be seen for reads from
- datafile headers
- rebuilding the control file
- dumping datafile headers
42Hypotheses Oracle Interpretations of Sequential
Reads
- Causes of excessive wait times
- Reading too many index leaf blocks
- Not finding block in buffer cache forces disk
read - Slow disk reads
- Contention for certain blocks
- High Read time on INSERT statements
43I. Reading too many index and table blocks (cont)
- Rebuild Fragmented Indexes
- alter index rebuild online
- Compress Indexes
- alter index rebuild compress
- Uses more CPU
- Multi-column indexes
- Avoid the table lookup
- Will create a larger index
- Pre-sort Table data
44II. Not finding block in buffer cache forces disk
read
- Db File sequential reads occur because the block
is not in the buffer cache. - How do we make sure more blocks are already in
the cache? - Solutions
- Increase the size of the buffer cache(s)
- Put the object in a cache where it is less likely
to get flushed out
45III. Slow disk reads
- With databases, it often comes down to this the
disk just needs to be faster - Put certain objects on the fastest disk
- O/S file caching using special software that
makes normal files perform like raw files - Increase Storage System Caching such as an EMC
cache
46Results
- Inserts were updating indexes that had low
cardinality leading columns - Reordered columns in the index and got a 50
performance improvement - Log file sync wait event was then the largest
wait event - Data was being committed too often
- Tuning is an iterative process
47Case Study Four Enqueue
48Problem Observed
- Problem High Wait on CPPFPROD
- Accumulated wait 9.5 hours (34,000 sec) during
3.00-4.00am hour - End users were complaining loudly
- Applied RMM approach to problem identification
- Identify Wait Time, offending SQL, offending
Resource
49Investigation Drill down to Top SQL Identify
likely source of Problem
50enqueue
Causes
- TX enqueue
- Locks held for the life of a transaction until a
COMMIT or ROLLBACK. - TM enqueue
- Locks being held when foreign key constraints are
not indexed properly. - ST enqueue
- Locks held during dynamic space allocations.
- HW enqueue
- Serialization for the allocation of space beyond
the high water mark
51enqueue
TX
- Generally due to application or table setup
issues - Is acquired when a transaction initiates an
UPDATE - Row is locked by the session
- Others may select from the row (read consistency)
- Others wanting to UPDATE same row must wait
- Lock is held until a COMMIT or ROLLBACK is issued
52enqueue
TX
- Waits caused by normal active transactions
- Just issue a COMMIT or ROLLBACK
- Determine what the true unit of work is
53enqueue
TX
- Waits due to Insufficient 'ITL' slots in a Block
- The ITL (interested transaction list) is an area
at the top of each data block where Oracle keeps
track of which rows are locked by which
transaction - Every transaction wanting to change a block
requires a slot in the ITL list of the block - The number of ITL slots is controlled by
- INITRANS, initial number of slots at block
creation - MAXTRANS, total allowable slots over time
- ITL list will expand to allow MAXTRANS only if
space is available
54enqueue
TX
- Waits due to rows being covered by the same
BITMAP index fragment - Bitmap indexes allow one index entry to cover
many rows within a table - If two sessions try to insert or update the same
key value the second session has to wait
55enqueue
ST
- Caused by space management operations
- Can happen with small extent sizes, allocation of
temporary segments for sorting - May get an ORA-01575 indicating a timeout
- Unnecessary Sorting
- Disk sorting requires space management and thus
contention on the ST enqueue - Eliminate as much disk sorting as possible
- Evaluate SORT_AREA_SIZE or PGA_AGGREGATE_TARGET
parameters
56enqueue
ST
- General TEMP tablespace advice
- SMON cleanup of temporary space
- Set PCTINCREASE equal to zero to stop
cleanup/coalesce - Set temporary tablespaces as TEMPORARY
- SMON in parallel environment
- SMON Cleanup operations are magnified
- Hanging or slow system
- Side effect of many processes on the ST enqueue
57enqueue
HW
- Acquired to move the HW mark
- High volume of inserts across concurrent session
will cause a wait on this contention - Recreate / modify the object with larger extents
- Pre-allocate extents
- ALTER TABLE ALLOCATE EXTENT
- VLOCK.ID1 is the tablespace number.
- VLOCK.ID2 is the relative dba of segment header
of the object for which space is being allocated.
58What is blocking session waiting on?
- Idle Session
- DB File Scattered Reads
- Another session
59Idle Session Scenario
Jim
Sally
- Update customer 147
- Goes to Lunch
Locked trying to update customer 147
Jim will needlessly wait a long time. DBA can
kill Sallys session IF they can tell that the
session is idle.
60Missing Index Scenario
Jim
Sally
- Update customer 147
- Selects from order table with missing index
Locked trying to update customer 147
DBA can tell that Jim is really waiting because
of a missing index on the order table even
though Jim isnt using the order table.
61Idle Session Scenario
Jim
Sally
- Update customer 147
- Selects from order table with missing index
Updated warehouse 22 Locked trying to update
customer 147
Bob
Locked trying to update warehouse 22
A chain of locks occurs even though both locked
users arent accessing the table with missing
indexes
62Wait Events for Development
- Tuning SQL for optimal performance
- Debug/test/integrate/pilot process
- Understand impact on existing database
- Understand Oracle impact on application
performance - View into production for better development
prioritization and feedback - Reduce finger-pointing
63Conclusion
- Conventional Tuning focus on system health and
lead to finger-pointing and confusion - Wait event tuning implemented according to RMM is
the new way to tune - Two RMM-compliant tools types
- Tracing tools
- Continuous DB-wide monitoring tools
- Questions Answers
64Who is Confio?
- Oracle product is Ignite for Oracle, fast
install, free trial at www.confio.com - Organizations who trust Confio to monitor their
most critical applications include
65Thank you for coming
- Matt Larson
- Founder/Chief Technology Officer
- Contact Information
- mattlarson_at_confio.com
- 303-938-8282 ext. 110
- Company website
- www.confio.com