Wait-Time%20Based%20Oracle%20Performance%20Management - PowerPoint PPT Presentation

About This Presentation

Title:

Wait-Time%20Based%20Oracle%20Performance%20Management

Description:

Title: PowerPoint Presentation Author: Don Bergal Created Date: 4/13/2004 3:25:01 PM Document presentation format: On-screen Show Company: Confio Other titles – PowerPoint PPT presentation

Number of Views:282

Avg rating:3.0/5.0

Slides: 66

Provided by: DonB190

Learn more at: http://www.nocoug.org

Category:

more less

Transcript and Presenter's Notes

Title: Wait-Time%20Based%20Oracle%20Performance%20Management

1
Wait-Time Based Oracle Performance Management
Prepared for NCOUGPresented by Matt Larson
CTO, Confio Software
2
Who am I?

Founder and CTO of database performance software
company
Former DBA consultant specializing in Oracle
performance tuning
Co-author of three Oracle books (Oracle
Development Unleashed, Oracle Unleashed
2nd Edition, Oracle8 Server Unleashed)
Co-author of two other database related books

3
Agenda

Foundation
Case Study One PL/SQL Issue
Case Study Two Full Table Scans
Case Study Three Inefficient Indexes
Case Study Four Locking Problems
QA

4
Working the Wrong Problems

After spending an agonizing week tuning Oracle
buffers to minimize I/O operations, management
typically rewards you with
A. An all expense paid vacation
B. A free lunch
C. A stale donut
D. Reward? Nobody even noticed!

5
Tuning Success (or lack thereof)

Your role in the rollout of a new customer facing
application results in
A. Keys to drive the CEOs Porsche
B. Keys to use the executive restroom
C. A mop to use in the executive restroom
D. Your office has been moved to the restroom

6
Conventional Tools Measure System Health

Assumption If I make the database healthy,
users benefit
Symptoms
DBA finds big problem and fixes it, users
report no impact
Lots of data to review and things to fix, not
sure which to do first
Unclear view of performance leads to
Finger-pointing

Its your Code!
Its your Database!
IT staff
Developer or vendor
7
RMM Focuses on User Wait-Time
End User

Identify each bottleneck affecting the user
Rank bottlenecks by user impact
Implement proven suggestions
Set correct expectations on impact of fix
Show proof the fix helped users

8
RMM Confios Underlying Methodology

Resource Mapping Methodology Industry
best-practice optimizing performance tuning for
maximum business impact
Three Key Principles of RMM
1. SQL View All statistics at SQL statement
level
2. Time View Measure Time, not number of times a
resource is utilized
3. Full View Separately measure every resource
to isolate source of problems

9
Illustrating example SQL View Principle

Example CEO measuring employee output
Averaging over entire company gives no useful
data
Must measure each job separately
DBA must manage database similarly
Measure and identify bottlenecks for each SQL
independently

10
Illustrating example Time View Principle

Example CEO counting tasks vs. time to
complete
Counting system statistics not meaningful
Must measure Time to complete
System stats (buffer size, hit ratios, I/O
counts) do not identify where database customers
are waiting
Identify and optimize Wait Time for each SQL as
best indicator of performance

11
Illustrating example Full View Principle

Example CEO measuring results with blind spot
hiding key processes
Without direct visibility, valuable info is lost
Must have visibility to every process step
Distinctly identify and measure each Oracle
resource for each distinct SQL

12
RMM-compliant Performance Tool Types

Two Primary Types of Tools
Session Specific Tools
Tools that focus on one session at a time often
by tracing the process
Examples tkprof (Oracle), OraSRP Profiler (open
source)
Continuous DB Wide Monitoring Tools
Tools that focus on all sessions by sampling
Oracle
Example Confio Ignite
Both tools have a place in the organization

13
Tracing

Tracing with wait events complies with RMM
Should be used cautiously in non-batch
environments due to session statistics skew
80 out of 100 sessions have no locking contention
issues
20 out of 100 have spent 99 of time waiting for
locked rows
If you trace one of the 80 sessions, it appears
as if you have no locking issues (and spend time
trying to tune other items that may not be
important)
If you trace one of the 20 sessions, it appears
as if you could fix the locking problems and
reduce your wait time by 99

14
Tracing (cont)

Very precise statistics, may be only way to get
certain statistics
Bind variable information is available
Different types of tracing available providing
detail analysis even deeper than wait events
Ideal if a known problem is going to occur in the
future (and known session)
Difficult to see trends over time
Primary audience is technical user

15
Continuous DB Wide Monitoring Tools

24/7 sampling provides real-time and historical
perspective
Allows DBA to go back in time and retrieve
information even if problem was not expected
Not the level of detail provided by tracing
Most of these tools have trend reports that allow
communication with others outside of the group
What is starting to perform poorly?
What progress have we made while tuning?

16
Case Study OnePL/SQL Issue
17
Problem Observed

Critical situation application performance
unsatisfactory
Response time between 240 and 900 seconds
Most times users shutdown application
Very high network traffic (3x4x normal),
indicating time-outs and user refreshes
CritSit declared major effort to resolve
problem

18
Wait Events During Problem
library cache lock
library cache pin
19
Investigation
20
What does RMM tell us?

Which SQL CERN_PROFILE
Truncate
Which Resource library cache pin
library cache lock
How much time up to 16 Hours of wait time
per hour

21
Results

Found an invalid trigger
Insert statement was trying to fire trigger
Truncate was locked behind it
Response time improvement from 60,000 seconds
(worst case) to 0 seconds
Configured alert to notify DBA when the problem
starts next time
Problem should not occur for 22 hours without
anyone knowing

22
Case Study Two DB File Scattered Reads
23
Problem Observed

Problem Login taking 4 minutes for each user
everyday they started their day
High wait accumulation from 630 830 am
600 Users X 4 Minutes 40 Hours Every Day
40 Hours lost productivity every day
Applied RMM approach to problem identification
Identify Wait Time, offending SQL, offending
Resource

24
Wait Events During Problem
25
Investigation
26
What does RMM tell us?

Which SQL LoginLookup
UpdateInventory
Which Resource Scattered Read
Buffer Busy Waits
How much time 40 Hour
Every Day

27
Hypotheses Oracle Interpretations

Two Alternative paths for optimization
Eliminate Full Table Scan
There isnt a need to read the whole table, so we
need to find the right shortcut
Improve response time
We need to read most or all of the table anyway,
so lets just figure out how to do it faster
Key Questions
Is full table scan necessary?
What causes a full table scan for this SQL
Statement?

28
I. Unnecessary Full Table Scan?

Solutions
Add / Modify index(es) on the table
Update table and/or index statistics if proper
index not being used
Add hint to use existing index
Optimize the application

29
Full Table Scan is Needed

Two alternative paths for optimization
I. Eliminate Full Table Scan
There isnt a need to read the whole table, so we
need to find the right shortcut
II. Improve response time
We need to read most or all of the table anyway,
so lets just figure out how to do it faster

30
II. Improve Response Time for Db File Scattered
Reads

Solutions
Use Parallel Reads
Set Database Parameters
Improve I/O Speed
Optimize the application
Larger Database Caches (64-bit)

31
1. Use Parallel Reads Faster FTS

Parallel Reads
Can be set at the table level (use with caution)
Alter table customer parallel degree 4
Normally used by hinting in the SQL Statement
select / FULL(customer) PARALLEL(customer, 4)
/ customer_namefrom customer
A delicate tradeoff
sacrifice the performance of others for the
running query.
Not necessarily efficient, just faster
Parallel Reads may actually do twice the work of
a sequential query but have four workers, thus
finishing in half the time while using 8x
resource

32
2. Set database parameters

DB_FILE_MULTIBLOCK_READ_COUNT
specifies the maximum number of blocks read in
one I/O operation during a sequential scan
Impacts the optimizer
Reduces number of I/Os required
For OLTP, typically between 4 to 16
Optimizer will more likely to FTS if set too high
Ensure that the database read requests are synced
up with the O/S.
This gets tricky if different block sizes are
used in different tablespaces

33
3. Improve I/O speed

Get your SA involved
Investigate I/O sub-system
Iostat, vmstat, sar, for potential problems
Monitor during high activity
Investigate contention at the disk/controller
level.
Learn which disks share common resources
Use more disks to spread I/O and reduce hot spots
Investigate caching on disk sub-system and
current memory usage

34
4. Optimizing the Application

Review application do you have access to code
for changes?
Understand the code around the problem SQL
Techniques to Optimize a statement
Reduce the number of calls for a SQL
Caching the data in the application
Creating a summary table (perhaps via a
materialized view)
Eliminating the need for the data
Retrieve Less Data with each statement
Add fields to the WHERE clause
Combine SQLs for fewer calls
Combine several SQLs with different bind
variables into one large statement that retrieves
all the data in one shot

35
5. Larger Database Caches (64-bit)

Larger cache means fewer disk reads
May need large increase to have significant
impact

Performance Gain
of database in memory
36
Results

Added indexes to underlying tables
Added Materialized View

Full Table Scan Fixed
37
Case Study Three DB File Sequential Reads
38
Problem Observed

Data Warehouse loads were taking too long
Noticed high wait times on db file sequential
read wait event
DBAs were confused why are data loads reading
data
Applied RMM approach to problem identification
Identify Wait Time, offending SQL, offending
Resource

39
Investigation
Sequential read time
SQL
Sequential read time by object for SQL
40
What does RMM tell us?

Which SQL 3 Insert Statements
Which Resource DB File Sequential Read
How much time 5 hour 90 of wait time

41
Investigating db file sequential reads

Often considered a good read
DB file sequential reads normally occur during
index lookups
Often a single-block read although it may
retrieve more than one block.
Sequential Read may also be seen for reads from
datafile headers
rebuilding the control file
dumping datafile headers

42
Hypotheses Oracle Interpretations of Sequential
Reads

Causes of excessive wait times
Reading too many index leaf blocks
Not finding block in buffer cache forces disk
read
Slow disk reads
Contention for certain blocks
High Read time on INSERT statements

43
I. Reading too many index and table blocks (cont)

Rebuild Fragmented Indexes
alter index rebuild online
Compress Indexes
alter index rebuild compress
Uses more CPU
Multi-column indexes
Avoid the table lookup
Will create a larger index
Pre-sort Table data

44
II. Not finding block in buffer cache forces disk
read

Db File sequential reads occur because the block
is not in the buffer cache.
How do we make sure more blocks are already in
the cache?
Solutions
Increase the size of the buffer cache(s)
Put the object in a cache where it is less likely
to get flushed out

45
III. Slow disk reads

With databases, it often comes down to this the
disk just needs to be faster
Put certain objects on the fastest disk
O/S file caching using special software that
makes normal files perform like raw files
Increase Storage System Caching such as an EMC
cache

46
Results

Inserts were updating indexes that had low
cardinality leading columns
Reordered columns in the index and got a 50
performance improvement
Log file sync wait event was then the largest
wait event
Data was being committed too often
Tuning is an iterative process

47
Case Study Four Enqueue
48
Problem Observed

Problem High Wait on CPPFPROD
Accumulated wait 9.5 hours (34,000 sec) during
3.00-4.00am hour
End users were complaining loudly
Applied RMM approach to problem identification
Identify Wait Time, offending SQL, offending
Resource

49
Investigation Drill down to Top SQL Identify
likely source of Problem
50
enqueue
Causes

TX enqueue
Locks held for the life of a transaction until a
COMMIT or ROLLBACK.
TM enqueue
Locks being held when foreign key constraints are
not indexed properly.
ST enqueue
Locks held during dynamic space allocations.
HW enqueue
Serialization for the allocation of space beyond
the high water mark

51
enqueue
TX

Generally due to application or table setup
issues
Is acquired when a transaction initiates an
UPDATE
Row is locked by the session
Others may select from the row (read consistency)
Others wanting to UPDATE same row must wait
Lock is held until a COMMIT or ROLLBACK is issued

52
enqueue
TX

Waits caused by normal active transactions
Just issue a COMMIT or ROLLBACK
Determine what the true unit of work is

53
enqueue
TX

Waits due to Insufficient 'ITL' slots in a Block
The ITL (interested transaction list) is an area
at the top of each data block where Oracle keeps
track of which rows are locked by which
transaction
Every transaction wanting to change a block
requires a slot in the ITL list of the block
The number of ITL slots is controlled by
INITRANS, initial number of slots at block
creation
MAXTRANS, total allowable slots over time
ITL list will expand to allow MAXTRANS only if
space is available

54
enqueue
TX

Waits due to rows being covered by the same
BITMAP index fragment
Bitmap indexes allow one index entry to cover
many rows within a table
If two sessions try to insert or update the same
key value the second session has to wait

55
enqueue
ST

Caused by space management operations
Can happen with small extent sizes, allocation of
temporary segments for sorting
May get an ORA-01575 indicating a timeout
Unnecessary Sorting
Disk sorting requires space management and thus
contention on the ST enqueue
Eliminate as much disk sorting as possible
Evaluate SORT_AREA_SIZE or PGA_AGGREGATE_TARGET
parameters

56
enqueue
ST

General TEMP tablespace advice
SMON cleanup of temporary space
Set PCTINCREASE equal to zero to stop
cleanup/coalesce
Set temporary tablespaces as TEMPORARY
SMON in parallel environment
SMON Cleanup operations are magnified
Hanging or slow system
Side effect of many processes on the ST enqueue

57
enqueue
HW

Acquired to move the HW mark
High volume of inserts across concurrent session
will cause a wait on this contention
Recreate / modify the object with larger extents
Pre-allocate extents
ALTER TABLE ALLOCATE EXTENT
VLOCK.ID1 is the tablespace number.
VLOCK.ID2 is the relative dba of segment header
of the object for which space is being allocated.

58
What is blocking session waiting on?

Idle Session
DB File Scattered Reads
Another session

59
Idle Session Scenario
Jim
Sally

Update customer 147
Goes to Lunch

Locked trying to update customer 147
Jim will needlessly wait a long time. DBA can
kill Sallys session IF they can tell that the
session is idle.
60
Missing Index Scenario
Jim
Sally

Update customer 147
Selects from order table with missing index

Locked trying to update customer 147
DBA can tell that Jim is really waiting because
of a missing index on the order table even
though Jim isnt using the order table.
61
Idle Session Scenario
Jim
Sally

Update customer 147
Selects from order table with missing index

Updated warehouse 22 Locked trying to update
customer 147
Bob
Locked trying to update warehouse 22
A chain of locks occurs even though both locked
users arent accessing the table with missing
indexes
62
Wait Events for Development

Tuning SQL for optimal performance
Debug/test/integrate/pilot process
Understand impact on existing database
Understand Oracle impact on application
performance
View into production for better development
prioritization and feedback
Reduce finger-pointing

63
Conclusion

Conventional Tuning focus on system health and
lead to finger-pointing and confusion
Wait event tuning implemented according to RMM is
the new way to tune
Two RMM-compliant tools types
Tracing tools
Continuous DB-wide monitoring tools
Questions Answers

64
Who is Confio?

Oracle product is Ignite for Oracle, fast
install, free trial at www.confio.com
Organizations who trust Confio to monitor their
most critical applications include

65
Thank you for coming