Wait-Time%20Based%20Oracle%20Performance%20Management - PowerPoint PPT Presentation

About This Presentation
Title:

Wait-Time%20Based%20Oracle%20Performance%20Management

Description:

Title: PowerPoint Presentation Author: Don Bergal Created Date: 4/13/2004 3:25:01 PM Document presentation format: On-screen Show Company: Confio Other titles – PowerPoint PPT presentation

Number of Views:278
Avg rating:3.0/5.0
Slides: 66
Provided by: DonB190
Learn more at: http://www.nocoug.org
Category:

less

Transcript and Presenter's Notes

Title: Wait-Time%20Based%20Oracle%20Performance%20Management


1
Wait-Time Based Oracle Performance Management
Prepared for NCOUGPresented by Matt Larson
CTO, Confio Software
2
Who am I?
  • Founder and CTO of database performance software
    company
  • Former DBA consultant specializing in Oracle
    performance tuning
  • Co-author of three Oracle books (Oracle
    Development Unleashed, Oracle Unleashed
  • 2nd Edition, Oracle8 Server Unleashed)
  • Co-author of two other database related books

3
Agenda
  • Foundation
  • Case Study One PL/SQL Issue
  • Case Study Two Full Table Scans
  • Case Study Three Inefficient Indexes
  • Case Study Four Locking Problems
  • QA

4
Working the Wrong Problems
  • After spending an agonizing week tuning Oracle
    buffers to minimize I/O operations, management
    typically rewards you with
  • A. An all expense paid vacation
  • B. A free lunch
  • C. A stale donut
  • D. Reward? Nobody even noticed!

5
Tuning Success (or lack thereof)
  • Your role in the rollout of a new customer facing
    application results in
  • A. Keys to drive the CEOs Porsche
  • B. Keys to use the executive restroom
  • C. A mop to use in the executive restroom
  • D. Your office has been moved to the restroom

6
Conventional Tools Measure System Health
  • Assumption If I make the database healthy,
    users benefit
  • Symptoms
  • DBA finds big problem and fixes it, users
    report no impact
  • Lots of data to review and things to fix, not
    sure which to do first
  • Unclear view of performance leads to
    Finger-pointing

Its your Code!
Its your Database!
IT staff
Developer or vendor
7
RMM Focuses on User Wait-Time
End User
  • Identify each bottleneck affecting the user
  • Rank bottlenecks by user impact
  • Implement proven suggestions
  • Set correct expectations on impact of fix
  • Show proof the fix helped users

8
RMM Confios Underlying Methodology
  • Resource Mapping Methodology Industry
    best-practice optimizing performance tuning for
    maximum business impact
  • Three Key Principles of RMM
  • 1. SQL View All statistics at SQL statement
    level
  • 2. Time View Measure Time, not number of times a
    resource is utilized
  • 3. Full View Separately measure every resource
    to isolate source of problems

9
Illustrating example SQL View Principle
  • Example CEO measuring employee output
  • Averaging over entire company gives no useful
    data
  • Must measure each job separately
  • DBA must manage database similarly
  • Measure and identify bottlenecks for each SQL
    independently

10
Illustrating example Time View Principle
  • Example CEO counting tasks vs. time to
    complete
  • Counting system statistics not meaningful
  • Must measure Time to complete
  • System stats (buffer size, hit ratios, I/O
    counts) do not identify where database customers
    are waiting
  • Identify and optimize Wait Time for each SQL as
    best indicator of performance

11
Illustrating example Full View Principle
  • Example CEO measuring results with blind spot
    hiding key processes
  • Without direct visibility, valuable info is lost
  • Must have visibility to every process step
  • Distinctly identify and measure each Oracle
    resource for each distinct SQL

12
RMM-compliant Performance Tool Types
  • Two Primary Types of Tools
  • Session Specific Tools
  • Tools that focus on one session at a time often
    by tracing the process
  • Examples tkprof (Oracle), OraSRP Profiler (open
    source)
  • Continuous DB Wide Monitoring Tools
  • Tools that focus on all sessions by sampling
    Oracle
  • Example Confio Ignite
  • Both tools have a place in the organization

13
Tracing
  • Tracing with wait events complies with RMM
  • Should be used cautiously in non-batch
    environments due to session statistics skew
  • 80 out of 100 sessions have no locking contention
    issues
  • 20 out of 100 have spent 99 of time waiting for
    locked rows
  • If you trace one of the 80 sessions, it appears
    as if you have no locking issues (and spend time
    trying to tune other items that may not be
    important)
  • If you trace one of the 20 sessions, it appears
    as if you could fix the locking problems and
    reduce your wait time by 99

14
Tracing (cont)
  • Very precise statistics, may be only way to get
    certain statistics
  • Bind variable information is available
  • Different types of tracing available providing
    detail analysis even deeper than wait events
  • Ideal if a known problem is going to occur in the
    future (and known session)
  • Difficult to see trends over time
  • Primary audience is technical user

15
Continuous DB Wide Monitoring Tools
  • 24/7 sampling provides real-time and historical
    perspective
  • Allows DBA to go back in time and retrieve
    information even if problem was not expected
  • Not the level of detail provided by tracing
  • Most of these tools have trend reports that allow
    communication with others outside of the group
  • What is starting to perform poorly?
  • What progress have we made while tuning?

16
Case Study OnePL/SQL Issue
17
Problem Observed
  • Critical situation application performance
    unsatisfactory
  • Response time between 240 and 900 seconds
  • Most times users shutdown application
  • Very high network traffic (3x4x normal),
    indicating time-outs and user refreshes
  • CritSit declared major effort to resolve
    problem

18
Wait Events During Problem
library cache lock
library cache pin
19
Investigation
20
What does RMM tell us?
  • Which SQL CERN_PROFILE
  • Truncate
  • Which Resource library cache pin
  • library cache lock
  • How much time up to 16 Hours of wait time
    per hour

21
Results
  • Found an invalid trigger
  • Insert statement was trying to fire trigger
  • Truncate was locked behind it
  • Response time improvement from 60,000 seconds
    (worst case) to 0 seconds
  • Configured alert to notify DBA when the problem
    starts next time
  • Problem should not occur for 22 hours without
    anyone knowing

22
Case Study Two DB File Scattered Reads
23
Problem Observed
  • Problem Login taking 4 minutes for each user
    everyday they started their day
  • High wait accumulation from 630 830 am
  • 600 Users X 4 Minutes 40 Hours Every Day
  • 40 Hours lost productivity every day
  • Applied RMM approach to problem identification
  • Identify Wait Time, offending SQL, offending
    Resource

24
Wait Events During Problem
25
Investigation
26
What does RMM tell us?
  • Which SQL LoginLookup
  • UpdateInventory
  • Which Resource Scattered Read
  • Buffer Busy Waits
  • How much time 40 Hour
  • Every Day

27
Hypotheses Oracle Interpretations
  • Two Alternative paths for optimization
  • Eliminate Full Table Scan
  • There isnt a need to read the whole table, so we
    need to find the right shortcut
  • Improve response time
  • We need to read most or all of the table anyway,
    so lets just figure out how to do it faster
  • Key Questions
  • Is full table scan necessary?
  • What causes a full table scan for this SQL
    Statement?

28
I. Unnecessary Full Table Scan?
  • Solutions
  • Add / Modify index(es) on the table
  • Update table and/or index statistics if proper
    index not being used
  • Add hint to use existing index
  • Optimize the application

29
Full Table Scan is Needed
  • Two alternative paths for optimization
  • I. Eliminate Full Table Scan
  • There isnt a need to read the whole table, so we
    need to find the right shortcut
  • II. Improve response time
  • We need to read most or all of the table anyway,
    so lets just figure out how to do it faster

30
II. Improve Response Time for Db File Scattered
Reads
  • Solutions
  • Use Parallel Reads
  • Set Database Parameters
  • Improve I/O Speed
  • Optimize the application
  • Larger Database Caches (64-bit)

31
1. Use Parallel Reads Faster FTS
  • Parallel Reads
  • Can be set at the table level (use with caution)
  • Alter table customer parallel degree 4
  • Normally used by hinting in the SQL Statement
  • select / FULL(customer) PARALLEL(customer, 4)
    / customer_namefrom  customer
  • A delicate tradeoff
  • sacrifice the performance of others for the
    running query.
  • Not necessarily efficient, just faster
  • Parallel Reads may actually do twice the work of
    a sequential query but have four workers, thus
    finishing in half the time while using 8x
    resource

32
2. Set database parameters
  • DB_FILE_MULTIBLOCK_READ_COUNT
  • specifies the maximum number of blocks read in
    one I/O operation during a sequential scan
  • Impacts the optimizer
  • Reduces number of I/Os required
  • For OLTP, typically between 4 to 16
  • Optimizer will more likely to FTS if set too high
  • Ensure that the database read requests are synced
    up with the O/S.
  • This gets tricky if different block sizes are
    used in different tablespaces

33
3. Improve I/O speed
  • Get your SA involved
  • Investigate I/O sub-system
  • Iostat, vmstat, sar, for potential problems
  • Monitor during high activity
  • Investigate contention at the disk/controller
    level.  
  • Learn which disks share common resources
  • Use more disks to spread I/O and reduce hot spots
  • Investigate caching on disk sub-system and
    current memory usage

34
4. Optimizing the Application
  • Review application do you have access to code
    for changes?
  • Understand the code around the problem SQL
  • Techniques to Optimize a statement
  • Reduce the number of calls for a SQL
  • Caching the data in the application
  • Creating a summary table (perhaps via a
    materialized view)
  • Eliminating the need for the data
  • Retrieve Less Data with each statement
  • Add fields to the WHERE clause
  • Combine SQLs for fewer calls
  • Combine several SQLs with different bind
    variables into one large statement that retrieves
    all the data in one shot

35
5. Larger Database Caches (64-bit)
  • Larger cache means fewer disk reads
  • May need large increase to have significant
    impact

Performance Gain
of database in memory
36
Results
  • Added indexes to underlying tables
  • Added Materialized View

Full Table Scan Fixed
37
Case Study Three DB File Sequential Reads
38
Problem Observed
  • Data Warehouse loads were taking too long
  • Noticed high wait times on db file sequential
    read wait event
  • DBAs were confused why are data loads reading
    data
  • Applied RMM approach to problem identification
  • Identify Wait Time, offending SQL, offending
    Resource

39
Investigation
Sequential read time
SQL
Sequential read time by object for SQL
40
What does RMM tell us?
  • Which SQL 3 Insert Statements
  • Which Resource DB File Sequential Read
  • How much time 5 hour 90 of wait time

41
Investigating db file sequential reads
  • Often considered a good read
  • DB file sequential reads normally occur during
    index lookups
  • Often a single-block read although it may
    retrieve more than one block.
  • Sequential Read may also be seen for reads from
  • datafile headers
  • rebuilding the control file
  • dumping datafile headers

42
Hypotheses Oracle Interpretations of Sequential
Reads
  • Causes of excessive wait times
  • Reading too many index leaf blocks
  • Not finding block in buffer cache forces disk
    read
  • Slow disk reads
  • Contention for certain blocks
  • High Read time on INSERT statements

43
I. Reading too many index and table blocks (cont)
  • Rebuild Fragmented Indexes
  • alter index rebuild online
  • Compress Indexes
  • alter index rebuild compress
  • Uses more CPU
  • Multi-column indexes
  • Avoid the table lookup
  • Will create a larger index
  • Pre-sort Table data

44
II. Not finding block in buffer cache forces disk
read
  • Db File sequential reads occur because the block
    is not in the buffer cache.
  • How do we make sure more blocks are already in
    the cache?
  • Solutions
  • Increase the size of the buffer cache(s)
  • Put the object in a cache where it is less likely
    to get flushed out

45
III. Slow disk reads
  • With databases, it often comes down to this the
    disk just needs to be faster
  • Put certain objects on the fastest disk
  • O/S file caching using special software that
    makes normal files perform like raw files
  • Increase Storage System Caching such as an EMC
    cache

46
Results
  • Inserts were updating indexes that had low
    cardinality leading columns
  • Reordered columns in the index and got a 50
    performance improvement
  • Log file sync wait event was then the largest
    wait event
  • Data was being committed too often
  • Tuning is an iterative process

47
Case Study Four Enqueue
48
Problem Observed
  • Problem High Wait on CPPFPROD
  • Accumulated wait 9.5 hours (34,000 sec) during
    3.00-4.00am hour
  • End users were complaining loudly
  • Applied RMM approach to problem identification
  • Identify Wait Time, offending SQL, offending
    Resource

49
Investigation Drill down to Top SQL Identify
likely source of Problem
50
enqueue
Causes
  • TX enqueue
  • Locks held for the life of a transaction until a
    COMMIT or ROLLBACK.
  • TM enqueue
  • Locks being held when foreign key constraints are
    not indexed properly.
  • ST enqueue
  • Locks held during dynamic space allocations.
  • HW enqueue
  • Serialization for the allocation of space beyond
    the high water mark

51
enqueue
TX
  • Generally due to application or table setup
    issues 
  • Is acquired when a transaction initiates an
    UPDATE
  • Row is locked by the session
  • Others may select from the row (read consistency)
  • Others wanting to UPDATE same row must wait
  • Lock is held until a COMMIT or ROLLBACK is issued

52
enqueue
TX
  • Waits caused by normal active transactions
  • Just issue a COMMIT or ROLLBACK
  • Determine what the true unit of work is

53
enqueue
TX
  • Waits due to Insufficient 'ITL' slots in a Block
  • The ITL (interested transaction list) is an area
    at the top of each data block where Oracle keeps
    track of which rows are locked by which
    transaction
  • Every transaction wanting to change a block
    requires a slot in the ITL list of the block
  • The number of ITL slots is controlled by
  • INITRANS, initial number of slots at block
    creation
  • MAXTRANS, total allowable slots over time
  • ITL list will expand to allow MAXTRANS only if
    space is available

54
enqueue
TX
  • Waits due to rows being covered by the same
    BITMAP index fragment
  • Bitmap indexes allow one index entry to cover
    many rows within a table
  • If two sessions try to insert or update the same
    key value the second session has to wait

55
enqueue
ST
  • Caused by space management operations
  • Can happen with small extent sizes, allocation of
    temporary segments for sorting
  • May get an ORA-01575 indicating a timeout
  • Unnecessary Sorting
  • Disk sorting requires space management and thus
    contention on the ST enqueue
  • Eliminate as much disk sorting as possible
  • Evaluate SORT_AREA_SIZE or PGA_AGGREGATE_TARGET
    parameters

56
enqueue
ST
  • General TEMP tablespace advice
  • SMON cleanup of temporary space
  • Set PCTINCREASE equal to zero to stop
    cleanup/coalesce
  • Set temporary tablespaces as TEMPORARY
  • SMON in parallel environment
  • SMON Cleanup operations are magnified
  • Hanging or slow system
  • Side effect of many processes on the ST enqueue

57
enqueue
HW
  • Acquired to move the HW mark
  • High volume of inserts across concurrent session
    will cause a wait on this contention
  • Recreate / modify the object with larger extents
  • Pre-allocate extents
  • ALTER TABLE ALLOCATE EXTENT
  • VLOCK.ID1 is the tablespace number.
  • VLOCK.ID2 is the relative dba of segment header
    of the object for which space is being allocated.

58
What is blocking session waiting on?
  • Idle Session
  • DB File Scattered Reads
  • Another session

59
Idle Session Scenario
Jim
Sally
  • Update customer 147
  • Goes to Lunch

Locked trying to update customer 147
Jim will needlessly wait a long time. DBA can
kill Sallys session IF they can tell that the
session is idle.
60
Missing Index Scenario
Jim
Sally
  • Update customer 147
  • Selects from order table with missing index

Locked trying to update customer 147
DBA can tell that Jim is really waiting because
of a missing index on the order table even
though Jim isnt using the order table.
61
Idle Session Scenario
Jim
Sally
  • Update customer 147
  • Selects from order table with missing index

Updated warehouse 22 Locked trying to update
customer 147
Bob
Locked trying to update warehouse 22
A chain of locks occurs even though both locked
users arent accessing the table with missing
indexes
62
Wait Events for Development
  • Tuning SQL for optimal performance
  • Debug/test/integrate/pilot process
  • Understand impact on existing database
  • Understand Oracle impact on application
    performance
  • View into production for better development
    prioritization and feedback
  • Reduce finger-pointing

63
Conclusion
  • Conventional Tuning focus on system health and
    lead to finger-pointing and confusion
  • Wait event tuning implemented according to RMM is
    the new way to tune
  • Two RMM-compliant tools types
  • Tracing tools
  • Continuous DB-wide monitoring tools
  • Questions Answers

64
Who is Confio?
  • Oracle product is Ignite for Oracle, fast
    install, free trial at www.confio.com
  • Organizations who trust Confio to monitor their
    most critical applications include

65
Thank you for coming
  • Matt Larson
  • Founder/Chief Technology Officer
  • Contact Information
  • mattlarson_at_confio.com
  • 303-938-8282 ext. 110
  • Company website
  • www.confio.com
Write a Comment
User Comments (0)
About PowerShow.com