Title: Title: Arial 28pt.
1ASE101 Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
Jeffrey Wong Consultant, Anton Ventures wongjtf_at_ao
l.com August 15-19, 2004
2Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Overview
- Limitations of other database performance
monitoring approaches - Limitations of this approach to database
performance monitoring - Typical uses for the real-time (RT) MDA database
system - Developmental timeline
- Architecture of the RT MDA database system
- Barrier synchronized time base and surrogate key
equivalence - Fast insertion mechanism for capturing MDA table
output - 2NF normalization and surrogate key substitution
of foreign keys - Stored procedure generation from specification
files - Restart of capture or archiving DML after
deadlock victim termination - Visualization of archived MDA performance data
- Visualization examples (3)
- Futures
- References
- Questions and answers
3Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Limitations of other database performance
monitoring approaches - Network Packet Sniffing Limited to
request-response measurements. No information on
Sybase internals efficiency - Sampling via Separate Monitoring Process
Separate process introduces latency effects. Can
be highly invasive for short sampling periods
(typically, anything less than 10 seconds). IPC
(inter process communication) overheads can be
significant. Typically, no synchronized time base
for collected samples - System Performance Report Typically, output is
not in data-mining-friendly (i.e. tabular)
format. Highly invasive for any sampling period
less than two minutes - Ad-hoc Measurements May not be monitoring the
correct indicators. Very heavily dependent on the
analysts skill level
4Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Limitations of this approach to database
performance monitoring - MDA Table Queries Hang at High CPU Utilization
Levels This is a current limitation of the CIS
proxy table mechanism upon which the MDA tables
are based - Specialized Resource Needs Ideally, both tempdb
and the RT data capture database need to be on
RAM disk, with the RT data capture database also
requiring battery backup of its RAM disk. Next
best would be configuring a named cache for
tempdb and RAM disk hosting for the tempdb
transaction log - Lower Limit to Sampling Granularity Due to
various technical considerations (including
excessive resource utilization of CPU and disk),
the current lower limit for the sampling
periodicity is 5 seconds (c.f. MDA table minimum
sampling periodicity of 1 second) - Counter Wraparound Steady-state averaging
technique was employed to estimate a counter
value where that counter value had overflowed and
wrapped around. This may or may not always be the
correct technique to use
5Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Typical uses for the real-time (RT) MDA database
system - System Instability Root Cause Analysis The RT
MDA database system was first deployed on a
warm-standby replication reporting subsystem
which had been experiencing intermittent
stability issues. The two root causes of system
instability were rapidly identified using this
analysis system - Capacity Planning and System Performance
Monitoring Actual resource utilization of a
business intelligence and data mining vendor
application was quantified by the RT MDA database
system, prior to its introduction on a 7x24 OLTP
production system. The new application was
subsequently monitored by the RT MDA database
system to certify that it exhibited appropriate
resource utilization in production - System Performance Characterization System
performance characteristics of 8 7x24 OLTP
production systems are being collected by the RT
MDA database system, prior to these 8 systems
being consolidated on a 4 node cluster hardware
system. The data collected by the RT MDA database
system will be used to appropriately size and
configure HA resource groups for that cluster
6Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Developmental Timeline
- April, 2003 Release of ASE 12.5.0.3 ESD 1 (MDA
Tables release). Commenced private RD into MDA
Tables functionality - July, 2003 Commenced private development of RT
MDA database system V1.0. This version of RT was
single channel with a capture database and time
base / surrogate key generator - September, 2003 Deployment of RT MDA V1.0 to a
production site (first commercial use). Commenced
private development of RT MDA V1.1 Multi-channel
for both capture and archiving databases, bit-map
barrier synchronization for both capture and
archiving, monitoring profiles, and
capture/archiving stored procedure generation
from specification templates - January, 2004 Deployment of RT MDA V1.1 to a
production site - March, 2004 Commenced joint development of RT
MDA V1.2 (visualization and capacity
planning/system performance characterization) - April, 2004 Deployment of RT MDA V1.2
(visualization) to a production site - June, 2004 Deployment of RT MDA V1.2 (capacity
planning/system performance characterization) to
a production site
7Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Architecture of the RT MDA database system
8Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Architecture of the RT MDA database system
(continued) - Transitions Description
- Transition A Parallel capture of MDA table data
synchronized on a standardized time base / sample
number identification pair - Transitions B, C, D Capture stored procedures
perform parallel inserts into the capture
database. Tempdb is used to perform fast
enumeration of capture data, such that for any
capture data set, each row contains a
monotonically increasing unique identifier. Each
capture stored procedure schedules itself in a
sleep-retry loop, if there is no data to process - Transition E Parallel archiving and purge of
captured MDA table data synchronized on user
configured archiving and purging time intervals - Transitions F, G, H, I Archive / purge stored
procedures perform parallel inserts into the
archive database when archiving, and parallel
deletes into the capture and archive databases
when purging. Tempdb is used to perform fast
enumeration of filtered capture data, such that
for any archive data set, each row contains a
monotonically increasing unique identifier. Each
archive / purge stored procedure schedules itself
in a sleep-retry loop, if there is no data to
process
9Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Architecture of the RT MDA database system
(continued) - Transitions Description (continued)
- Transitions J, K Time base trigger fires for
each generated time base / sample number
identification pair stored in the capture
database, thereby ensuring that the time base
history tables in the capture and archive
databases are kept completely in sync - Transition L Every day at 1 AM, a batch job is
executed that encapsulates the previous days
archive data into a set of time-stamped,
time-series visualization tables and views - Transitions M, N Data mining is performed on
extracted time-series visualization data at the
operations analysts convenience. Currently,
unsophisticated visualization is carried out
using Microsoft Excel (using its in-built
charting module). It is intended in the future to
extend the scope of the visualization to
encompass other OLAP/BI tools (such as Impromptu)
10Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Barrier Synchronized Time Base and Surrogate Key
Equivalence - Barrier synchronization can handle up to 64
parallel channels - Most capture barrier synchronization is at one
hierarchical level only, except for process
monitoring which is at two hierarchical levels
(this was done to enforce strict correlation
between MDA process monitoring tables, albeit at
the expense of introducing a one second time skew
between sampling the process lookup table, and
sampling the other process monitoring tables) - The time base generation is also barrier
synchronized, and is currently set for 15 second
sampling intervals - At the instant the current time base is
synthesized, a monotonically increasing, unique
surrogate key (identifier) is assigned to that
time base. This surrogate key is the sample
number for this particular parallel sampling
activity - Also at the instant the current time base is
synthesized, it, along with the sample number,
initial time base indicator and sampling
interval, are written to the capture time base
history table (NB This table forms the
time-series frame of reference for the parallel
capture activities of the RT MDA database system)
11Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Fast Insertion Mechanism for Capturing MDA Table
Output - The quantity of data output from MDA tables in a
short period of time can be quite significant.
For example, a trial deployment of RT MDA V1.0
filled up 3 GB of disk in only one hour. There
was only moderate activity on this target system,
sampling intervals were 15 seconds, and only 4
MDA tables were being monitored - Consequently, the following techniques were
employed to ensure that capture database table
insertion was as fast as possible - Most RT MDA tables are defined to use allpage
locking, with SQL command retry used to recover
from non-clustered index deadlocks - All DML to the capture or archive databases is
done through highly-optimized (TransactSQL)
stored procedures - Surrogate key updating of the control table is
done once, at the end of the set insert composite
transaction - Minimally logged insert commands (select into)
are used to insert data into the tempdb staging
tables - Set insert commands are used to insert the
staging table rows into the respective capture
database tables - Tempdb has been configured on each database
server that is being monitored, as having both a
bound named data cache and a transaction log
hosted on RAM disk
12Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- 2NF Normalization and Surrogate Key Substitution
of Foreign Keys - As discussed earlier, the volume of information
that needs to be stored when building a database
repository from sampled MDA table data, can be
significant - Hence, for V1.1 of the RT MDA database system, an
archive database was introduced, along with a
parallel time base synchronization mechanism - This archive database contains 2NF versions of
the data tables of the capture database, along
with associated domain tables - The domain tables contain the original column
data as captured from the MDA tables, along with
associated surrogate keys - Part of the parallel archiving process involves
substitution of the original MDA column data with
the associated surrogate key. These surrogate
keys now act as foreign keys for the 2NF data
tables of the archive database - The space savings accomplished by the above
strategy are substantial. Currently, RT MDA V1.1
can handle 7x24 capture and archiving with up to
30 parallel channels on a 15 second cyclical
sampling duration and a rotating 8 day archiving
window with minimal invasiveness to the host OLTP
system and with modest disk needs (3GB capture
db, 6GB archive db)
13Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Stored Procedure Generation from Specification
Files - Given the performance and non-invasive
requirements for the RT MDA database system, it
is mandatory that the capture, archive,
normalization and purge operations be performed
via stored procedure executions - However, the tedious and error-prone nature with
manually encoding these stored procedures
necessitated a more accurate and efficient
process for generating these database objects - Thus initially, the manual encoding process was
performed for a representative MDA table, and for
capture, archive, normalization and purge
activities - Then, the finished set of four stored procedures
was parameterized and encoded as three perl
template programs (NB archive and normalization
activities were merged together) - Three configuration files were then constructed,
with each non-blank line of the files containing
the parameter information for a particular stored
procedure
14Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Stored Procedure Generation from Specification
Files (continued) - The above mentioned perl programs were next run
with the respective configuration files, thereby
producing the deployment source code for the RT
MDA database system stored procedures - Installation and unit testing of the generated
stored procedures was then carried out on a
development system - Finally, the tested stored procedures were
deployed on target production database servers
and carefully monitored for possible malfunction
15Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Restart of Capture or Archiving DML After
Deadlock Victim Termination - Given that most tables in the RT MDA capture
database are of allpage locking (for performance
reasons), the inclusion of deadlock retry logic
is mandatory due to the distinct possibility of
non-clustered index deadlocks occurring during
normal operation - Initial implementation involved using a C or
sybperl program per channel to re-submit the RT
MDA stored procedure for execution automatically
upon detection of the deadlock. However, this
approach was abandoned due to logistic reasons
(i.e. installing system software on multiple
production systems requires significant
commitment from outside groups for any large
organization) - Current implementation involves a perl program
per channel performing re-submittal of a shell
script for execution automatically upon detection
of the deadlock. The shell script in turn calls
up the RT MDA stored procedure via isql - Future plans involve adapting Jean-Paul Martins
asemon-logger Java program to provide the same
functionality that was initially implemented to
handle the non-clustered deadlock issue
16Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Visualization of Archived MDA Performance Data
- Current methodology is simply applying Microsoft
Excels charting module to RT MDA archive data
extracted using custom visualization views - Major limitation of this approach is that there
is no easy way to prevent the charting module
from auto scaling the chart axes (especially the
Y-axis), thus distorting the relevancy of the
visualization - Advantages of this approach include easy
auto-filtering of column data, easy chart type
composition, and fast charting performance (when
the chart data is taken from a flat file rather
than dynamically through a database connection) - A recent discovery has been the existence of
Jean-Paul Martins asemon Java GUI program for
display of MDA table data in a tabular format.
Consequently, the current focus for visualization
has been temporarily shifted to adaptation of
this program to access RT MDA capture and archive
databases (rather than further development of the
Microsoft Excel approach) - Three visualization examples have been included
in this presentation to show the effectiveness of
the Microsoft Excel approach to date
17Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Visualization Example 1 (disk activity)
18Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Visualization Example 2 (engine activity)
19Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Visualization Example 3 (buffer pool activity)
20Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Visualization Example 3 (buffer pool activity,
continued)
21Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Futures (in no particular order)
- Adaptation of the RT MDA database system to
utilize other data mining/BI GUI tools (such as
Impromptu) - Adaptation of Jean-Paul Martins asemon Java GUI
program to visualize data contained in both the
RT MDA capture and archive databases - Adaptation of Jean-Paul Martins asemon-logger
Java program to utilize the barrier
synchronization and capture/archive stored
procedures of the RT MDA database system - Implementation of missing value simulation and
time axis normalization in the RT MDA archive
database - Design and implementation of event notification
triggers in the RT MDA capture database. It is
intended to integrate these notification triggers
into the resource group failover mechanism of a
well-known hardware vendors cluster computing
system - Integration of Replication Server into the RT MDA
database, so that the RT MDA archive,
normalization and purge operations can be handled
by DSI invocation of custom function strings - Integration of Replication Server 12.5
performance counters into the RT MDA database
system
22Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- References
- Whats new in ASE 12.5.0.3 Sybase Incorporated
(the original reference for MDA Tables
functionality) - http//www.sypron.nl (Rob Verschoors Web site
more information of MDA Tables setup and usage) - http//www.sybase.com/developer/codexchange
(source code for Jean-Paul Martins asemon and
David Owens sybmon Java GUI monitoring programs
are available at this Web site) - Tips, Tricks Recipes for Sybase ASE Rob
Verschoor (where some components of the fast
insertion mechanism for MDA table data capture
were initially discussed) - Predictive Data Mining, A Practical Guide
Sholom M. Weiss, Nitin Indurkhya (background
information on data mining techniques) - Data Mining and Statistical Analysis Using SQL
Robert P. Trueblood, John N. Lovett, Jr. (data
mining statistical analysis queries using
Transact-SQL)
23Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
- Questions and Answers
- We have 10 minutes for QA. Please feel free to
ask. Thank you - I will defer the difficult questions for when the
10 minute period has expired (your schedule
permitting) - Any questions that I cannot answer I will try to
do so later via email, time permitting. Please be
so kind as to leave me your email address in this
case - My email address is wongjtf_at_aol.com. If you still
have questions regarding this presentation,
kindly send me a note and I will respond as soon
as I can. Thanks
24Building a Real-Time Database Performance
Repository Using the Sybase MDA Tables
Jeffrey Wong Consultant, Anton Ventures wongjtf_at_ao
l.com August 17, 2004