Title: Performance Management Framework
1Performance Management Framework
2Topics
- Overview What is it, Why have it Where has it
been, Where does it fit? - Fundamentals What are the underlying principles
and metrics? - Architecture Data collection (Observation,
Analysis, Action) - Instrumentation How are performance
measurements collected? - Analysis and Reporting How are events analyzed
What are the reporting patterns? - Alerts, Leveraging ROC Concepts and Next Steps -
3Overview - What is it?,
- Performance Engineering and Management is
the ability to ensure that applications will be
designed to meet their response time and
throughput requirements, and when in production
continue to do so.
4Fundamentals
Concepts and Metrics
Infrastructure
Net. Facilities
Resource
Application
Load/Usage
Users
Performance
DB
OS
Methods
Quality
Response Time
Stability
Net.Components
5Instrumentation
- Measurements
- No Load (SoftProbes)
- Intrusive vs. Non Intrusive
- Passive vs. Active
- Data Mining
- Application Logs
- System Logs
6Architecture
- Throughout the development lifecycle, performance
statistics are asynchronously collected, analyzed
and used to influence design and implementation
decisions.
Training (Alert Rules)
7Instrumentation
Intrusive Application Instrumentation
(Performance Framework)
- GUI
- Business context, user, workstation, method,
class - Application rules
- Specific method invocations
- 3rd party/calls to external modules and/or
systems - Data base access
- Selects, inserts, update, deletes, stored
procedures, DDL commands
8Instrumentation
Performance Framework Class Structure
(Highlighting Alert Pattern Classes)
9Performance Repository
Event attributes Begin timestamp End
timestamp Latency (ms) Session_id
10Performance Framework Benchmark Stats
Response time Rt_1 felt by application Rt_2
internal posting
11Analysis and Reports
12Analysis and Reports
Transaction Summary (10 bucket report)
13Analysis and Reports
If you cannot see the problemyou cannot fix it!
Arrival Rates and Response Times
Arrival Rate and Concurrency
14Analysis and Reports
Denial of Service Attack (Day 1)
15Analysis and Reports
Denial of Service Attack (Day 2)
16What are Alerts and why do we need them?
- The ability for an application to assess when it
cant perform its functions correctly or to meet
service levels, and thenreport the failures to
someone who cares. - If an application is sick and cant perform some
or all of its functions, what is a better way
(fast and precise) to be notified than having the
application tell you exactly whats wrong.
You need to walk before you can run
17Todays Application Alert Architecture (outside
looking in)
- Device Monitors for Servers via OpenView
- Application http/https Monitors via ISM
Device Monitoring OV
Event driven
Applications and Infrastructure Devices
CIC
Application Monitoring (ISM)
polling
18Tomorrows Alert Architecture (inside looking
out)
- Application Problem Determination as Presented in
2002/2003
Device Monitoring OV
Event driven
Application Alerts via OV using SNMP traps
Applications and Infrastructure Devices
CIC
Application Monitoring (ISM)
polling
19Tomorrows Alert Architecture (inside looking
out)
- Application Problem Determination as suggested
using whats in place today!
Device Monitoring OV
Event driven
Root Cause Analysis
Application Alerts via OV
Applications and Infrastructure Devices
CIC
Corrective action
Application Monitoring (ISM)
polling
20Architecture
- Throughout the development lifecycle, performance
statistics are asynchronously collected, analyzed
and used to influence design and implementation
decisions.
Training (alert Rules)
21Performance Frameworks Alert Functionality
- Performance Framework has been integrated into
the banks Application-WebSphere Framework so
that all applications that use it are being
monitored for response time, throughput, quality
and stability. The Performance Framework is
currently being re-written for the banks
Application-.NET Framework. - The main purpose of the performance framework is
to instrument applications so that performance
related statistics can be measured and
subsequently analyzed. As a by-product of data
collection, real time analysis software was added
in 2002 to identify when the target application
is not functioning as designed or within
performance tolerances. - We chose not to implement it in 2002/2003 because
the banks implementation of the problem
management software was not sophisticated enough
to assist in root cause analysis prior to
involving an operator. Raw alerts would have
overwhelmed the problem management process at the
CIC.
22Performance Frameworks Functionality
- Alert Types
- Response Time Are transactions (method
invocations) meeting there expected latency? - Stability (throughput) Is the Application
processing transactions at the expected volume
and throughput? - Quality Are the transactions (method
invocations) error free, if not what are the
errors?
23Performance Frameworks Functionality
- Alert Rule (Attributes)
- FORMAT
- Alarm_Name (String any unique label)
- Alarm_Type (String Latency, error, stability)
- Alarm_Layer (integer layer identifier)
- Alarm_Method_Name (String Optional)
- Alarm_Days (N1-N2 - where Sunday 1 and
Saturday 7) - Alarm_Times (t1-t2 - range 1-24 hrs)
- Sample_Size (gt1, lt15)
- Alarm_Threshold (integer - this will be used as
min Arrivals for Stability Alarms) - Alarm_Forgiveness (integer)
- Alarm_Message (Text, white space allowed)
-
- The Format will be in order (top to bottom)
delimited with a comma - stab_1,stability,1,null,1-7,1-24,3,101,0.0,This
is a stability alarm - err_1,error,1,null,2-6,7-20,2,0,5.0,Errors
Greater than 5 pct
24Performance Frameworks Alert Functionality
Architecture Layers
25Components
26Analysis Needs to Collaborate Between Components
27What is needed to use it?
- Real-time Alert Analysis
- Collaboration between Rule-Types
- Is a stability alert real, or is it the
by-product of a latency problem? - Is a response time alert real, or is it the
by-product of an error alert? - Are any latency or stability alerts real, or are
they pointing in the direction of the root cause?
28What is needed to use it?
- Real-time Alert Analysis (continued)
- Collaboration between components
- Multi column applications need to isolate
underlying infrastructure failures. When a
subset of app columns deliver slow response time
is the mutual failure in the network or the
mainframe? - When a group of applications deliver slow
response time while others are OK, is the mutual
failure in the network or the mainframe, or ?
29What is needed to use it?
- USBank Application Framework services that
supports Alert Analysis and Communication
Root Cause Analysis
30Next Steps
- Determine whether or not the Performance
Frameworks approach is directional for problem
determination. - Determine the requirements for a robust Alert
Analysis and Recovery process. - Determine the role that the banks application
framework should provide support services. - Determine whether to buy or build a strategic
Analysis solution.