Title: Application Performance Monitoring One Approach
1Application Performance MonitoringOne Approach
- John Slobodnik
- April 18, 2006
- 130 p.m.
- CMG Canada
2Introduction of Product Suite
- ServerVantage
- ApplicationVantage
- ClientVantage
- VantageAnalyzer
- VantageView
3ServerVantage (SV)
- Collects server level data.
- Multiplatform Windows, LINUX, UNIX, etc.
- CPU, memory, disk, network out of the box.
- Collects application level data.
- Oracle, SQL server, WebLogic, IPlanet, LDAP, etc.
- One SV agent installed on each client server.
- Runs most of the time.
- Customized counters (metrics) can be written.
4ApplicationVantage (AV)
- A sniffer.
- Agent-based application analysis of packet level
communications. - Gathers all traffic that passes through the
Network Interface Cards (NICs). - Can merge the data together from multiple
servers. - Can trace, for example, SQL server traffic.
- One AV agent is installed per client server.
- Turned on when required.
- Most often in firefighting mode.
5ClientVantage (CV)
- Gathers data on the performance of your
application. - Done through timings of synthetic business
transactions on CV workstations (robots). - Scripting of business transactions done with a
tool called QARun. - We are doing active monitoring.
- There are two other options available here now
- Passive monitoring using CV
- A hardware-based solution
6Vantage Analyzer for J2EE (VA)
- J2EE (Java) based tool to help pinpoint exact
locations of code-level performance problems. - Locates slow methods, SQL statements and
transactions. - The VA agent runs inside your Application servers
JVM obtaining performance metrics using Byte Code
Instrumentation. - Data sent in real-time to nucleus server where it
is stored and distributed to VA performance
consoles. - Supports WebLogic, Websphere, etc.
- Customized component
- Allows a transaction to be followed in
VantageView.
7VantageView (VV)
- Web-based portal for viewing SV, CV, AV and VA
data for monitoring and reporting. - Accessing information from the Vantage suite of
tools, VantageView users check the status of
clients, servers and networks from their intranet
while also providing a near-time service level
perspective on application availability and
performance. - The flexibility of VantageView enables different
levels of users to view pertinent information for
easy problem determination and resolution. - Customized counters (metrics) can be created in
the VV database.
8(No Transcript)
9A Few Easy Setup Steps
- A summary of the steps to implement the solution
- Install the agents.
- Complete Administration
- Set Preferences
- Management
- Create tasks and apply blackout schedules.
- Create monitoring views.
- Create reports.
- Optional steps taken
- Create dashboards.
- Create custom counters (metrics).
10Install the Agents
- This is a quick, procedural task that is quick to
complete. - A script is run to do the install followed by any
applicable patches. - The product keeps track of the level of agent
installed on each server in a central repository. - SV and AV agents are installed on each server
(Window, UNIX, LINUX)
11Administration Configure Databases
Set up the online database(s).
12Configure Historical Database
Define the historical database. We keep 3 months
of data online. All else goes to historical
database.
13Control Server Configuration
Set up the control servers.
14Define Users
Define VV user profiles.
15Preferences
Business applications 4 applications.
Business locations Various Canadian cities.
Business transactions An application (29
transactions) is broken down into 3 transaction
groupings (14, 7, 8).
Server groupings Production, pre-production,
support, third-party, etc.
16Management - Create Tasks
Create a new task.
17Create Tasks
Select the type of server Windows, UNIX, etc.
18Create Tasks
Select the counters you wish to see.
19Create Tasks
Add a rule for alerting.
20Create Tasks
Set up alerts if you want them. For example
System Thrashing, TCP Connectivity lost from WL
to WL layer, CPU gt 90, etc.
21Create Tasks
Alert notification via pager, email, SNMP,
etc. Different audiences for different tasks,
DBAs, App. Support, etc.
If you can do it from a command line it can be
automated here shell scripts, bat
files. Perform an action based on a threshold
being breached. (1) Kick off a WL thread dump
based on a WL counter below a certain level. (2)
Send an alert based on an ASCII pattern
match. (3) Previous problems can be proactively
addressed with this type of instrumentation. We
examine WebLogic logs
22Create Tasks
Select the appropriate data sampling
interval. Key to the size of your database.
23Create Tasks
Start the task.
24Management Blackout Schedule
- Apply a blackout schedule, if applicable.
- ServerVantage agents do not run when the
application is down daily. - Client Vantage robots are also set up to run on
a blackout schedule. - Implemented through CV which uses the Windows
scheduler.
25Create Monitoring Views
- Monitoring views contain all data points.
- Flexible you can plot many different metrics on
the same chart. - Business metric vs. server performance.
- Application metrics vs. server metrics.
- TeeChart Editor gives you Excel chart type
functionality to modify the look of the chart.
26Monitoring View
Saved as a permanent monitoring view report.
27Monitoring View
28Monitoring (ad hoc)
Can drill into data point.
29Drill into IDP
Intelligent Data Point (IDP)
30Create Reports
- Reports contain different levels of data
summarization. - From all data points to daily average.
- We have created 12 hour, 2 day, 1 week and 1
month views of all reports. - Flexible you can plot many different metrics on
the same chart. - TeeChart Editor gives you Excel chart type
functionality to modify the look of the chart.
31Create Reports
Select the metric source.
32Create Reports
Select the metric(s) desired.
33Create Reports
Select the time range.
34Create Reports
Select the display format.
35Create Reports
Schedule the report.
36Create Reports
Save the report.
37Reports
38Reports
39Reports
40Then the Business asked
- How can we prove that the API calls are
performing better? - Custom program installed on WL servers.
- Gathers API call response time data, converts it
to a local CSV file, FTP to VV database. - API Response Time report created, queries VV DB.
- APIs split between internal vs. outsourced (for
reporting purposes). - There are a number of activities within each bean
conversation.
41API Response Time Report
Sample bean conversation report.
42Then Management said
- We need to have a some different dashboard views.
- Each level of dashboard gets more detailed.
- Special dashboard for outsourced infrastructure.
- Dashboards were created using the integrated
VISIO (Vantage Visualizer) piece of the product.
43Management Dashboard
44Drill down to Application Availability
45Application Availability (bottom)
46Drill down to Heat Chart
47Drill down to CNS report
48Drill down to Application Scorecard
49Application Scorecard (bottom)
50Drill down to Transaction Scorecard
51Drill down to Performance Summary
52Drill down to Orders Report
53Drill down to Session Current Count report
54Drill down to WL Serviced Requests report
55(No Transcript)
56Geographic Dashboard
57We asked ourselves
- How can we make this easier to support?
- Customized metrics can be created in VV or SV.
- Make non-standard types of metrics available.
- Samples of some of the customization created
- Disk usage of SV logs files directory.
- Automated removal of SV log files.
- Automate push of patches to all agents.
- Send a command to run on a server and return the
result. - Count the number of SV datafiles.
- Agent restart.
- Gather SV log files.
58More Customization
- TCP Connection test from WL layer to WL layer.
- Number of Orders.
- SQL query to xml to csv to VV DB.
- ASCII file pattern match in WL logs (3).
- Automatic thread dumps WebLogic.
- Average Elapsed Time
- Customer purchase at store experience.
- Individual transaction timings is CV, adding them
up is custom. - Network Test / TCP Connection Test
- Traceroute response time for up to 10 hops
alert. - API Response time monitoring.
- Average, max, min, std dev
59Network Connection Test
60Vantage Analyzer
- Installed on production WebLogic servers during
the peak annual sales period. - Now in the pre-prod environment.
- So bugs can be found before promoting new code to
production.
61J2EE JavaScape Paints a landscape view of your
J2EE environment. This view displays component
interactions between JSPs, Servlets and Web
services, Session, Entity and Message-driven
Beans, as well as database usage.
62Transaction Explorer The tree is organized by the
largest consumers, from top to bottom. The tree
can be sorted by the CPU or Transaction time
period.
63Transaction Scope Gives a detailed view on each
individual transaction which runs through your
application.
Stalled Threads Shows thread-level detail of a
transaction.
64Method HotSpots Identifies the biggest consumers
in your application. The view can be sorted by
Transaction or CPU time.
65SQLyzer HotSpots Lets you pinpoint the largest
SQL consumers.
SLA Monitoring This view displays pre-configured
SLA rules and when they were last violated.
66Memory HotSpots Locate memory leaks as well as
memory allocation hot spots to help assist with
server availability and performance.
67Summary
- Management extremely pleased.
- Customized dashboards, peak period success, want
more applications instrumented. - Business application ran almost 99.9
availability during peak processing period of the
year in large part due to this solution. - Now instrumented to be more proactive than in the
past. - Being used as a model for the rest of the
enterprise. - Support teams have embraced the solution because
it makes their lives easier. - DBAs, application support, system
administrators, performance and capacity
planners, etc. - Significantly less time wasted determining whose
problem it is (you know, 6 teams in a room)
during fire-fighting.
68- John.Slobodnik_at_bell.ca
- (905) 282-3342