Machine Learning in Performance Management

About This Presentation

Title:

Machine Learning in Performance Management

Description:

Classification naive Bayes (up to 87% accuracy) ... Lemma 2: Na ve Bayes is a 'good approximation' for 'almost-functional' dependencies ... – PowerPoint PPT presentation

Number of Views:151

Avg rating:3.0/5.0

Slides: 40

Provided by: ibm76

Category:

more less

Transcript and Presenter's Notes

Title: Machine Learning in Performance Management

1
Machine Learning in Performance Management

Irina Rish
IBM T.J. Watson Research Center
January 24, 2001

2
Outline

Introduction
Machine learning applications in Performance
Management
Bayesian learning tools extending ABLE
Advancing theory
Summary and future directions

3
Learning problems examples
Pattern discovery, classification,
diagnosis and prediction
4
Approach Bayesian learning
Bayesian networks
Learn (probabilistic) dependency models
P(S)
P(BS)
P(CS)
Pattern classification P(classdata)?
P(XC,S)
P(DC,B)
Prediction P(symptomcause)?
Diagnosis P(causesymptom)?
Numerous important applications

Medicine
Stock market
Bio-informatics
eCommerce
Military

5
Outline

Introduction
Machine-learning applications in Performance
Management
Transaction Recognition
In progress Event Mining
Probe Placement etc.
Bayesian learning tools extending ABLE
Advancing theory
Summary and future directions

6
End-User Transaction Recognition why is
it important?
?
End-User Transactions (EUT)
Remote Procedure Calls (RPCs)

Session (connection)
Server (Web, DB, Lotus Notes)
Client Workstation
Examples Lotus Notes, Web/eBusiness (on-line
stores, travel agencies, trading) database
transactions, buy/sell, search, email, etc.

Realistic workload models (for testing
performance)
Resource management (anticipating requests)
Quantifying end-user perception of performance
(response times)

7
Why is it hard? Why learn from data?
Example EUTs and RPCs in Lotus Notes
8
Our approach Classification Segmentation
(similar to text classification)
(similar to speech understanding,
image segmentation)
9
How to represent transactions? Feature
vectors
10
Classification scheme
11
Our classifier naïve Bayes (NB)
Simplifying (naïve) assumption feature
independence given class
2. Classification given
(unlabeled) instance , choose most
likely class
12
Classification results on Lotus CoC data
NB Bernoulli, mult. or geom.
NB shifted geom.
Accuracy
Baseline classifier Always selects
most- frequent transaction
Training set size

Significant improvement over baseline
classifier (75)
NB is simple, efficient, and comparable to the
state-of-the-art classifiers
SVM 85-87, Decision Tree 90-92
Best-fit distribution (shift. geom) - not
necessarily best classifier! (?)

13
Transaction recognitionsegmentation
classification
Dynamic programming (Viterbi search)
(Recursive) DP equation
Naive Bayes classifier
14
Transaction recognition results
Accuracy
Training set size

Good EUT recognition accuracy 64 (harder
problem than classification!)
Reversed order of results best classifier -
not necessarily best recognizer! (?)

further research!
15
EUT recognition summary

A novel approach learning EUTs from RPCs
Patent, conference paper (AAAI-2000), prototype
system
Successful results on Lotus Notes data (Lotus
CoC)
Classification naive Bayes (up to 87 accuracy)
EUT recognition ViterbiBayes (up to 64
accuracy)
Work in progress
Better feature selection (RPC subsequences?)
Selecting best classifier for segmentation task
Learning more sophisticated classifiers (Bayesian
networks)
Information-theoretic approach to segmentation
(MDL)

16
Outline

Introduction
Machine-learning applications in Performance
Management
Transaction Recognition
In progress Event Mining
Probing Strategy
etc.
Bayesian learning tools extending ABLE
Advancing theory
Summary and future directions

17
Event Mining analyzing system event sequences
Events from hosts

Example USAA data
858 hosts, 136 event types
67184 data points (13 days, by sec)
Event examples
High-severity events
'Cisco_Link_Down,
'chassisMinorAlarm_On, etc.
Low-severity events
'tcpConnectClose, 'duplicate_ip, etc.

Time (sec)
18
1. Learning event dependency models

Current approach
learn dynamic probabilistic graphical models
(temporal, or dynamic Bayes nets)
Predict
time to failure
event co-occurrence
existence of hidden nodes root causes
Recognize sequence of high-level system states
unsupervised version of EUT recognition
problem

???
Event1
EventM
Event N
Event2
Important issue incremental learning from data
streams
19
2. Clustering hosts by their history
Problematic hosts
Silent hosts

group hosts w/ similar event sequences what is
appropriate similarity (distance) metric? One
example
e.g., distance between compressed sequences
event distribution models

20
Probing strategy (EPP)

Objectives find probe frequency F that minimizes
E (Tprobe-Tstart) - failure detection, or
E( total failure time total estimated
failure time) -
gives accurate performance estimate
Constraints on additional load induced by probes
L(F) lt MaxLoad

21
Outline

Introduction
Machine-learning applications in Performance
Management
Bayesian learning tools extending ABLE
Advancing theory
Summary and future directions

22

ABLE Agent Building and Learning Environment
23
What is ABLE? What is my contribution?

A JAVA toolbox for building reasoning and
learning agents
Provides visual environment, boolean and fuzzy
rules, neural networks, genetic search
My contributions
naïve Bayes classifier (batch and incremental)
Discretization
Future releases
General Bayesian learning and inference tools
Available at
AlphaWorks www.alphaWorks.ibm.com/tech
Project page w3.rchland.ibm.com/projects/ABLE

24
How does it work?
25
Who is using Naïve Bayes tools?Impact on other
IBM projects

Video Character Recognition
(w/ C. Dorai)
Naïve Bayes 84 accuracy
Better than SVM on some pairs of characters
(aver. SVM 87)
Current work combining Naïve Bayes with SVMs
Environmental data analysis
(w/ Yuan-Chi Chang)
Learning mortality rates using data on air
pollutants
Naïve Bayes is currently being evaluated
Performance management
Event mining in progress
EUT recognition successful results

26
Outline

Introduction
Machine-learning in Performance Management
Bayesian learning tools extending ABLE
Advancing theory
analysis of naïve Bayes classifier
inference in Bayesian Networks
Summary and future directions

27
Why Naïve Bayes does well? And when?
Class-conditional feature independence
Unrealistic assumption! But why/when it works?
True
P(classf)
NB estimate

When independence assumptions
do not hurt classification?
28
Case 1 functional dependencies
Lemma 1 Naïve Bayes is optimal when features
are functionally dependent
given class
Proof
29
Case 2 almost-functional (low-entropy)
distributions

Lemma 2 Naïve Bayes is a good approximation
for almost-functional
dependencies

Formally

Related practical examples
RPC occurrences in EUTs often almost-deterministi
c (and NB does well)
Successful local inference in
almost-deterministic Bayesian networks (Turbo
coding, mini-buckets see DechterRish 2000)

30
Experimental results support theory
Random problem generator uniform P(class)
random P(fclass) 1. A randomly selected entry
in P(fclass) is assigned 2. The rest of entries
uniform random sampling normalization

Less noise (smaller )
gt NB closer to optimal

2. Feature dependence does NOT correlate with NB
error
31
Outline

Introduction
Machine-learning in Performance Management
Transaction Recognition
Event Mining
Bayesian learning tools extending ABLE
Advancing theory
analysis of naïve Bayes classifier
inference in Bayesian Networks
Summary and future directions

32
From Naïve Bayes to Bayesian Networks
Naïve Bayes model independent features given
class
Bayesian network (BN) model Any joint
probability distributions
P(S, C, B, X, D)
P(S) P(CS) P(BS) P(XC,S) P(DC,B)
Query P (lung canceryes smokingno,
dyspnoeayes ) ?
33
Example Printer Troubleshooting (Microsoft
Windows 95)
Heckerman, 95
34
How to use Bayesian networks?
Diagnosis P(causesymptom)?
Prediction P(symptomcause)?
MEU Decision-making (given utility function)
NP-complete inference problems
Approximate algorithms
35
Local approximation scheme Mini-buckets (paper
submitted to JACM)

Idea
reduce complexity of inference by ignoring
some dependencies
Successfully used for approximating Most Probable
Explanation
Very efficient on real-life (medical, decoding)
and synthetic problems

Less noise gt higher accuracy similarly to
naïve Bayes!
Approximation accuracy
General theory needed Independence assumptions
and almost-deterministic distributions
noise
Potential impact efficient inference in complex
performance management models (e.g., event
mining, system dependence models)
36
Summary

Performance management
End-user transaction recognition (Lotus CoC)
novel method, patent, paper applied to Lotus
Notes
In progress event mining (USAA), probing
strategies (EPP)

Machine-learning tools (alphaWorks)
Extending ABLE w/ Bayesian classifier
Applying classifier to other IBM projects
Video character recognition
Environmental data analysis

Theory and algorithms
analysis of Naïve Bayes accuracy (Research
Report)
approximate Bayesian inference (submitted
paper)
patent on meta-learning

37
Future directions
Research interest
Automated learning and inference
Practical Problems
Theory
Generic tools
38
Collaborations

Transaction recognition
J. Hellerstein, T. Jayram (Watson)
Event Mining
J. Hellerstein, R. Vilalta, S. Ma, C. Perng
(Watson)
ABLE
J. Bigus, R. Vilalta (Watson)
Video Character Recognition
C. Dorai (Watson)
MDL approach to segmentation
B. Dom (Almaden)
Approximate inference in Bayes nets
R. Dechter (UCI)
Meta-learning
R. Vilalta (Watson)
Environmental data analysis
Y. Chang (Watson)

39
Machine learning discussion group

Weekly seminars
1130-230 (w/ lunch) in 1S-F40
Active group members
Mark Brodie, Vittorio Castelli, Joe Hellerstein,
Daniel Oblinger,
Jayram Thathachar, Irina Rish (more people joint
recently)
Agenda
discussions of recent ML papers, book chapters
(Pattern Classification by Duda, Hart, and
Stork, 2000)
brain-storming sessions about particular ML
topics
Recent discussions accuracy of Bayesian
classifiers (naïve Bayes)
Web site
http//reswat4.research.ibm.com/projects/mlreading
group/mlreadinggroup.nsf/main/toppage

Write a Comment

User Comments (0)

About PowerShow.com

Machine Learning in Performance Management - PowerPoint PPT Presentation

Machine Learning in Performance Management

Classification naive Bayes (up to 87% accuracy) ... Lemma 2: Na ve Bayes is a 'good approximation' for 'almost-functional' dependencies ... – PowerPoint PPT presentation