Title: Machine Learning in Performance Management
1Machine Learning in Performance Management
- Irina Rish
- IBM T.J. Watson Research Center
- January 24, 2001
2Outline
- Introduction
- Machine learning applications in Performance
Management - Bayesian learning tools extending ABLE
- Advancing theory
- Summary and future directions
3Learning problems examples
Pattern discovery, classification,
diagnosis and prediction
4Approach Bayesian learning
Bayesian networks
Learn (probabilistic) dependency models
P(S)
P(BS)
P(CS)
Pattern classification P(classdata)?
P(XC,S)
P(DC,B)
Prediction P(symptomcause)?
Diagnosis P(causesymptom)?
Numerous important applications
- Medicine
- Stock market
- Bio-informatics
- eCommerce
- Military
-
5Outline
- Introduction
- Machine-learning applications in Performance
Management - Transaction Recognition
- In progress Event Mining
- Probe Placement etc.
- Bayesian learning tools extending ABLE
- Advancing theory
- Summary and future directions
6End-User Transaction Recognition why is
it important?
?
End-User Transactions (EUT)
Remote Procedure Calls (RPCs)
Session (connection)
Server (Web, DB, Lotus Notes)
Client Workstation
Examples Lotus Notes, Web/eBusiness (on-line
stores, travel agencies, trading) database
transactions, buy/sell, search, email, etc.
- Realistic workload models (for testing
performance) - Resource management (anticipating requests)
- Quantifying end-user perception of performance
(response times)
7Why is it hard? Why learn from data?
Example EUTs and RPCs in Lotus Notes
8Our approach Classification Segmentation
(similar to text classification)
(similar to speech understanding,
image segmentation)
9How to represent transactions? Feature
vectors
10Classification scheme
11 Our classifier naïve Bayes (NB)
Simplifying (naïve) assumption feature
independence given class
2. Classification given
(unlabeled) instance , choose most
likely class
12 Classification results on Lotus CoC data
NB Bernoulli, mult. or geom.
NB shifted geom.
Accuracy
Baseline classifier Always selects
most- frequent transaction
Training set size
- Significant improvement over baseline
classifier (75) - NB is simple, efficient, and comparable to the
state-of-the-art classifiers - SVM 85-87, Decision Tree 90-92
- Best-fit distribution (shift. geom) - not
necessarily best classifier! (?)
13 Transaction recognitionsegmentation
classification
Dynamic programming (Viterbi search)
(Recursive) DP equation
Naive Bayes classifier
14Transaction recognition results
Accuracy
Training set size
- Good EUT recognition accuracy 64 (harder
problem than classification!) - Reversed order of results best classifier -
not necessarily best recognizer! (?)
further research!
15EUT recognition summary
- A novel approach learning EUTs from RPCs
- Patent, conference paper (AAAI-2000), prototype
system - Successful results on Lotus Notes data (Lotus
CoC) - Classification naive Bayes (up to 87 accuracy)
- EUT recognition ViterbiBayes (up to 64
accuracy) - Work in progress
- Better feature selection (RPC subsequences?)
- Selecting best classifier for segmentation task
- Learning more sophisticated classifiers (Bayesian
networks) - Information-theoretic approach to segmentation
(MDL) -
16Outline
- Introduction
- Machine-learning applications in Performance
Management - Transaction Recognition
- In progress Event Mining
- Probing Strategy
etc. - Bayesian learning tools extending ABLE
- Advancing theory
- Summary and future directions
17Event Mining analyzing system event sequences
Events from hosts
- Example USAA data
- 858 hosts, 136 event types
- 67184 data points (13 days, by sec)
- Event examples
- High-severity events
- 'Cisco_Link_Down,
- 'chassisMinorAlarm_On, etc.
- Low-severity events
- 'tcpConnectClose, 'duplicate_ip, etc.
Time (sec)
181. Learning event dependency models
- Current approach
- learn dynamic probabilistic graphical models
- (temporal, or dynamic Bayes nets)
- Predict
- time to failure
- event co-occurrence
- existence of hidden nodes root causes
- Recognize sequence of high-level system states
- unsupervised version of EUT recognition
problem
???
Event1
EventM
Event N
Event2
Important issue incremental learning from data
streams
192. Clustering hosts by their history
Problematic hosts
Silent hosts
- group hosts w/ similar event sequences what is
appropriate similarity (distance) metric? One
example - e.g., distance between compressed sequences
event distribution models
20Probing strategy (EPP)
- Objectives find probe frequency F that minimizes
- E (Tprobe-Tstart) - failure detection, or
- E( total failure time total estimated
failure time) - - gives accurate performance estimate
- Constraints on additional load induced by probes
L(F) lt MaxLoad
21Outline
- Introduction
- Machine-learning applications in Performance
Management - Bayesian learning tools extending ABLE
- Advancing theory
- Summary and future directions
22 ABLE Agent Building and Learning Environment
23What is ABLE? What is my contribution?
- A JAVA toolbox for building reasoning and
learning agents - Provides visual environment, boolean and fuzzy
rules, neural networks, genetic search - My contributions
- naïve Bayes classifier (batch and incremental)
- Discretization
- Future releases
- General Bayesian learning and inference tools
- Available at
- AlphaWorks www.alphaWorks.ibm.com/tech
- Project page w3.rchland.ibm.com/projects/ABLE
24How does it work?
25Who is using Naïve Bayes tools?Impact on other
IBM projects
- Video Character Recognition
- (w/ C. Dorai)
- Naïve Bayes 84 accuracy
- Better than SVM on some pairs of characters
(aver. SVM 87) - Current work combining Naïve Bayes with SVMs
- Environmental data analysis
- (w/ Yuan-Chi Chang)
- Learning mortality rates using data on air
pollutants - Naïve Bayes is currently being evaluated
- Performance management
- Event mining in progress
- EUT recognition successful results
26Outline
- Introduction
- Machine-learning in Performance Management
- Bayesian learning tools extending ABLE
- Advancing theory
- analysis of naïve Bayes classifier
- inference in Bayesian Networks
- Summary and future directions
27Why Naïve Bayes does well? And when?
Class-conditional feature independence
Unrealistic assumption! But why/when it works?
True
P(classf)
NB estimate
When independence assumptions
do not hurt classification?
28Case 1 functional dependencies
Lemma 1 Naïve Bayes is optimal when features
are functionally dependent
given class
Proof
29Case 2 almost-functional (low-entropy)
distributions
- Lemma 2 Naïve Bayes is a good approximation
- for almost-functional
dependencies
Formally
- Related practical examples
- RPC occurrences in EUTs often almost-deterministi
c (and NB does well) - Successful local inference in
almost-deterministic Bayesian networks (Turbo
coding, mini-buckets see DechterRish 2000)
30Experimental results support theory
Random problem generator uniform P(class)
random P(fclass) 1. A randomly selected entry
in P(fclass) is assigned 2. The rest of entries
uniform random sampling normalization
- Less noise (smaller )
- gt NB closer to optimal
2. Feature dependence does NOT correlate with NB
error
31Outline
- Introduction
- Machine-learning in Performance Management
- Transaction Recognition
- Event Mining
- Bayesian learning tools extending ABLE
- Advancing theory
- analysis of naïve Bayes classifier
- inference in Bayesian Networks
- Summary and future directions
32From Naïve Bayes to Bayesian Networks
Naïve Bayes model independent features given
class
Bayesian network (BN) model Any joint
probability distributions
P(S, C, B, X, D)
P(S) P(CS) P(BS) P(XC,S) P(DC,B)
Query P (lung canceryes smokingno,
dyspnoeayes ) ?
33Example Printer Troubleshooting (Microsoft
Windows 95)
Heckerman, 95
34How to use Bayesian networks?
Diagnosis P(causesymptom)?
Prediction P(symptomcause)?
MEU Decision-making (given utility function)
NP-complete inference problems
Approximate algorithms
35Local approximation scheme Mini-buckets (paper
submitted to JACM)
- Idea
- reduce complexity of inference by ignoring
some dependencies - Successfully used for approximating Most Probable
Explanation - Very efficient on real-life (medical, decoding)
and synthetic problems
Less noise gt higher accuracy similarly to
naïve Bayes!
Approximation accuracy
General theory needed Independence assumptions
and almost-deterministic distributions
noise
Potential impact efficient inference in complex
performance management models (e.g., event
mining, system dependence models)
36Summary
- Performance management
- End-user transaction recognition (Lotus CoC)
- novel method, patent, paper applied to Lotus
Notes - In progress event mining (USAA), probing
strategies (EPP)
- Machine-learning tools (alphaWorks)
- Extending ABLE w/ Bayesian classifier
- Applying classifier to other IBM projects
- Video character recognition
- Environmental data analysis
- Theory and algorithms
- analysis of Naïve Bayes accuracy (Research
Report) - approximate Bayesian inference (submitted
paper) - patent on meta-learning
37Future directions
Research interest
Automated learning and inference
Practical Problems
Theory
Generic tools
38Collaborations
- Transaction recognition
- J. Hellerstein, T. Jayram (Watson)
- Event Mining
- J. Hellerstein, R. Vilalta, S. Ma, C. Perng
(Watson) - ABLE
- J. Bigus, R. Vilalta (Watson)
- Video Character Recognition
- C. Dorai (Watson)
- MDL approach to segmentation
- B. Dom (Almaden)
- Approximate inference in Bayes nets
- R. Dechter (UCI)
- Meta-learning
- R. Vilalta (Watson)
- Environmental data analysis
- Y. Chang (Watson)
39Machine learning discussion group
- Weekly seminars
- 1130-230 (w/ lunch) in 1S-F40
- Active group members
- Mark Brodie, Vittorio Castelli, Joe Hellerstein,
Daniel Oblinger, - Jayram Thathachar, Irina Rish (more people joint
recently) - Agenda
- discussions of recent ML papers, book chapters
- (Pattern Classification by Duda, Hart, and
Stork, 2000) - brain-storming sessions about particular ML
topics - Recent discussions accuracy of Bayesian
classifiers (naïve Bayes) - Web site
- http//reswat4.research.ibm.com/projects/mlreading
group/mlreadinggroup.nsf/main/toppage