ROC Statistics for the Lazy Machine Learner in All of Us presentation

About This Presentation

Title:

ROC Statistics for the Lazy Machine Learner in All of Us

Description:

Imagine you have 2 different probabilistic classification models ... How do you communicate your belief? ... Valencia, Spain. 2004. ... –

Number of Views:41

Avg rating:3.0/5.0

Slides: 24

Provided by: peopleVa

Category:

more less

Transcript and Presenter's Notes

Title: ROC Statistics for the Lazy Machine Learner in All of Us

1
ROCStatistics for the LazyMachine Learner in
All of Us

Bradley Malin
Lecture for COS Lab
School of Computer Science
Carnegie Mellon University
9/22/2005

2
Why Should I Care?

Imagine you have 2 different probabilistic
classification models
e.g. logistic regression vs. neural network
How do you know which one is better?
How do you communicate your belief?
Can you provide quantitative evidence beyond a
gut feeling and subjective interpretation?

3
Recall Basics Contingencies
4
Some Terms
5
Some More Terms
6
Accuracy

What does this mean?
What is the difference between accuracy and an
accurate prediction?
Contingency Table Interpretation
(True Positives) (True
Negatives)
(True Positives) (True Negatives)
(False Positives) (False Negatives)
Is this a good measure? (Why or Why Not?)

7
Note on Discrete Classes

TRADITION Show contingency table when reporting
predictions of model.
BUT probabilistic models do not provide
discrete calculations of the matrix cells!!!
IN OTHER WORDS Regression does not report the
number of individuals predicted positive (e.g.
has a heart attack) well, not really
INSTEAD report probability the output will be
certain variable (e.g. 1 or 0)

8
Visual Perspective
??
??
??
9
ROC Curves

Originated from signal detection theory
Binary signal corrupted by Guassian noise
What is the optimal threshold (i.e. operating
point)?
Dependence on 3 factors
Signal Strength
Noise Variance
Personal tolerance in Hit / False Alarm Rate

10
ROC Curves

Receiver operator characteristic
Summarize present performance of any binary
classification model
Models ability to distinguish between false
true positives

11
Use Multiple Contingency Tables

Sample contingency tables from range of
threshold/probability.
TRUE POSITIVE RATE (also called SENSITIVITY)
True Positives
(True Positives) (False Negatives)
FALSE POSITIVE RATE (also called 1 - SPECIFICITY)
False Positives
(False Positives) (True Negatives)
Plot Sensitivity vs. (1 Specificity) for
sampling and you are done

12
Data-Centric Example
13
ROC Rates
14
ROC Plot
LOGISTIC
NEURAL
15
Sidebar Use More Samples
(These are plots from a much larger dataset
See Malin 2005)
16
ROC Quantification

Area Under ROC Curve
Use quadrature to calculate the area
e.g. trapz (trapezoidal rule) function in Matlab
will work
Example Appears Neural Network model is
better.

17
Theory Model Optimality

Classifiers on convex hull are always optimal
e.g. Net Tree
Classifiers below convex hull are always
suboptimal
e.g. Naïve Bayes

Decision Tree
Neural Net
Naïve Bayes
18
Building Better Classifiers

Classifiers on convex hull can be combined to
form a strictly dominant hybrid classifier
ordered sequence of classifiers
can be converted into ranker

Decision Tree
Neural Net
19
Some Statistical Insight

Curve Area
Take random healthy patient ? score of X
Take random heart attack patient ? score of Y
Area estimate of P Y gt X
Slope of curve is equal to likelihood
P (score Signal)
P (score Noise)
ROC graph captures all information in conting.
table
False negative true negative rates are
complements of true positive false positive
rates, resp.

20
Can Always QuantifyBest Operating Point

When misclassification costs are equal, best
operating point is
45? tangent to curve closest to (0,1) coord.
Verify this mathematically (economic
interpretation)
Why?

21
Quick Question

Are ROC curves always appropriate?
Subjective operating points?
Must weight the tradeoffs between false positives
and false negatives
ROC curve plot is independent of the class
distribution or error costs
This leads into utility theory (not touching this
today)

22
Much Much More on ROC

Oh, if only I had more time.
You should also look up and learn about
Iso-accuracy lines
Skew distributions and why the 45? line isnt
always best
Convexity vs. non-convexity vs. concavity
Mann-Whitney-Wilcoxon sum of ranks
Gini coefficient
Calibrated thresholds
Averaging ROC curves
Precision-Recall (THIS IS VERY IMPORTANT)
Cost Curves

23
Some References

Good Bibliography http//splweb.bwh.harvard.edu8
000/pages/ppl/zou/roc.html
Drummond C and Holte R. What ROC curves can and
cant do (and cost curves can). In Proceedings of
the Workshop on ROC Analysis in AI in
conjunction with the European Conference on AI.
Valencia, Spain. 2004.
Malin B. Probabilistic prediction of myocardial
infarction logistic regression versus simple
neural networks. Data Privacy Lab Working Paper
WP-25, School of Computer Science, Carnegie
Mellon University. Sept 2005.
McNeil BJ, Hanley JA. Statistical approaches to
the analysis of receiver operating characteristic
(ROC) curves. Medical Decision Making. 1984 4
137-50.
Provost F and Fawcett T. The case against
accuracy estimation for comparing induction
algorithms. In Proceedings of the 15th
International Conference on Machine Learning.
Madison, Wisconsin. 1998 445-453.
Swets J. Measuring the accuracy of diagnostic
systems. Science. 1988 240(4857) 1285-1293.
(based on his 1967 book Information Retrieval
Systems)

Write a Comment

User Comments (0)

About PowerShow.com