ROC Statistics for the Lazy Machine Learner in All of Us

1 / 23
About This Presentation
Title:

ROC Statistics for the Lazy Machine Learner in All of Us

Description:

Imagine you have 2 different probabilistic classification models ... How do you communicate your belief? ... Valencia, Spain. 2004. ... –

Number of Views:41
Avg rating:3.0/5.0
Slides: 24
Provided by: peopleVa
Category:

less

Transcript and Presenter's Notes

Title: ROC Statistics for the Lazy Machine Learner in All of Us


1
ROCStatistics for the LazyMachine Learner in
All of Us
  • Bradley Malin
  • Lecture for COS Lab
  • School of Computer Science
  • Carnegie Mellon University
  • 9/22/2005

2
Why Should I Care?
  • Imagine you have 2 different probabilistic
    classification models
  • e.g. logistic regression vs. neural network
  • How do you know which one is better?
  • How do you communicate your belief?
  • Can you provide quantitative evidence beyond a
    gut feeling and subjective interpretation?

3
Recall Basics Contingencies
4
Some Terms
5
Some More Terms
6
Accuracy
  • What does this mean?
  • What is the difference between accuracy and an
    accurate prediction?
  • Contingency Table Interpretation
  • (True Positives) (True
    Negatives)
  • (True Positives) (True Negatives)
  • (False Positives) (False Negatives)
  • Is this a good measure? (Why or Why Not?)

7
Note on Discrete Classes
  • TRADITION Show contingency table when reporting
    predictions of model.
  • BUT probabilistic models do not provide
    discrete calculations of the matrix cells!!!
  • IN OTHER WORDS Regression does not report the
    number of individuals predicted positive (e.g.
    has a heart attack) well, not really
  • INSTEAD report probability the output will be
    certain variable (e.g. 1 or 0)

8
Visual Perspective
??
??
??
9
ROC Curves
  • Originated from signal detection theory
  • Binary signal corrupted by Guassian noise
  • What is the optimal threshold (i.e. operating
    point)?
  • Dependence on 3 factors
  • Signal Strength
  • Noise Variance
  • Personal tolerance in Hit / False Alarm Rate

10
ROC Curves
  • Receiver operator characteristic
  • Summarize present performance of any binary
    classification model
  • Models ability to distinguish between false
    true positives

11
Use Multiple Contingency Tables
  • Sample contingency tables from range of
    threshold/probability.
  • TRUE POSITIVE RATE (also called SENSITIVITY)
  • True Positives
  • (True Positives) (False Negatives)
  • FALSE POSITIVE RATE (also called 1 - SPECIFICITY)
  • False Positives
  • (False Positives) (True Negatives)
  • Plot Sensitivity vs. (1 Specificity) for
    sampling and you are done

12
Data-Centric Example
13
ROC Rates
14
ROC Plot
LOGISTIC
NEURAL
15
Sidebar Use More Samples
(These are plots from a much larger dataset
See Malin 2005)
16
ROC Quantification
  • Area Under ROC Curve
  • Use quadrature to calculate the area
  • e.g. trapz (trapezoidal rule) function in Matlab
    will work
  • Example Appears Neural Network model is
    better.

17
Theory Model Optimality
  • Classifiers on convex hull are always optimal
  • e.g. Net Tree
  • Classifiers below convex hull are always
    suboptimal
  • e.g. Naïve Bayes

Decision Tree
Neural Net
Naïve Bayes
18
Building Better Classifiers
  • Classifiers on convex hull can be combined to
    form a strictly dominant hybrid classifier
  • ordered sequence of classifiers
  • can be converted into ranker

Decision Tree
Neural Net
19
Some Statistical Insight
  • Curve Area
  • Take random healthy patient ? score of X
  • Take random heart attack patient ? score of Y
  • Area estimate of P Y gt X
  • Slope of curve is equal to likelihood
  • P (score Signal)
  • P (score Noise)
  • ROC graph captures all information in conting.
    table
  • False negative true negative rates are
    complements of true positive false positive
    rates, resp.

20
Can Always QuantifyBest Operating Point
  • When misclassification costs are equal, best
    operating point is
  • 45? tangent to curve closest to (0,1) coord.
  • Verify this mathematically (economic
    interpretation)
  • Why?

21
Quick Question
  • Are ROC curves always appropriate?
  • Subjective operating points?
  • Must weight the tradeoffs between false positives
    and false negatives
  • ROC curve plot is independent of the class
    distribution or error costs
  • This leads into utility theory (not touching this
    today)

22
Much Much More on ROC
  • Oh, if only I had more time.
  • You should also look up and learn about
  • Iso-accuracy lines
  • Skew distributions and why the 45? line isnt
    always best
  • Convexity vs. non-convexity vs. concavity
  • Mann-Whitney-Wilcoxon sum of ranks
  • Gini coefficient
  • Calibrated thresholds
  • Averaging ROC curves
  • Precision-Recall (THIS IS VERY IMPORTANT)
  • Cost Curves

23
Some References
  • Good Bibliography http//splweb.bwh.harvard.edu8
    000/pages/ppl/zou/roc.html
  • Drummond C and Holte R. What ROC curves can and
    cant do (and cost curves can). In Proceedings of
    the Workshop on ROC Analysis in AI in
    conjunction with the European Conference on AI.
    Valencia, Spain. 2004.
  • Malin B. Probabilistic prediction of myocardial
    infarction logistic regression versus simple
    neural networks. Data Privacy Lab Working Paper
    WP-25, School of Computer Science, Carnegie
    Mellon University. Sept 2005.
  • McNeil BJ, Hanley JA. Statistical approaches to
    the analysis of receiver operating characteristic
    (ROC) curves. Medical Decision Making. 1984 4
    137-50.
  • Provost F and Fawcett T. The case against
    accuracy estimation for comparing induction
    algorithms. In Proceedings of the 15th
    International Conference on Machine Learning.
    Madison, Wisconsin. 1998 445-453.
  • Swets J. Measuring the accuracy of diagnostic
    systems. Science. 1988 240(4857) 1285-1293.
    (based on his 1967 book Information Retrieval
    Systems)
Write a Comment
User Comments (0)
About PowerShow.com