Statistical Comparison of Two Learning Algorithms - PowerPoint PPT Presentation

About This Presentation
Title:

Statistical Comparison of Two Learning Algorithms

Description:

Compare the samples not just their means. Do a statistically ... Approximate statistical tests for comparing supervised classification learning algorithms. ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 13
Provided by: Pay27
Category:

less

Transcript and Presenter's Notes

Title: Statistical Comparison of Two Learning Algorithms


1
Statistical Comparison of Two Learning Algorithms
  • Presented by
  • Payam Refaeilzadeh

2
Overview
  • How can we tell if one algorithm can learn better
    than another?
  • Design an experiment to measure the accuracy of
    the two algorithms.
  • Run multiple trials.
  • Compare the samples not just their means. Do a
    statistically sound test of the two samples.
  • Is any observed difference significant? Is it
    due to true difference between algorithms or
    natural variation in the measurements?

3
Statistical Hypothesis Testing
  • Statistical Hypothesis A statement about the
    parameters of one or more populations
  • Hypothesis Testing A procedure for deciding to
    accept or reject the hypothesis
  • Identify the parameter of interest
  • State a null hypothesis, H0
  • Specify an alternate hypothesis, H1
  • Choose a significance level a
  • State an appropriate test statistic

4
Statistical Hypothesis Testing Cont
  • Null Hypothesis (H0) A statement presumed to be
    true until statistical evidence shows otherwise
  • Usually specifies an exact value for a parameter
  • Example H0 µ 30 Kg
  • Alternate Hypothesis (H1) Accepted if the null
    hypothesis is rejected
  • Test Statistic Particular statistic calculated
    from measurements of a random sample / experiment
  • A test statistic is assumed to follow a
    particular distribution (normal, t, chi-square,
    etc)
  • That particular distribution can be used to test
    for the significance of the calculated test
    statistic.

5
Error in Hypothesis Testing
  • Type I error occurs when H0 is rejected but it is
    in fact true
  • P(Type I error)a or significance level
  • Type II error occurs when we fail to reject H0
    but it is in fact false
  • P(Type II error)ß
  • power 1-ß Probability of correctly rejecting
    H0
  • power ability to distinguish between the two
    populations

6
Paired t-Test
  • Collect data in pairs
  • Example Given a training set DTrain and a test
    set DTest, train both learning algorithms on
    DTrain and then test their accuracies on DTest.
  • Suppose n paired measurements have been made
  • Assume
  • The measurements are independent
  • The measurements for each algorithm follow a
    normal distribution
  • The test statistic T0 will follow a
    t-distribution with n-1 degrees of freedom

7
Paired t-Test cont
Trial Algorithm 1 Accuracy X1 Algorithm 2 Accuracy X2
1 X11 X21
2 X12 X22
..
n X1N X2N
Null Hypothesis H0 µD ?0 Test Statistic
Assume X1 follows N(µ1,s1) X2
follows N(µ2,s2) Let µD µ1 - µ2 Di X1i -
X2i i1,2,...,n
Rejection Criteria H1 µD ? ?0 t0 gt
ta/2,n-1 H1 µD gt ?0 t0 gt ta,n-1 H1 µD lt ?0
t0 lt -ta,n-1
8
Cross Validated t-test
  • Paired t-Test on the 10 paired accuracies
    obtained from 10-fold cross validation
  • Advantages
  • Large train set size
  • Most powerful (Diettrich, 98)
  • Disadvantages
  • Accuracy results are not independent (overlap)
  • Somewhat elevated probability of type-1 error
    (Diettrich, 98)

     
     
     
     
     
     
     
     
     
     
9
5x2 Cross Validated t-test
  • Run 2-fold cross validation 5 times
  • Use results from the first of five replications
    to estimate mean difference
  • Use results for all folds to estimate the
    variance
  • Advantage
  • Lowest Type-1 error (Diettrich, 98)
  • Disadvantage
  • Not as powerful as 10 fold cross validated t-test
    (Diettrich, 98)

10
Re-sampled t-test
  • Randomly divide data into train / test sets
    (usually 2/3 1/3)
  • Run multiple trials (usually 30)
  • Perform a paired t-test between the trial
    accuracies
  • This test has very high probability of type-1
    error and should never be used.

11
Calibrated Tests
  • Bouckaert ICML 2003
  • It is very difficult to estimate the true degrees
    of freedom because independence assumptions are
    being violated
  • Instead of correcting for the mean-difference,
    calibrate on the degrees of freedom
  • Recommendation use 10 times repeated 10-fold
    cross validation with 10 degrees of freedom

12
References
  • R. R. Bouckaert. Choosing between two learning
    algorithms based on calibrated tests. ICML03 PP
    51-58.
  • T. G. Dietterich. Approximate statistical tests
    for comparing supervised classification learning
    algorithms. Neural Computation, 1018951924,
    1998.
  • D. C. Montgomery et al. Engineering Statistics.
    2nd Edition. Wiley Press. 2001
Write a Comment
User Comments (0)
About PowerShow.com