Title: Assessing and Comparing Machine Learning Algorithms
1Assessing and Comparing Machine Learning
Algorithms
2Learning Objectives
- Understand cross-validation and resampling
methods. - Understand how to measure error.
- Understand hypothesis testing.
- Understand how to compare classification
algorithms performance. - Understand how to assess a prediction algorithms
performance. - Understand how to assess a clustering algorithms
performance.
3Acknowledgements
- Some of these slides have been adapted from Ethem
Alpaydin.
4Introduction
- Questions
- Assessment of the expected error of a learning
algorithm is the error rate of 1-NN less than
2? - Comparing the expected errors of two algorithms
is k-NN more accurate than MLP ? - Training/validation/test sets
- Resampling methods K-fold cross-validation
5Algorithm Preference
- Criteria (Application-dependent)
- Misclassification error, or risk (loss functions)
- Training time/space complexity
- Testing time/space complexity
- Interpretability
- Easy programmability
- Cost-sensitive learning
6Learning Objectives
- Understand cross-validation and resampling
methods. - Understand how to measure error.
- Understand hypothesis testing.
- Understand how to compare classification
algorithms performance. - Understand how to assess a prediction algorithms
performance. - Understand how to assess a clustering algorithms
performance.
7Resampling and K-Fold Cross-Validation
- The need for multiple training/validation sets
- Xi,Vii training/validation sets of fold i
- K-fold cross-validation divide X into k,
Xi,i1,...,K - Ti share K-2 parts
- Leave one out Ti N-1 Vi 1
852 Cross-Validation
- 5 times 2 fold cross-validation (Dietterich, 1998)
9Bootstrapping
- Draw instances from a dataset with replacement
- Prob that we do not pick an instance after N
draws -
- that is, only 36.8 is new!
10Learning Objectives
- Understand cross-validation and resampling
methods. - Understand how to measure error.
- Understand hypothesis testing.
- Understand how to compare classification
algorithms performance. - Understand how to assess a prediction algorithms
performance. - Understand how to assess a clustering algorithms
performance.
11Measuring Error
- Error rate of errors / of instances
(FNFP) / N - Recall of found positives / of positives
- TP / (TPFN) sensitivity hit rate
- Precision of found positives / of found
- TP / (TPFP)
- Specificity TN / (TNFP)
- False alarm rate FP / (FPTN) 1 - Specificity
12ROC Curve
13Learning Objectives
- Understand cross-validation and resampling
methods. - Understand how to measure error.
- Understand hypothesis testing.
- Understand how to compare classification
algorithms performance. - Understand how to assess a prediction algorithms
performance. - Understand how to assess a clustering algorithms
performance.
14Interval Estimation
- X xt t where xt N ( µ, s2)
- m N ( µ, s2/N)
100(1- a) percent confidence interval
15When s2 is not known
16Hypothesis Testing
- Reject a null hypothesis if not supported by the
sample with enough confidence - X xt t where xt N ( µ, s2)
- H0 µ µ0 vs. H1 µ ? µ0
- Accept H0 with level of significance a if µ0 is
in the 100(1- a) confidence interval - Two-sided test
17- One-sided test H0 µ µ0 vs. H1 µ gt µ0
- Accept if
- Variance unknown Use t, instead of z
- Accept H0 µ µ0 if
18Assessing Error H0 p p0 vs. H1 p gt p0
- Single training/validation set Binomial Test
- If error prob is p0, prob that there are e
errors or less in N validation trials is -
Accept if this prob is less than 1- a
1- a
N100, e20
19Normal Approximation to the Binomial
- Number of errors X is approx N with mean Np0 and
var Np0(1-p0)
Accept if this prob for X e is less than z1-a
1- a
20Paired t Test
- Multiple training/validation sets
- xti 1 if instance t misclassified on fold i
- Error rate of fold i
- With m and s2 average and var of pi
- we accept p0 or less error if
- is less than ta,K-1
21Learning Objectives
- Understand cross-validation and resampling
methods. - Understand how to measure error.
- Understand hypothesis testing.
- Understand how to compare classification
algorithms performance. - Understand how to assess a prediction algorithms
performance. - Understand how to assess a clustering algorithms
performance.
22Comparing Classifiers H0 µ0 µ1 vs. H1 µ0
? µ1
- Single training/validation set McNemars Test
- Under H0, we expect e01 e10(e01 e10)/2
Accept if lt X2a,1
23K-Fold CV Paired t Test
- Use K-fold cv to get K training/validation folds
- pi1, pi2 Errors of classifiers 1 and 2 on fold i
- pi pi1 pi2 Paired difference on fold i
- The null hypothesis is whether pi has mean 0
2452 cv Paired t Test
- Use 52 cv to get 2 folds of 5 tra/val
replications (Dietterich, 1998) - pi(j) difference btw errors of 1 and 2 on fold
j1, 2 of replication i1,...,5
Two-sided test Accept H0 µ0 µ1 if in
(-ta/2,5,ta/2,5)
One-sided test Accept H0 µ0 µ1 if lt ta,5
2552 cv Paired F Test
Two-sided test Accept H0 µ0 µ1 if lt Fa,10,5
26Comparing Lgt2 Algorithms Analysis of Variance
(Anova)
- Errors of L algorithms on K folds
- We construct two estimators to s2 .
- One is valid if H0 is true, the other is always
valid. - We reject H0 if the two estimators disagree.
27(No Transcript)
28(No Transcript)
29Other Tests
- Range test (Newman-Keuls)
- Nonparametric tests (Sign test, Kruskal-Wallis)
- Contrasts Check if 1 and 2 differ from 3,4, and
5 - Multiple comparisons require Bonferroni
correction If there are m tests, to have an
overall significance of a, each test should have
a significance of a/m. - Regression CLT states that the sum of iid
variables from any distribution is approximately
normal and the preceding methods can be used. - Other loss functions ?
30Learning Objectives
- Understand cross-validation and resampling
methods. - Understand how to measure error.
- Understand hypothesis testing.
- Understand how to compare classification
algorithms performance. - Understand how to assess a prediction algorithms
performance. - Understand how to assess a clustering algorithms
performance.
31Prediction Assessment
- As in classification
- Performance evaluation
- Cross-validation
- Difference
- Error rate is not appropriate
- Performance measures for prediction
- Mean squared error
- Root mean squared error
- Mean absolute error
- Relative squared error
- Root relative squared error
- Relative absolute error
- Correlation coefficient
32Prediction Assessment
- Performance measures for prediction (p
predicted values, a actual values) - Mean squared error
- Root mean squared error
- Mean absolute error
- Relative squared error
- Root relative squared error
- Relative absolute error
- Correlation coefficient
33Prediction Assessment
- Performance measures for prediction
- Minimize error and maximize the correlation
coefficient - Significance test applied to performance measure
(mean squared error )
34Learning Objectives
- Understand cross-validation and resampling
methods. - Understand how to measure error.
- Understand hypothesis testing.
- Understand how to compare classification
algorithms performance. - Understand how to assess a prediction algorithms
performance. - Understand how to assess a clustering algorithms
performance.
35Minimum Description Length Principle
- MDL (minimum description length
principle)states that the best theory for some
data is the one that minimizes the size of the
modelalso minimize the amount of information
necessary to specify the exceptions relative to
the theory - Theory that minimizes LTLETwhere LT is
the number of bits to code the theory and where
LET is the number of bits to code training set.
36Clustering Assessment
- Clustering assessment
- Evaluate how clusters found match predefined
classes (supervised method). - Evaluate by application context usefulness.
- Evaluate by minimum description length principle
- Best clustering will support the most efficient
encoding of the samples by the clusters. - Example
- Encode the cluster centers.
- For each sample, code the cluster it belongs
to, and its displacement/coordinates from the
cluster center. - The better the clustering fits the data, the more
compact is going to be the representation.