Model Selection and Assessment Using Cross-indexing

About This Presentation

Title:

Model Selection and Assessment Using Cross-indexing

Description:

Model Selection and Assessment Using Cross-indexing. Juha Reunanen ... Regularization penalize complex models. Model selection welcome to the second level... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 13

Provided by: clop

Category:

more less

Transcript and Presenter's Notes

Title: Model Selection and Assessment Using Cross-indexing

1
Model Selection and Assessment Using
Cross-indexing

Juha Reunanen
ABB, Web Imaging Systems, Finland

2
Model Selection Using Cross-Validation

Choose a search algorithm for example
hill-climbing, grid search, genetic algorithm
Evaluate the models using cross-validation
Select the model that gives the best CV score

3
Multiple-Comparison Procedure (D. D. Jensen and
P. R. Cohen Multiple Comparisons in Induction
Algorithms, Machine Learning, volume 38, pages
309338, 2000)

Example Choosing an investment advisor
Criterion Predict stock market change (/)
correctly for 11 out of 14 days
You evaluate 10 candidates
Your friend evaluates 30 candidates
If everyone is just guessing, your probability of
accepting is 0.253, your friends 0.583

4
The Problem

Overfitting on the first level of
inferenceIncreasing model complexity may
decrease the training error while the test error
goes up
Overfitting on the second level of
inferenceMaking the search more intense may
decrease the CV error estimate, even if the test
error would actually go up

5
Overfitting Visualized

Model Complexity, or Number of Models
Evaluated

6
Solutions

First level of inference
Regularization penalize complex models
Model selection welcome to the second level...
Second level of inference
Regularization! (G. C. Cawley and N. L. C.
Talbot Preventing over-fitting during model
selection via Bayesian regularisation of the
hyper-parameters, Journal of Machine Learning
Research, volume 8, pages 841-861, 2007)
Another layer of (cross-)validation...

7
Another Layer of Validation

A lot of variance the estimate related to the
winner gets biased (in the MCP sense)
Cross-validation makes it smoother, but does not
remove the problem

8
The Cross-indexing Trick

Assume an outer loop of cross-validation using
five folds
Use (for example) three folds to determine the
best depth, and the rest two to assess it
This essentially removes the multiple-comparison
effect
Revolve, and average (or, create an ensemble)
Previously shown to work in feature selection
(Juha Reunanen Less Biased Measurement of
Feature Selection Benefits, SLSFS 2005, LNCS
3940, pages 198208, 2006)

9
Competition Entries

Stochastic search guided by cross-validation
Several candidate models (and corresponding
search processes running pseudo-parallel)Prepro
naiveBayes, PCAkernelRidge, GSkernelRidge,
PreprolinearSVC, PreprononlinearSVC,
ReliefneuralNet, RF, and Boosting (with
neuralNet, SVC and kernelRidge)
Final selection and assessment using the
cross-indexing criterion

10
Milestone Results
Agnostic learning ranks as of December 1st, 2006
Yellow CLOP model. CLOP prize winner Juha
Reunanen (both ave. rank and ave. BER). Best ave.
BER held by Reference (Gavin Cawley) with the
bad.
11
Models Selected
12
Conclusions

Because of multiple-comparison procedures (MCPs)
on the different levels of inference, validation
is often used to estimate final performance
On the second level, the cross-indexing trick may
give estimates that are less biased (when
comparing to straightforward outer-loop CV)

Write a Comment

User Comments (0)