Model Selection and Assessment Using Cross-indexing - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Model Selection and Assessment Using Cross-indexing

Description:

Model Selection and Assessment Using Cross-indexing. Juha Reunanen ... Regularization penalize complex models. Model selection welcome to the second level... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 13
Provided by: clop
Category:

less

Transcript and Presenter's Notes

Title: Model Selection and Assessment Using Cross-indexing


1
Model Selection and Assessment Using
Cross-indexing
  • Juha Reunanen
  • ABB, Web Imaging Systems, Finland

2
Model Selection Using Cross-Validation
  • Choose a search algorithm for example
    hill-climbing, grid search, genetic algorithm
  • Evaluate the models using cross-validation
  • Select the model that gives the best CV score

3
Multiple-Comparison Procedure (D. D. Jensen and
P. R. Cohen Multiple Comparisons in Induction
Algorithms, Machine Learning, volume 38, pages
309338, 2000)
  • Example Choosing an investment advisor
  • Criterion Predict stock market change (/)
    correctly for 11 out of 14 days
  • You evaluate 10 candidates
  • Your friend evaluates 30 candidates
  • If everyone is just guessing, your probability of
    accepting is 0.253, your friends 0.583

4
The Problem
  • Overfitting on the first level of
    inferenceIncreasing model complexity may
    decrease the training error while the test error
    goes up
  • Overfitting on the second level of
    inferenceMaking the search more intense may
    decrease the CV error estimate, even if the test
    error would actually go up

5
Overfitting Visualized
  • Model Complexity, or Number of Models
    Evaluated

6
Solutions
  • First level of inference
  • Regularization penalize complex models
  • Model selection welcome to the second level...
  • Second level of inference
  • Regularization! (G. C. Cawley and N. L. C.
    Talbot Preventing over-fitting during model
    selection via Bayesian regularisation of the
    hyper-parameters, Journal of Machine Learning
    Research, volume 8, pages 841-861, 2007)
  • Another layer of (cross-)validation...

7
Another Layer of Validation
  • A lot of variance the estimate related to the
    winner gets biased (in the MCP sense)
  • Cross-validation makes it smoother, but does not
    remove the problem

8
The Cross-indexing Trick
  • Assume an outer loop of cross-validation using
    five folds
  • Use (for example) three folds to determine the
    best depth, and the rest two to assess it
  • This essentially removes the multiple-comparison
    effect
  • Revolve, and average (or, create an ensemble)
  • Previously shown to work in feature selection
  • (Juha Reunanen Less Biased Measurement of
    Feature Selection Benefits, SLSFS 2005, LNCS
    3940, pages 198208, 2006)

9
Competition Entries
  • Stochastic search guided by cross-validation
  • Several candidate models (and corresponding
    search processes running pseudo-parallel)Prepro
    naiveBayes, PCAkernelRidge, GSkernelRidge,
    PreprolinearSVC, PreprononlinearSVC,
    ReliefneuralNet, RF, and Boosting (with
    neuralNet, SVC and kernelRidge)
  • Final selection and assessment using the
    cross-indexing criterion

10
Milestone Results
Agnostic learning ranks as of December 1st, 2006
Yellow CLOP model. CLOP prize winner Juha
Reunanen (both ave. rank and ave. BER). Best ave.
BER held by Reference (Gavin Cawley) with the
bad.
11
Models Selected
12
Conclusions
  • Because of multiple-comparison procedures (MCPs)
    on the different levels of inference, validation
    is often used to estimate final performance
  • On the second level, the cross-indexing trick may
    give estimates that are less biased (when
    comparing to straightforward outer-loop CV)
Write a Comment
User Comments (0)
About PowerShow.com