Title: RESULTS OF THE NIPS 2006
1- RESULTS OF THE NIPS 2006
- MODEL SELECTION GAME
- Isabelle Guyon, Amir Saffari, Gideon Dror,
- Gavin Cawley, Olivier Guyon,
- and many other volunteers, see http//www.agnostic
.inf.ethz.ch/credits.php
2Thanks
3Part I
4Model selection
- Selecting models (neural net, decision tree, SVM,
) - Selecting hyperparameters (number of hidden
units, weight decay/ridge, kernel parameters, ) - Selecting variables or features (space
dimensionality reduction.) - Selecting patterns (data cleaning, data
reduction, e.g by clustering.)
5Performance prediction challenge
- How good are you at predicting
- how good you are?
- Practically important in pilot studies.
- Good performance predictions render model
selection trivial.
6Model Selection Game
- Find which model works best
- in a well controlled environment.
- A given sandbox the CLOP Matlab toolbox.
- Focus only on devising model selection strategy.
- Same datasets as the performance prediction
challenge, but reshuffled - Two 500 prizes offered.
7Agnostic Learning vs. Prior Knowledge challenge
- When everything else fails,
- ask for additional domain knowledge
- Two tracks
- Agnostic learning Preprocessed datasets in a
nice feature-based representation, but no
knowledge about the identity of the features. - Prior knowledge Raw data, sometimes not in a
feature-based representation. Information given
about the nature and structure of the data.
8Game rules
- Date started October 1st, 2006.
- Date ended December 1st, 2006
- Duration 3 months.
- Submit in Agnostic track only.
- Optionally use CLOP or Spider.
- Five last complete entries ranked
- Total ALvsPK challenge entrants 22.
- Total ALvsPK developement entries 546.
- Number of game ranked participants 10.
- Number of game ranked submissions 39.
9Datasets
Type
Dataset
Domain
Feat-ures
Training Examples
Validation Examples
Test Examples
Dense
ADA
415
Marketing
48
4147
41471
Dense
GINA
Digits
970
3153
315
31532
Dense
HIVA
384
Drug discovery
1617
3845
38449
Sparse binary
NOVA
Text classif.
16969
1754
175
17537
Dense
SYLVA
1308
Ecology
216
13086
130858
http//www.agnostic.inf.ethz.ch
10Baseline BER distribution(Performance prediction
challenge, 145 entrants)
Test BER
11Agnostic track on Dec. 1st 2006
- Yellow used a CLOP model
- CLOP prize winner Juha Reunanen
(both ave. rank and ave. BER) - Best ave. BER still held by Reference (Gavin
Cawley) with the_bad.
12Part II
13Protocol
- Data split training/validation/test.
- Data proportions 10/1/100.
- Online feed-back on validation data.
- Validation label release not yet one month
before end of challenge. - Final ranking on test data using the five last
complete submissions for each entrant.
14Performance metrics
- Balanced Error Rate (BER) average of error rates
of positive class and negative class. - Area Under the ROC Curve (AUC).
- Guess error (for the performance prediction
challenge only) - dBER abs(testBER guessedBER)
15CLOP
- CLOPChallenge Learning Object Package.
- Based on the Spider developed at the Max Planck
Institute. - Two basic abstractions
- Data object
- Model object
http//www.agnostic.inf.ethz.ch/models.php
16CLOP tutorial
At the Matlab prompt
- Ddata(X,Y)
- hyper 'degree3', 'shrinkage0.1'
- model kridge(hyper)
- resu, model train(model, D)
- tresu test(model, testD)
- model chain(standardize,kridge(hyper))
17CLOP models
18Preprocessing and FS
19Model grouping
for k110 base_modelkchain(standardize,
naive) end my_modelensemble(base_model)
20Part III
21What did we expect?
- Learn about new competitive machine learning
techniques. - Identify competitive methods of performance
prediction, model selection, and ensemble
learning (theory put into practice). - Drive research in the direction of refining such
methods (on-going benchmark).
22Method comparison (PPC)
Agnostic track no significant improvement so far
dBER
Test BER
23LS-SVM
Gavin Cawley, July 2006
24Logitboost
Roman Lutz, July 2006
25CLOP models (best entrant)
Â
Juha Reunanen, cross-indexing-7
Â
sns shiftnscale, std standardize, norm
normalize (some details of hyperparameters not
shown)
26CLOP models (2nd best entrant)
Â
Hugo Jair Escalante Balderas, BRun2311062
Â
sns shiftnscale, std standardize, norm
normalize (some details of hyperparameters not
shown) Note entry Boosting_1_001_x900 gave
better results, but was older.
27Danger of overfitting (PPC)
Full line test BER Dashed line validation BER
0.5
0.45
0.4
0.35
HIVA
0.3
BER
0.25
0.2
ADA
0.15
0.1
NOVA
GINA
0.05
SYLVA
0
0
20
40
60
80
100
120
140
160
Time (days)
28Two best CLOP entrants (game)
Ave. test BER
H._Jair_Escalante
Juha Reunanen
Time
Statistically significant difference for 3/5
datasets.
29Stats / CV / bounds ???
30Top ranking methods
- Performance prediction
- CV with many splits 90 train / 10 validation
- Nested CV loops
- Model selection
- Performance prediction challenge
- Use of a single model family
- Regularized risk / Bayesian priors
- Ensemble methods
- Nested CV loops, computationally efficient with
with VLOO - Model selection game
- Cross-indexing
- Particle swarm
31Part IV
- COMPETE NOW
- in the
- PRIOR KNOWLEDGE TRACK
32ADA
- ADA is the marketing database
- Task Discover high revenue people from census
data. Two-class pb. - Source Census bureau, Adult database from the
UCI machine-learning repository. - Features 14 original attributes including age,
workclass, Â education, education, marital status,
occupation, native country. Continuous, binary
and categorical features. -
- Â
33GINA
GINA is the digit database
- Task Handwritten digit recognition. Separate the
odd from the even digits. Two-class pb. with
heterogeneous classes. - Source MNIST database formatted by LeCun and
Cortes. - Features 28x28 pixel map.
- Â
34HIVA
- HIVA is the HIV database
- Task Find compounds active against the AIDS HIV
infection. We brought it back to a two-class pb.
(active vs. inactive), but provide the original
labels (active, moderately active, and inactive). - Data source National Cancer Inst.
- Data representation The compounds are
represented by their 3d molecular structure. - Â
35NOVA
Subject Re Goalie masksLines 21Tom
Barrasso wore a great mask, one time, last
season. He unveiled it at a game in Boston.Â
It was all black, with Pgh city scenes on it.
The "Golden Triangle" graced the top, alongwith
a steel mill on one side and the Civic Arena on
the other.  On the back of the helmet was the
old Pens' logo the current (at the time)
Penslogo, and a space for the "new" logo.A
great mask done in by a goalie's
superstition.LoriÂ
- NOVA is the text classification database
- Task Classify newsgroup emails into politics or
religion vs. other topics. - Source The 20-Newsgroup dataset from in the UCI
machine-learning repository. - Data representation The raw text with an
estimated 17000 words of vocabulary.
36SYLVA
- SYLVA is the ecology database
- Task Classify forest cover types into Ponderosa
pine vs. everything else. - Source US Forest Service (USFS).
- Data representation Forest cover type for 30 x
30 meter cells encoded with 108 features
(elavation, hill shade, wilderness type, soil
type, etc.) -
- Â
37How to enter?
- Enter results on any dataset in either track
until March 1st 2007 at http//www.agnostic.inf.et
hz.ch. - Only complete entries (on 5 datasets) will be
ranked. The 5 last will count. - Seven prizes
- Best overall agnostic entry.
- Best overall prior knowledge entry.
- Best prior knowledge result in each dataset (5
prizes). - Best paper.
38Conclusions
- Less participation volume as in the previous
challenges - Entry level higher
- Other on-going competitions
- Top methods in agnostic track as before
- LS-SVMs and boosted logistic trees
- Top ranking entries closely followed by CLOP
entries showing great advances in model
selection. - Todo upgrade CLOP with LS-SVMs and logitboost.
39Open problems
- Bridge the gap between theory and practice
- What are the best estimators of the variance of
CV? - What should k be in k-fold?
- Are other cross-validation methods better than
k-fold (e.g bootstrap, 5x2CV)? - Are there better hybrid methods?
- What search strategies are best?
- More than 2 levels of inference?