RESULTS OF THE NIPS 2006

About This Presentation

Title:

RESULTS OF THE NIPS 2006

Description:

Best ave. BER still held by Reference (Gavin Cawley) with the_bad. Part II. PROTOCOL and SCORING ... Ave. test BER. H._Jair_Escalante. Juha Reunanen ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 40

Provided by: Isabell47

Category:

more less

Transcript and Presenter's Notes

Title: RESULTS OF THE NIPS 2006

1

RESULTS OF THE NIPS 2006
MODEL SELECTION GAME
Isabelle Guyon, Amir Saffari, Gideon Dror,
Gavin Cawley, Olivier Guyon,
and many other volunteers, see http//www.agnostic
.inf.ethz.ch/credits.php

2
Thanks
3
Part I

INTRODUCTION

4
Model selection

Selecting models (neural net, decision tree, SVM,
)
Selecting hyperparameters (number of hidden
units, weight decay/ridge, kernel parameters, )
Selecting variables or features (space
dimensionality reduction.)
Selecting patterns (data cleaning, data
reduction, e.g by clustering.)

5
Performance prediction challenge

How good are you at predicting
how good you are?
Practically important in pilot studies.
Good performance predictions render model
selection trivial.

6
Model Selection Game

Find which model works best
in a well controlled environment.
A given sandbox the CLOP Matlab toolbox.
Focus only on devising model selection strategy.
Same datasets as the performance prediction
challenge, but reshuffled
Two 500 prizes offered.

7
Agnostic Learning vs. Prior Knowledge challenge

When everything else fails,
ask for additional domain knowledge
Two tracks
Agnostic learning Preprocessed datasets in a
nice feature-based representation, but no
knowledge about the identity of the features.
Prior knowledge Raw data, sometimes not in a
feature-based representation. Information given
about the nature and structure of the data.

8
Game rules

Date started October 1st, 2006.
Date ended December 1st, 2006
Duration 3 months.
Submit in Agnostic track only.
Optionally use CLOP or Spider.
Five last complete entries ranked
Total ALvsPK challenge entrants 22.
Total ALvsPK developement entries 546.
Number of game ranked participants 10.
Number of game ranked submissions 39.

9
Datasets
Type
Dataset
Domain
Feat-ures
Training Examples
Validation Examples
Test Examples
Dense
ADA
415
Marketing
48
4147
41471
Dense
GINA
Digits
970
3153
315
31532
Dense
HIVA
384
Drug discovery
1617
3845
38449
Sparse binary
NOVA
Text classif.
16969
1754
175
17537
Dense
SYLVA
1308
Ecology
216
13086
130858
http//www.agnostic.inf.ethz.ch
10
Baseline BER distribution(Performance prediction
challenge, 145 entrants)
Test BER
11
Agnostic track on Dec. 1st 2006

Yellow used a CLOP model
CLOP prize winner Juha Reunanen
(both ave. rank and ave. BER)
Best ave. BER still held by Reference (Gavin
Cawley) with the_bad.

12
Part II

PROTOCOL and SCORING

13
Protocol

Data split training/validation/test.
Data proportions 10/1/100.
Online feed-back on validation data.
Validation label release not yet one month
before end of challenge.
Final ranking on test data using the five last
complete submissions for each entrant.

14
Performance metrics

Balanced Error Rate (BER) average of error rates
of positive class and negative class.
Area Under the ROC Curve (AUC).
Guess error (for the performance prediction
challenge only)
dBER abs(testBER guessedBER)

15
CLOP

CLOPChallenge Learning Object Package.
Based on the Spider developed at the Max Planck
Institute.
Two basic abstractions
Data object
Model object

http//www.agnostic.inf.ethz.ch/models.php
16
CLOP tutorial
At the Matlab prompt

Ddata(X,Y)
hyper 'degree3', 'shrinkage0.1'
model kridge(hyper)
resu, model train(model, D)
tresu test(model, testD)
model chain(standardize,kridge(hyper))

17
CLOP models
18
Preprocessing and FS
19
Model grouping
for k110 base_modelkchain(standardize,
naive) end my_modelensemble(base_model)
20
Part III

RESULT ANALYSIS

21
What did we expect?

Learn about new competitive machine learning
techniques.
Identify competitive methods of performance
prediction, model selection, and ensemble
learning (theory put into practice).
Drive research in the direction of refining such
methods (on-going benchmark).

22
Method comparison (PPC)
Agnostic track no significant improvement so far
dBER
Test BER
23
LS-SVM
Gavin Cawley, July 2006
24
Logitboost
Roman Lutz, July 2006
25
CLOP models (best entrant)

Juha Reunanen, cross-indexing-7

sns shiftnscale, std standardize, norm
normalize (some details of hyperparameters not
shown)
26
CLOP models (2nd best entrant)

Hugo Jair Escalante Balderas, BRun2311062

sns shiftnscale, std standardize, norm
normalize (some details of hyperparameters not
shown) Note entry Boosting_1_001_x900 gave
better results, but was older.
27
Danger of overfitting (PPC)
Full line test BER Dashed line validation BER
0.5
0.45
0.4
0.35
HIVA
0.3
BER
0.25
0.2
ADA
0.15
0.1
NOVA
GINA
0.05
SYLVA
0
0
20
40
60
80
100
120
140
160
Time (days)
28
Two best CLOP entrants (game)
Ave. test BER
H._Jair_Escalante
Juha Reunanen
Time
Statistically significant difference for 3/5
datasets.
29
Stats / CV / bounds ???
30
Top ranking methods

Performance prediction
CV with many splits 90 train / 10 validation
Nested CV loops
Model selection
Performance prediction challenge
Use of a single model family
Regularized risk / Bayesian priors
Ensemble methods
Nested CV loops, computationally efficient with
with VLOO
Model selection game
Cross-indexing
Particle swarm

31
Part IV

COMPETE NOW
in the
PRIOR KNOWLEDGE TRACK

32
ADA

ADA is the marketing database
Task Discover high revenue people from census
data. Two-class pb.
Source Census bureau, Adult database from the
UCI machine-learning repository.
Features 14 original attributes including age,
workclass, education, education, marital status,
occupation, native country. Continuous, binary
and categorical features.

33
GINA
GINA is the digit database

Task Handwritten digit recognition. Separate the
odd from the even digits. Two-class pb. with
heterogeneous classes.
Source MNIST database formatted by LeCun and
Cortes.
Features 28x28 pixel map.

34
HIVA

HIVA is the HIV database
Task Find compounds active against the AIDS HIV
infection. We brought it back to a two-class pb.
(active vs. inactive), but provide the original
labels (active, moderately active, and inactive).
Data source National Cancer Inst.
Data representation The compounds are
represented by their 3d molecular structure.

35
NOVA
Subject Re Goalie masksLines 21Tom
Barrasso wore a great mask, one time, last
season. He unveiled it at a game in Boston.
It was all black, with Pgh city scenes on it.
The "Golden Triangle" graced the top, alongwith
a steel mill on one side and the Civic Arena on
the other. On the back of the helmet was the
old Pens' logo the current (at the time)
Penslogo, and a space for the "new" logo.A
great mask done in by a goalie's
superstition.Lori

NOVA is the text classification database
Task Classify newsgroup emails into politics or
religion vs. other topics.
Source The 20-Newsgroup dataset from in the UCI
machine-learning repository.
Data representation The raw text with an
estimated 17000 words of vocabulary.

36
SYLVA

SYLVA is the ecology database
Task Classify forest cover types into Ponderosa
pine vs. everything else.
Source US Forest Service (USFS).
Data representation Forest cover type for 30 x
30 meter cells encoded with 108 features
(elavation, hill shade, wilderness type, soil
type, etc.)

37
How to enter?

Enter results on any dataset in either track
until March 1st 2007 at http//www.agnostic.inf.et
hz.ch.
Only complete entries (on 5 datasets) will be
ranked. The 5 last will count.
Seven prizes
Best overall agnostic entry.
Best overall prior knowledge entry.
Best prior knowledge result in each dataset (5
prizes).
Best paper.

38
Conclusions

Less participation volume as in the previous
challenges
Entry level higher
Other on-going competitions
Top methods in agnostic track as before
LS-SVMs and boosted logistic trees
Top ranking entries closely followed by CLOP
entries showing great advances in model
selection.
Todo upgrade CLOP with LS-SVMs and logitboost.

39
Open problems

Bridge the gap between theory and practice
What are the best estimators of the variance of
CV?
What should k be in k-fold?
Are other cross-validation methods better than
k-fold (e.g bootstrap, 5x2CV)?
Are there better hybrid methods?
What search strategies are best?
More than 2 levels of inference?

Write a Comment

User Comments (0)

About PowerShow.com

RESULTS OF THE NIPS 2006 - PowerPoint PPT Presentation

RESULTS OF THE NIPS 2006

Best ave. BER still held by Reference (Gavin Cawley) with the_bad. Part II. PROTOCOL and SCORING ... Ave. test BER. H._Jair_Escalante. Juha Reunanen ... – PowerPoint PPT presentation