Title: Lab 1
1Lab 1
Getting started with Basic Learning
Machines and the Overfitting Problem
2Lab 1
Polynomial regression
3Matlab POLY_GUI
- The code implements the ridge regression
algorithm wargmin Si (1-yi f(xi))2 g w 2 - f(x) w1 x w2 x2 wn xn w xT
- x x, x2, , xn
- wT XY
- X XT(XXTg)-1(XTX g)-1XT
- Xx(1) x(2) x(p) (matrix (p, n))
- The leave-one-out error (LOO) is obtained with
PRESS statistic (Predicted REsidual Sums of
Squares.) - LOO error (1/p) Sk rk/1-(XX)kk 2
4Matlab POLY_GUI
5Matlab POLY_GUI
- At the prompt type poly_gui
- Vary the parameters. Refrain from hitting CV.
Explain what happens in the following situations - Sample num. ltlt Target degree (small noise)
- Large noise, small sample num
- Target degree ltlt Model degree
- Why is the LOO error sometimes larger than the
training and test error? - Are there local minima in the LOO error? Is the
LOO error flat near the optimum? - Propose ways of getting a better solution.
6CLOP Data Objects
The poly_gui emulates CLOP objects of type data
- X rand(10,5)
- Y rand(10,1)
- D data(X,Y) constructor
- methods(D)
- get_x(D)
- get_y(D)
- plot(D)
7CLOP Model Objects
poly_ridge is a model object.
- P poly_ridge h plot(P)
- D gene(P) plot(D, h)
- resu, P train(P, D)
- mse(resu)
- Dt gene(P)
- tresu, P test(P, Dt)
- mse(tresu)
- plot(P, h)
8Lab 1
Support Vector Machines
9Support Vector Classifier
Boser-Guyon-Vapnik-1992
10Matlab SVC_GUI
- At the prompt type svc_gui
- The code implements the Support Vector Machine
algorithm with kernel - k(s, t) (1 s ? t)q exp -gs-t2
- Regularization similar to ridge regression
- Hinge loss L(xi)max(0, 1-yi f(xi))b
- Empirical risk Si L(xi)
- wargmin (1/C) w2 Si L(xi)
shrinkage
11Lab 1
More loss functions
12Loss Functions
13Exercise Gradient Descent
- Linear discriminant f(x) Sj wj xj
- Functional margin zy f(x), y?1
- Compute ?z/ ?wj
- Derive the learning rules Dwj-h ?L/?wj
corresponding to the following loss functions
SVC loss max(0, 1-z)
Adaboost loss e-z
square loss (1- z)2
logistic loss log(1e-z)
Perceptron loss max(0, -z)
14Exercise Dual Algorithms
- From the Dwj derive the Dw
- w Si ai xi
- From the Dw, derive the Dai of the dual
algorithms.
15Summary
- Modern ML algorithms optimize a penalized risk
functional
16Lab 2
Getting started with CLOP
17Lab 2
CLOP tutorial
18What is CLOP?
- CLOPChallenge Learning Object Package.
- Based on the Spider developed at the Max Planck
Institute. - Two basic abstractions
- Data object
- Model object
- Put the CLOP directory in your path.
- At the prompt type use_spider_clop
- If you have used before poly_gui type
- clear classes
19CLOP Data Objects
At the Matlab prompt
- addpath(ltclop_dirgt)
- use_spider_clop
- Xrand(10,8)
- Y1 1 1 1 1 -1 -1 -1 -1 -1'
- Ddata(X,Y) constructor
- p,nget_dim(D)
- get_x(D)
- get_y(D)
20CLOP Model Objects
D is a data object previously defined.
- model kridge constructor
- resu, model train(model, D)
- resu, model.W, model.b0
- Yhat D.Xmodel.W' model.b0
- testD data(rand(3,8), -1 -1 1')
- tresu test(model, testD)
- balanced_errate(tresu.X, tresu.Y)
21Hyperparameters and Chains
A model often has hyperparameters
- default(kridge)
- hyper 'degree3', 'shrinkage0.1'
- model kridge(hyper)
- model chain(standardize,kridge(hyper))
- resu, model train(model, D)
- tresu test(model, testD)
- balanced_errate(tresu.X, tresu.Y)
Models can be chained
22Hyper-parameters
- Kernel methods kridge and svc
- k(x, y) (coef0 x ? y)degree exp(-gamma x -
y2) - kij k(xi, xj)
- kii ? kii shrinkage
- Naïve Bayes naive none
- Neural network neural
- units, shrinkage, maxiter
- Random Forest rf (windows only)
- mtry
23Exercise
- Here some the pattern recognition CLOP objects
- _at_rf _at_naive
- _at_svc _at_neural
- _at_gentleboost _at_lssvm
- _at_gkridge _at_kridge
- _at_klogistic _at_logitboost
- Try at the prompt example(neural)
- Try other pattern recognition objects
- Try different sets of hyperparameters, e.g.,
example(svc('gamma1', 'shrinkage0.001')) - Remember use default(method) to get the HP.
24Lab 2
Example Digit Recognition
Subset of the MNIST data of LeCun and Cortes used
for the NIPS2003 challenge
25data(X, Y)
- Go to the Gisette directory
- cd('GISETTE')
- Load validation data
- Xtload('gisette_valid.data')
- Ytload('gisette_valid.labels')
-
- Create a data object
- and examine it
- Dtdata(Xt, Yt)
- browse(Dt, 2)
- Load training data (longer)
- Xload('gisette_train.data')
- Yload('gisette_train.labels')
- p, nget_dim(Dt)
- Dtrain(subsample('p_max' num2str(p)), data(X,
Y)) - clear X Y Xt Yt
26model(hyperparam)
- Define some hyperparameters
- hyper 'degree3', 'shrinkage0.1'
- Create a kernel ridge
- regression model
- model kridge(hyper)
- Train it and test it
- resu, Model train(model, D)
- tresu test(Model, Dt)
- Visualize the results
- roc(tresu)
- idxfind(tresu.X.tresu.Ylt0)
- browse(get(D, idx), 2)
27Exercise
- Here are some pattern recognition CLOP objects
- _at_rf _at_naive _at_gentleboost
- _at_svc _at_neural _at_logitboost
- _at_kridge _at_lssvm _at_klogistic
- Instanciate a model with some hyperparameters
(use default(method) to get the HP) - Vary the HP and the number of training examples
(Hint use get(D, 1n) to restrict the data to n
examples).
28chain(model1, model2,)
- Combine preprocessing and kernel ridge
regression - my_prepronormalize
- model chain(my_prepro,kridge(hyper))
- Combine replicas of a base learner
- for k110
- base_modelkneural
- end
- modelensemble(base_model)
ensemble(model1, model2,)
29Exercise
- Here are some preprocessing CLOP objects
- _at_normalize _at_standardize _at_fourier
- Chain a preprocessing and a model, e.g.,
- modelchain(fourier, kridge('degree3'))
- my_classifsvc('coef01', 'degree4', 'gamma0',
'shrinkage0.1') - modelchain(normalize, my_classif)
- Train, test, visualize the results. Hint you can
browse the preprocessed data - browse(train(standardize, D), 2)
30Summary
- After creating your complex model, just one
command train - modelensemble(chain(standardize,kridge(hyper))
,chain(normalize,naive)) - resu, Model train(model, D)
- After training your complex model, just one
command test - tresu test(Model, Dt)
- You can use a cv object to perform
cross-validation - cv_modelcv(model)
- resu, Model train(model, D)
- roc(resu)
31Lab 3
Getting started with Feature Selection
32POLY_GUI again
- clear classes
- poly_gui
- Check the Multiplicative updates (MU) box.
- Play with the parameters.
- Try CV
- Compare with no MU
33Lab 3
Exploring feature selection methods
34Re-load the GISETTE data
- Start CLOP
- clear classes
- use_spider_clop
- Go to the Gisette directory
- cd('GISETTE')
- load('gisette')
35Visualization
- 1) Create a heatmap of the data matrix or a
subset - show(D)
- show(get(D,110, 12500))
- 2) Look at individual patterns
- browse(D)
- browse(D, 2) For 2d data
- Display feature positions
- browse(D, 2, 212, 463, 429, 239)
- 3) Make a scatter plot of a few
featuresscatter(D, 212, 463, 429, 239) -
36Example
- my_classifsvc('coef01', 'degree3', 'gamma0',
'shrinkage1') - modelchain(normalize, s2n('f_max100'),
my_classif) - resu, Model train(model, D)
- tresu test(Model, Dt)
- roc(tresu)
- Show the misclassified first
- s,idxsort(tresu.X.tresu.Y)
- browse(get(Dt, idx), 2, Model2)
37Some Filters in CLOP
- Univariate
- _at_s2n (Signal to noise ratio.)
- _at_Ttest (T statistic similar to s2n.)
- _at_Pearson (Uses Matlab corrcoef. Gives the same
results as Ttest, classes are balanced.) - _at_aucfs (ranksum test)
- Multivariate
- _at_relief (no elimination of redundancy)
- _at_gs (Gram-Schmidt orthogonalization
complementary features)
38Exercise
- Change the feature selection algorithm
- Visualize the features
- What can you say of the various methods?
- Which one gives the best results for 2, 10, 100
features? - Can you improve by changing the preprocessing?
(Hint try _at_pc_extract)
39Lab 3
Feature significance
40T-test
m-
m
P(XiY1)
P(XiY-1)
-1
xi
s-
s
- Normally distributed classes, equal variance s2
unknown estimated from data as s2within. - Null hypothesis H0 m m-
- T statistic If H0 is true,
- t (m - m-)/(swithin?1/m1/m-)
Student(mm--2 d.f.)
41Evalution of pval and FDR
- Ttest object
- computes pval analytically
- FDRpvalnsc/n
- probe object
- takes any feature ranking object as an argument
(e.g. s2n, relief, Ttest) - pvalnsp/np
- FDRpvalnsc/n
42Analytic vs. probe
43Example
- resu, FS train(Ttest, D)
- resu, PFS train(probe(Ttest), D)
- figure('Name', 'pvalue')
- plot(get_pval(FS, 1), 'r')
- hold on plot(get_pval(PFS, 1))
- figure('Name', 'FDR')
- plot(get_fdr(FS, 1), 'r')
- hold on plot(get_pval(PFS, 1))
44Exercise
- What could explain differences between the pvalue
and fdr with the analytic and probe method? - Replace Ttest with chain(rmconst('w_min0'),
Ttest) - Recompute the pvalue and fdr curves. What do you
notice? - Choose an optimum number fnum of features based
on pvalue or FDR. Visualize with browse(D, 2,FS,
fnum) - Create a model with fnum. Is fnum optimal? Do you
get something better with CV?
45Lab 3
Local feature selection
46Exercise
- Consider the 1 nearest neighbor algorithm. We
define the following score - Where s(k) (resp. d(k)) is the index of the
nearest neighbor of xk belonging to the same
class (resp. different class) as xk.
47Exercise
- Motivate the choice of such a cost function to
approximate the generalization error (qualitative
answer) - How would you derive an embedded method to
perform feature selection for 1 nearest neighbor
using this functional? - Motivate your choice (what makes your method an
embedded method and not a wrapper method)
48Relief
ReliefltDmiss/Dhitgt
Local_Relief Dmiss/Dhit
nearest hit
Dhit
Dmiss
nearest miss
Dhit
Dmiss
49Exercise
- resu, FS train(relief, D)
- browse(D, 2,FS, 20)
- resu, LFS train(local_relief,D)
- browse(D, 2,LFS, 20)
- Propose a modification to the nearest neighbor
algorithm that uses features relevant to
individual patterns (like those provided by
local_relief). - Do you anticipate such an algorithm to perform
better than the non-local version using relief?
50Epilogue
Becoming a pro and playing with other datasets
51Some CLOP objects
52http//clopinet.com/challenges/
- Challenges in
- Feature selection
- Performance prediction
- Model selection
- Causality
- Large datasets
53NIPS 2003 Feature Selection Challenge
54NIPS 2006 Model Selection Game
NOVA
First place Juha Reunanen, cross-indexing-7
Subject Re Goalie masksLines 21Tom Barrasso
wore a great mask, one time, last season. It was
all black, with Pgh city scenes on it. The
"Golden Triangle" graced the top, along with a
steel mill on one side and the Civic Arena on the
other. On the back of the helmet was the old
Pens' logo the current (at the time) Pens logo,
and a space for the "new" logo.Lori
sns shiftnscale, std standardize, norm
normalize (some details of hyperparameters not
shown)
Second place Hugo Jair Escalante Balderas,
BRun2311062
GINA
Proc. IJCNN07, Orlando, FL, Aug, 2007 PSMS for
Neural Networks H. Jair Escalante, Manuel Montes
y Gomez, and Luis Enrique Sucar Model Selection
and Assessment Using Cross-indexing, Juha Reunanen
sns shiftnscale, std standardize, norm
normalize (some details of hyperparameters not
shown) Note entry Boosting_1_001_x900 gave
better results, but was older.