Lab 1 - PowerPoint PPT Presentation

About This Presentation

Title:

Lab 1

Description:

Why is the LOO error sometimes larger than the training and test error? ... testD = data(rand(3,8), [-1 -1 1]'); tresu = test(model, testD) ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 55

Provided by: Isabell47

Category:

Tags: lab

more less

Transcript and Presenter's Notes

Title: Lab 1

1
Lab 1
Getting started with Basic Learning
Machines and the Overfitting Problem
2
Lab 1
Polynomial regression
3
Matlab POLY_GUI

The code implements the ridge regression
algorithm wargmin Si (1-yi f(xi))2 g w 2
f(x) w1 x w2 x2 wn xn w xT
x x, x2, , xn
wT XY
X XT(XXTg)-1(XTX g)-1XT
Xx(1) x(2) x(p) (matrix (p, n))
The leave-one-out error (LOO) is obtained with
PRESS statistic (Predicted REsidual Sums of
Squares.)
LOO error (1/p) Sk rk/1-(XX)kk 2

4
Matlab POLY_GUI
5
Matlab POLY_GUI

At the prompt type poly_gui
Vary the parameters. Refrain from hitting CV.
Explain what happens in the following situations
Sample num. ltlt Target degree (small noise)
Large noise, small sample num
Target degree ltlt Model degree
Why is the LOO error sometimes larger than the
training and test error?
Are there local minima in the LOO error? Is the
LOO error flat near the optimum?
Propose ways of getting a better solution.

6
CLOP Data Objects
The poly_gui emulates CLOP objects of type data

X rand(10,5)
Y rand(10,1)
D data(X,Y) constructor
methods(D)
get_x(D)
get_y(D)
plot(D)

7
CLOP Model Objects
poly_ridge is a model object.

P poly_ridge h plot(P)
D gene(P) plot(D, h)
resu, P train(P, D)
mse(resu)
Dt gene(P)
tresu, P test(P, Dt)
mse(tresu)
plot(P, h)

8
Lab 1
Support Vector Machines
9
Support Vector Classifier
Boser-Guyon-Vapnik-1992
10
Matlab SVC_GUI

At the prompt type svc_gui
The code implements the Support Vector Machine
algorithm with kernel
k(s, t) (1 s ? t)q exp -gs-t2
Regularization similar to ridge regression
Hinge loss L(xi)max(0, 1-yi f(xi))b
Empirical risk Si L(xi)
wargmin (1/C) w2 Si L(xi)

shrinkage
11
Lab 1
More loss functions
12
Loss Functions
13
Exercise Gradient Descent

Linear discriminant f(x) Sj wj xj
Functional margin zy f(x), y?1
Compute ?z/ ?wj
Derive the learning rules Dwj-h ?L/?wj
corresponding to the following loss functions

SVC loss max(0, 1-z)
Adaboost loss e-z
square loss (1- z)2
logistic loss log(1e-z)
Perceptron loss max(0, -z)
14
Exercise Dual Algorithms

From the Dwj derive the Dw
w Si ai xi
From the Dw, derive the Dai of the dual
algorithms.

15
Summary

Modern ML algorithms optimize a penalized risk
functional

16
Lab 2
Getting started with CLOP
17
Lab 2
CLOP tutorial
18
What is CLOP?

CLOPChallenge Learning Object Package.
Based on the Spider developed at the Max Planck
Institute.
Two basic abstractions
Data object
Model object
Put the CLOP directory in your path.
At the prompt type use_spider_clop
If you have used before poly_gui type
clear classes

19
CLOP Data Objects
At the Matlab prompt

addpath(ltclop_dirgt)
use_spider_clop
Xrand(10,8)
Y1 1 1 1 1 -1 -1 -1 -1 -1'
Ddata(X,Y) constructor
p,nget_dim(D)
get_x(D)
get_y(D)

20
CLOP Model Objects
D is a data object previously defined.

model kridge constructor
resu, model train(model, D)
resu, model.W, model.b0
Yhat D.Xmodel.W' model.b0
testD data(rand(3,8), -1 -1 1')
tresu test(model, testD)
balanced_errate(tresu.X, tresu.Y)

21
Hyperparameters and Chains
A model often has hyperparameters

default(kridge)
hyper 'degree3', 'shrinkage0.1'
model kridge(hyper)
model chain(standardize,kridge(hyper))
resu, model train(model, D)
tresu test(model, testD)
balanced_errate(tresu.X, tresu.Y)

Models can be chained
22
Hyper-parameters

Kernel methods kridge and svc
k(x, y) (coef0 x ? y)degree exp(-gamma x -
y2)
kij k(xi, xj)
kii ? kii shrinkage
Naïve Bayes naive none
Neural network neural
units, shrinkage, maxiter
Random Forest rf (windows only)
mtry

23
Exercise

Here some the pattern recognition CLOP objects
_at_rf _at_naive
_at_svc _at_neural
_at_gentleboost _at_lssvm
_at_gkridge _at_kridge
_at_klogistic _at_logitboost
Try at the prompt example(neural)
Try other pattern recognition objects
Try different sets of hyperparameters, e.g.,
example(svc('gamma1', 'shrinkage0.001'))
Remember use default(method) to get the HP.

24
Lab 2
Example Digit Recognition
Subset of the MNIST data of LeCun and Cortes used
for the NIPS2003 challenge
25
data(X, Y)

Go to the Gisette directory
cd('GISETTE')
Load validation data
Xtload('gisette_valid.data')
Ytload('gisette_valid.labels')
Create a data object
and examine it
Dtdata(Xt, Yt)
browse(Dt, 2)
Load training data (longer)
Xload('gisette_train.data')
Yload('gisette_train.labels')
p, nget_dim(Dt)
Dtrain(subsample('p_max' num2str(p)), data(X,
Y))
clear X Y Xt Yt

26
model(hyperparam)

Define some hyperparameters
hyper 'degree3', 'shrinkage0.1'
Create a kernel ridge
regression model
model kridge(hyper)
Train it and test it
resu, Model train(model, D)
tresu test(Model, Dt)
Visualize the results
roc(tresu)
idxfind(tresu.X.tresu.Ylt0)
browse(get(D, idx), 2)

27
Exercise

Here are some pattern recognition CLOP objects
_at_rf _at_naive _at_gentleboost
_at_svc _at_neural _at_logitboost
_at_kridge _at_lssvm _at_klogistic
Instanciate a model with some hyperparameters
(use default(method) to get the HP)
Vary the HP and the number of training examples
(Hint use get(D, 1n) to restrict the data to n
examples).

28
chain(model1, model2,)

Combine preprocessing and kernel ridge
regression
my_prepronormalize
model chain(my_prepro,kridge(hyper))
Combine replicas of a base learner
for k110
base_modelkneural
end
modelensemble(base_model)

ensemble(model1, model2,)
29
Exercise

Here are some preprocessing CLOP objects
_at_normalize _at_standardize _at_fourier
Chain a preprocessing and a model, e.g.,
modelchain(fourier, kridge('degree3'))
my_classifsvc('coef01', 'degree4', 'gamma0',
'shrinkage0.1')
modelchain(normalize, my_classif)
Train, test, visualize the results. Hint you can
browse the preprocessed data
browse(train(standardize, D), 2)

30
Summary

After creating your complex model, just one
command train
modelensemble(chain(standardize,kridge(hyper))
,chain(normalize,naive))
resu, Model train(model, D)
After training your complex model, just one
command test
tresu test(Model, Dt)
You can use a cv object to perform
cross-validation
cv_modelcv(model)
resu, Model train(model, D)
roc(resu)

31
Lab 3
Getting started with Feature Selection
32
POLY_GUI again

clear classes
poly_gui
Check the Multiplicative updates (MU) box.
Play with the parameters.
Try CV
Compare with no MU

33
Lab 3
Exploring feature selection methods
34
Re-load the GISETTE data

Start CLOP
clear classes
use_spider_clop
Go to the Gisette directory
cd('GISETTE')
load('gisette')

35
Visualization

1) Create a heatmap of the data matrix or a
subset
show(D)
show(get(D,110, 12500))
2) Look at individual patterns
browse(D)
browse(D, 2) For 2d data
Display feature positions
browse(D, 2, 212, 463, 429, 239)
3) Make a scatter plot of a few
featuresscatter(D, 212, 463, 429, 239)

36
Example

my_classifsvc('coef01', 'degree3', 'gamma0',
'shrinkage1')
modelchain(normalize, s2n('f_max100'),
my_classif)
resu, Model train(model, D)
tresu test(Model, Dt)
roc(tresu)
Show the misclassified first
s,idxsort(tresu.X.tresu.Y)
browse(get(Dt, idx), 2, Model2)

37
Some Filters in CLOP

Univariate
_at_s2n (Signal to noise ratio.)
_at_Ttest (T statistic similar to s2n.)
_at_Pearson (Uses Matlab corrcoef. Gives the same
results as Ttest, classes are balanced.)
_at_aucfs (ranksum test)
Multivariate
_at_relief (no elimination of redundancy)
_at_gs (Gram-Schmidt orthogonalization
complementary features)

38
Exercise

Change the feature selection algorithm
Visualize the features
What can you say of the various methods?
Which one gives the best results for 2, 10, 100
features?
Can you improve by changing the preprocessing?
(Hint try _at_pc_extract)

39
Lab 3
Feature significance
40
T-test
m-
m
P(XiY1)
P(XiY-1)
-1
xi
s-
s

Normally distributed classes, equal variance s2
unknown estimated from data as s2within.
Null hypothesis H0 m m-
T statistic If H0 is true,
t (m - m-)/(swithin?1/m1/m-)
Student(mm--2 d.f.)

41
Evalution of pval and FDR

Ttest object
computes pval analytically
FDRpvalnsc/n
probe object
takes any feature ranking object as an argument
(e.g. s2n, relief, Ttest)
pvalnsp/np
FDRpvalnsc/n

42
Analytic vs. probe
43
Example

resu, FS train(Ttest, D)
resu, PFS train(probe(Ttest), D)
figure('Name', 'pvalue')
plot(get_pval(FS, 1), 'r')
hold on plot(get_pval(PFS, 1))
figure('Name', 'FDR')
plot(get_fdr(FS, 1), 'r')
hold on plot(get_pval(PFS, 1))

44
Exercise

What could explain differences between the pvalue
and fdr with the analytic and probe method?
Replace Ttest with chain(rmconst('w_min0'),
Ttest)
Recompute the pvalue and fdr curves. What do you
notice?
Choose an optimum number fnum of features based
on pvalue or FDR. Visualize with browse(D, 2,FS,
fnum)
Create a model with fnum. Is fnum optimal? Do you
get something better with CV?

45
Lab 3
Local feature selection
46
Exercise

Consider the 1 nearest neighbor algorithm. We
define the following score
Where s(k) (resp. d(k)) is the index of the
nearest neighbor of xk belonging to the same
class (resp. different class) as xk.

47
Exercise

Motivate the choice of such a cost function to
approximate the generalization error (qualitative
answer)
How would you derive an embedded method to
perform feature selection for 1 nearest neighbor
using this functional?
Motivate your choice (what makes your method an
embedded method and not a wrapper method)

48
Relief
ReliefltDmiss/Dhitgt
Local_Relief Dmiss/Dhit
nearest hit
Dhit
Dmiss
nearest miss
Dhit
Dmiss
49
Exercise

resu, FS train(relief, D)
browse(D, 2,FS, 20)
resu, LFS train(local_relief,D)
browse(D, 2,LFS, 20)

Propose a modification to the nearest neighbor
algorithm that uses features relevant to
individual patterns (like those provided by
local_relief).
Do you anticipate such an algorithm to perform
better than the non-local version using relief?

50
Epilogue
Becoming a pro and playing with other datasets
51
Some CLOP objects
52
http//clopinet.com/challenges/

Challenges in
Feature selection
Performance prediction
Model selection
Causality
Large datasets

53
NIPS 2003 Feature Selection Challenge
54
NIPS 2006 Model Selection Game
NOVA
First place Juha Reunanen, cross-indexing-7
Subject Re Goalie masksLines 21Tom Barrasso
wore a great mask, one time, last season. It was
all black, with Pgh city scenes on it. The
"Golden Triangle" graced the top, along with a
steel mill on one side and the Civic Arena on the
other. On the back of the helmet was the old
Pens' logo the current (at the time) Pens logo,
and a space for the "new" logo.Lori
sns shiftnscale, std standardize, norm
normalize (some details of hyperparameters not
shown)
Second place Hugo Jair Escalante Balderas,
BRun2311062

GINA
Proc. IJCNN07, Orlando, FL, Aug, 2007 PSMS for
Neural Networks H. Jair Escalante, Manuel Montes
y Gomez, and Luis Enrique Sucar Model Selection
and Assessment Using Cross-indexing, Juha Reunanen

sns shiftnscale, std standardize, norm
normalize (some details of hyperparameters not
shown) Note entry Boosting_1_001_x900 gave
better results, but was older.

Write a Comment

User Comments (0)