Nature%20Inspired%20Learning:%20Classification%20and%20Prediction%20Algorithms - PowerPoint PPT Presentation

About This Presentation

Title:

Nature%20Inspired%20Learning:%20Classification%20and%20Prediction%20Algorithms

Description:

TRAINING THE SINGLE LAYER PERCEPTRON OUTLINE ... Decision boundaries of EDC, LDF, QDF and Anderson- Bahadur linear DF. AB and F are different. ... – PowerPoint PPT presentation

Number of Views:155

Avg rating:3.0/5.0

Slides: 34

Provided by: rau2

Category:

more less

Transcript and Presenter's Notes

Title: Nature%20Inspired%20Learning:%20Classification%20and%20Prediction%20Algorithms

1
Nature Inspired Learning Classification and
Prediction Algorithms

Šarunas Raudys
Computational Intelligence Group
Department of Informatics
Vilnius University. Lithuania
e-mail sarunas_at_raudys.com
Juodkrante, 2009 05 22

2
Nature inspired learning
Statics
Accuracy, and the relations between sample size,
and complexity
W S -1 (M1-M2)
Dynamics

learning rapidity
becomes a very important issue
perceptron
3
(No Transcript)
4
Nature inspired learning
y
output
nonlinearity
weighted sum
x1, x2, , xp i n p u t s

A Non-linear
Single Layer Perceptron -
a main element in the ANN theory

5
Nature inspired learning

TRAINING THE SINGLE LAYER PERCEPTRON
OUTLINE
A plot of
300 bivariate vectors (dots and pluses) sampled
from
two Gaussian pattern classes, and the linear
decision boundary

Three tasks
FINISH
START
Minimization of deviations
CLASSIFICATION
CLUSTERIZATION, if target2 target1
6

1. Cost function and training SLP used for
classification.
2. When to stop training?
3. Six types of classification Equation while
training SLP
1. Euclidean distance, (only means)
2 Regularized,
Fisher, or
Fisher with pseudo-inversion of S
5 Robust,
6. Minimal empirical error,
7 Support vector (maximal margin).
How to train SLP in the best way?

FINISH
START
CLASSIFICATION 2 category case I will speak also
about the multi-category case
7
Nature inspired learning

Training the non-linear SLP

x1, x2, , xp y
output y
nonlinearity
1 2 N
Training Data
weighted sum
inputs X (x1, x2, , xp)
net f( VTX v0), where f(net ) is a
non-linear activation function, e.g. a sigmoid
function f (net) 1/(1e-net ) f
sigmoid(net), and v0, VT (v1 , v2 , ... , vp)
are the weights of the DF.
STANDART
8

TRAINING THE SINGLE LAYER PERCEPTRON BASED
CLASSIFIER

o f( VTX v0), where f(net ) is a
non-linear activation function, and v0, VT
(v1 , v2 , ... , vp) are the weights. Cost
function (Amari 1967 Tsypkin, 1966) C 1/N
S (yj - f( VTXj v0))2, Vt1 Vt - h x
gradient, Training where h is a learning step
parameter and yj is training signal
(desired output)
x1, x2, , xp y
1 2 N
Training Data
Rule
Optimal stopping
V(FINISH) mimimum of the cost function
V(0)
A true (unknown) minimum
9
1 2 N

Training the Non-linear Single Layer Perceptron

Training Data
Vt1 Vt - h x gradient
True landscape
Videal
Finish
Training data landscape
Optimal stopping
10
Vt1 Vt - h x gradient
Early stopping
Vopt aoptVstart (1-aopt)Vfinish, where
RaudysAmari, 1998
accuracy
A general Principle
Late stopping
Majority, who stopped too late, are here.
11
Nature inspired learning

Where to use Early stopping? - Knowledge
discovery in very large databases

Data Set 2
Data Set 1
Train, however, stop training early!
In order to save previous information, stop
training early!
Data Set 3
12
Standard sum of squares cost function Standard
regression C 1/N S (yj f ( VTXj
v0))2. We assume that the data is normalized
Covariances
Let correlations between input variables x1, x2,
, xp be zero. Then components of vector V
will be proportional to correlations between x1,
x2, , xp and y. We may obtain such regression
after the first iteration.
Gradient descent training algorithm Vt1 Vt -
h x gradient
13
SLP AS SIX REGRESSIONS
START
14
Nature inspired learning. Robust regression
(yj - VTXj)2
robust
yj - VTXj
Š. Raudys (2000). Evolution and generalization of
a single neurone. III. Primitive, regularized,
standard, robust and minimax regressions. Neural
Networks 13 (3/4)507-523.
In order to obtain robust regression, instead of
square function we have to
use robust function
15
A real world problem. Use of robust regression
in order to distinguish very weak baby signal
from mothers ECG. Robust regression pays
attention to smallest deviations, not to
the largest ones considered as the outliers.
Mother and a fetus (baby) ECG. Two signals
Result the fetus signal
16
Nature inspired learning. Standard and
regularized regression
Use of statistical methods to perform diverse
whitening data transformations, where the
input variables x1, x2, , xp are decorrelated
and scaled in order to have the same variances.
Then while training the perceptron in the
transformed feature space, we can obtain standard
regression after the very first iteration.
XnewT Xold T L-1/2 F, where SXX F L
FT is a singular value decomposition of the
covariance matrix SXX.
Vstart 0,
Speeding up the calculations (a converegence)
If SXX SXX l I, we obtain regularized
regression. Moreover, we can equalize
eigenvalues and speed up training process.
17
SLP AS SEVEN STATISTICAL CLASSIFIERS
Large weights
Small weights
The simplest classifier
START
18
Nature inspired learning
Conditions to obtain Euclidean distance
classifier just after the first iteration
When we train further, we have regularized
discriminant analysis (RDA)
V t1 (2/(t-1)/h I S) -1 (M1-M2)

is regularization parameter, l 0 with an
increase in the number of training iterations

Fisher classifier,
or Fisher classifier with pseudoinverse of the
covariance matrix
19
Nature inspired learning. Standard approach.

Use the diversity of statistical methods and
multivariate models in order to obtain efficient
estimate of covariance matrix. Then perform
whitening data transformations, where the input
variables are decorrelated and scaled in order to
have the same variances.
While training the perceptron in the
transformed feature space, we can obtain the
Euclidean distance classifier after the first
iteration. In original feature space it
corresponds to the Fisher classifier or to
modification of the Fisher (it depends on a
method used to estimate covariance matrix) in
original feature space.

Fisher classifier
Untransformed data
Transformed data
Euclidean classifier Fisher in original space
Euclidean classifier
20
Nature inspired learning

Generalisation errors. EDC, Fisher and Quadratic
classifiers

21
A real world problem. Dozens of ways used to
estimate covariance matrix and perform
whitening data transformation. It is an
additional information (if correct), that can be
useful in SLP training
196-dimensional data
S. Raudys, M. Iwamura. Structures of covariance
matrix in handwritten character recognition.
Lecture Notes in Computer Science, 3138, 725-733,
2004. S. Raudys, A. Saudargiene. First-order
tree-type dependence between variables and
classification performance. IEEE Trans. on
Pattern Analysis and Machine Intelligence. Vol.
PAMI-23 (2), pp. 233-239, 2001.
22
Covariance matrices are different.
Decision boundaries of EDC, LDF, QDF and
Anderson- Bahadur linear DF. AB and F are
different.
If we would start with the AB decision boundary,
not with the Fisher, it would be better. Hence,
we have proposed a special method of input data
transformation.
Q Fisher AB
S. Raudys (2004). Integration of statistical and
neural methods to design classifiers in case of
unequal covariance matrices. Lecture Notes in
Artificial intelligence, Springer-Verlag. Vol.
3238, pp. 270-280
23
Non-linear discrimination. Similarity features
LNCS 3686, pp. 136 145, 2005
b
a
SV classifier
KDA
Generalization error
SV classifier
c
d
SLP
optimal stopping of SLP
SLP
epochs
100100 2D two class training vectors (pluses and
circles) and decision boundaries of Kernel
Discriminant Analysis (a), SVM (b), SLP trained
in 200D dissimilarity feature space (c). Learning
curve generalization error of SLP classifier as
a function of number of training epochs (d).
24
Nature inspired learning. A noise injection
A coloured noise, used to form
pseudo-validation set we are adding a noise in
directions of closest training vectors. So, we
almost do not distort geometry of the
data. In this technique, we use
additional information a space between
neighboring points in multidimensional feature
space is not empty it is filled by vectors of
the same class.
A pseudo-validation data set
used to realize early stopping
25
Nature inspired learning. Multi-category cases
1
2
Pair-wise classifiers optimally stopped (noise)
SLPs H-T fusion. Wee need to obtain the
classifier (SLP) of optimal complexity Early
stopping
26
Learning Rapidity. Two Pattern Recognition (PR)
tasks
A time to learn the second task is restricted,
say 300 training epochs
Parameters that affect learning rapidity h
learning step the weights growth
s target1 target2
Regularization a) weight decay term, b) a
noise injection to input vectors, c) a corruption
of the targets Wstart w x Wstart. w also
controls learning rapidity
h, s, and w
27

Optimal values of learning
parameters

of epochs
h, s, and w
s
s, and w
s target1 target2
w
h the learning step
28
Collective learning. A l e n g t h y
sequence of diverse PR tasks
The angle and/or the time between two changes are
varying all the time
29
The multi-agent system composed of adaptive
agents the single layer perceptrons
In order to survive the agents should learn
rapidly. Unsuccessful agents are replaced by
newborn. Inside the group the agents help each
other. In a case of emergency, they help to the
weakest groups. Genetics learning and adaptive
one.
A moral a single agent (SLP) can not learn very
long sequence of the PR tasks successfully
30
A power of the PR task changes and parameter s
as a function of time
A power of the changes
PR task changes
s t1-t2
s is following the variation of the power of the
changes
I tried to learn s, emotions, altruism,
the noise intensity, a length of learning set,
e.t.c.
31
Integrating Statistical Methods and Neural
Networks. Nature inspired learning
Regression Neural Networks, 13 (3/4), pp.
507-523, 2000
The theory for equal covariance matrix case
The theory for unequal covariance matrices and
multicategory cases LNCS, 4432, pp. 1 10, 2007
LNCS, 4472, pp. 6271, 2007 LNCS, 4142, pp. 47
56, 2006 LNAI, 3238, pp. 270-280, 2004
JMLR, ICNC'08
32
(No Transcript)
33
(No Transcript)

Write a Comment

User Comments (0)