Title: One-layer neural networks Classification problems
1One-layer neural networksClassification problems
- Architecture and functioning
- Applicability
- Classification problems. Linear and nonlinear
separability - The Perceptron. The learning algorithm
2Architecture
- One layer NN one layer of input units and one
layer of functional units
Fictive unit
-1
W
Y
X
Total connectivity
Output vector
Input vector
N input units
M functional units (output units
3Functioning
4Functioning
- Computing the output signal
- Remarks
- In the following we shall use X instead of bar X
to denote the extended vector - The output units have usually the same activation
function
5Applicability
- Classification
- Problem find the class corresponding to an
object described by a set of features - The training set contains examples of correctly
classified objects
6Applicability
- Approximation (regression)
- Problem estimate a functional dependence
between two variables - The training set contains pairs of corresponding
values
7Classification problems
- Input feature vectors
- Output class labels
- Basic notions
- Feature space (S) the set of all feature vectors
(patterns) - Example all representations of letters
- Class subset of S containing objects with
similar features - Example class of representations of letter A
- Classifier system which decide to what class a
given object belong - Example method based on the nearest neighbor
with respect to a standard pattern
8Classification problems
0
0
1
0
0
(matrix of active/inactive pixels)
0
1
1
0
0
0
1
0
0
1
1
(0,0,1,0,0,0,1,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,1,0,0
)
0
1
0
0
1
0
0
1
0
0
1
0
Presence of Horizontal line Vertical
line Oblique (right) Oblique (left) Curves
Vector of presence of some properties
(0,1,1,0,0)
9Classification problems
- A more formal approach a classifier is
equivalent with a partitioning of S in subsets
based on some decision functions
10Classification problems
- Example N2, M2 (bidimensional data and 2
classes)
Decision function
Linear separable classes
Nonlinear separable classes
11Classification problems
- In the N-dimensional case two classes are
considered to be linearly separable if there
exist a hyperplane which separates them
The concept of linear separability can be
extended to the case of more than two classes
12Classification problems
- The case of three classes
-
-
-
Strong linearly separable classes
Linearly separable
Undefined regions
13Classification problems
- Remarks
- A classification problem is considered to be
linearly separable if there exist hyperplanes
which separate the classes - The strongly linearly separable problems
correspond to situations where the classes are
more clearly separated than in the case of just
linearly separable ones - From applications point of view linearly
separable means that the classes are clearly
separated
14One unit perceptron
- It is the simplest neural network for
classification - It allows the classification in two linearly
separable classes
Interpretation of the output if y-1 then X
belongs to Class 1 if y1 then X belongs
to Class 2
Architecture and functioning
X
W
15One unit perceptron
- Perceptrons training algorithm (Rosenblatt)
- Training set (X1,d1), (X2,d2), . (XL,dL)
where Xl is a features vector and dl is -1 if Xl
should belong to Class 1 and 1 if it should
belong to Class 2 - The training is an iterative process based on
scanning the training set for several times until
the desired behavior is obtained - At each iteration for each example l from the
training set the weights are adjusted based on
16One unit perceptron
- Perceptrons training algorithm (Rosenblatt)
Which means that the weights are adjusted only in
the case when the network gives a wrong answer.
In such a situation the adjustment is just
17One unit perceptron
The parameter eta (correction step or learning
rate) is a positive value which can be chosen
such that after the correction the network gives
the right answer for the l-th example
18One unit perceptron
- Convergence of the perceptrons learning
algorithm - For a linearly separable problem the algorithm
will converge in a finite number of steps to the
coefficients of a boundary hyperplane - Remarks.
- Thy hypothesis that the classes are linearly
separable is essential - This property of convergence in a finite number
of steps is unique in the field of learning
algorithms - The initial values of the weights can influence
the number of steps until convergence but it
cannot prevent the convergence
19Perceptrons with multiple output units
- Classification in more than two linearly
separable classes (e.g. M classes) - If the classes are strongly linearly separable
then one can use M simple perceptrons which can
be trained independently - If the classes are just linearly separable then
the perceptrons cannot be trained separately. In
this case we should use the so-called multiple
perceptron
20Multiple perceptron
- Architecture, functioning and output
interpretation - N input units
- M output units with linear activation function
(each output unit is associated to a class)
- Interpretation of the result
- The index of the maximal value in Y is the class
label to which X belongs - If Y is normalized (all elements are in 0,1 and
their sum is 1) then they can be interpreted as
probabilities (yi is the probability that the
input X belongs to class Ci)
X
YWX
M
N
21Multiple perceptron
- Learning algorithm
- Training set (X1,d1), .,(XL,dL) where dl is
from 1,,M and is the index of the class to
which Xl belongs - Step 1 initialization
- Initialize the elements of W (M lines and N1
columns) with randomly selected values from
-1,1 - Initialize the iteration counter k0
- Step 2 iterative adjustment
- REPEAT
- scan the training set and adjust the weights
- kk1
- UNTIL kkmax OR correct1 (the network learned
the training set)
22Multiple perceptron
- Scan the training set and adjust the weights.
- FOR l1,L do
- correct1
- Compute YWXl
- Find i the index of the maximum in Y
- IF i ltgt dl THEN
- Wi Wi - etaXl
- WdlWdl etaXl
- correct0
- ENDIF
- ENDFOR
23Multiple perceptron
- Remarks
- Wi denotes the row i of matrix W (X is also
considered to be a row vector) - If there are more than one maximal value in Y
then i can be any of them - The learning rate eta has a similar role as in
the case of the simple perceptrons - If the classes are linearly separable then the
learning algorithm is convergent
24Nonlinearly separable problems
- Classification is similar to representing boolean
functions - f0,1N-gt0,1
For linear separable problems one layer is enough
For nonlinear separable problems introducing a
hidden layer is necessary