Title: Pattern recognition and classification II
1- Lecture 8
- Pattern recognition and classification II
- Contents
- Neural networks
2Classification using neural networks Basic
principles The network transforms an input
vector to an output vector. No logical/mathematica
l rules are known about how the output is related
to the input. Only examples are
available. Training The parameters of the
network is adjusted to that the network
reproduces the e Xamples as good as
possible. Recognition The network receives
input which may or may not be among the examples
. The output is the recognition result .
3Performance Generalization capability Correct
guess in case of example not included in the
training Speed of training Speed of
recognition Adaptivity Adjustment of internal
parameters according to new examples presented
4(No Transcript)
5Neural networks in image analysis Pattern
classification in image analysis can be based on
neural networks. The preprocessed image defines
in some way the input of the net, and the output
is designed to define the resulting class of the
image. Most simple is to assign each binary
pixel to an input neuron, and the output consists
of K output neurons . If the input is an image of
class k, then the net is adjusted to give the
output neuron k 1, the rest has output value
0) . Adjusted means that the weights of the
network are chosen to give approximately the
correct output when the training examples are
applied. A good, well-trained network gives high
value of the kth output neuron and low values of
the remaining output neurons, if an image
similar to a class k image is
input. ___________________________________________
____ ) Other mapping of the between the output
vector and the class index can also be used, for
example the binary representation of the index
6OCR (optical character recognition) Example The
10 characters 0,1 . .9 normaized to a binary
bitmap of 10 x 10 pixels
7 . . . X X X X . . . . . X . . . . X . . . . X
. . . . . . . . . X . . . . . . . . . X . X X X
. . . . . X X . . . X . . . . X . . . . X . .
. . X . . . . X . . . . X . . . . X . . . . . X
X X X . . . 6 . X X X X X X X . . . . . . . . X
. . . . . . . . . X . . . . . . . . X . . . .
. . . . . X . . . . . . . . X . . . . . . . . .
X . . . . . . . . X . . . . . . . . . X . . . .
. . . . . X . . . . . . 7
Typical example Input 10 x 10 binary
numbers Output Class index
8Notation concerning the net L total number of
layers minus 1, l is the layer index, intput has
l0, output has lL nl is the number of neurons
in layer l hl,i is the summed input to the ith
neuron of layer l yl,i is the output from
ith neuron of layer l wl,i,j is the weight
between neuron i in layer l-1 and neuron j
in layer l gl(h) is the activation function of
layer l hl,i S over j w l,i,j y l-1,j yl,i
gl(hl,i)
9Notation concerning the example set m is the
example index ym0,j is the signal on the jth
input neuron for example m tmj is the target
(i.e. the desired) signal on the jth output
neuron for example m Combined net and example
quantity ymL,j is the signal on the jth output
neuron when the input of example m is
applied Cost function E ½ S over m S over i
(ymL,i tmi)2
10Minimize E ½ S over m S over i (ymL,i
tmi)2 or try solving many equations
ymL,i tmi 0 for m 1..M, and i 1..nL The
equations are still NONLINEAR Number of equations
number of examples MnL Number of unknowns
number of weights Nw Two extremes MnL gtgt Nw
Overdetermined system of equations Many local
minima, the global minumim has E gt 0
Good generalizaton capability MnL lt Nw
Underdetermined system of equations
Few local minima, the global minumim has E 0
Poor generalization capability
11- Partial derivatives
- ?E/wl,i,j can be calculated
- Gradient descent
- wl,i,j,new wl,i,j,old - ? ?E/wl,i,j
- If ? is not too large and ?E/wl,i,j is not zero,
the cost function will decrease in such a step. - Aim
- To reach the global minimun for E in reasonable
time - Traps
- Training ends in a poor, local minimum
- The training is takes too long time
12- Modern neural net has more than 1000 weights
- gtSearch for a minimum in 1000-dimensional space
- One cannot be sure that a minimum found is global
- Tricks in the network design
- Chosing good input representations
- Chosing good numbers of hidden layers and neuron
numbers in each hidden layers - Tricks in the training procedure
- Chosing good start guesses of weights
- Chosing good values of ?
- (possibly dynamically adjusted)
- 3 . Chosing a good stop criterion
- 4 . Applying selective brain damage
13Test Performance test of classification/
recognition on examples which has not been
included in the training for the purpose of
redesign/retraining Validation Performance test
of classification/ recognition on examples which
has not been included in the training for the
purpose of benchmarking Adaptive adjustment
Changes of weights by dynamically including new
training examples
14Overtraining
Stop training here
15- Efficient way of calculating ?E/wl,i,j
Backpropagation - Here the determination of partial derivatives has
the same complexity as a forward calculation - Modern research in neural networks Other
updating opdating schemes than backpropagation.
Automatic removal of outliers among examples - Two different principles of gradient decent
- Batch training all examples produce average
value of ?E/wl,i,j which then is used for
updating the weights - Online training The value of ?E/wl,i,j is
calculated for each example and used for updating
weights before going the next example.
16Advantage of neural networks over expert
systems No model, no logical/mathematical rules
invovled Disadvantage of neural networks When a
network makes a mistake it is often against all
expectations, i.e. the mistaken case seems to lie
inside the generality of the training examples.
When the network is redesigned or retrained
including the mistaken case as example, then new
surprising mistakes may appear. In other words
there is no such thing as a perfect neural
network (unless all possible inputs are trained
and the network is sufficiently large)
17- Applications of neural networks
- Timeseries (say stock prices)
- Hyphenation
- OCR of postal zip-codes (written by hand)
- Assesment of bank customers (by the bank before
lending money)
18The following is a provokative example Input
vector in case of 4 Assesment of bank customers
(by the bank before lending money) Input vector
age, married or not, number of devorces,
number of children, age of death of already dead
parents, sisters and children, did or did not do
military service, hair color, zip-code of
residence, color blind or not, left- handed or
not, smoker or not, diabetics or not, etc, etc
. Output risk of lending money Training using
input vectors and output of previous customers