Pattern recognition and classification II - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Pattern recognition and classification II

Description:

The network transforms an input vector to an output vector. ... Assesment of bank customers (by the bank before lending money) ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 19

Provided by: ivarba

Category:

more less

Transcript and Presenter's Notes

Title: Pattern recognition and classification II

1

Lecture 8
Pattern recognition and classification II
Contents
Neural networks

2
Classification using neural networks Basic
principles The network transforms an input
vector to an output vector. No logical/mathematica
l rules are known about how the output is related
to the input. Only examples are
available. Training The parameters of the
network is adjusted to that the network
reproduces the e Xamples as good as
possible. Recognition The network receives
input which may or may not be among the examples
. The output is the recognition result .

3
Performance Generalization capability Correct
guess in case of example not included in the
training Speed of training Speed of
recognition Adaptivity Adjustment of internal
parameters according to new examples presented

4
(No Transcript)
5
Neural networks in image analysis Pattern
classification in image analysis can be based on
neural networks. The preprocessed image defines
in some way the input of the net, and the output
is designed to define the resulting class of the
image. Most simple is to assign each binary
pixel to an input neuron, and the output consists
of K output neurons . If the input is an image of
class k, then the net is adjusted to give the
output neuron k 1, the rest has output value
0) . Adjusted means that the weights of the
network are chosen to give approximately the
correct output when the training examples are
applied. A good, well-trained network gives high
value of the kth output neuron and low values of
the remaining output neurons, if an image
similar to a class k image is
input. ___________________________________________
____ ) Other mapping of the between the output
vector and the class index can also be used, for
example the binary representation of the index
6
OCR (optical character recognition) Example The
10 characters 0,1 . .9 normaized to a binary
bitmap of 10 x 10 pixels
7
. . . X X X X . . . . . X . . . . X . . . . X
. . . . . . . . . X . . . . . . . . . X . X X X
. . . . . X X . . . X . . . . X . . . . X . .
. . X . . . . X . . . . X . . . . X . . . . . X
X X X . . . 6 . X X X X X X X . . . . . . . . X
. . . . . . . . . X . . . . . . . . X . . . .
. . . . . X . . . . . . . . X . . . . . . . . .
X . . . . . . . . X . . . . . . . . . X . . . .
. . . . . X . . . . . . 7
Typical example Input 10 x 10 binary
numbers Output Class index
8
Notation concerning the net L total number of
layers minus 1, l is the layer index, intput has
l0, output has lL nl is the number of neurons
in layer l hl,i is the summed input to the ith
neuron of layer l yl,i is the output from
ith neuron of layer l wl,i,j is the weight
between neuron i in layer l-1 and neuron j
in layer l gl(h) is the activation function of
layer l hl,i S over j w l,i,j y l-1,j yl,i
gl(hl,i)
9
Notation concerning the example set m is the
example index ym0,j is the signal on the jth
input neuron for example m tmj is the target
(i.e. the desired) signal on the jth output
neuron for example m Combined net and example
quantity ymL,j is the signal on the jth output
neuron when the input of example m is
applied Cost function E ½ S over m S over i
(ymL,i tmi)2
10
Minimize E ½ S over m S over i (ymL,i
tmi)2 or try solving many equations
ymL,i tmi 0 for m 1..M, and i 1..nL The
equations are still NONLINEAR Number of equations
number of examples MnL Number of unknowns
number of weights Nw Two extremes MnL gtgt Nw
Overdetermined system of equations Many local
minima, the global minumim has E gt 0
Good generalizaton capability MnL lt Nw
Underdetermined system of equations
Few local minima, the global minumim has E 0
Poor generalization capability
11

Partial derivatives
?E/wl,i,j can be calculated
Gradient descent
wl,i,j,new wl,i,j,old - ? ?E/wl,i,j
If ? is not too large and ?E/wl,i,j is not zero,
the cost function will decrease in such a step.
Aim
To reach the global minimun for E in reasonable
time
Traps
Training ends in a poor, local minimum
The training is takes too long time

Modern neural net has more than 1000 weights
gtSearch for a minimum in 1000-dimensional space
One cannot be sure that a minimum found is global
Tricks in the network design
Chosing good input representations
Chosing good numbers of hidden layers and neuron
numbers in each hidden layers
Tricks in the training procedure
Chosing good start guesses of weights
Chosing good values of ?
(possibly dynamically adjusted)
3 . Chosing a good stop criterion
4 . Applying selective brain damage

13
Test Performance test of classification/
recognition on examples which has not been
included in the training for the purpose of
redesign/retraining Validation Performance test
of classification/ recognition on examples which
has not been included in the training for the
purpose of benchmarking Adaptive adjustment
Changes of weights by dynamically including new
training examples
14
Overtraining
Stop training here
15

Efficient way of calculating ?E/wl,i,j
Backpropagation
Here the determination of partial derivatives has
the same complexity as a forward calculation
Modern research in neural networks Other
updating opdating schemes than backpropagation.
Automatic removal of outliers among examples
Two different principles of gradient decent
Batch training all examples produce average
value of ?E/wl,i,j which then is used for
updating the weights
Online training The value of ?E/wl,i,j is
calculated for each example and used for updating
weights before going the next example.

16
Advantage of neural networks over expert
systems No model, no logical/mathematical rules
invovled Disadvantage of neural networks When a
network makes a mistake it is often against all
expectations, i.e. the mistaken case seems to lie
inside the generality of the training examples.
When the network is redesigned or retrained
including the mistaken case as example, then new
surprising mistakes may appear. In other words
there is no such thing as a perfect neural
network (unless all possible inputs are trained
and the network is sufficiently large)
17

Applications of neural networks
Timeseries (say stock prices)
Hyphenation
OCR of postal zip-codes (written by hand)
Assesment of bank customers (by the bank before
lending money)

18
The following is a provokative example Input
vector in case of 4 Assesment of bank customers
(by the bank before lending money) Input vector
age, married or not, number of devorces,
number of children, age of death of already dead
parents, sisters and children, did or did not do
military service, hair color, zip-code of
residence, color blind or not, left- handed or
not, smoker or not, diabetics or not, etc, etc
. Output risk of lending money Training using
input vectors and output of previous customers

Write a Comment

User Comments (0)