Title: CS515 Neural Networks
1 2Objectives
- A generalization of the LMS algorithm, called
backpropagation, can be used to train multilayer
networks. - Backpropagation is an approximate steepest
descent algorithm, in which the performance index
is mean square error. - In order to calculate the derivatives, we need to
use the chain rule of calculus.
3Motivation
- The perceptron learning and the LMS algorithm
were designed to train single-layer
perceptron-like networks. - They are only able to solve linearly separable
classification problems. - Parallel Distributed Processing
- The multilayer perceptron, trained by the
backpropagation algorithm, is currently the most
widely used neural network.
4Three-Layer Network
Number of neurons in each layer
5Pattern Classification XOR gate
- The limitations of the single-layer perceptron
(Minsky Papert, 1969)
6Two-Layer XOR Network
AND
Individual Decisions
7Solved Problem P11.1
- Design a multilayer network to distinguish these
categories.
Class I
Class II
There is no hyperplane that can separate these
two categories.
8Solution of Problem P11.1
OR
AND
9Function Approximation
10Function Approximation
- The centers of the steps occur where the net
input to a neuron in the first layer is zero. - The steepness of each step can be adjusted by
changing the network weights.
11Effect of Parameter Changes
12Effect of Parameter Changes
13Effect of Parameter Changes
14Effect of Parameter Changes
15Function Approximation
- Two-layer networks, with sigmoid transfer
functions in the hidden layer and linear transfer
functions in the output layer, can approximate
virtually any function of interest to any degree
accuracy, provided sufficiently many hidden units
are available.
16Backpropagation Algorithm
- For multilayer networks the outputs of one layer
becomes the input to the following layer.
17Performance Index
- Training Set
- Mean Square Error
- Vector Case
- Approximate Mean Square Error
- Approximate Steepest Descent Algorithm
18Chain Rule
-
- If f(n) en and n 2w, so that f(n(w)) e2w.
- Approximate mean square error
19Sensitivity Gradient
- The net input to the ith neurons of layer m
- The sensitivity of to changes in the ith
element of the net input at layer m - Gradient
20Steepest Descent Algorithm
- The steepest descent algorithm for the
approximate mean square error - Matrix form
21BP the Sensitivity
- Backpropagation a recurrence relationship in
which the sensitivity at layer m is computed from
the sensitivity at layer m1. - Jacobian matrix
22Matrix Repression
- The i,j element of Jacobian matrix
-
23Recurrence Relation
- The recurrence relation for the sensitivity
- The sensitivities are propagated backward through
the network from the last layer to the first
layer.
24Backpropagation Algorithm
25Summary
- The first step is to propagate the input forward
through the network - The second step is to propagate the sensitivities
backward through the network - Output layer
- Hidden layer
- The final step is to update the weights and
biases
26BP Neural Network
27Ex Function Approximation
t
?
p
e
1-2-1 Network
28Network Architecture
p
a
1-2-1 Network
29Initial Values
Initial Network Response
30Forward Propagation
Initial input
Output of the 1st layer
Output of the 2nd layer
error
31Transfer Func. Derivatives
32Backpropagation
- The second layer sensitivity
- The first layer sensitivity
33Weight Update
34Choice of Network Structure
- Multilayer networks can be used to approximate
almost any function, if we have enough neurons in
the hidden layers. - We cannot say, in general, how many layers or how
many neurons are necessary for adequate
performance.
35Illustrated Example 1
1-3-1 Network
36Illustrated Example 2
1-2-1
1-3-1
1-5-1
1-4-1
37Convergence
Convergence to Global Min.
Convergence to Local Min.
The numbers to each curve indicate the sequence
of iterations.
38Generalization
- In most cases the multilayer network is trained
with a finite number of examples of proper
network behavior - This training set is normally representative of a
much larger class of possible input/output pairs. - Can the network successfully generalize what it
has learned to the total population?
39Generalization Example
1-9-1
1-2-1
Generalize well
Not generalize well
For a network to be able to generalize, it should
have fewer parameters than there are data points
in the training set.
40Objectives
- The neural networks, trained in a supervised
manner, require a target signal to define correct
network behavior. - The unsupervised learning rules give networks the
ability to learn associations between patterns
that occur together frequently. - Associative learning allows networks to perform
useful tasks such as pattern recognition (instar)
and recall (outstar).
41What is an Association?
- An association is any link between a systems
input and output such that when a pattern A is
presented to the system it will respond with
pattern B. - When two patterns are link by an association, the
input pattern is referred to as the stimulus and
the output pattern is to referred to as the
response.
42Classic Experiment
- Ivan Pavlov
- He trained a dog to salivate at the sound of a
bell, by ringing the bell whenever food was
presented. When the bell is repeatedly paired
with the food, the dog is conditioned to salivate
at the sound of the bell, even when no food is
present. - B. F. Skinner
- He trained a rat to press a bar in order to
obtain a food pellet.
43Associative Learning
- Anderson and Kohonen independently developed the
linear associator in the late 1960s and early
1970s. - Grossberg introduced nonlinear continuous-time
associative networks during the same time period.
44Simple Associative Network
- Single-Input Hard Limit Associator
- Restrict the value of p to be either 0 or 1,
indicating whether a stimulus is absent or
present. - The output a indicates the presence or absence of
the networks response.
45Two Types of Inputs
- Unconditioned Stimulus
- Analogous to the food presented to the dog in
Pavlovs experiment. - Conditioned Stimulus
- Analogous to the bell in Pavlovs experiment.
- The dog salivates only when food is presented.
This is an innate that does not have to be
learned.
46Banana Associator
- An unconditioned stimulus (banana shape) and a
conditioned stimulus (banana smell) -
- The network is to associate the shape of a
banana, but not the smell.
47Associative Learning
- Both animals and humans tend to associate things
occur simultaneously. - If a banana smell stimulus occurs simultaneously
with a banana concept response (activated by some
other stimulus such as the sight of a banana
shape), the network should strengthen the
connection between them so that later it can
activate its banana concept in response to the
banana smell alone.
48Unsupervised Hebb Rule
- Increasing the weighting wij between a neurons
input pj and output ai in proportion to their
product - Hebb rule uses only signals available within the
layer containing the weighting being updated. ?
Local learning rule - Vector form
- Learning is performed in response to the training
sequence
49Ex Banana Associator
- Initial weights
- Training sequence
- Learning rule
Sight
Banana ?
Smell
50Ex Banana Associator
- First iteration (sight fails)
-
(no
response) - Second iteration (sight works)
-
(banana)
51Ex Banana Associator
- Third iteration (sight fails)
-
(banana) - From now on, the network is capable of responding
to bananas that are detected either sight or
smell. Even if both detection systems suffer
intermittent faults, the network will be correct
most of the time.
52Problems of Hebb Rule
- Weights will become arbitrarily large
- Synapses cannot grow without bound.
- There is no mechanism for weights to decrease
- If the inputs or outputs of a Hebb network
experience ant noise, every weight will grow
(however slowly) until the network responds to
any stimulus.
53Hebb Rule with Decay
-
- ? , the decay rate, is a positive constant less
than one. - This keeps the weight matrix from growing without
bound, which can be found by setting both ai and
pj to 1, i.e.,The maximum weight value
is determined by the decay rate ?.
54Ex Banana Associator
- First iteration (sight fails) no response
- Second iteration (sight works) banana
- Third iteration (sight fails) banana
55Ex Banana Associator
Hebb Rule
Hebb with Decay
56Prob. of Hebb Rule with Decay
- Associations will decay away if stimuli are not
occasionally presented. - If ai 0, thenIf ? 0.1, this reducesto
- The weight decays by10 at each iterationfor
which ai 0(no stimulus)
57Instar (Recognition Network)
- A neuron that has a vector input and a scalar
output is referred to as an instar. - This neuron is capable of pattern recognition.
- Instar is similar to perceptron, ADALINE and
linear associator.
58Instar Operation
- Input-output expression
- The instar is active when
or - where ? is the angle between two vectors.
- If , the inner product
is maximized when the angle ? is 0. - Assume that all input vectors have the same
length (norm).
59Vector Recognition
- If , then
the instar will be only active when ? 0. - If , then
the instarwill be active for a range of angles. - The larger the value of b, the more patterns
there will be that can activate the instar, thus
making it the less discriminatory.
60Instar Rule
- Hebb rule
- Hebb rule with decay
- Instar rule a decay term, the forgetting
problem, is add that is proportion to
- If
61Graphical Representation
- For the case where the instar is active(
), - For the case where the instaris inactive (
),
62Ex Orange Recognizer
- The elements of p will be contained to ?1 values.
63Initialization Training
- Initial weights
- The instar rule (?1)
- Training sequence
- First iteration
64Second Training Iteration
- Second iteration
- The network can now recognition the orange by its
measurements.
65Third Training Iteration
Orange will now be detected if either set of
sensors works.
66Kohonen Rule
- Kohonen rule
- Learning occurs when the neurons index i is a
member of the set X(q). - The Kohonen rule can be made equivalent to the
instar rule by defining X(q) as the set of all i
such that - The Kohonen rule allows the weights of a neuron
to learn an input vector and is therefore
suitable for recognition applications.
67Ourstar (Recall Network)
- The outstar network has a scalar input and a
vector output. - It can perform pattern recall by associating a
stimulus with a vector response.
68Outstar Operation
- Input-output expression
- If we would like the outstar network to associate
a stimulus (an input of 1) with a particular
output vector a, set W a. - If p 1, a satlins(Wp) satlins(ap) a
Hence, the pattern is correctly recalled. - The column of a weight matrix represents the
pattern to be recalled.
69Outstar Rule
- In instar rule, the weight decay term of Hebb
rule is proportional to the output of network,
ai. - In outstar rule, the weight decay term of Hebb
rule is proportional to the input of network, pj. - If ? ?,
- Learning occurs whenever pj is nonzero (instead
of ai). When learning occurs, column wj moves
toward the output vector. (complimentary to
instar rule)
70Ex Pineapple Recaller
- Any set of p0 (with ?1 values) will be copied to
a.
71Initialization
- The outstar rule (?1)
- Training sequence
- Pineapple measurements
72First Training Iteration
73Second Training Iteration
- Second iteration
- The network forms an association between the
sight and the measurements.
74Third Training Iteration
- Third iteration
- Even if the measurement system fail, the network
is now able to recall the measurements of the
pineapple when it sees it.