CS515 Neural Networks - PowerPoint PPT Presentation

1 / 74

About This Presentation

Title:

CS515 Neural Networks

Description:

... salivate at the sound of a bell, by ringing the bell whenever food was presented. When the bell is repeatedly paired with the food, the dog is conditioned to ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 75

Provided by: berr51

Category:

more less

Transcript and Presenter's Notes

Title: CS515 Neural Networks

1

Back-Propagation

2
Objectives

A generalization of the LMS algorithm, called
backpropagation, can be used to train multilayer
networks.
Backpropagation is an approximate steepest
descent algorithm, in which the performance index
is mean square error.
In order to calculate the derivatives, we need to
use the chain rule of calculus.

3
Motivation

The perceptron learning and the LMS algorithm
were designed to train single-layer
perceptron-like networks.
They are only able to solve linearly separable
classification problems.
Parallel Distributed Processing
The multilayer perceptron, trained by the
backpropagation algorithm, is currently the most
widely used neural network.

4
Three-Layer Network
Number of neurons in each layer
5
Pattern Classification XOR gate

The limitations of the single-layer perceptron
(Minsky Papert, 1969)

6
Two-Layer XOR Network

Two-layer, 1-2-1 network

AND
Individual Decisions
7
Solved Problem P11.1

Design a multilayer network to distinguish these
categories.

Class I
Class II
There is no hyperplane that can separate these
two categories.
8
Solution of Problem P11.1
OR
AND
9
Function Approximation

Two-layer, 1-2-1 network

10
Function Approximation

The centers of the steps occur where the net
input to a neuron in the first layer is zero.
The steepness of each step can be adjusted by
changing the network weights.

11
Effect of Parameter Changes
12
Effect of Parameter Changes
13
Effect of Parameter Changes
14
Effect of Parameter Changes
15
Function Approximation

Two-layer networks, with sigmoid transfer
functions in the hidden layer and linear transfer
functions in the output layer, can approximate
virtually any function of interest to any degree
accuracy, provided sufficiently many hidden units
are available.

16
Backpropagation Algorithm

For multilayer networks the outputs of one layer
becomes the input to the following layer.

17
Performance Index

Training Set
Mean Square Error
Vector Case
Approximate Mean Square Error
Approximate Steepest Descent Algorithm

18
Chain Rule

If f(n) en and n 2w, so that f(n(w)) e2w.
Approximate mean square error

19
Sensitivity Gradient

The net input to the ith neurons of layer m
The sensitivity of to changes in the ith
element of the net input at layer m
Gradient

20
Steepest Descent Algorithm

The steepest descent algorithm for the
approximate mean square error
Matrix form

21
BP the Sensitivity

Backpropagation a recurrence relationship in
which the sensitivity at layer m is computed from
the sensitivity at layer m1.
Jacobian matrix

22
Matrix Repression

The i,j element of Jacobian matrix

23
Recurrence Relation

The recurrence relation for the sensitivity
The sensitivities are propagated backward through
the network from the last layer to the first
layer.

24
Backpropagation Algorithm

At the final layer

25
Summary

The first step is to propagate the input forward
through the network
The second step is to propagate the sensitivities
backward through the network
Output layer
Hidden layer
The final step is to update the weights and
biases

26
BP Neural Network
27
Ex Function Approximation
t
?
p
e

1-2-1 Network
28
Network Architecture
p
a
1-2-1 Network
29
Initial Values
Initial Network Response
30
Forward Propagation
Initial input
Output of the 1st layer
Output of the 2nd layer
error
31
Transfer Func. Derivatives
32
Backpropagation

The second layer sensitivity
The first layer sensitivity

33
Weight Update

Learning rate

34
Choice of Network Structure

Multilayer networks can be used to approximate
almost any function, if we have enough neurons in
the hidden layers.
We cannot say, in general, how many layers or how
many neurons are necessary for adequate
performance.

35
Illustrated Example 1
1-3-1 Network
36
Illustrated Example 2
1-2-1
1-3-1
1-5-1
1-4-1
37
Convergence
Convergence to Global Min.
Convergence to Local Min.
The numbers to each curve indicate the sequence
of iterations.
38
Generalization

In most cases the multilayer network is trained
with a finite number of examples of proper
network behavior
This training set is normally representative of a
much larger class of possible input/output pairs.
Can the network successfully generalize what it
has learned to the total population?

39
Generalization Example
1-9-1
1-2-1
Generalize well
Not generalize well
For a network to be able to generalize, it should
have fewer parameters than there are data points
in the training set.
40
Objectives

The neural networks, trained in a supervised
manner, require a target signal to define correct
network behavior.
The unsupervised learning rules give networks the
ability to learn associations between patterns
that occur together frequently.
Associative learning allows networks to perform
useful tasks such as pattern recognition (instar)
and recall (outstar).

41
What is an Association?

An association is any link between a systems
input and output such that when a pattern A is
presented to the system it will respond with
pattern B.
When two patterns are link by an association, the
input pattern is referred to as the stimulus and
the output pattern is to referred to as the
response.

42
Classic Experiment

Ivan Pavlov
He trained a dog to salivate at the sound of a
bell, by ringing the bell whenever food was
presented. When the bell is repeatedly paired
with the food, the dog is conditioned to salivate
at the sound of the bell, even when no food is
present.
B. F. Skinner
He trained a rat to press a bar in order to
obtain a food pellet.

43
Associative Learning

Anderson and Kohonen independently developed the
linear associator in the late 1960s and early
1970s.
Grossberg introduced nonlinear continuous-time
associative networks during the same time period.

44
Simple Associative Network

Single-Input Hard Limit Associator
Restrict the value of p to be either 0 or 1,
indicating whether a stimulus is absent or
present.
The output a indicates the presence or absence of
the networks response.

45
Two Types of Inputs

Unconditioned Stimulus
Analogous to the food presented to the dog in
Pavlovs experiment.
Conditioned Stimulus
Analogous to the bell in Pavlovs experiment.
The dog salivates only when food is presented.
This is an innate that does not have to be
learned.

46
Banana Associator

An unconditioned stimulus (banana shape) and a
conditioned stimulus (banana smell)
The network is to associate the shape of a
banana, but not the smell.

47
Associative Learning

Both animals and humans tend to associate things
occur simultaneously.
If a banana smell stimulus occurs simultaneously
with a banana concept response (activated by some
other stimulus such as the sight of a banana
shape), the network should strengthen the
connection between them so that later it can
activate its banana concept in response to the
banana smell alone.

48
Unsupervised Hebb Rule

Increasing the weighting wij between a neurons
input pj and output ai in proportion to their
product
Hebb rule uses only signals available within the
layer containing the weighting being updated. ?
Local learning rule
Vector form
Learning is performed in response to the training
sequence

49
Ex Banana Associator

Initial weights
Training sequence
Learning rule

Sight
Banana ?
Smell
50
Ex Banana Associator

First iteration (sight fails)
(no
response)
Second iteration (sight works)
(banana)

51
Ex Banana Associator

Third iteration (sight fails)
(banana)
From now on, the network is capable of responding
to bananas that are detected either sight or
smell. Even if both detection systems suffer
intermittent faults, the network will be correct
most of the time.

52
Problems of Hebb Rule

Weights will become arbitrarily large
Synapses cannot grow without bound.
There is no mechanism for weights to decrease
If the inputs or outputs of a Hebb network
experience ant noise, every weight will grow
(however slowly) until the network responds to
any stimulus.

53
Hebb Rule with Decay

? , the decay rate, is a positive constant less
than one.
This keeps the weight matrix from growing without
bound, which can be found by setting both ai and
pj to 1, i.e.,The maximum weight value
is determined by the decay rate ?.

54
Ex Banana Associator

First iteration (sight fails) no response
Second iteration (sight works) banana
Third iteration (sight fails) banana

55
Ex Banana Associator
Hebb Rule
Hebb with Decay
56
Prob. of Hebb Rule with Decay

Associations will decay away if stimuli are not
occasionally presented.
If ai 0, thenIf ? 0.1, this reducesto
The weight decays by10 at each iterationfor
which ai 0(no stimulus)

57
Instar (Recognition Network)

A neuron that has a vector input and a scalar
output is referred to as an instar.
This neuron is capable of pattern recognition.
Instar is similar to perceptron, ADALINE and
linear associator.

58
Instar Operation

Input-output expression
The instar is active when
or
where ? is the angle between two vectors.
If , the inner product
is maximized when the angle ? is 0.
Assume that all input vectors have the same
length (norm).

59
Vector Recognition

If , then
the instar will be only active when ? 0.
If , then
the instarwill be active for a range of angles.
The larger the value of b, the more patterns
there will be that can activate the instar, thus
making it the less discriminatory.

60
Instar Rule

Hebb rule
Hebb rule with decay
Instar rule a decay term, the forgetting
problem, is add that is proportion to
If

61
Graphical Representation

For the case where the instar is active(
),
For the case where the instaris inactive (
),

62
Ex Orange Recognizer

The elements of p will be contained to ?1 values.

63
Initialization Training

Initial weights
The instar rule (?1)
Training sequence
First iteration

64
Second Training Iteration

Second iteration
The network can now recognition the orange by its
measurements.

65
Third Training Iteration

Third iteration

Orange will now be detected if either set of
sensors works.
66
Kohonen Rule

Kohonen rule
Learning occurs when the neurons index i is a
member of the set X(q).
The Kohonen rule can be made equivalent to the
instar rule by defining X(q) as the set of all i
such that
The Kohonen rule allows the weights of a neuron
to learn an input vector and is therefore
suitable for recognition applications.

67
Ourstar (Recall Network)

The outstar network has a scalar input and a
vector output.
It can perform pattern recall by associating a
stimulus with a vector response.

68
Outstar Operation

Input-output expression
If we would like the outstar network to associate
a stimulus (an input of 1) with a particular
output vector a, set W a.
If p 1, a satlins(Wp) satlins(ap) a
Hence, the pattern is correctly recalled.
The column of a weight matrix represents the
pattern to be recalled.

69
Outstar Rule

In instar rule, the weight decay term of Hebb
rule is proportional to the output of network,
ai.
In outstar rule, the weight decay term of Hebb
rule is proportional to the input of network, pj.
If ? ?,
Learning occurs whenever pj is nonzero (instead
of ai). When learning occurs, column wj moves
toward the output vector. (complimentary to
instar rule)

70
Ex Pineapple Recaller