Error backpropagation technical

About This Presentation

Title:

Error backpropagation technical

Description:

... yj if it's a network output. Error ... An index, not a power! Nearly all practical error functions can be ... function, g is yk as a function of ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 17

Provided by: engi144

Category:

more less

Transcript and Presenter's Notes

Title: Error backpropagation technical

1
Error back-propagation (technical)

Bishop pp. 141

2
Deriving back-prop algorithm for computing
derivatives

Will now derive back-prop algorithm for a general
network with arbitrary feed-forward structure
Resulting formulae will then be illustrated using
simple layered network structure with
a single layer of sigmoidal hidden units, and,
a sum-of-squares error

3
Feedforward networks
4
First, what does each unit (artificial neuron) do?

In a general feed-forward network, each unit j
computes a weighted sum of its input

Could be xi if its a network input
(4.26)
Activation (output) of a unit, or input, i, which
sends a connection to unit j
Summation over all units i that send connections
to unit j
Weight assoc. with that connection (from unit i
to unit j)
5
What else?

The sum a is then transformed by a (usually
nonlinear) activation function g( ) (such as a
sigmoid) to give the activation zj of the form

Could be yj if its a network output
6
Error function

We seek to determine suitable weights to minimize
some appropriate error function
Will consider error functions that can be written
as a sum, over all patterns in the training set,
of an error defined for each pattern separately

Nearly all practical error functions can be
written in this form
(4.28)
An index, not a power!
Pattern index
7
Error function (contd)

Well also assume that the error can be written
as a function of the network output variables,
the yis

(4.29)
8
Our main goal (reminder)

To find the derivatives of the error E with
respect to the weights (and biases) in the
network
But from (4.28) we just need to know how to find
the derivative of E for 1 pattern (at a time) and
add up the results
Thus we can just consider 1 pattern from now on
Can drop the pattern index n

9
Consider evaluating the derivative of E (En) wrt
wji

Note that E depends on weight wji only via the
summed input aj to unit j.
Thus, we apply the chain rule to get

(4.30-4.33)
Chain rule
Follows from def of aj
10
More on derivative of E Are we done?
We just derived
Hebbian since it correlates input and output
11
Easy for output units!
(4.34)
Chain rule
Activation function, g is yk as a function of ak
12
(4.35)
Using def of d
Chain rule, once again!
Sum runs over all units k to which unit j sends
connections Note that units k are closer to the
network output than unit j
13
(No Transcript)
14
We already had.
(4.35)
(4.26), (4,27), re-indexed
Thus.
Sum over units feeding into unit k, including
unit j
All partial derivatives are 0 except when xj
15
All partial derivatives are 0 except when xj
We just derived
16
Summary of back-propagation

Apply an input vector x to the network and find
the activations (outputs) of all neurons
This is forward propagation (4.26, 27)
Evaluate dk for all output units (4.34)
Back-propagate the ds (4.36) to obtain djs for
hidden units
Use (4.33) to evaluate the required derivatives
Derivative of total error E can be obtained by
repeating the above steps for each pattern in the
training set and summing over all patterns