Error backpropagation technical - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Error backpropagation technical

Description:

... yj if it's a network output. Error ... An index, not a power! Nearly all practical error functions can be ... function, g is yk as a function of ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 17
Provided by: engi144
Category:

less

Transcript and Presenter's Notes

Title: Error backpropagation technical


1
Error back-propagation (technical)
  • Bishop pp. 141

2
Deriving back-prop algorithm for computing
derivatives
  • Will now derive back-prop algorithm for a general
    network with arbitrary feed-forward structure
  • Resulting formulae will then be illustrated using
    simple layered network structure with
  • a single layer of sigmoidal hidden units, and,
  • a sum-of-squares error

3
Feedforward networks
4
First, what does each unit (artificial neuron) do?
  • In a general feed-forward network, each unit j
    computes a weighted sum of its input

Could be xi if its a network input
(4.26)
Activation (output) of a unit, or input, i, which
sends a connection to unit j
Summation over all units i that send connections
to unit j
Weight assoc. with that connection (from unit i
to unit j)
5
What else?
  • The sum a is then transformed by a (usually
    nonlinear) activation function g( ) (such as a
    sigmoid) to give the activation zj of the form

Could be yj if its a network output
6
Error function
  • We seek to determine suitable weights to minimize
    some appropriate error function
  • Will consider error functions that can be written
    as a sum, over all patterns in the training set,
    of an error defined for each pattern separately

Nearly all practical error functions can be
written in this form
(4.28)
An index, not a power!
Pattern index
7
Error function (contd)
  • Well also assume that the error can be written
    as a function of the network output variables,
    the yis

(4.29)
8
Our main goal (reminder)
  • To find the derivatives of the error E with
    respect to the weights (and biases) in the
    network
  • But from (4.28) we just need to know how to find
    the derivative of E for 1 pattern (at a time) and
    add up the results
  • Thus we can just consider 1 pattern from now on
  • Can drop the pattern index n

9
Consider evaluating the derivative of E (En) wrt
wji
  • Note that E depends on weight wji only via the
    summed input aj to unit j.
  • Thus, we apply the chain rule to get

(4.30-4.33)
Chain rule
Follows from def of aj
10
More on derivative of E Are we done?
We just derived
Hebbian since it correlates input and output
11
Easy for output units!
(4.34)
Chain rule
Activation function, g is yk as a function of ak
12
(4.35)
Using def of d
Chain rule, once again!
Sum runs over all units k to which unit j sends
connections Note that units k are closer to the
network output than unit j
13
(No Transcript)
14
We already had.
(4.35)
(4.26), (4,27), re-indexed
Thus.
Sum over units feeding into unit k, including
unit j
All partial derivatives are 0 except when xj
15
All partial derivatives are 0 except when xj
We just derived
16
Summary of back-propagation
  • Apply an input vector x to the network and find
    the activations (outputs) of all neurons
  • This is forward propagation (4.26, 27)
  • Evaluate dk for all output units (4.34)
  • Back-propagate the ds (4.36) to obtain djs for
    hidden units
  • Use (4.33) to evaluate the required derivatives
  • Derivative of total error E can be obtained by
    repeating the above steps for each pattern in the
    training set and summing over all patterns
Write a Comment
User Comments (0)
About PowerShow.com