Error BackPropagation intro - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Error BackPropagation intro

Description:

So far we concentrated on the representational capabilities of multilayer networks. Next we see how such a network can learn a suitable mapping from a given data set ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 11
Provided by: engi144
Category:

less

Transcript and Presenter's Notes

Title: Error BackPropagation intro


1
Error Back-Propagation (intro)
  • Bishop section 4.8, pp. 140

2
Background and motivation
  • So far we concentrated on the representational
    capabilities of multilayer networks
  • Next we see how such a network can learn a
    suitable mapping from a given data set

3
Network with threshold units The final (output)
layer
  • The final layer of weights can be regarded as a
    perceptron with inputs given by the outputs of
    the last layer of hidden units
  • These weights can therefore be updated using the
    Perceptron Learning Rules of chapter 3
  • If the presented pattern is misclassified, update
    the weight vector with the product of target
    output and input

4
Problems with the hidden layers
  • Cant update weights to these layers with the
    perceptron rule
  • Because we dont know what the target outputs (of
    the hidden layers) are supposed to be!!
  • In other words, suppose you input a pattern to
    the entire network, and get a misclassified
    output to the entire network cant tell which
    hidden layer units have the wrong output!

5
Hidden layer problem (contd)
  • In fact maybe all the hidden layer units give the
    right output, but the output unit doesnt!
  • This is known as the credit assignment problem
  • Shouldve been known as the fault assignment
    problem!
  • Dont know which hidden units (or output unit)
    are guilty of giving the wrong output
  • So dont know which weights to adjust, and by how
    much

6
Solution to the Credit Assignment Problem
  • Is relatively simple! (in principle)
  • Hard to read at first
  • Consider network with differentiable activation
    functions
  • Then the output becomes a differentiable function
    of all the input variables and all the weights
    and biases
  • Note that operations like taking the vector
    product are already differentiable
  • Also, a composition of differentiable functions
    is differentiable

7
What good is differentiability?
  • Ok, we chose a differentiable activation function
    (like a sigmoid), so that the output is a
    differentiable function of all the weights
  • Now suppose the error function is a
    differentiable function of the output
  • E.g. sum-of-squares error (Chapter 1)
  • Then we can evaluate the derivatives of the error
    wrt the weights
  • Finally, we use these derivatives to find weights
    that minimize the error!!
  • Gradient descent
  • Other techniques
  • We have one that makes use of a computer as a
    decision maker rather than just performing
    iterations according to a math formula

8
Back-propagation
  • Is a technique for evaluating the derivatives of
    the error function wrt weights
  • Really just an application of the chain rule
    (with partial derivatives), but made nicely
    canonical for programming convenience
  • Name came from a propagation of errors
    backwards through the network
  • Popularized in a paper by Rumelhart, Hinton and
    Williams (1986)
  • Similar ideas developed earlier by a number of
    researchers including Werbos (1974) and Parker
    (1985)

9
Meaning of the term back-propagation
  • Term is used to mean different things
  • The original meaning was as we just saw
  • Propagation of errors back through network for
    use in computing the derivative of error wrt
    weights
  • The multi-layer perceptron is sometimes called a
    back-propagation network
  • Term is also used to refer to the training of a
    multi-layer perceptron using Gradient Descent
  • To clarify the terminology it is useful to
    consider the nature of the training process more
    carefully

10
The two stages of a training iteration
  • Weight training is done in two distinct stages
  • Computing derivatives of the error function wrt
    weights
  • Back-prop happens in this stage, so in this book
    well use the term to mean just this
    back-propagation of errors
  • These derivatives are then used to compute the
    weight adjustments
  • This can be done in many ways, including gradient
    desent
Write a Comment
User Comments (0)
About PowerShow.com