Network Training: The Gradient Descent Method - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Network Training: The Gradient Descent Method

Description:

Illustration 2: the current state. w1. w0. E(w) Illustration3: the descent direction ... Illustration 4: one-step learning. w1. w0. E(w) The starting point in ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 20
Provided by: AndyPhil8
Category:

less

Transcript and Presenter's Notes

Title: Network Training: The Gradient Descent Method


1
Network TrainingThe Gradient Descent Method
2
Learning as an Optimization Problem
  • Minimizing the error function by optimizing
    parameters
  • Eg. the sum of square error

i the index of training example
  • All examplestraining set testing set
  • The performance of the learning algorithm
  • -The convergence
  • -The speed

3
A standard strategy Gradient Descent(1)
  • How to update optimize w based on E(w)?
  • The gradient of the error function
  • An important observation at a point

4
Gradient Descent (2)
  • The learning rule w new w old - h E(w)
  • where h is the learning rate (small positive
    number) t index learning
  • steps.

D
5
Illustration 1 the landscape of error function
E(w)
w1
w0
6
Illustration 2 the current state
E(w)
w1
w0
7
Illustration3 the descent direction
E(w)
w1
Direction descent direction of the negative
gradient
w0
8
Illustration 4 one-step learning
E(w)
w1
The starting point in the weight space
New point after learning
w0
9
A formal justification
  • Taylor expansion
  • After one-step learning, for h sufficiently small

10
Gradient Descent the Algorithm
  • Step 0 Choosing the learning rate h and the
    initial value of w(0)
  • Step 1 at the learning step t
  • Calculate E(w) at w(t-1)
  • note this can be done numerically or
    analytically
  • update all the weights
  • w(t)w(t-1) h E(w(t-1))
  • Step 2 check if E(w) reaches the minimum
  • The change of E(w) after one-step learning is
    smaller than a tolerable value.
  • if so, stop
  • if not, go back to step 1

D
D
11
Presenting Examples
  • Sequential mode evaluating examples one-by-one
  • Batch mode evaluating E(w) based on all examples

12
An example
  • The input
  • The network output
  • The sum of square error
  • The gradients

13
The effect of learning rate
  • The learning rate controls how large to move
    along the gradient direction.
  • The effect of h
  • If h is too small, learning is slow (slow
    moving)
  • If h is too large, learning is slow
    (oscillation).
  • The idea of momentum

14
Beyond the standard gradient descent
  • Natural gradient (Amari)
  • Conjugate gradient
  • Newtons method
  • An intrinsic defict of the gradient type of
    methods local minimums

15
Perceptron learning (1)
  • Note that
  • The error function perceptron criterion

16
Perceptron Learning (2)
  • Pattern-by-pattern gradient descent rule
  • h1 can be used.
  • The algorithm is guaranteed to converge in a
    finite number of steps, provide the data set is
    linearly separable.

17
Initial Step, x(3) is misclassified
X(1)
W(0)
X(2)
X(3)
X(4)
18
W(1)
X(1)
-X(3)
W(0)
X(2)
X(3)
X(4)
First step, x(3) is correctly classified, But
x(2) becomes misclassified.
19
W(2)
X(2)
W(1)
X(1)
X(2)
X(3)
All patterns are correctly classified.
X(4)
Write a Comment
User Comments (0)
About PowerShow.com