CS515 Neural Networks - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

CS515 Neural Networks

Description:

(A: Hessian matrix) If the eigenvalues of the Hessian matrix are all ... The Hessian matrix of F(x), 2R, has both eigenvalues at 2. So the contour of the ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 49
Provided by: berr51
Category:

less

Transcript and Presenter's Notes

Title: CS515 Neural Networks


1
  • Performance Optimization
  • Steepest Descent

2
Basic Optimization Algorithm
or
pk - Search Direction
?k - Learning Rate
3
Steepest Descent
Choose the next step so that the function
decreases
4
Steepest Descent
For small changes in x we can approximate F(x)
where
5
Steepest Descent
If we want the function to decrease
6
Steepest Descent
If we want the function to decrease
We can maximize the decrease by choosing
7
Steepest Descent
We can maximize the decrease by choosing
Two general methods to select ak - minimize
F(x) w.r.t. ak - use a predetermined value (e.g.
0.2, 1/k)
8
Example
9
Plot
10
Stable Learning Rates (Quadratic)
Stability is determined by the eigenvalues
of this matrix.
(?i - eigenvalue of A)
Eigenvalues of I - ?A.
11
Stable Learning Rates (Quadratic)
Stability is determined by the eigenvalues
of this matrix.
(?i - eigenvalue of A)
Eigenvalues of I - ?A.
Stability Requirement (given)
12
Example
13
CHAPTER 10
  • Widrow-Hoff Learning

14
Objectives
  • Widrow-Hoff learning is an approximate steepest
    descent algorithm, in which the performance index
    is mean square error.
  • It is widely used today in many signal processing
    applications.
  • It is precursor to the backpropagation algorithm
    for multilayer networks.

15
ADALINE Network
  • ADALINE (Adaptive Linear Neuron) network and its
    learning rule, LMS (Least Mean Square) algorithm
    are proposed by Widrow and Marcian Hoff in 1960.
  • Both ADALINE network and the perceptron suffer
    from the same inherent limitation they can only
    solve linearly separable problems.
  • The LMS algorithm minimizes mean square error
    (MSE), and therefore tires to move the decision
    boundaries as far from the training patterns as
    possible.

16
ADALINE Network
17
Single ADALINE
  • Set n 0, then Wp b 0 specifies a decision
    boundary.
  • The ADALINE can be used to classify objects into
    two categories if they are linearly separable.

18
Mean Square Error
  • The LMS algorithm is an example of supervised
    training.
  • The LMS algorithm will adjust the weights and
    biases of the ADALINE in order to minimize the
    mean square error, where the error is the
    difference between the target output (tq) and the
    network output (pq).

MSE
E expected value
19
Performance Optimization
  • Develop algorithms to optimize a performance
    index F(x), where the word optimize will mean
    to find the value of x that minimizes F(x).
  • The optimization algorithms are iterative as
    or a
    search direction positive
    learning rate, which determines the
    length of the step initial guess

20
Taylor Series Expansion
  • Taylor series
  • Vector case

21
Gradient Hessian
  • Gradient
  • Hessian

22
Directional Derivative
  • The ith element of the gradient, ?F(x)??xi, is
    the first derivative of performance index F along
    the xi axis.
  • Let p be a vector in the direction along which we
    wish to know the derivative.Directional
    derivative
  • . Find the derivative of
    F(x) at the point in the
    direction

23
Approximated-Based Formulation
  • Given input/output training data p1,t1,
    p2,t2,, pQ,tQ. The objective of network
    training is to find the optimal weights to
    minimize the error (minimum-squares error)
    between the target value and the actual response.
  • Model (network) function
  • Least-squares-error function
  • The weight vector x can be training by minimizing
    the error function along the gradient-descent
    direction

24
Delta Learning Rule
  • ADALINE
  • Least-Squares-Error Criterionminimize
  • Gradient
  • Delta learning rule

25
Mean Square Error
?
?
26
Mean Square Error
  • If the correlation matrix R is positive definite,
    there will be a unique stationary point
    , which will be a strong minimum.
  • Strong Minimum the point is a strong minimum
    of F(x) if a scalar exists, such that
  • for all ?x such that .
  • Global Minimum the point is a unique global
    minimum of F(x) for all .
  • Weak Minimum the point is a weak minimum of
    F(x) if it is not a strong minimum, and a scalar
    exists, such that
    for all ?x such that .

27
LMS Algorithm
  • LMS algorithm is to locate the minimum point.
  • Use an approximate steepest descent algorithm to
    estimate the gradient.
  • Estimate the mean square error F(x) by
  • Estimated gradient

28
LMS Algorithm
?
?
?
?
29
LMS Algorithm
?
30
Quadratic Functions
(A Hessian matrix)
?
31
Stable Learning Rates
32
Stable Learning Rates
33
Analysis of Convergence
?
34
Orange/Apple Example
?
In practical applications, the stable learning
rate ? might NOT be practical to calculate R, and
? could be selected by trial and error.
35
Orange/Apple Example
?
Start, arbitrary, with all the weights set to
zero, and then will apply input p1, p2, p1, p2,
etc., in that order, calculating the new weights
after each input is presented.
36
Orange/Apple Example
37
Solved Problem P10.2
Since they are linear separable, we can design an
ADALINE network to make such a distinction.
As shown in figure,
They are NOT linear separable, so an ADALINE
network CANNOT distinguish between them.
38
Solved Problem P10.3
39
Solved Problem P10.3
The Hessian matrix of F(x), 2R, has both
eigenvalues at 2. So the contour of the
performance surface will be circular. The center
of the contours (the minimum point) is .
40
Solved Problem P10.4
41
Tapped Delay Line
At the output of the tapped delay line we have an
R-dim. vector, consisting of the input signal at
the current time and at delays of from 1 to R1
time steps.
42
Adaptive Filter
43
Solved Problem P10.1
44
Solved Problem P10.1
45
Solved Problem P10.1
?
?
46
Solved Problem P10.6
Application of ADALINE adaptive predictor The
purpose of this filter is to predict the next
value of the input signal from the two previous
values. Suppose that the input signal is a
stationary random process with autocorrelation
function given by
47
Solved Problem P10.6
48
Solved Problem P10.6
49
Solved Problem P10.6
ii.
The maximum stable value of the learning for the
LMS algorithm
iii.
The LMS algorithm is approximate steepest
descent, so the trajectory for small learning
rates will move perpendicular to the contour
lines.
50
Applications
  • Noise cancellation system to remove 60-Hz noise
    from EEG signal (Fig. 10.6)
  • Echo cancellation system in long distance
    telephone lines (Fig. 10.10)
  • Filtering engine noise from pilots voice signal
    (Fig. P10.8)
Write a Comment
User Comments (0)
About PowerShow.com