CS515 Neural Networks

About This Presentation

Title:

CS515 Neural Networks

Description:

(A: Hessian matrix) If the eigenvalues of the Hessian matrix are all ... The Hessian matrix of F(x), 2R, has both eigenvalues at 2. So the contour of the ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 49

Provided by: berr51

Category:

more less

Transcript and Presenter's Notes

Title: CS515 Neural Networks

1

Performance Optimization
Steepest Descent

2
Basic Optimization Algorithm
or
pk - Search Direction
?k - Learning Rate
3
Steepest Descent
Choose the next step so that the function
decreases
4
Steepest Descent
For small changes in x we can approximate F(x)
where
5
Steepest Descent
If we want the function to decrease
6
Steepest Descent
If we want the function to decrease
We can maximize the decrease by choosing
7
Steepest Descent
We can maximize the decrease by choosing
Two general methods to select ak - minimize
F(x) w.r.t. ak - use a predetermined value (e.g.
0.2, 1/k)
8
Example
9
Plot
10
Stable Learning Rates (Quadratic)
Stability is determined by the eigenvalues
of this matrix.
(?i - eigenvalue of A)
Eigenvalues of I - ?A.
11
Stable Learning Rates (Quadratic)
Stability is determined by the eigenvalues
of this matrix.
(?i - eigenvalue of A)
Eigenvalues of I - ?A.
Stability Requirement (given)
12
Example
13
CHAPTER 10

Widrow-Hoff Learning

14
Objectives

Widrow-Hoff learning is an approximate steepest
descent algorithm, in which the performance index
is mean square error.
It is widely used today in many signal processing
applications.
It is precursor to the backpropagation algorithm
for multilayer networks.

15
ADALINE Network

ADALINE (Adaptive Linear Neuron) network and its
learning rule, LMS (Least Mean Square) algorithm
are proposed by Widrow and Marcian Hoff in 1960.
Both ADALINE network and the perceptron suffer
from the same inherent limitation they can only
solve linearly separable problems.
The LMS algorithm minimizes mean square error
(MSE), and therefore tires to move the decision
boundaries as far from the training patterns as
possible.

16
ADALINE Network
17
Single ADALINE

Set n 0, then Wp b 0 specifies a decision
boundary.
The ADALINE can be used to classify objects into
two categories if they are linearly separable.

18
Mean Square Error

The LMS algorithm is an example of supervised
training.
The LMS algorithm will adjust the weights and
biases of the ADALINE in order to minimize the
mean square error, where the error is the
difference between the target output (tq) and the
network output (pq).

MSE
E expected value
19
Performance Optimization

Develop algorithms to optimize a performance
index F(x), where the word optimize will mean
to find the value of x that minimizes F(x).
The optimization algorithms are iterative as
or a
search direction positive
learning rate, which determines the
length of the step initial guess

20
Taylor Series Expansion

Taylor series
Vector case

21
Gradient Hessian

Gradient
Hessian

22
Directional Derivative

The ith element of the gradient, ?F(x)??xi, is
the first derivative of performance index F along
the xi axis.
Let p be a vector in the direction along which we
wish to know the derivative.Directional
derivative
. Find the derivative of
F(x) at the point in the
direction

23
Approximated-Based Formulation

Given input/output training data p1,t1,
p2,t2,, pQ,tQ. The objective of network
training is to find the optimal weights to
minimize the error (minimum-squares error)
between the target value and the actual response.
Model (network) function
Least-squares-error function
The weight vector x can be training by minimizing
the error function along the gradient-descent
direction

24
Delta Learning Rule

ADALINE
Least-Squares-Error Criterionminimize
Gradient
Delta learning rule

25
Mean Square Error
?
?
26
Mean Square Error

If the correlation matrix R is positive definite,
there will be a unique stationary point
, which will be a strong minimum.
Strong Minimum the point is a strong minimum
of F(x) if a scalar exists, such that
for all ?x such that .
Global Minimum the point is a unique global
minimum of F(x) for all .
Weak Minimum the point is a weak minimum of
F(x) if it is not a strong minimum, and a scalar
exists, such that
for all ?x such that .

27
LMS Algorithm

LMS algorithm is to locate the minimum point.
Use an approximate steepest descent algorithm to
estimate the gradient.
Estimate the mean square error F(x) by
Estimated gradient

28
LMS Algorithm
?
?
?
?
29
LMS Algorithm
?
30
Quadratic Functions
(A Hessian matrix)
?
31
Stable Learning Rates
32
Stable Learning Rates
33
Analysis of Convergence
?
34
Orange/Apple Example
?
In practical applications, the stable learning
rate ? might NOT be practical to calculate R, and
? could be selected by trial and error.
35
Orange/Apple Example
?
Start, arbitrary, with all the weights set to
zero, and then will apply input p1, p2, p1, p2,
etc., in that order, calculating the new weights
after each input is presented.
36
Orange/Apple Example
37
Solved Problem P10.2
Since they are linear separable, we can design an
ADALINE network to make such a distinction.
As shown in figure,
They are NOT linear separable, so an ADALINE
network CANNOT distinguish between them.
38
Solved Problem P10.3
39
Solved Problem P10.3
The Hessian matrix of F(x), 2R, has both
eigenvalues at 2. So the contour of the
performance surface will be circular. The center
of the contours (the minimum point) is .
40
Solved Problem P10.4
41
Tapped Delay Line
At the output of the tapped delay line we have an
R-dim. vector, consisting of the input signal at
the current time and at delays of from 1 to R1
time steps.
42
Adaptive Filter
43
Solved Problem P10.1
44
Solved Problem P10.1
45
Solved Problem P10.1
?
?
46
Solved Problem P10.6
Application of ADALINE adaptive predictor The
purpose of this filter is to predict the next
value of the input signal from the two previous
values. Suppose that the input signal is a
stationary random process with autocorrelation
function given by
47
Solved Problem P10.6
48
Solved Problem P10.6
49
Solved Problem P10.6
ii.
The maximum stable value of the learning for the
LMS algorithm
iii.
The LMS algorithm is approximate steepest
descent, so the trajectory for small learning
rates will move perpendicular to the contour
lines.
50
Applications

Noise cancellation system to remove 60-Hz noise
from EEG signal (Fig. 10.6)
Echo cancellation system in long distance
telephone lines (Fig. 10.10)
Filtering engine noise from pilots voice signal
(Fig. P10.8)

Write a Comment

User Comments (0)

About PowerShow.com

CS515 Neural Networks - PowerPoint PPT Presentation

CS515 Neural Networks

(A: Hessian matrix) If the eigenvalues of the Hessian matrix are all ... The Hessian matrix of F(x), 2R, has both eigenvalues at 2. So the contour of the ... – PowerPoint PPT presentation