Multilayer perceptrons - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Multilayer perceptrons

Description:

For a set of data, containing n examples, the RMS error is. where n is the number of examples ... The root mean square error is the square root of this value ... – PowerPoint PPT presentation

Number of Views:69

Avg rating:3.0/5.0

Slides: 20

Provided by: CHMat

Category:

more less

Transcript and Presenter's Notes

Title: Multilayer perceptrons

1
Multi-layer perceptrons

second stage weight adjustment (lt--------------)
Objective for weight adjustment (learning')
minimising the error between the target and the
actual output

2
Multi-layer perceptrons

Error measures

For example, the RMS error ( "root mean square"
error) For a set of data, containing n examples,
the RMS error is
where n is the number of examples td is the
target for example d and od is the actual output
for example d
i.e For each example evaluate the difference
between the target and the actual output, square
it and sum across the entire data set. The
average (mean) error is this sum divided by the
number of examples. The root mean square error
is the square root of this value
3
Multi-layer perceptrons

Visualising an error surface'

4
Multi-layer perceptrons

Visualising an error surface'
or a one-dimensional' view

slope 'of error surface
error measure
weight matrix, W
minimum error
5
Multi-layer perceptrons

gradient descent
interested in the slope' of the error surface
the slope indicates the direction' of the
weight adjustment
the slope given by the derivative of the error
function
for each weight the adjustment is proportional
to the slope

where ? is the learning rate and wij is the
weight associated with the ith input to unit
(neuron) j
Unit i
wij
Unit j
6
Multi-layer perceptrons
it can be shown for a neuron j, that
Unit i
xij
thus
wij
Unit j
where oj actual output from neuron j tj
target (or expected output) for neuron j xij
input from neuron i to neuron j also note oj(1 -
oj) is the derivative of the logistic activation
oj
7
Multi-layer perceptrons
it can be shown for a neuron j, that
Unit i
xij
thus
wij
Unit j
oj
the error term is this expression is
and is referred to as the error derivative or
delta j
8
Multi-layer perceptrons
back to our example
output4 0.9664
input6 -2.5853
1.2
1
4 bias 0.2
-0.4
6 bias -0.1
output6 0.0700
-0.8
2
target 0.8
0.7
-3.8
5 bias -0.3
4.0
1
output5 0.5523
3
-0.2
In general 1. evaluate the error derivatives
(i.e. d terms) for neurons 4,5 and 6
2. use these to update the weights accordingly
9
Multi-layer perceptrons
evaluating d6
o6 0.0700
d6 0.0475
t6 0.8
4
-0.4
6
-3.8
5
10
Multi-layer perceptrons
evaluating d4
o4 0.9664
d6 0.0475
4
-0.4
6
o4 0.9664
t4 ?????
-3.8
5
we have to estimate the difference between the
target on 4 and actual output of 4
or for more than one output neurons
11
Multi-layer perceptrons
evaluating d4
o4 0.9664
d6 0.0475
d4 -0.0006
4
-0.4
6
o4 0.9664
t4 ?????
-3.8
5
we have to estimate the difference between the
target on 4 and actual output of 4
12
Multi-layer perceptrons
now evaluating d5
o4 0.9664
d6 0.0475
d4 -0.0006
4
-0.4
6
o5 0.5523
t5 ?????
-3.8
d5 -0.0448
5
o5 0.5523
13
Multi-layer perceptrons
now we will apply the learning rule and update
the weights, assuming a learning rate (?) of 0.3
o4 0.9664
d6 0.0475
d4 -0.0006
4
-0.4
6
for w4,6
-3.8
d5 -0.0448
5
o5 0.5523
so
14
Multi-layer perceptrons
now we will apply the learning rule and update
the weights, assuming a learning rate (?) of 0.3
o4 0.9664
d6 0.0475
d4 -0.0006
4
-0.4
6
and for w5,6
-3.8
d5 -0.0448
5
o5 0.5523
so
15
Multi-layer perceptrons
the weights between the input and hidden layer
can be updated in the same way, using d4 and d5
e.g. for w3,5
o1 0.5
o4 0.9664
d6 0.0475
d4 -0.0006
1
1.2
4
-0.3862
6
o2 0.3
-0.8
2
-3.7921
0.7
5
4.0
1
o5 0.5523
3
-0.2
d5 -0.0448
o3 0.7
16
Multi-layer perceptrons
the weights between the input and hidden layer
can be updated in the same way, using d4 and d5
e.g. for w3,5
o1 0.5
o4 0.9664
d6 0.0475
d4 -0.0006
1
1.2
4
-0.3862
6
o2 0.3
-0.8
2
-3.3921
0.7
5
4.0
1
o5 0.5523
3
-0.2084
d5 -0.0448
o3 0.7
17
Multi-layer perceptrons
Finally the biases in 4,5 and 6 can be updated
Only the error derivative (d) and learning rate
is required
d6 0.0475
d4 -0.0006
e.g. for Bias6
4 bias 0.2
6 bias -0.1
5 bias -0.3
d5 -0.0448
18
Multi-layer perceptrons
Finally the biases in 4,5 and 6 can be updated
Only the error derivative (d) and learning rate
is required
d6 0.0475
d4 -0.0006
e.g. for Bias6
4 bias 0.2
6 bias -0.0857
5 bias -0.3
d5 -0.0448
19
Multi-layer perceptrons

Variations in the learning rule
To enhance learning a momentum term , as well the
learning rate can be used
momentum (a) (gt 0, lt 1)
the proportion of the last weight adjustment
that will contribute to this weight adjustment
i.e. the weight adjustment of this weight in the
previous learning cycle.
typical value, 0.9

e.g. For w3,5 let us assume that the previous
weight change was -0.0076 and that momentum is 0.9

Write a Comment

User Comments (0)