Title: Multilayer perceptrons
1Multi-layer perceptrons
- second stage weight adjustment (lt--------------)
- Objective for weight adjustment (learning')
- minimising the error between the target and the
actual output
2Multi-layer perceptrons
For example, the RMS error ( "root mean square"
error) For a set of data, containing n examples,
the RMS error is
where n is the number of examples td is the
target for example d and od is the actual output
for example d
i.e For each example evaluate the difference
between the target and the actual output, square
it and sum across the entire data set. The
average (mean) error is this sum divided by the
number of examples. The root mean square error
is the square root of this value
3Multi-layer perceptrons
- Visualising an error surface'
4Multi-layer perceptrons
- Visualising an error surface'
- or a one-dimensional' view
slope 'of error surface
error measure
weight matrix, W
minimum error
5Multi-layer perceptrons
- gradient descent
- interested in the slope' of the error surface
- the slope indicates the direction' of the
weight adjustment - the slope given by the derivative of the error
function - for each weight the adjustment is proportional
to the slope
where ? is the learning rate and wij is the
weight associated with the ith input to unit
(neuron) j
Unit i
wij
Unit j
6Multi-layer perceptrons
it can be shown for a neuron j, that
Unit i
xij
thus
wij
Unit j
where oj actual output from neuron j tj
target (or expected output) for neuron j xij
input from neuron i to neuron j also note oj(1 -
oj) is the derivative of the logistic activation
oj
7Multi-layer perceptrons
it can be shown for a neuron j, that
Unit i
xij
thus
wij
Unit j
oj
the error term is this expression is
and is referred to as the error derivative or
delta j
8Multi-layer perceptrons
back to our example
output4 0.9664
input6 -2.5853
1.2
1
4 bias 0.2
-0.4
6 bias -0.1
output6 0.0700
-0.8
2
target 0.8
0.7
-3.8
5 bias -0.3
4.0
1
output5 0.5523
3
-0.2
In general 1. evaluate the error derivatives
(i.e. d terms) for neurons 4,5 and 6
2. use these to update the weights accordingly
9Multi-layer perceptrons
evaluating d6
o6 0.0700
d6 0.0475
t6 0.8
4
-0.4
6
-3.8
5
10Multi-layer perceptrons
evaluating d4
o4 0.9664
d6 0.0475
4
-0.4
6
o4 0.9664
t4 ?????
-3.8
5
we have to estimate the difference between the
target on 4 and actual output of 4
or for more than one output neurons
11Multi-layer perceptrons
evaluating d4
o4 0.9664
d6 0.0475
d4 -0.0006
4
-0.4
6
o4 0.9664
t4 ?????
-3.8
5
we have to estimate the difference between the
target on 4 and actual output of 4
12Multi-layer perceptrons
now evaluating d5
o4 0.9664
d6 0.0475
d4 -0.0006
4
-0.4
6
o5 0.5523
t5 ?????
-3.8
d5 -0.0448
5
o5 0.5523
13Multi-layer perceptrons
now we will apply the learning rule and update
the weights, assuming a learning rate (?) of 0.3
o4 0.9664
d6 0.0475
d4 -0.0006
4
-0.4
6
for w4,6
-3.8
d5 -0.0448
5
o5 0.5523
so
14Multi-layer perceptrons
now we will apply the learning rule and update
the weights, assuming a learning rate (?) of 0.3
o4 0.9664
d6 0.0475
d4 -0.0006
4
-0.4
6
and for w5,6
-3.8
d5 -0.0448
5
o5 0.5523
so
15Multi-layer perceptrons
the weights between the input and hidden layer
can be updated in the same way, using d4 and d5
e.g. for w3,5
o1 0.5
o4 0.9664
d6 0.0475
d4 -0.0006
1
1.2
4
-0.3862
6
o2 0.3
-0.8
2
-3.7921
0.7
5
4.0
1
o5 0.5523
3
-0.2
d5 -0.0448
o3 0.7
16Multi-layer perceptrons
the weights between the input and hidden layer
can be updated in the same way, using d4 and d5
e.g. for w3,5
o1 0.5
o4 0.9664
d6 0.0475
d4 -0.0006
1
1.2
4
-0.3862
6
o2 0.3
-0.8
2
-3.3921
0.7
5
4.0
1
o5 0.5523
3
-0.2084
d5 -0.0448
o3 0.7
17Multi-layer perceptrons
Finally the biases in 4,5 and 6 can be updated
Only the error derivative (d) and learning rate
is required
d6 0.0475
d4 -0.0006
e.g. for Bias6
4 bias 0.2
6 bias -0.1
5 bias -0.3
d5 -0.0448
18Multi-layer perceptrons
Finally the biases in 4,5 and 6 can be updated
Only the error derivative (d) and learning rate
is required
d6 0.0475
d4 -0.0006
e.g. for Bias6
4 bias 0.2
6 bias -0.0857
5 bias -0.3
d5 -0.0448
19Multi-layer perceptrons
- Variations in the learning rule
- To enhance learning a momentum term , as well the
learning rate can be used - momentum (a) (gt 0, lt 1)
- the proportion of the last weight adjustment
that will contribute to this weight adjustment
i.e. the weight adjustment of this weight in the
previous learning cycle. - typical value, 0.9
e.g. For w3,5 let us assume that the previous
weight change was -0.0076 and that momentum is 0.9