Title: Multilayer Feedforward Networks
1Chapter 3
- Multilayer Feedforward Networks
2Multilayer Feedforward Network Structure
Output nodes
Layer N
Layer N-1
Hidden nodes
Layer 1
Connections
Input nodes
Layer 0
N-layer network
3Multilayer Feedforward Network Structure (cont.)
x1
o1
x2
x3
Output ????????????
???
4Multilayer Feedforward Network Structure (cont.)
???????? ???????? superscript index
???????????????????????????????????
Superscript index ????????????????????????????????
?
5Multilayer Perceptron How it works
Function XOR
y1
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0
x1
o
x2
y2
Layer 1
Layer 2
f( ) step function
6Multilayer Perceptron How it works (cont.)
Outputs at layer 1
y1
x1 x2 y1 y2
0 0 0 0
0 1 1 0
1 0 1 0
1 1 1 1
x1
x2
y2
7Multilayer Perceptron How it works (cont.)
Inside layer 1
(1,1)
Linearly separable !
(0,0)
(1,0)
8Multilayer Perceptron How it works (cont.)
Inside output layer
y1
o
y2
Space y1-y2 ???? linearly separable ???????????
????????? L3 ???? class 0 ??? class 1???
9Multilayer Perceptron How it works (cont.)
- ??????????? hidden layers
- ?????????????????????????????? layer
???????????? linearly separable - ?????????????????????????? output layer
- ??????????????????? Hidden layer, ??????????????
linearly separable - ???????????? hidden layer ??????? 1 layer
???????????????????????? - ??? linearly separable
- ????????? Activation function ???????? layer
????????????????? - Thresholding function ????????????????????
function ????????
10Backpropagation Algorithm
2-Layer case
Output layer
Hidden layer
Input layer
11Backpropagation Algorithm (cont.)
2-Layer case
????????????? e2 ???????? w(2)k,j
????????????? e2 ???????? q (2)k
12Backpropagation Algorithm (cont.)
2-Layer case
????????????? e2 ???????? w(1)j,i
13Backpropagation Algorithm (cont.)
?????????????????????? e2 ???????? w(1)j,i
????????????????? weight ???????????????? Node j
??? current layer (Layer 1) ??? Node i ??? Lower
Layer (Layer 0)
Weight between upper Node k and Node j of
current layer
Error from upper Node k
Derivative of upper Node k
Input from lower Node i
Derivative of Node j of current layer
?????????? Error ?????????????? (back
propagation) ????? Node j ??? current layer
14Backpropagation Algorithm (cont.)
???????????
??????????? e2 ???????? w(2)k,j
Input from lower node
Derivative of current node
Error at current node
??????????? e2 ???????? w(1)j,i
????????? ??????????? error ???????? weight
???????????????? 2 layer ????????? ???????????????
???? ??????????????????? error, derivative ???
input
15Backpropagation Algorithm (cont.)
General case
??????????? e2 ???????? weight
???????????????? Node j ?? Layer n (current
layer) ??? Node i ?? Layer n-1 (lower layer)
Input from Node i of lower layer
Weighted input sum at Node j
Error at Node j of Layer n
Derivative of Node j
16Backpropagation Algorithm (cont.)
General case
??????????? e2 ???????? bias (q ) ????? Node
j ?? Layer n (current layer)
Weighted input sum at Node j
Error at Node j of Layer n
Derivative of Node j
17Backpropagation Algorithm (cont.)
General case
Error at Node j of Layer n
Error at Node k of Layer N (Output layer)
18Updating Weights Gradient Descent Method
Updating weights and bias
19Updating Weights Gradient Descent with Momentum
Method
0lt b lt1
Momentum term
Dw will converge to
20Updating Weights Newtons Methods
From Taylors Series
where H Hessian matrix
E Error function (e2)
From Taylors series, we get
or
( Newtons method )
21Updating Weights Newtons Methods
Advantages w Fast (quadratic convergence) Disadv
antages w Computationally expensive (requires
the computation of inverse matrix operation at
each iteration time). w Hessian matrix is
difficult to be computed.
22Levenberg-Marquardt Backpropagation Methods
where J Jacobian matrix E All errors I
Identity matrix m Learning rate
23Example Application of MLP for classification
Matlab command Create training data
Input pattern x1 and x2 generated from random
numbers
x randn(2 200) o (x(1,).2x(2,).2)lt1
Desired output o if (x1,x2) lies in a circle of
radius 1 centered at the origin then o
1 else o 0
x2
x1
24Example Application of MLP for classification
(cont.)
Matlab command Create a 2-layer network
PR min(x(1,)) max(x(1,)) min(x(2,))
max(x(2,)) S1 10 S2 1 TF1
'logsig' TF2 'logsig' BTF 'traingd' BLF
'learngd' PF 'mse' net newff(PR,S1
S2,TF1 TF2,BTF,BLF,PF)
Range of inputs
No. of nodes in Layers 1 and 2
Activation functions of Layers 1 and 2
Training function
Learning function
Cost function
Command for creating the network
25Example Application of MLP for classification
(cont.)
Matlab command Train the network
No. of training rounds
net.trainParam.epochs 2000 net.trainParam.goal
0.002 net train(net,x,o) y
sim(net,x) netout ygt0.5
Maximum desired error
Training command
Compute network outputs (continuous)
Convert to binary outputs
26Example Application of MLP for classification
(cont.)
Network structure
Output node (Sigmoid)
x1
Input nodes
x2
Threshold unit (for binary output)
Hidden nodes (sigmoid)
27Example Application of MLP for classification
(cont.)
Initial weights of the hidden layer nodes (10
nodes) displayed as Lines w1x1w2x2q 0
28Example Application of MLP for classification
(cont.)
Training algorithm Gradient descent method
MSE vs training epochs
29Example Application of MLP for classification
(cont.)
Results obtained using the Gradient descent method
Classification Error 40/200
30Example Application of MLP for classification
(cont.)
Training algorithm Gradient descent with
momentum method
MSE vs training epochs
31Example Application of MLP for classification
(cont.)
Results obtained using the Gradient descent with
momentum method
Classification Error 40/200
32Example Application of MLP for classification
(cont.)
Training algorithm Levenberg-Marquardt
Backpropagation
MSE vs training epochs (success with in only 10
epochs!)
33Example Application of MLP for classification
(cont.)
Results obtained using the Levenberg-Marquardt
Backpropagation
Unused node
Only 6 hidden nodes are adequate !
Classification Error 0/200
34Example Application of MLP for classification
(cont.)
??????????????? ??????????????????????????????? -
????????? Classification ???????? Node
?????????? ???????????????????????? ??????
(boundary) ??? Class ?????????????????????????????
????????????? (Local) - ???????????????????????
?????????????????? ??????????????????????? ???
Node ??????????????????? Global boundary
??????????????????????????
35Example Application of MLP for function
approximation
Function to be approximated
x 00.014 y (sin(2pix)1).exp(-x.2)
36Example Application of MLP for function
approximation (cont.)
Matlab command Create a 2-layer network
PR min(x) max(x) S1 6 S2 1 TF1
'logsig' TF2 'purelin' BTF 'trainlm' BLF
'learngd' PF 'mse' net newff(PR,S1
S2,TF1 TF2,BTF,BLF,PF)
Range of inputs
No. of nodes in Layers 1 and 2
Activation functions of Layers 1 and 2
Training function
Learning function
Cost function
Command for creating the network
37Example Application of MLP for function
approximation
Network structure
Output node (Linear)
x
y
Input nodes
Hidden nodes (sigmoid)
38Example Application of MLP for function
approximation
Initial weights of the hidden nodes displayed in
terms of the activation function of each node
(sigmoid function).
39Example Application of MLP for function
approximation
Final weights of the hidden nodes after training
displayed in terms of the activation function of
each node (sigmoid function).
40Example Application of MLP for function
approximation
Weighted summation of all outputs from the
first layer nodes yields function approximation.
41Example Application of MLP for function
approximation (cont.)
Matlab command Create a 2-layer network
PR min(x) max(x) S1 3 S2 1 TF1
'logsig' TF2 'purelin' BTF 'trainlm' BLF
'learngd' PF 'mse' net newff(PR,S1
S2,TF1 TF2,BTF,BLF,PF)
Range of inputs
No. of nodes in Layers 1 and 2
Activation functions of Layers 1 and 2
Training function
Learning function
Cost function
Command for creating the network
42Example Application of MLP for function
approximation
Network structure
No. of hidden nodes is too small !
Function approximated using the network
43Example Application of MLP for function
approximation (cont.)
Matlab command Create a 2-layer network
PR min(x) max(x) S1 5 S2 1 TF1
'radbas' TF2 'purelin' BTF 'trainlm' BLF
'learngd' PF 'mse' net newff(PR,S1
S2,TF1 TF2,BTF,BLF,PF)
Range of inputs
No. of nodes in Layers 1 and 2
Activation functions of Layers 1 and 2
Training function
Learning function
Cost function
Command for creating the network
44Example Application of MLP for function
approximation
Radiasl Basis Func.
Initial weights of the hidden nodes displayed in
terms of the activation function of each node
(radial basis function).
45Example Application of MLP for function
approximation
Final weights of the hidden nodes after training
displayed in terms of the activation function of
each node (Radial Basis function).
46Example Application of MLP for function
approximation
Function approximated using the network
47Example Application of MLP for function
approximation
- ??????????????? ??????????????????????????????????
?? - - ????????? Function approximation ???????? Node
?????????? ??????????? - ???????????????????????????????????????? (Local)
???????? ???????? Node - ?????????? ????????? (Active) ???????? Input
??????????? - ????????????????????????????????????????? output
??? Node ????????? - ?????????? Global function ??????????? input
range ???????