Title: Connectionist Machine Learning IIa
1Connectionist Machine Learning IIa
- Basics
- Backpropagation Algorithm
- Momentum
- Summary
2Basics
In contrast to perceptrons, multilayer networks
can learn multiple decision boundaries. In
addition, the boundaries may be nonlinear.
Output nodes
Internal nodes
Input nodes
3Example
x2
x1
4Example
5One Single Unit
To make nonlinear partitions on the space we need
to define each unit as a nonlinear function
(unlike the perceptron). One solution is to use
the sigmoid unit.
x1
w1
x2
net
w2
S
w0
wn
xn
xo1
O s(net) 1 / 1 e -net
6One Single Unit
The sigmoid or squashing function.
s(net)
net
O s(net) 1 / 1 e -net
7More Precisely
O(x1,x2,,xn)
s ( WX )
where s ( WX ) 1 / 1 e -WX
Function s is called the sigmoid or logistic
function. It has the following property d
s(y) / dy s(y) (1 s(y))
8Connectionist Machine Learning IIa
- Basics
- Backpropagation Algorithm
- Momentum
- Summary
9Many weights need adjustment
Multilayer networks need many weights to be
adjusted
Output nodes
Internal nodes
Input nodes
10Backpropagation Algorithm
Goal To learn the weights for all links in an
interconnected multilayer network. We begin by
defining our measure of error E(W) ½ Sd Sk
(tkd okd) 2 k varies along the output nodes
and d over the training examples. The idea is to
use again a gradient descent over the space of
weights to find a global minimum.
11Output Nodes
Output nodes
12Algorithm
The idea is to use again a gradient descent over
the space of weights to find a global minimum (no
guarantee).
- Create a network with nin input nodes, nhidden
- internal nodes, and nout output nodes.
- Initialize all weights to small random numbers.
- Until error is small do
- For each example X do
- Propagate example X forward through the network
- Propagate errors backward through the network
13Propagating Forward
Given example X, compute the output of every
node until we reach the output nodes
Output nodes
Compute sigmoid function
Internal nodes
Input nodes
Example X
14Error Output Nodes
Estimation
Target function
Output nodes
15Propagating Error Backward
- For each output node k compute the error
- dk Ok (1-Ok)(tk Ok)
- Update each network weight
- Wji Wji ?Wji
- where ?Wji ? dj Xji (Wji and Xji
are the input and - weight of node i to node j)
16Error Intermediate Nodes
Output nodes
?
Estimation
Intermediate nodes
Input nodes
17Propagating Error Backward
- For each hidden unit h, calculate the error
- dh Oh (1-Oh) Sk Wkh dk
- Update each network weight
- Wji Wji ?Wji
- where ?Wji ? dj Xji (Wji and Xji
are the input and - weight of node i to node j)
18Connectionist Machine Learning IIa
- Basics
- Backpropagation Algorithm
- Momentum
- Summary
19Adding Momentum
- The weight update rule can be modified so as to
depend - on the last iteration. At iteration n we have the
following - ?Wji (n) ? dj Xji a?Wji (n)
- Where a ( 0 lt a lt 1) is a constant called the
momentum. - It increases the speed along a local minimum.
- It increases the speed along flat regions.
-
20Adding Momentum
Flat region Where do we go??
E(W)
W
21Remarks on Backpropagation
- It implements a gradient descent search over the
- weight space.
- 2. It may become trapped in local minima.
- 3. In practice, it is very effective.
- 4. How to avoid local minima?
- Add momentum
- Use stochastic gradient descent
- Use different networks with different initial
values - for the weights.
-
22Representational Power
- Boolean functions. Every boolean function can be
- represented with a network having two
layers of units. - Continuous functions. All bounded continuous
- functions can also be approximated with a
network - having two layers of units.
- Arbitrary functions. Any arbitrary function can
be - approximated with a network with three
layers of units.
23Connectionist Machine Learning IIa
- Basics
- Backpropagation Algorithm
- Momentum
- Summary
24Summary
- In multi-layer neural networks the output of
- each node is a sigmoid or squashing
function. - In propagating error backwards, intermediate
nodes - compute a weighted sum of the error factor
on the - output nodes.
- Momentum helps increase the speed along a local
- minimum and along flat regions.
- Any arbitrary function can be approximated with
- a network with three layers of units.