Title: Artificial Neural Network Paradigms
1Artificial Neural Network Paradigms Marc
Pomplun Department of Computer
Science University of Massachusetts at
Boston E-mail marc_at_cs.umb.edu Homepage
http//www.cs.umb.edu/marc/
2Artificial Neural Network Paradigms
- Overview
- The Backpropagation Network (BPN)
- Supervised Learning in the BPN
- The Self-Organizing Map (SOM)
- Unsupervised Learning in the SOM
- Instantaneous Learning The Hopfield Network
3The Backpropagation Network
- The backpropagation network (BPN) is the most
popular type of ANN for applications such as
classification or function approximation. - Like other networks using supervised learning,
the BPN is not biologically plausible. - The structure of the network is identical to the
one we discussed before - Three (sometimes more) layers of neurons,
- Only feedforward processing input layer ?
hidden layer ? output layer, - Sigmoid activation functions
4The Backpropagation Network
- BPN units and activation functions
5Supervised Learning in the BPN
- Before the learning process starts, all weights
(synapses) in the network are initialized with
pseudorandom numbers. - We also have to provide a set of training
patterns (exemplars). They can be described as a
set of ordered vector pairs (x1, y1), (x2, y2),
, (xP, yP). - Then we can start the backpropagation learning
algorithm. - This algorithm iteratively minimizes the
networks error by finding the gradient of the
error surface in weight-space and adjusting the
weights in the opposite direction
(gradient-descent technique).
6Supervised Learning in the BPN
- Gradient-descent example Finding the absolute
minimum of a one-dimensional error function f(x)
Repeat this iteratively until for some xi, f(xi)
is sufficiently close to 0.
7Supervised Learning in the BPN
- Gradients of two-dimensional functions
The two-dimensional function in the left diagram
is represented by contour lines in the right
diagram, where arrows indicate the gradient of
the function at different locations. Obviously,
the gradient is always pointing in the direction
of the steepest increase of the function. In
order to find the functions minimum, we should
always move against the gradient.
8Supervised Learning in the BPN
- In the BPN, learning is performed as follows
- Randomly select a vector pair (xp, yp) from the
training set and call it (x, y). - Use x as input to the BPN and successively
compute the outputs of all neurons in the network
(bottom-up) until you get the network output o. - Compute the error ?opk, for the pattern p across
all K output layer units by using the formula
9Supervised Learning in the BPN
- Compute the error ?hpj, for all J hidden layer
units by using the formula
- Update the connection-weight values to the hidden
layer by using the following equation
10Supervised Learning in the BPN
- Update the connection-weight values to the output
layer by using the following equation
- Repeat steps 1 to 6 for all vector pairs in the
training set this is called a training epoch. - Run as many epochs as required to reduce the
network error E to fall below a threshold ?
11Supervised Learning in the BPN
The only thing that we need to know before we can
start our network is the derivative of our
sigmoid function, for example, f(netk) for the
output neurons
12Supervised Learning in the BPN
- Now our BPN is ready to go!
- If we choose the type and number of neurons in
our network appropriately, after training the
network should show the following behavior - If we input any of the training vectors, the
network should yield the expected output
vector (with some margin of error). - If we input a vector that the network has never
seen before, it should be able to
generalize and yield a plausible output
vector based on its knowledge about similar
input vectors.
13Self-Organizing Maps (Kohonen Maps)
In the BPN, we used supervised learning. This is
not biologically plausible In a biological
system, there is no external teacher who
manipulates the networks weights from outside
the network. Biologically more adequate
unsupervised learning. We will study
Self-Organizing Maps (SOMs) as examples for
unsupervised learning (Kohonen, 1980).
14Self-Organizing Maps (Kohonen Maps)
In the human cortex, multi-dimensional sensory
input spaces (e.g., visual input, tactile input)
are represented by two-dimensional maps. The
projection from sensory inputs onto such maps is
topology conserving. This means that neighboring
areas in these maps represent neighboring areas
in the sensory input space. For example,
neighboring areas in the sensory cortex are
responsible for the arm and hand regions.
15Self-Organizing Maps (Kohonen Maps)
- Such topology-conserving mapping can be achieved
by SOMs - Two layers input layer and output (map) layer
- Input and output layers are completely
connected. - Output neurons are interconnected within a
defined neighborhood. - A topology (neighborhood relation) is defined
on the output layer.
16Self-Organizing Maps (Kohonen Maps)
Common output-layer structures
One-dimensional(completely interconnected)
Two-dimensional(connections omitted, only
neighborhood relations shown green)
17Self-Organizing Maps (Kohonen Maps)
A neighborhood function ?(i, k) indicates how
closely neurons i and k in the output layer are
connected to each other. Usually, a Gaussian
function on the distance between the two neurons
in the layer is used
18Unsupervised Learning in SOMs
For n-dimensional input space and m output
neurons
(1) Choose random weight vector wi for neuron i,
i 1, ..., m
(2) Choose random input x
(3) Determine winner neuron k wk
x mini wi x (Euclidean distance)
(4) Update all weight vectors of all neurons i in
the neighborhood of neuron k wi wi
??(i, k)(x wi) (wi is shifted towards x)
(5) If convergence criterion met, STOP.
Otherwise, narrow neighborhood function ? and
learning parameter ? and go to (2).
19Unsupervised Learning in SOMs
Example I Learning a one-dimensional
representation of a two-dimensional (triangular)
input space
20Unsupervised Learning in SOMs
Example II Learning a two-dimensional
representation of a two-dimensional (square)
input space
21Unsupervised Learning in SOMs
Example IIILearning a two-dimensional mapping
of texture images
22The Hopfield Network
- The Hopfield model is a single-layered recurrent
network. - It is usually initialized with appropriate
weights instead of being trained. - The network structure looks as follows
X1
X2
XN
23The Hopfield Network
- We will focus on the discrete Hopfield model,
because its mathematical description is more
straightforward. - In the discrete model, the output of each neuron
is either 1 or 1. - In its simplest form, the output function is the
sign function, which yields 1 for arguments ? 0
and 1 otherwise.
24The Hopfield Network
- For input-output pairs (x1, y1), (x2, y2), ,
(xP, yP), we can initialize the weights in the
following way (like associative memory)
This is identical to the following formula
where xp(j) is the j-th component of vector xp,
andyp(i) is the i-th component of vector yp.
25The Hopfield Network
- In the discrete version of the model, each
component of an input or output vector can only
assume the values 1 or 1. - The output of a neuron i at time t is then
computed according to the following formula
This recursion can be performed over and over
again. In some network variants, external input
is added to the internal, recurrent one.
26The Hopfield Network
- Usually, the vectors xp are not orthonormal, so
it is not guaranteed that whenever we input some
pattern xp, the output will be yp, but it will be
a pattern similar to yp. - Since the Hopfield network is recurrent, its
behavior depends on its previous state and in the
general case is difficult to predict. - However, what happens if we initialize the
weights with a set of patterns so that each
pattern is being associated with itself, (x1,
x1), (x2, x2), , (xP, xP)?
27The Hopfield Network
- This initialization is performed according to the
following equation
You see that the weight matrix is symmetrical,
i.e., wij wji. We also demand that wii 0, in
which case the network shows an interesting
behavior. It can be mathematically proven that
under these conditions the network will reach a
stable activation state within an finite number
of iterations.
28The Hopfield Network
- And what does such a stable state look like?
- The network associates input patterns with
themselves, which means that in each iteration,
the activation pattern will be drawn towards one
of those patterns. - After converging, the network will most likely
present one of the patterns that it was
initialized with. - Therefore, Hopfield networks can be used to
restore incomplete or noisy input patterns.
29The Hopfield Network
- Example Image reconstruction (Ritter, Schulten,
Martinetz 1990) - A 20?20 discrete Hopfield network was trained
with 20 input patterns, including the one shown
in the left figure and 19 random patterns as the
one on the right.
30The Hopfield Network
- After providing only one fourth of the face
image as initial input, the network is able to
perfectly reconstruct that image within only two
iterations.
31The Hopfield Network
- Adding noise by changing each pixel with a
probability p 0.3 does not impair the networks
performance. - After two steps the image is perfectly
reconstructed.
32The Hopfield Network
- However, for noise created by p 0.4, the
network is unable the original image. - Instead, it converges against one of the 19
random patterns.
33The Hopfield Network
- The Hopfield model constitutes an interesting
neural approach to identifying partially occluded
objects and objects in noisy images. - These are among the toughest problems in computer
vision.