Modular Neural Networks II

About This Presentation

Title:

Modular Neural Networks II

Description:

Each unit fires only for vector located inside it associated cone' of radius r' ... vector w is updated by pulling it in the direction of x. This is done in ART-1 ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 23

Provided by: pagesCpsc

Category:

more less

Transcript and Presenter's Notes

Title: Modular Neural Networks II

1
Modular Neural Networks II
Presented by David Brydon Karl Martens David
Pereira
CPSC 533 - Artificial Intelligence Winter
2000 Instructor C. Jacob Date 16-March-2000
2
Presentation Agenda

A Reiteration Of Modular Neural Networks
Hybrid Neural Networks
Maximum Entropy
Counterpropagation Networks
Spline Networks
Radial Basis Functions
Note The information contained in this
presentation has been obtained from Neural
Networks A Systematic Introduction by R. Rojas.

3
A Reiteration of Modular Neural Networks
There are many different types of neural networks
- linear, recurrent, supervised, unsupervised,
self-organizing, etc. Each of these neural
networks have a different theoretical and
practical approach. However, each of these
different models can be combined. How ? Each of
the afore-mentioned neural networks can be
transformed into a module that can be freely
intermixed with modules of other types of neural
networks. Thus, we have Modular Neural Networks.
4
A Reiteration of Modular Neural Networks

But WHY do we have Modular Neural Network Systems
?
To Reduce Model Complexity
To Incorporate Knowledge
To Fuse Data and Predict Averages
To Combine Techniques
To Learn Different Tasks Simultaneously
To Incrementally Increase Robustness
To Emulate Its Biological Counterpart

5
Hybrid Neural Networks

A very well-known and promising family of
architectures was developed by Stephen Grossberg.
It is called ART - Adaptive Resonance Theory.
It is closer to the biological paradigm than
feed-forward networks or standard associative
memories.
The dynamics of the networks resembles learning
in humans.
One-shot learning can be recreated with this
model.
There are three different architectures in this
family
ART-1 Uses Boolean values
ART-2 Uses real values
ART-3 Uses differential equations

6
Hybrid Neural Networks
Each category in the input space is represented
by a vector. The ART networks classify a
stochastic series of vectors into clusters. All
vectors located inside the cone around each
weight vector are considered members of a
specific cluster. Each unit fires only for vector
located inside it associated cone of radius
r. The value r is inversely proportional to
the attention parameter of the unit. Large r
means classification of the input space is
fine. Small r means classification of the
input space is coarse.
7
Hybrid Neural Networks
Fig. 1. Vector clusters and attention
parameters
8
Hybrid Neural Networks

Once the weight vectors have been found, the
network computes whether new data can or cannot
be classified by the existing clusters.
If not, a new a new cluster is created with a new
associated weight vector.
ART networks have two major advantages
Plasticity it can always react to unknown
inputs (by creating a new cluster with a new
weight vector, if the given input cannot be
classified by existing clusters).
Stability Existing clusters are not deleted by
the introduction of new inputs (New clusters will
just be created in addition to the old ones).
However, enough potential weight vectors must be
provided.

9
Hybrid Neural Networks
Fig. 2. The ART-1 Architecture
10
Hybrid Neural Networks
The Structure of ART-1 (Part 1 of 2) There are
two basic layers of computing units. Layer F1
receives binary input vectors from the input
sites. As soon as an input vector arrives it is
passed to layer F1 and from there to layer
F2. Layer F2 contains elements which fire
according to the winner-takes-all method. (Only
the element receiving the maximal scalar product
of its weight vector and input vector
fires). When a unit in layer F2 has fired, the
negative weight turns off the attention unit.
Also, the winning unit in layer F2 sends back a 1
throughout the connection between layer F2 and
F1. Now each unit in layer F1 becomes as input
the corresponding component of the input vector x
and of the weight vector w.
11
Hybrid Neural Networks
The Structure of ART-1 (Part 2 of 2) The i-th F1
unit compares xi with wi and outputs the product
xiwi. The reset unit receives this information
and also the components of x, weighted by p, the
attention parameter so that its own computation
is p (x1x2xn) - x.w 0 which is the same
as (x.w) / (x1x2xn) p The reset unit fires
only if the input lies outside the attention cone
of the winning unit. A reset signal is sent to
layer F2, but only the winning layer is
inhibited. This is turns activates the attention
unit and a new round of computation begins.
Hence, there is resonance.
12
Hybrid Neural Networks
The Structure of ART-1 (Some Final Details) The
weight vectors in layer F2 are initialized with
all components equal to 1 and p is selected to
satisfy 0ltplt1. This ensures that eventually an
unused vector will be recruited to represent a
new cluster. The selected weight vector w is
updated by pulling it in the direction of x. This
is done in ART-1 by turning of all component in w
which are zeros in x. The purpose of the reset
signal is to inhibit all units that do not
resonate with the input. A unit in layer F2,
which is still unused, can be selected for the
new cluster containing x. In this way,
sufficiently different input data can create a
new cluster. By modifying the value of the
attention parameter p, we can control the number
of clusters and how wide they are.
13
Hybrid Neural Networks
The Structure of ART-2 and ART-3 ART-2 uses
vectors that have real-valued components instead
of Boolean components. The dynamics of the ART-2
and ART-3 models is governed by differential
equations. However, computer simulations consume
too much time. Consequently, implementations
using analog hardware or a combination of optical
and electronic elements are more suited to this
kind of model.
14
Hybrid Neural Networks
Maximum entropy So whats the problem with ART ?
It tries to build clusters of the same size,
independently of the distribution data. So, is
there a better solution ? Yes, Allow the clusters
to have varying radii with a technique called the
Maximum Entropy Method. What is entropy ? The
entropy H of a data set of N points assigned to k
differently clusters c1, c2, c3,,cn is given
by H- p(c1)log(p(c1)) p(c1)log(p(c2)) ...
p(cn)log(p(cn)) where p(ci) denotes the
probability of hitting the i-th cluster, when an
element of the data set is picked at random.
Since the probabilities add up to 1, the cluster
that maximizes the entropy is one for which all
clusters are identical. This means that the
clusters will tend to cover the same number of
points.
15
Hybrid Neural Networks
Maximum entropy However, there is still a problem
- whenever the number of elements of each class
in the data set is different. Consider the case
of unlabeled speech data some phonemes are more
frequent than others and if a maximum entropy
method is used, the boundaries between clusters
will deviate from the natural solution and
classify some data erroneously. So how do we
solve this problem ? With the Boostrapped
Iterative Algorithm cluster Computer a maximum
entropy clustering with the training data. Label
the original data data according to this
clustering. select Build a new training set by
selecting from each class the same number of
points (random selection with replacement). Go to
the previous step.
16
Hybrid Neural Networks
Counterpropagation network Are there any other
hybrid network models ? Yes, the
counter-propagation network as proposed by
Hecht-Nielsen. So what are counter-propagation
networks designed for ? To approximate a
continuous mapping f and it inverse f-1. A
counter-propagation consists of an n-dimentional
input vector which is fed to a hidden layer
consisting of h cluster vectors. The output is
generated by a single linear associator unit.
The weights in the network are adjusted using
supervised learning. The above network can
successfully approximate functions of the form f
Rn -gt R.
17
Hybrid Neural Networks
Fig. 3 Simplified counterpropagation
nework
18
Hybrid Neural Networks

Counterpropagation network
The training phase is completed in two parts
Training of the hidden layer into a clustering of
input space that corresponds to an n-dimentional
Voronoi tiling. The hidden layers output needs
to be controlled so that only the element with
the highest activation fires.
The zi weights are then adjusted to represent the
value of the approximation for the cluster
region.
This network can be extended to handle multiple
output
units.

19
Hybrid Neural Networks

Fig. 4 Function approximation with a
counterpropagation network.

20
Hybrid Neural Networks

Spline networks
Can the approximation created by a
counterpropagation network be improved on? Yes
In the counterpropagation network the Voronoi
Tiling, is composed of a series horizontal tiles.
Each of which represents an average of the
function in that region.
The spline network solves this problem by
extending the hidden layer in the
counterpropagation network. Each unit is paired
with a linear associator, the cluster unit is
used to inhibit or activate the linear associator
which is connected to all inputs.
This modification allows the resulting set of
tiles to be oriented differently with respect to
each other. Creating an approximation with a
smaller quadratic error, and a better solution to
the problem.
Training proceeds as before except the newly
added linear associators are trained using back
propagation.

21
Hybrid Neural Networks

Fig. 5 Function approximation with linear
associators

22
Hybrid Neural Networks

Radial basis functions
Has a simular structure as that of the counter
propagation network. The difference is in the
activation function used for each unit is
Gaussian instead of Sigmoidal.
The Gaussian approach uses locally concentrated
functions.
The Sigmodal approach uses a smooth step
approach.
Which is better depends on the specific problem
at hand. If the function is smooth step then the
Gaussian approach would require more units, where
if the function is Gaussian then the Sigmodal
approach will require more units.

Write a Comment

User Comments (0)

About PowerShow.com

Modular Neural Networks II - PowerPoint PPT Presentation

Modular Neural Networks II

Each unit fires only for vector located inside it associated cone' of radius r' ... vector w is updated by pulling it in the direction of x. This is done in ART-1 ... – PowerPoint PPT presentation