Modular Neural Networks II - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Modular Neural Networks II

Description:

Each unit fires only for vector located inside it associated cone' of radius r' ... vector w is updated by pulling it in the direction of x. This is done in ART-1 ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 23
Provided by: pagesCpsc
Category:

less

Transcript and Presenter's Notes

Title: Modular Neural Networks II


1
Modular Neural Networks II
Presented by David Brydon Karl Martens David
Pereira
CPSC 533 - Artificial Intelligence Winter
2000 Instructor C. Jacob Date 16-March-2000
2
Presentation Agenda
  • A Reiteration Of Modular Neural Networks
  • Hybrid Neural Networks
  • Maximum Entropy
  • Counterpropagation Networks
  • Spline Networks
  • Radial Basis Functions
  • Note The information contained in this
    presentation has been obtained from Neural
    Networks A Systematic Introduction by R. Rojas.

3
A Reiteration of Modular Neural Networks
There are many different types of neural networks
- linear, recurrent, supervised, unsupervised,
self-organizing, etc. Each of these neural
networks have a different theoretical and
practical approach. However, each of these
different models can be combined. How ? Each of
the afore-mentioned neural networks can be
transformed into a module that can be freely
intermixed with modules of other types of neural
networks. Thus, we have Modular Neural Networks.
4
A Reiteration of Modular Neural Networks
  • But WHY do we have Modular Neural Network Systems
    ?
  • To Reduce Model Complexity
  • To Incorporate Knowledge
  • To Fuse Data and Predict Averages
  • To Combine Techniques
  • To Learn Different Tasks Simultaneously
  • To Incrementally Increase Robustness
  • To Emulate Its Biological Counterpart

5
Hybrid Neural Networks
  • A very well-known and promising family of
    architectures was developed by Stephen Grossberg.
  • It is called ART - Adaptive Resonance Theory.
  • It is closer to the biological paradigm than
    feed-forward networks or standard associative
    memories.
  • The dynamics of the networks resembles learning
    in humans.
  • One-shot learning can be recreated with this
    model.
  • There are three different architectures in this
    family
  • ART-1 Uses Boolean values
  • ART-2 Uses real values
  • ART-3 Uses differential equations

6
Hybrid Neural Networks
Each category in the input space is represented
by a vector. The ART networks classify a
stochastic series of vectors into clusters. All
vectors located inside the cone around each
weight vector are considered members of a
specific cluster. Each unit fires only for vector
located inside it associated cone of radius
r. The value r is inversely proportional to
the attention parameter of the unit. Large r
means classification of the input space is
fine. Small r means classification of the
input space is coarse.
7
Hybrid Neural Networks
Fig. 1. Vector clusters and attention
parameters
8
Hybrid Neural Networks
  • Once the weight vectors have been found, the
    network computes whether new data can or cannot
    be classified by the existing clusters.
  • If not, a new a new cluster is created with a new
    associated weight vector.
  • ART networks have two major advantages
  • Plasticity it can always react to unknown
    inputs (by creating a new cluster with a new
    weight vector, if the given input cannot be
    classified by existing clusters).
  • Stability Existing clusters are not deleted by
    the introduction of new inputs (New clusters will
    just be created in addition to the old ones).
  • However, enough potential weight vectors must be
    provided.

9
Hybrid Neural Networks
Fig. 2. The ART-1 Architecture
10
Hybrid Neural Networks
The Structure of ART-1 (Part 1 of 2) There are
two basic layers of computing units. Layer F1
receives binary input vectors from the input
sites. As soon as an input vector arrives it is
passed to layer F1 and from there to layer
F2. Layer F2 contains elements which fire
according to the winner-takes-all method. (Only
the element receiving the maximal scalar product
of its weight vector and input vector
fires). When a unit in layer F2 has fired, the
negative weight turns off the attention unit.
Also, the winning unit in layer F2 sends back a 1
throughout the connection between layer F2 and
F1. Now each unit in layer F1 becomes as input
the corresponding component of the input vector x
and of the weight vector w.
11
Hybrid Neural Networks
The Structure of ART-1 (Part 2 of 2) The i-th F1
unit compares xi with wi and outputs the product
xiwi. The reset unit receives this information
and also the components of x, weighted by p, the
attention parameter so that its own computation
is p (x1x2xn) - x.w 0 which is the same
as (x.w) / (x1x2xn) p The reset unit fires
only if the input lies outside the attention cone
of the winning unit. A reset signal is sent to
layer F2, but only the winning layer is
inhibited. This is turns activates the attention
unit and a new round of computation begins.
Hence, there is resonance.
12
Hybrid Neural Networks
The Structure of ART-1 (Some Final Details) The
weight vectors in layer F2 are initialized with
all components equal to 1 and p is selected to
satisfy 0ltplt1. This ensures that eventually an
unused vector will be recruited to represent a
new cluster. The selected weight vector w is
updated by pulling it in the direction of x. This
is done in ART-1 by turning of all component in w
which are zeros in x. The purpose of the reset
signal is to inhibit all units that do not
resonate with the input. A unit in layer F2,
which is still unused, can be selected for the
new cluster containing x. In this way,
sufficiently different input data can create a
new cluster. By modifying the value of the
attention parameter p, we can control the number
of clusters and how wide they are.
13
Hybrid Neural Networks
The Structure of ART-2 and ART-3 ART-2 uses
vectors that have real-valued components instead
of Boolean components. The dynamics of the ART-2
and ART-3 models is governed by differential
equations. However, computer simulations consume
too much time. Consequently, implementations
using analog hardware or a combination of optical
and electronic elements are more suited to this
kind of model.
14
Hybrid Neural Networks
Maximum entropy So whats the problem with ART ?
It tries to build clusters of the same size,
independently of the distribution data. So, is
there a better solution ? Yes, Allow the clusters
to have varying radii with a technique called the
Maximum Entropy Method. What is entropy ? The
entropy H of a data set of N points assigned to k
differently clusters c1, c2, c3,,cn is given
by H- p(c1)log(p(c1)) p(c1)log(p(c2)) ...
p(cn)log(p(cn)) where p(ci) denotes the
probability of hitting the i-th cluster, when an
element of the data set is picked at random.
Since the probabilities add up to 1, the cluster
that maximizes the entropy is one for which all
clusters are identical. This means that the
clusters will tend to cover the same number of
points.
15
Hybrid Neural Networks
Maximum entropy However, there is still a problem
- whenever the number of elements of each class
in the data set is different. Consider the case
of unlabeled speech data some phonemes are more
frequent than others and if a maximum entropy
method is used, the boundaries between clusters
will deviate from the natural solution and
classify some data erroneously. So how do we
solve this problem ? With the Boostrapped
Iterative Algorithm cluster Computer a maximum
entropy clustering with the training data. Label
the original data data according to this
clustering. select Build a new training set by
selecting from each class the same number of
points (random selection with replacement). Go to
the previous step.
16
Hybrid Neural Networks
Counterpropagation network Are there any other
hybrid network models ? Yes, the
counter-propagation network as proposed by
Hecht-Nielsen. So what are counter-propagation
networks designed for ? To approximate a
continuous mapping f and it inverse f-1. A
counter-propagation consists of an n-dimentional
input vector which is fed to a hidden layer
consisting of h cluster vectors. The output is
generated by a single linear associator unit.
The weights in the network are adjusted using
supervised learning. The above network can
successfully approximate functions of the form f
Rn -gt R.
17
Hybrid Neural Networks
Fig. 3 Simplified counterpropagation
nework
18
Hybrid Neural Networks
  • Counterpropagation network
  • The training phase is completed in two parts
  • Training of the hidden layer into a clustering of
    input space that corresponds to an n-dimentional
    Voronoi tiling. The hidden layers output needs
    to be controlled so that only the element with
    the highest activation fires.
  • The zi weights are then adjusted to represent the
    value of the approximation for the cluster
    region.
  • This network can be extended to handle multiple
    output
  • units.

19
Hybrid Neural Networks
  • Fig. 4 Function approximation with a
    counterpropagation network.

20
Hybrid Neural Networks
  • Spline networks
  • Can the approximation created by a
    counterpropagation network be improved on? Yes
  • In the counterpropagation network the Voronoi
    Tiling, is composed of a series horizontal tiles.
    Each of which represents an average of the
    function in that region.
  • The spline network solves this problem by
    extending the hidden layer in the
    counterpropagation network. Each unit is paired
    with a linear associator, the cluster unit is
    used to inhibit or activate the linear associator
    which is connected to all inputs.
  • This modification allows the resulting set of
    tiles to be oriented differently with respect to
    each other. Creating an approximation with a
    smaller quadratic error, and a better solution to
    the problem.
  • Training proceeds as before except the newly
    added linear associators are trained using back
    propagation.

21
Hybrid Neural Networks
  • Fig. 5 Function approximation with linear
    associators

22
Hybrid Neural Networks
  • Radial basis functions
  • Has a simular structure as that of the counter
    propagation network. The difference is in the
    activation function used for each unit is
    Gaussian instead of Sigmoidal.
  • The Gaussian approach uses locally concentrated
    functions.
  • The Sigmodal approach uses a smooth step
    approach.
  • Which is better depends on the specific problem
    at hand. If the function is smooth step then the
    Gaussian approach would require more units, where
    if the function is Gaussian then the Sigmodal
    approach will require more units.
Write a Comment
User Comments (0)
About PowerShow.com