RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS

Description:

RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265 walter.delashmit_at_lmco.com – PowerPoint PPT presentation

Number of Views:121

Avg rating:3.0/5.0

Slides: 31

Provided by: Walte224

Category:

more less

Transcript and Presenter's Notes

Title: RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS

1
RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON
NEURAL NETWORKS

Walter H. Delashmit
Lockheed Martin Missiles and Fire Control
Dallas, TX 75265
walter.delashmit_at_lmco.com
walter.delashmit_at_verizon.net
Michael T. Manry
The University of Texas at Arlington
Arlington, TX 76010
manry_at_uta.edu
Memphis Area Engineering and Science Conference
2005
May 11, 2005

2
Outline of Presentation

Review of Multilayer Perceptron Neural Networks
Network Initial Types and Training Problems
Common Starting Point Initialized Networks
Dependently Initialized Networks
Separating Mean Processing
Summary

3
Review of Multilayer Perceptron Neural Networks
4
Typical 3 Layer MLP
5
MLP Performance Equations
Mean Square Error (MSE)
Output
Net Function
6
Net Control

Scales and shifts all net functions so that
they do not generate small gradients and do not
allow large inputs to mask the potential effects
of small inputs

7
Neural Network Training Algorithms

Backpropagation Training
Output Weight Optimization Hidden Weight
Optimization (OWO-HWO)
Full Conjugate Gradient

8
Output Weight Optimization Hidden Weight
Optimization (OWO-HWO)

Used in this development
Linear equations used to solve for output weights
in OWO
Separate error functions for each hidden unit are
used and multiple sets of linear equations solved
to determine the weights connecting to the hidden
units in HWO

9
Network Initial Types and Training Problems
10
Problem Definition

Assume that a set of MLPs of different sizes are
to be designed for a given training data set
Let be the set of all MLPs for that
training data having Nh hidden units, Eint(Nh)
denote the corresponding training error of am
initial network that belongs to
Let Ef(Nh) denote the corresponding training
error of a well-trained network
Let Nhmax denote the maximum number of hidden
units for which networks are to be designed
Goal Choose a set of initial networks from S0,
S1, S2, such that
Eint(0) ? Eint (1) ? Eint (2) ? . Eint(Nhmax)
and train the network to minimize Ef(Nh)
such that Ef(0) ? Ef (1) ? Ef (2) ? . Ef(Nhmax)
Axiom 3.1 If Ef(Nh) ? Ef (Nh-1) then the network
having Nh hidden units is useless since the
training resulted in a larger, more complex
network with a larger or the same training error.

11
Network Design Methodologies

Design Methodology One (DM-1) A well-organized
researcher may design a set of different size
networks in an orderly fashion, each with one or
more hidden units than the previous network
Thorough design approach
May take longer time to design
Allows achieving a trade-off between network
performance and size
Design Methodology Two (DM-2) A researcher may
design different size networks in no particular
order
May be quickly pursued for only a few networks
Possible that design could be significantly
improved with a bit more attention to network
design

12
Three Types of Networks Defined

Randomly Initialized (RI) Networks No members
of this set of networks have any initial weights
and thresholds in common. Practically this means
that the initial random number seeds (IRNS) are
widely separated. Useful when the goal is to
quickly design one or more networks of the same
or different sizes whose weights are
statistically independent of each other. Can be
designed using DM-1 or DM-2
Common Starting Points Initialized (CSPI)
Networks When a set of networks are CSPI, each
one starts with the same IRNS. These networks are
useful when it is desired to make performance
comparisons of networks that have the same IRNS
for the starting point. Can be designed using
DM-1 or DM-2
Dependently Initialized (DI) Networks A series
of networks are designed with each subsequent
network having one or more hidden units than the
previous network. Larger size networks are
initialized using the final weights and
thresholds from training a smaller size network
for the values of the common weights and
thresholds. DI networks are useful when the goal
is a thorough analysis of network performance
versus size and are most relevant to being
designed using DM-1.

13
Network Properties

Theorem 3.1 If two initial RI networks (1) are
the same size, (2) have the same training data
set and (3) the training data set has more than
one unique input vector, then the hidden unit
basis functions are different for the two
networks.
Theorem 3.2 If two CSPI networks (1) are the
same size and (2) use the same algorithm for
processing random numbers into weights, then they
are identical.
Corollary 3.2 If two initial CSPI networks are
the same size and use the same algorithm for
processing random numbers into weights, then they
have all common basis functions.

14
Problems with MLP Training

Non-monotonic Ef(Nh)
No standard way to initialize and train
additional hidden units
Net control parameters are arbitrary
No procedure to initialize and train DI networks
Network linear and nonlinear component
interference

15
Mapping Error Examples
16
Tasks Performed in this Research

Analysis of RI networks
Improved Initialization in CSPI networks
Improved initialization of new hidden units in DI
networks
Analysis of separating mean training approaches

17
CSPI and CSPI-SWI Networks

Improvement to RI networks
Each CSPI network starts with same IRNS
Extended to CSPI-SWI (Structured Weight
Initialization) networks
Every hidden unit of the larger network has the
same initial weights and threshold values as the
corresponding units of the smaller network
Input to output weights and thresholds are also
identical
Theorem 5.1 If two CSPI networks are designed
with structured weight initialization, the common
subset of the hidden unit basis functions are
identical.
Corollary 5.1 If two CSPI networks are designed
using structured weight initialization, the only
initial basis functions that are not the same are
the hidden unit basis functions for the
additional hidden units in the larger network.
Detailed flow chart for CSPI-SWI initialization
in dissertation

18
CSPI-SWI Examples
fm
twod
19
DI Network Development and Evaluation

Improvement over RI, CSPI and CSPI-SWI networks
The values of the common subset of the initial
weights and thresholds for the larger network
are initialized with the final weights and
thresholds from a previously well-trained smaller
network
Designed with DM-1
Single network designs ? networks are
implementable
After training, testing is feasible on a
different set of data set

20
Basic DI Network Flowgraph
21
Properties of DI Networks

Eint(Nh) lt Eint(Nh-p)
Ef(Np) curve is monotonic non-increasing (i. e.,
Ef(Nh) ? Ef(Nh-p))
Eint(Nh) Ef(Nh-p)

22
Performance Results for DI Networks with Fixed
Iterations
fm
twod
F24
F17
23
RI Network and DI Network Comparison

(1) DI network standard DI network design for
Nh hidden units
(2) RI type 1 RI networks were designed using
a single network for each value of Nh and every
network of size Nh was trained using the value of
Niter that the corresponding network was trained
with for the DI network.
(3) RI type 2 RI networks were designed using
a single network for each value of Nh and every
network was trained using the total number of
Niter that was used for the entire sequence of DI
networks. This can be expressed by
This results in the RI type 2 network actually
having a larger value of Niter than the
DI network.

24
RI Network and DI Network Comparison Results
fm
twod
25
Separating Mean Processing Techniques

Bottom-Up Separating Mean
Top-Down Separating Mean

26
Bottom-Up Separating Mean
Generate new desired output vector
Generate linear
Train MLP
mapping results
using new data

Basic Idea
A linear mapping is removed from the training
data.
The nonlinear fit to the resulting data may
perform better.

27
Bottom-up Separating Mean Results
fm
power12
single2
28
Top-Down Separating Mean

Basic Idea
If we know which subsets of inputs and outputs
have the same means in Signal Model 2 and 3, we
can estimate and remove these means.
Network performance is more robust.

29
Separating Mean Results
power12
30
Conclusions

On the average CSPI-SWI networks have more
monotonic non-increasing MSE versus Nh curves
than RI networks
MSE versus Nh curves are always monotonic
non-increasing for DI networks
DI network training was improved by calculating
the number of training iterations and limiting
the amount of training used for previously
trained units
DI networks always produce more consistent MSE
versus Nh curves than RI, CSPI and CSPI-SWI
networks
Separating mean processing using both a bottom-up
and top-down architecture often produce improved
performance results
A new technique was developed to determine which
inputs and outputs are similar to use for
top-down separating mean processing