RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS

Description:

RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265 walter.delashmit_at_lmco.com – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 31
Provided by: Walte224
Category:

less

Transcript and Presenter's Notes

Title: RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS


1
RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON
NEURAL NETWORKS
  • Walter H. Delashmit
  • Lockheed Martin Missiles and Fire Control
  • Dallas, TX 75265
  • walter.delashmit_at_lmco.com
  • walter.delashmit_at_verizon.net
  • Michael T. Manry
  • The University of Texas at Arlington
  • Arlington, TX 76010
  • manry_at_uta.edu
  • Memphis Area Engineering and Science Conference
    2005
  • May 11, 2005

2
Outline of Presentation
  • Review of Multilayer Perceptron Neural Networks
  • Network Initial Types and Training Problems
  • Common Starting Point Initialized Networks
  • Dependently Initialized Networks
  • Separating Mean Processing
  • Summary

3
Review of Multilayer Perceptron Neural Networks
4
Typical 3 Layer MLP
5
MLP Performance Equations
Mean Square Error (MSE)
Output
Net Function
6
Net Control
  • Scales and shifts all net functions so that
    they do not generate small gradients and do not
    allow large inputs to mask the potential effects
    of small inputs

7
Neural Network Training Algorithms
  • Backpropagation Training
  • Output Weight Optimization Hidden Weight
    Optimization (OWO-HWO)
  • Full Conjugate Gradient

8
Output Weight Optimization Hidden Weight
Optimization (OWO-HWO)
  • Used in this development
  • Linear equations used to solve for output weights
    in OWO
  • Separate error functions for each hidden unit are
    used and multiple sets of linear equations solved
    to determine the weights connecting to the hidden
    units in HWO

9
Network Initial Types and Training Problems
10
Problem Definition
  • Assume that a set of MLPs of different sizes are
    to be designed for a given training data set
  • Let be the set of all MLPs for that
    training data having Nh hidden units, Eint(Nh)
    denote the corresponding training error of am
    initial network that belongs to
  • Let Ef(Nh) denote the corresponding training
    error of a well-trained network
  • Let Nhmax denote the maximum number of hidden
    units for which networks are to be designed
  • Goal Choose a set of initial networks from S0,
    S1, S2, such that
  • Eint(0) ? Eint (1) ? Eint (2) ? . Eint(Nhmax)
    and train the network to minimize Ef(Nh)
  • such that Ef(0) ? Ef (1) ? Ef (2) ? . Ef(Nhmax)
  • Axiom 3.1 If Ef(Nh) ? Ef (Nh-1) then the network
    having Nh hidden units is useless since the
    training resulted in a larger, more complex
    network with a larger or the same training error.

11
Network Design Methodologies
  • Design Methodology One (DM-1) A well-organized
    researcher may design a set of different size
    networks in an orderly fashion, each with one or
    more hidden units than the previous network
  • Thorough design approach
  • May take longer time to design
  • Allows achieving a trade-off between network
    performance and size
  • Design Methodology Two (DM-2) A researcher may
    design different size networks in no particular
    order
  • May be quickly pursued for only a few networks
  • Possible that design could be significantly
    improved with a bit more attention to network
    design

12
Three Types of Networks Defined
  • Randomly Initialized (RI) Networks No members
    of this set of networks have any initial weights
    and thresholds in common. Practically this means
    that the initial random number seeds (IRNS) are
    widely separated. Useful when the goal is to
    quickly design one or more networks of the same
    or different sizes whose weights are
    statistically independent of each other. Can be
    designed using DM-1 or DM-2
  • Common Starting Points Initialized (CSPI)
    Networks When a set of networks are CSPI, each
    one starts with the same IRNS. These networks are
    useful when it is desired to make performance
    comparisons of networks that have the same IRNS
    for the starting point. Can be designed using
    DM-1 or DM-2
  • Dependently Initialized (DI) Networks A series
    of networks are designed with each subsequent
    network having one or more hidden units than the
    previous network. Larger size networks are
    initialized using the final weights and
    thresholds from training a smaller size network
    for the values of the common weights and
    thresholds. DI networks are useful when the goal
    is a thorough analysis of network performance
    versus size and are most relevant to being
    designed using DM-1.

13
Network Properties
  • Theorem 3.1 If two initial RI networks (1) are
    the same size, (2) have the same training data
    set and (3) the training data set has more than
    one unique input vector, then the hidden unit
    basis functions are different for the two
    networks.
  • Theorem 3.2 If two CSPI networks (1) are the
    same size and (2) use the same algorithm for
    processing random numbers into weights, then they
    are identical.
  • Corollary 3.2 If two initial CSPI networks are
    the same size and use the same algorithm for
    processing random numbers into weights, then they
    have all common basis functions.

14
Problems with MLP Training
  • Non-monotonic Ef(Nh)
  • No standard way to initialize and train
    additional hidden units
  • Net control parameters are arbitrary
  • No procedure to initialize and train DI networks
  • Network linear and nonlinear component
    interference

15
Mapping Error Examples
16
Tasks Performed in this Research
  • Analysis of RI networks
  • Improved Initialization in CSPI networks
  • Improved initialization of new hidden units in DI
    networks
  • Analysis of separating mean training approaches

17
CSPI and CSPI-SWI Networks
  • Improvement to RI networks
  • Each CSPI network starts with same IRNS
  • Extended to CSPI-SWI (Structured Weight
    Initialization) networks
  • Every hidden unit of the larger network has the
    same initial weights and threshold values as the
    corresponding units of the smaller network
  • Input to output weights and thresholds are also
    identical
  • Theorem 5.1 If two CSPI networks are designed
    with structured weight initialization, the common
    subset of the hidden unit basis functions are
    identical.
  • Corollary 5.1 If two CSPI networks are designed
    using structured weight initialization, the only
    initial basis functions that are not the same are
    the hidden unit basis functions for the
    additional hidden units in the larger network.
  • Detailed flow chart for CSPI-SWI initialization
    in dissertation

18
CSPI-SWI Examples
fm
twod
19
DI Network Development and Evaluation
  • Improvement over RI, CSPI and CSPI-SWI networks
  • The values of the common subset of the initial
    weights and thresholds for the larger network
    are initialized with the final weights and
    thresholds from a previously well-trained smaller
    network
  • Designed with DM-1
  • Single network designs ? networks are
    implementable
  • After training, testing is feasible on a
    different set of data set

20
Basic DI Network Flowgraph
21
Properties of DI Networks
  • Eint(Nh) lt Eint(Nh-p)
  • Ef(Np) curve is monotonic non-increasing (i. e.,
    Ef(Nh) ? Ef(Nh-p))
  • Eint(Nh) Ef(Nh-p)

22
Performance Results for DI Networks with Fixed
Iterations
fm
twod
F24
F17
23
RI Network and DI Network Comparison
  • (1)   DI network standard DI network design for
    Nh hidden units
  • (2)   RI type 1 RI networks were designed using
    a single network for each value of Nh and every
    network of size Nh was trained using the value of
    Niter that the corresponding network was trained
    with for the DI network.
  • (3)   RI type 2 RI networks were designed using
    a single network for each value of Nh and every
    network was trained using the total number of
    Niter that was used for the entire sequence of DI
    networks. This can be expressed by
  • This results in the RI type 2 network actually
    having a larger value of Niter than the
  • DI network.

24
RI Network and DI Network Comparison Results
fm
twod
25
Separating Mean Processing Techniques
  • Bottom-Up Separating Mean
  • Top-Down Separating Mean

26
Bottom-Up Separating Mean
Generate new desired output vector
Generate linear
Train MLP
mapping results
using new data
  • Basic Idea
  • A linear mapping is removed from the training
    data.
  • The nonlinear fit to the resulting data may
    perform better.

27
Bottom-up Separating Mean Results
fm
power12
single2
28
Top-Down Separating Mean
  • Basic Idea
  • If we know which subsets of inputs and outputs
    have the same means in Signal Model 2 and 3, we
    can estimate and remove these means.
  • Network performance is more robust.

29
Separating Mean Results
power12
30
Conclusions
  • On the average CSPI-SWI networks have more
    monotonic non-increasing MSE versus Nh curves
    than RI networks
  • MSE versus Nh curves are always monotonic
    non-increasing for DI networks
  • DI network training was improved by calculating
    the number of training iterations and limiting
    the amount of training used for previously
    trained units
  • DI networks always produce more consistent MSE
    versus Nh curves than RI, CSPI and CSPI-SWI
    networks
  • Separating mean processing using both a bottom-up
    and top-down architecture often produce improved
    performance results
  • A new technique was developed to determine which
    inputs and outputs are similar to use for
    top-down separating mean processing
Write a Comment
User Comments (0)
About PowerShow.com