Chapter 5 NEURAL NETWORKS - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Chapter 5 NEURAL NETWORKS

Description:

Sinusoidal data set. Over-fitting. Regularization (2) One approach is to choose the specific solution having the smallest validation set error ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 37
Provided by: james678
Category:

less

Transcript and Presenter's Notes

Title: Chapter 5 NEURAL NETWORKS


1
Chapter 5NEURAL NETWORKS
  • by S. Betul Ceran

2
Outline
  • Introduction
  • Feed-forward Network Functions
  • Network Training
  • Error Backpropagation
  • Regularization

3
Introduction
4
Multi-Layer Perceptron (1)
  • Layered perceptron networks can realize any
    logical function, however there is no simple way
    to estimate the parameters/generalize the (single
    layer) Perceptron convergence procedure
  • Multi-layer perceptron (MLP) networks are a class
    of models that are formed from layered sigmoidal
    nodes, which can be used for regression or
    classification purposes.
  • They are commonly trained using gradient descent
    on a mean squared error performance function,
    using a technique known as error back propagation
    in order to calculate the gradients.
  • Widely applied to many prediction and
    classification problems over the past 15 years.

5
Multi-Layer Perceptron (2)
  • XOR (exclusive OR) problem
  • 000
  • 1120 mod 2
  • 101
  • 011
  • Perceptron does not work here!

Single layer generates a linear decision boundary
6
Universal Approximation
1st layer
2nd layer
3rd layer
Universal Approximation Three-layer network can
in principle approximate any function with any
accuracy!
7
Feed-forward Network Functions

  • (1)

  • f nonlinear activation function
  • Extensions to previous linear models by hidden
    units
  • Make basis function F depend on the parameters
  • Adjust these parameters during training
  • Construct linear combinations of the input
    variables x1, , xD.
  • (2)
  • Transform each of them using a nonlinear
    activation function
  • (3)

8
Contd
  • Linearly combine them to give output unit
    activations
  • (4)
  • Key difference with perceptron is the continuous
    sigmoidal nonlinearities in the hidden units i.e.
    neural network function is differentiable w.r.t
    network parameters
  • Whereas perceptron uses step-functions
  • Weight-space symmetry
  • Network function is unchanged by certain
    permutations and the sign flips in the weight
    space.
  • E.g. tanh( a) tanh(a) flip the sign of all
    weights out of that hidden unit

9
Two-layer neural network
zj hidden unit
10
A multi-layer perceptron fitting into different
functions
f(x)x2
f(x)sin(x)
f(x)H(x)
f(x)x
11
Network Training
  • Problem of assigning credit or blame to
    individual elements involved in forming overall
    response of a learning system (hidden units)
  • In neural networks, problem relates to deciding
    which weights should be altered, by how much and
    in which direction.
  • Analogous to deciding how much a weight in the
    early layer contributes to the output and thus
    the error
  • We therefore want to find out how weight wij
    affects the error ie we want

12
Error Backpropagation
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
Two phases of back-propagation
19
Activation and Error back-propagation
20
Weight updates
21
Other minimization procedures
22
Two schemes of training
  • There are two schemes of updating weights
  • Batch Update weights after all patterns have
    been presented (epoch).
  • Online Update weights after each pattern is
    presented.
  • Although the batch update scheme implements the
    true gradient descent, the second scheme is often
    preferred since
  • it requires less storage,
  • it has more noise, hence is less likely to get
    stuck in a local minima (which is a problem with
    nonlinear activation functions). In the online
    update scheme, order of presentation matters!

23
Problems of back-propagation
  • It is extremely slow, if it does converge.
  • It may get stuck in a local minima.
  • It is sensitive to initial conditions.
  • It may start oscillating.

24
Regularization (1)
  • How to adjust the number of hidden units to get
    the best performance while avoiding over-fitting
  • Add a penalty term to the error function
  • The simplest regularizer is the weight decay

25
Changing number of hidden units
Over-fitting
Sinusoidal data set
26
Regularization (2)
  • One approach is to choose the specific solution
    having the smallest validation set error

Error vs. Number of hidden units
27
Consistent Gaussian Priors
  • One disadvantage of weight decay is its
    inconsistency with certain scaling properties of
    network mappings
  • A linear transformation in the input would be
    reflected to the weights such that the overall
    mapping unchanged

28
Contd
  • A similar transformation can be achieved in the
    output by changing the 2nd layer weights
    accordingly
  • Then a regularizer of the following form would be
    invariant under the linear transformations
  • W1 set of weights in 1st layer
  • W2 set of weights in 2nd layer

29
Effect of consistent gaussian priors
30
Early Stopping
  • A method to
  • obtain good generalization performance and
  • control the effective complexity of the network
  • Instead of iteratively reducing the error until a
    minimum of the training data set has been reached
  • Stop at the point of smallest error w.r.t. the
    validation data set

31
Effect of early stopping
Training Set
Error vs. Number of iterations
Validation Set
A slight increase in the validation set error
32
Invariances
  • Alternative approaches for encouraging an
    adaptive model to exhibit the required
    invariances
  • E.g. position within the image, size

33
Various approaches
  • Augment the training set using transformed
    replicas according to the desired invariances
  • Add a regularization term to the error function
    tangent propagation
  • Extract the invariant features in the
    pre-processing for later use.
  • Build the invariance properteis into the network
    structure convolutional networks

34
Tangent Propagation (Simard et al., 1992)
  • A continuous transformation on a particular input
    vextor xn can be approximated by the tangent
    vector tn
  • A regularization function can be derived by
    differentiating the output function y w.r.t. the
    transformation parameter, ?

35
Tangent vector implementation
Tangent vector corresponding to a clockwise
rotation
Original image x
True image rotated
Adding a small contribution from the tangent
vector xet
36
References
  • Neurocomputing course slides by Erol Sahin. METU,
    Turkey.
  • Backpropagation of a Multi-Layer Perceptron by
    Alexander Samborskiy. University of Missouri,
    Columbia.
  • Neural Networks - A Systematic Introduction by
    Raul Rojas. Springer.
  • Introduction to Machine Learning by Ethem
    Alpaydin. MIT Press.
  • Neural Networks course slides by Andrew
    Philippides. University of Sussex, UK.
Write a Comment
User Comments (0)
About PowerShow.com