Single-Layer Perceptrons - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Single-Layer Perceptrons

Description:

Steepest Descent, Newton's, Gauss-Newton's methods. Method of ... Gauss-Newton Method(II) where J(n) is the n-by-m Jacobian matrix of ... Gauss-Newton Method ... – PowerPoint PPT presentation

Number of Views:932
Avg rating:3.0/5.0
Slides: 12
Provided by: Jinhyu
Category:

less

Transcript and Presenter's Notes

Title: Single-Layer Perceptrons


1
  • Single-Layer Perceptrons
  • (3.1 - 3.3)
  • CS679 Lecture Note
  • by Lim Jae Hong
  • Physics Department
  • KAIST

2
Perceptron
  • The simplest form of a neural network
  • consists of a single neuron with adjustable
    synaptic weights and bias
  • performs pattern classification with only two
    classes
  • perceptron convergence theorem
  • Patterns (vectors) are drawn from two linearly
    separable classes
  • During training, the perceptron algorithm
    converges and positions the decision surface in
    the form of hyperplane between two classes by
    adjusting synaptic weights
  • With more than one neuron at output layer, we may
    correspondingly form classification with more
    than two classes which are linearly separable

3
Adaptive Filtering Problem
  • An unknown dynamical system (Fig. 3.1) with
  • input m-dimensional vector x(i)
    x1(i),x2(i),,xm(i) T
  • output scalar d(i)
  • Adaptive filter
    a
    model of the unknown dynamical system by building
    it around a single linear neuron
  • The algorithm starts from an arbitrary setting of
    the neurons synaptic weights
  • Adjustments to the synaptic weights, in response
    to statistical variations in the systems
    behavior, are made on a continuous basis
  • computations of adjustments to the synaptic
    weights are completed inside a time interval that
    is one sampling period long.

4
Adaptive Filters Operation
  • Filtering process (computes two signals)
  • output y(i) ( v(i) xT(i)w(i) )
  • error signal e(i) ( d(i) - y(i) , d(I) acts as
    a target signal)
  • Adaptive process
  • automatic adjustment of the synaptic weights of
    the neuron in accordance with the error signal
    e(i)
  • Thus, the combination of these two processes
    working together constitutes a feedback
    loop acting around the neuron

5
Unconstrained Optimization Techniques
  • Cost function E(w)
  • continuously differentiable
  • a measure of how to choose w of an adaptive
    filtering algorithm so that it behaves in an
    optimum manner
  • we want to find an optimal solution w that
    minimize E(w)
  • local iterative descent
  • starting with an initial guess denoted by w(0),
    generate a sequence of weight vectors w(1), w(2),
    , such that the cost function E(w) is reduced at
    each iteration of the algorithm, as shown by
  • E(w(n1)) lt E(w(n))
  • Steepest Descent, Newtons, Gauss-Newtons
    methods

6
Method of Steepest Descent
  • Here the successive adjustments applied to w are
    in the direction of steepest descent, that is, in
    a direction opposite to the gradE(w)
  • w(n1) w(n) - a g(n)
  • a small positive constant

  • called step size or learning-rate para.
  • g(n) gradE(w)
  • The method of steepest descent converges to the
    optimal solution w slowly
  • The learning rate parameter a has a profound
    influence on its convergence behavior
  • overdamped, underdamped, or even
    unstable(diverges)

7
Newtons Method(I)
  • Using a second-order Taylor series expansion of
    the cost function around the point w(n)
  • DE(w(n)) E(w(n1)) - E(w(n))
  • gT(n) Dw(n) 1/2
    DwT(n) H(n) Dw(n)
  • where Dw(n) w(n1) - w(n) ,
  • H(n) Hessian matrix of E(n) (eq. 3.15)
  • We want Dw(n) that minimize DE(w(n)) so
    differentiate DE(w(n)) with respect to Dw(n)
  • g(n) H(n) Dw(n) 0
  • so,
  • Dw(n) -1/ H(n) g(n)

8
Newtons Method(II)
  • Finally,
  • w(n1) w(n) Dw(n)
  • w(n) - 1/H(n) g(n)
  • Newtons method converges quickly asymptotically
    and does not exhibit the zigzagging behavior
  • the Hessian H(n) has to be a positive definite
    matrix for all n

9
Gauss-Newton Method(I)
  • The Gauss-Newton method is applicable to a cost
    function
  • Because the error signal e(i) is a function of w,
    we linearize the dependence of e(i) on w by
    writing
  • Equivalently, by using matrix notation we may
    write

10
Gauss-Newton Method(II)
  • where J(n) is the n-by-m Jacobian matrix of e(n)
    (eq 3.20)
  • We want updated weight vector w(n1) defined by
  • simple algebraic calculation tells
  • Now differentiate this expression with respect
    to w and set the result to 0, we obtain

11
Gauss-Newton Method(III)
  • Thus we get
  • To guard against the possibility that the matrix
    product JT(n)J(n) is singular, the customary
    practice is
  • where is a small positive constant.
  • This modification effect is progressively
    reduced as the number of iterations, n, is
    increased.
Write a Comment
User Comments (0)
About PowerShow.com