Title: Lecture 4 Learning Theory of Neural Networks
1Lecture 4 Learning Theory of Neural Networks
Saint-Petersburg State Polytechnic
University Distributed Intelligent Systems
Dept ARTIFICIAL NEURAL NETWORKS
- Prof. Dr. Viacheslav P. Shkodyrev
- EMail shkodyrev_at_imop.spbstu.ru
One of the amazing features of human brain
consists in his ability of learning or
self-learning actions. Artificial Neural Networks
simulates similar ability to adapt for, or
learning from a set of training patterns. How
does it make it and which mathematical model
describe this solutions. We present an
algebraic approach to leaning theory of Neural
Networks.
2 Learning is the main neural paradigms.
Via information point of view learning encodes
information in Neural Networks. A system learns a
pattern if the system encodes the pattern in its
structure. So, the system structure changes as
the system learns the information.
- Objective of Lecture 4
- The main purpose of this lecture is to
introduce a mathematical background of learning
theory of ANN including different methods and
techniques. Lecture covers an algebraic
interpretation of learning technique via
optimization theory and algorithmic methods of
task solution via unconstraint optimization
problem. We will discuss - Generalized model and different learning
paradigms, - Algebraic approach to learning via
optimization theory - Deterministic model of optimization task
solution - Computation models and technique of learning
via gradient descent algorithm - Stochastic gradient strategy.
3 One of the amazing features of human brain
consists in their ability of learning or
self-learning actions. ANN simulates similar
ability to adapt for, or learning from a set of
training patterns. How does it make it and which
math model describe this solutions.
The processing ability of the network is stored
in the inter-unit connection strengths, or
weights, obtained by a process of adaptation to,
or learning from a set of training patterns.
4 The learning or training is a purposeful
adjustment of synaptic links weights in such a
way that to achieve the best desirable reaction
to input pattern observation. Most typical
approach to this problem is based on the
minimization of an error function or
computational energy function (also called a cost
or an objective function).
5Today, there are three main learning paradigms
6Neural Net Functionality
Formulation of optimization problem
General formulation
where
7(No Transcript)
8 Deterministic strategy of iIterative solution
is based on assumption that cost function is
deterministic and may be described by a set of
ordinary differential equation. It allow to
determine a step direction as gradient of error
function
- is Hessian matrix. On base of this strategy
some techniques were designed. Among them are
9Steepest-ascent/descent strategy - It is the
most obvious and trivial (but tedious) strategy
of searching via step-by-step we analyze all
derivative at every step. We analyze the
direction of extreme searching via gradient
Fast (quickest) descent strategy - It is an
acceleration way of calculation at the sacrifice
of only one dimensional steepest strategy we
analyze one-directional derivative until finding
one-directional minimum.
The process is repeated with every coordinates
sequentially. In Picture trajectory 2.
In Picture trajectory 1.
10Newtons method strategy This approach takes into
account information about high derivatives - the
second derivative of error surface by Hessian
matrix
Quasi-Newton (variable metric) method strategy
This approach calculate simplification of
inverse Hessian matrix by means of its
approximation by some symmetric positive
definite matrix Hk
Conjugate gradient methods This approach is
based on computation of the actual search
direction as a linear combination of the
current gradient vector and the previous search
direction.
which characterize a current slope of error
surface.
In Picture trajectory 4.
In Picture trajectory 3.
11General case of nonlinear cost function
12(No Transcript)
13Energy model of potential well
To drag out a ball from such pit we need to shake
this surface until the moment, when the ball
turns out from one pit to another and, in
finally, in deepest pit.
To illustrate a stochastic approach to a local
minimum problem we can use a small iron ball on a
rough surface with many pits or dimple. How
we can remove this one to most deep global pit
if it is placed in a non-deep?
Mathematical analogue of surface shaking
is a stochastic variance of energy function
parameters.
14(No Transcript)
15(No Transcript)
16 The state of this stochastic feedback NN is
described by the computational energy function
The Boltzmann Machine a special class of
stochastic feedback NN consisting of binary
neurons connected mutually by symmetric weights.
Using the computational energy function
which simulates an annealing approach via
searching the state of neurons corresponding to
the global minimum of the energy function
17Conjugate Gradient nD algorithm
Algorithm determines a local minimum of a
function of n independent variables with the
Conjugate Gradient method
LabVIEW - Library
18Downhill Simplex nD Algorithm
Algorithm determines a local minimum of a
function of n independent variables with the
Downhill Simplex method
LabVIEW - Library
19- Haykin S. Neural Networks A Comprehensive
Foundation. Mac-Millan. N.York.1994. - Laurence Fausett Fundamentals of Neural
Networks Architecture, Algorithms, and
Applications . Prentice Hall, 1994. - Cichocki A. Unbehauen R. Neural Networks for
Optimization and Signal Processing, Wiley, 1993. -