WK4 - PowerPoint PPT Presentation

About This Presentation
Title:

WK4

Description:

WK4 Radial Basis Function Networks CS 476: Networks of Neural Computation WK4 Radial Basis Function Networks Dr. Stathis Kasderidis Dept. of Computer Science – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 41
Provided by: geo92
Category:

less

Transcript and Presenter's Notes

Title: WK4


1
WK4 Radial Basis Function Networks
CS 476 Networks of Neural Computation WK4
Radial Basis Function Networks Dr. Stathis
Kasderidis Dept. of Computer Science University
of Crete Spring Semester, 2009
2
Contents
  • Introduction to Time Series Analysis
  • Prediction Problem
  • Predicting Time Series with Neural Networks
  • Radial Basis Function Network
  • Conclusions

Contents
3
Introduction to Time Series Analysis
  • There are two major classes of statistical
    problems
  • Classification problems (given an input x find in
    which of a set of K known classes it belongs to)
  • Regression problems (try to build a functional
    relationship between independent and regressed
    variables. The former are the effects, while the
    latter are the the causes).
  • The regression problems are created due to the
    need for
  • Explanation
  • Prediction
  • Control

Time Series
4
Introduction to Time Series Analysis II
  • In a regression problem, there are two high-level
    issues to determine
  • The nature of the mechanism that generates the
    data (stochastic or deterministic). This affects
    which class of models he will use use
  • A modelling procedure.

Time Series
5
Introduction to Time Series Analysis III
  • A modelling procedure includes usually the
    following steps
  • Specification of a model
  • If it describes a function or a probability
    distribution
  • If it is linear or non-linear
  • If it is parametric or non-parametric
  • If it is a mixture or a single function
  • It it includes time explicitly or not
  • It it include memory or not.

Time Series
6
Introduction to Time Series Analysis IV
  • Preparation of the data
  • Noise reduction
  • Scaling
  • Appropriate representation for the target
    problem
  • Transformations
  • De-correlation (cleaning up spatial or temporal
    correlation structure)
  • Feature extraction
  • Handling missing values

Time Series
7
Introduction to Time Series Analysis V
  • An estimation procedure (i.e. a framework to
    estimate the model parameters)
  • Maximum Likelihood estimation
  • Bayesian estimation
  • (Ordinary) Least Squares
  • Numerical Techniques used in the estimation
    framework are
  • Optimisation
  • Integration
  • Graph-Theoretic methods
  • etc

Time Series
8
Introduction to Time Series Analysis VI
  • Availability of data
  • Enough in number
  • Quality
  • Resolution.
  • Resulting estimators created by the framework
    must be
  • Un-biased (i.e. do not systematically differ from
    the true model in a statistical sense)
  • Consistent (i.e. as the number of data grows the
    estimator approaches the true model with
    probability 1).

Time Series
9
Introduction to Time Series Analysis VII
  • A model selection procedure (i.e. to select the
    best model). Factors include
  • Goodness of Fit (i.e. how well fitted first the
    given data)
  • Generalisation (i.e. how well approaches the
    underlying data generation mechanism)
  • Confidence Intervals.

Time Series
10
Introduction to Time Series Analysis VIII
  • Testing a model
  • Testing the model in out of sample data
  • Re-iterate the modelling procedure until we
    produce a model with which we are satisfied
  • Compare different classes of models in order to
    find the best one
  • Usually we select the simplest class which
    describes well the data
  • There is not always available a comparison
    framework among different classes of models.
  • Neural Networks are semi-parametric, non-linear
    statistical modelling techniques

Time Series
11
The Prediction Problem
  • Def A time series, Xt, is a family of
    real-valued random variables indexed by t. The
    index t can take values in ? or ?.
  • When a family of variables is defined in all
    points in time it is called continuous, otherwise
    it is called discrete.
  • In practice we have always a discrete series due
    to discrete sampling times of a continuous series
    or due to digitization.
  • The length of a series is the time elapsed
    between the recoded start and finish of the
    series.

Prediction
12
The Prediction Problem II
  • Def A time series, Xt, is called (strictly)
    stationary if, for any t1, t2,, tn ? I, any k ?
    I and n1,2,
  • Where P denotes the joint distribution function
    of the set of random variables which appear as
    suffices and I is an appropriate indexing set.
  • Broadly speaking a time series is stationary if
    there is no systematic change in mean, if there
    is no systematic change in variance, and if
    strictly periodic variations have been removed.

Prediction
13
The Prediction Problem III
  • In classical time series analysis we decompose a
    time series to the following components
  • A trend (a long term movement)
  • Fluctuations about the trend of grater or less
    regularity
  • A seasonal component
  • A residual (irregular or random effect).
  • Typically probability theory of time series
    examines stationary series and investigates
    residuals for further structure. However, in
    other cases we may be interested in capturing the
    trend (i.e. function approximation).

Prediction
14
The Prediction Problem IV
  • It is assumed that if the residuals do not
    contain any further structure, then they behave
    like an IID (identical and independent
    distributed) process which usually is assumed to
    be the normal. Such a stochastic process cannot
    be modelled further, thus the analysis of a time
    series terminates
  • If on the other hand the series contains more
    structure, we re-iterate the analysis until the
    residuals do not contain any structure.
  • Tests to use for checking the normality of the
    residuals are
  • Kolmogorov-Smirnov test
  • BDS test, etc

Prediction
15
The Prediction Problem V
  • If the structure of the series is linear then we
    fit a linear model such as ARMA, or if it is
    non-stationary we fit the ARIMA model.
  • On the other hand for non-linear models we use
    the ARCH, GARCH and neural network models.
    Typically we fit first the linear component with
    a linear model and then the residuals with a
    non-linear model.

Prediction
16
The Prediction Problem VI
  • Usually a time series does not have all the
    desirable statistical properties so we transform
    it in order to achieve better results before we
    start the analysis. Typical transforms include
  • Stabilise the variance
  • Make seasonal effects additive
  • Make the data normally distributed
  • Filtering (FFT, moving averages, exponential
    smoothing, low and high-pass filters, etc)
  • Differencing (the preferred method for
    de-trending. We apply differencing until the time
    series becomes stationary).

Prediction
17
The Prediction Problem VII
  • Restating the prediction problem
  • We want to construct a model with an appropriate
    technique, which when is estimated can give
    'good' forecasts in new data. The new data
    commonly are some future values of the series. We
    want the model
  • to predict as accurately as possible the future
    values of the time series, given as input some
    previous values of the series.

Prediction
18
The Prediction Problem VIII
  • There are three main approaches which are used to
    model the series prediction problem
  • A. Assume a functional relationship as a
    generating mechanism. E.g. Xt1 F(Xt), where Xt
    is an appropriate vector of past values and F is
    the generating mechanism
  • B. Assume that the map F has multiple braches.
    Then the returned output represents the
    probability of obtaining Xt1 in any one of the
    branches of F.
  • C. Divide the input to a set of classes and try
    to learn the map from input to classes, I.e. a
    classification problem.

Prediction
19
Time Series Prediction using Neural Networks
  • To apply a neural network model in time series
    prediction we we have to make choices on the
    following issues
  • Preparing the data
  • Transforming the data (see above)
  • Handling missing values
  • Smoothing the data (if needed)
  • Scale the data (almost always a good idea!)
  • Dimensionality reduction (principal component
    analysis, factor analysis)
  • De-correlating data
  • Extracting Features (I.e. combination of
    variables)

TS NNs
20
Time Series Prediction using Neural Networks II
  • Representing variables
  • Continuous or discrete
  • Semantics of variables (i.e. probabilities,
    categories, data points, etc)
  • Distributed or atomic representation
  • Variables with little information content can be
    harmful in generalisation
  • In Bayesian estimation the method of Automatic
    Relevance Determination can be used for selecting
    variables
  • Selecting Features
  • Capturing of causal relations

TS NNs
21
Time Series Prediction using Neural Networks III
  • Discovering memory in the generating process
  • Trial and error
  • Partial Auto-correlation functions (linear)
  • Mutual Information function (non-linear)
  • Methods from Dynamical Systems theory
  • Determination of past values by fitting a model
    (e.g. linear) and eliminating past values with
    small contribution based on sensitivity.

TS NNs
22
Time Series Prediction using Neural Networks IV
  • Selecting an architecture
  • Type of training
  • Family of models
  • Transfer function
  • Memory
  • Network Topology
  • Other parameters in network specification.
  • Model selection
  • See discussion in WK3

TS NNs
23
Time Series Prediction using Neural Networks V
  • Determination of Confidence Intervals
  • Jacknife Method (a linear approximation of
    Bootstrap)
  • Bootstrap
  • Moving Blocks Bootstrap
  • Bootstrap t-interval
  • Bootstrap percentile interval
  • Bias-corrected and accelerated Bootstrap.

TS NNs
24
Time Series Prediction using Neural Networks VI
  • Additional Literature
  • Masters T. (1995). Neural, Novel Hybrid
    Algorithms for Time Series Prediction, Wiley.
  • Pawitan Y, (2001). In all Likelihood Statistical
    Modelling and Inference Using Likelihood, Oxford
    University Press.
  • Chatfield C. (1989). The analysis of time series.
    An introduction. 4th Ed. Chapman Hall.
  • Harvey A (1993). Time Series Models, Harvester
    Wheatsheaf.
  • Efron B., Tibshirani R. (1993). An introduction
    to Bootstrap, Chapman and Hall.

TS NNs
25
Radial Basis Function Model
  • There are only three layers Input, Hidden and
    Output. There is only one hidden layer.

RBF Model
26
Radial Basis Function Model II
  • The hidden layer provides a non-linear
    transformation of the input space to the hidden
    space, which is assumed usually of high enough
    dimension.
  • The output layer combines in a linear way the
    activations of the hidden layer.
  • Note The RBF model owns its development on ideas
    of fitting hyper-surfaces to data points in a
    high-dimensional space.
  • In Numerical Analysis, radial-basis functions
    were introduced for the solution of real
    multivariate interpolation problems.

RBF Model
27
Radial Basis Function Model III
  • In the RBF model the hidden units provide a set
    of functions that constitute an arbitrary
    basis for the input patterns when they are
    expanded to the hidden space.
  • The inspiration for the RBF model is based on
    Covers theorem (1965) on the separability of
    patterns
  • A complex pattern-classification problem cast in
    a high-dimensional space nonlinearly is more
    likely to be linearly separable than in a
    low-dimensional space.
  • This leads to consider the multivariable
    interpolation problem in high-dimensional space

RBF Model
28
Radial Basis Function Model IV
  • Given a set of N different points xi ?Rm0
    I1,2,..,N and a corresponding set of N real
    numbers di ?R1 I1,2,,N, find a function
    FRN ? R1 that satisfies the interpolation
    condition
  • F(xi) di , I1,2,,N
  • For strict interpolation the interpolating
    surface, i.e. F, is constrained to pass through
    all data points.
  • The radial-basis function (RBF) technique
    consists of choosing a function F that has the
    following form

RBF Model
29
Radial Basis Function Model V
  • Where ?(x-xi) I1,2,,N is a set of N
    arbitrary functions, known as radial-basis
    functions, and denotes a norm, which is
    usually the Euclidean. The data points xi ?Rm0
    are taken to be the centers of the radial-basis
    functions.
  • Assume that d describes the desired response
    vector and w is the linear weight vector. N is
    the size of the training set. Let ? denote an N x
    N matrix with elements
  • ?ij ?(xj-xi) , (j,i)1,2,..,N
  • ? is called the interpolation matrix.

RBF Model
30
Radial Basis Function Model VI
  • Thus according to the above theorem we can write
  • w x
  • The solution for the weight vector is
  • W ?-1x
  • Assuming that ? is non-singular. The Micchellis
    Theorem provides assurances for a set of
    functions that create non-singular matrix ?
  • Let xii1N be a set of distinct points in Rm0 .
    Then the N x N interpolation matrix ? is
    nonsingular.

RBF Model
31
Radial Basis Function Model VII
  • Functions that are covered by Micchallis theorem
    include
  • Multiquadrics
  • ?(r)(r2 c2)½ cgt0, r ?R
  • Inverse Multiquadrics
  • ?(r)1/(r2 c2)½ cgt0, r ?R
  • Gaussian functions
  • ?(r)exp(-r2/2?2) ?gt0, r ?R
  • All that is required for nonsigular ? is that the
    points x be different.

RBF Model
32
Radial Basis Function Model VIII
  • Universal Approximation Theorem for RBF Networks
  • For any continuous input-output mapping function
    f(x) there is an RBF network with a set of
    centers tii1m1 and a common width ?gt0 such
    that the input-output mapping function F(x)
    realized by the RBF network is close to f(x) in
    the Lp norm, p ? 1,?.
  • The RBF network is consisting of functions F Rm0
    ? R represented by

RBF Model
33
Radial Basis Function Model IX
  • Results on Sample Complexity, Computational
    Complexity and Generalisation Performance for RBF
    Networks
  • The generalisation error converges to zero only
    if the number of hidden units m1, increases more
    slowly than the size N of the training sample
  • For a given size N of training sample, the
    optimum number of hidden units, m1 , behaves as
  • m1 ?
    N1/3
  • The RBF network exhibits a rate of approximation
    O (1/ m1) that is similar to that of an MLP with
    sigmoid activation functions.

RBF Model
34
Radial Basis Function Model X
  • Comparison of MLP and RBF networks
  • An RBF network has a single hidden layer. An MLP
    has one or more hidden layers
  • Typically the nodes of an MLP in a hidden or
    output layer share the same neuronal model. On
    the other hand the nodes of an RBF in a hidden
    layer play a different role than those in the
    output layer
  • The hidden layer of an RBF is non-linear. The
    output layer is linear. Typically in an MLP both
    layers are nonlinear

RBF Model
35
Radial Basis Function Model XI
  1. An RBF network computes as argument of its
    activation function the Euclidean norm of the
    input vector and the center of the unit. In MLP
    networks the activation function computes the
    inner product of the input vector and the weight
    vector of the node
  2. MLPs are global approximators RBFs are local
    approximators due to the localised decaying
    Gaussian (or other) function.

RBF Model
36
Learning Law for Radial Basis Networks
  • To develop a learning law for RBF networks we
    assume that the error function has the following
    form
  • Where N is the size of the training sample used
    to do the learning, and ej is the error signal
    defined by

RBF Model
37
Learning Law for Radial Basis Networks II
  • We need to find the free parameters wi, ti and
    ?-1 so as to minimise E. Ci is a norm weighting
    matrix, i.e.
  • xC2 (Cx)T(Cx)xCTCx
  • We use a weighted norm matrix when the individual
    elements of x belong to different classes.
  • To calculate the update equations we use gradient
    descent on the instantaneous error function E. We
    get the following update rules for the free
    parameters

RBF Model
38
Learning Law for Radial Basis Networks III
  • Linear weights (output layer)

  • i1,2,,m1
  • Positions of centers (hidden layer)

  • i1,2,,m1

RBF Model
39
Learning Law for Radial Basis Networks IV
  • Spreads of centers (hidden layer)
  • Note that three different learning rates ?1, ?2,
    ?3 are used in the gradient descent equations.


RBF Model
40
Conclusions
  • In time series modelling we seek to extract the
    maximum possible structure we can find in the
    series.
  • We terminate the analysis of a series when the
    residuals do not contain any more structure, i.e.
    they have an IID structure.
  • NN can be used as models in time series
    prediction.
  • RBF networks are a second paradigm of multi layer
    perceptrons.
  • They are inspired by interpolation theory
    (numerical analysis)
  • They can be trained with the gradient descent
    method, the same as the MLP case.

Conclusions
Write a Comment
User Comments (0)
About PowerShow.com