Radial Basis Function Networks - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Radial Basis Function Networks

Description:

Exact Interpolation. The idea of RBFNN is that we interpolate' the ... To illustrate this idea, we consider a special case of exact interpolation, in which the ... – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 39
Provided by: AndyPhil8
Category:

less

Transcript and Presenter's Notes

Title: Radial Basis Function Networks


1
Radial Basis Function Networks
2
Why network models beyond MLN?
  • MLN is already universal, but
  • MLN can have many local minimums.
  • It is often to slow to train MLN.
  • Sometimes, it is extremely difficult to optimize
    the structure of MLN.
  • There may exist other network architectures

3
The idea of RBFNN (1)
  • MLN is one way to get non-linearity. The other is
    to use
  • the generalized linear discriminate function,
  • For Radial Basis Function (RBF), the basis
    function is radial
  • symmetry with respect to input, whose value is
    determined by the
  • distance from the data point to the RBF center.
  • For instance, the Gaussian RBF

4
The idea of RBFNN (2)
  • For RBFNN, we expect that the function to be
    learned can be
  • expressed as a linear superposition of a number
    of RBFs.

The function is described as a linear
superposition of three basis functions.
5
The RBFNN (1)
y
  • RBFNN a two-layer network

w
f
RBF distance
x
  • Free parameters
  • --the network weights w in the 2nd layer
  • --the form of basis functions
  • --the number of basis functions
  • --the location of basis functions.
  • E.g. for Gaussian RBFNN, they are the number,
    the centers and the widths
  • of basis functions

6
The RBFNN (2)
  • Universal approximation for Gaussian RBFNN, it
    is capable to

  • approximate any function.
  • The type of basis functions

localized
Non-localized
7
Exact Interpolation
  • The idea of RBFNN is that we interpolate the
    target function by using the sum of a
  • number of basis functions.
  • To illustrate this idea, we consider a special
    case of exact interpolation, in which the
  • number of basis functions M is equal to the
    number of data points N (MN) and all
  • basis functions are centered at data points. We
    want the target values are exactly
  • interpolated by the summation of basis
    functions, i.e,
  • Since MN, F is a square matrix and is
    non-singular for general cases, the result is

8
An example of exact interpolation
  • For Gaussian RBF (1D input)
  • 21 data points are generated by ysin(px) plus
    noise (strength0.2)

The target data points are indeed exactly
interpolated, but the generalization
performance is not good.
9
Beyond exact interpolation
  • The number of basis functions need not to be
    equal to the number
  • data points. Actually, in a typical situation, M
    should be much
  • less than N.
  • The centers of basis functions are no longer
    constrained to be
  • at the input data points. Instead, the
    determination of centers
  • becomes part of the training process.
  • Instead of having a common width parameter s,
    each basis
  • function can has its own width, which is also to
    be determined
  • by learning.

10
An example of RBFNN
RBFNN, 4 basis functions, s0.4
Exact interpolation, s0.1
11
An example of regularization
Exact interpolation, s0.1
With weight decay regularization, n2.5
12
The hybrid training procedure
  • Unsupervised learning in the first layer. This is
    to fix the basis
  • functions by only using the knowledge of input
    data. For Gaussian
  • RBF, it often includes to decide the number,
    locations and the
  • width of RBF.
  • Supervised learning in the second layer. This is
    to determine the
  • network weights in the second layer. If we
    choose the sum-of-square
  • error, it becomes a quadratic function
    optimization, which is easy
  • to solve.
  • In summary, the hybrid training avoid to use
    supervised learning
  • simultaneously in two layers, and greatly
    simplify the computational
  • cost.

13
Basis function optimization
  • The form of basis function is predefined, and is
    often chosen to be
  • Gaussian.
  • The number of basis function has often to be
    determined by trials,
  • e.g, though monitoring the generalization
    performance.
  • The key issue in unsupervised learning is to
    determine the locations
  • and the widths of basis functions.

14
Algorithms for basis function optimization
  • Subsets of data points.
  • To randomly select a number of input data points
    as basis functions centers.
  • The width can be chosen to be equal and to be
    given by some multiple of the
  • average distance between the basis function
    centers.
  • Gaussian mixture models.
  • The choice of basis functions is essentially to
    model the density distribution of
  • input data (intuitively we want the centers of
    basis functions to be at high density
  • regions). We may assume input data is generated
    by a mixture of Gaussian
  • distribution. Optimizing the probability density
    model returns the basis function
  • centers and widths.
  • Clustering algorithms.
  • In this approach the input data is assumed to
    consist of a number of clusters.
  • Each cluster corresponds to one basis function,
    with the center being the
  • basis function center. The width can be set to
    be equal to some multiple of the
  • average distance between all centers.

15
K-means clustering algorithm (1)
  • The algorithm partitions data points into K
    disjoint subsets (K is predefined).
  • The clustering criteria are
  • -the cluster centers are set in the high
    density regions of data
  • -a data point is assigned to the cluster with
    which it has the minimum distance to
  • the center
  • Mathematically, this is equivalent to minimizing
    the sum-of-square
  • clustering function,

16
K-means clustering algorithm (2)
  • The algorithm

Step 1 Initially randomly assign data points to
one of K clusters. Each data points
will then have a cluster label. Step 2
Calculate the mean of each cluster C. Step
3Check whether each data pointed has the right
cluster label. For each data point,
calculate its distances to all K centers. If the
minimum distance is not the value for
this data point to its cluster
center, the cluster identity of this data point
will then be updated to the one that
gives the minimum distance. Step 4 After each
epoch checking (one turn for all data points),
if no updating occurs, i.e, J reaches
the minimum value, then stop.
Otherwise, go back to step 2.
17
An example of data clustering
Before clustering
After clustering
18
The network training
  • The network output after clustering
  • The sum-of-square error

which can be easily solved.
19
An example of time series prediction
  • We will show an example of using RBFNN for time
    series prediction.
  • Time series prediction to predict the system
    behavior based on its history.
  • Suppose the time course of a system is denoted as
  • S(1),S(2),S(n), where S(n) is the system
    state at time step n.
  • the task is to predict the system behavior at
    n1 based on the knowledge of
  • its history. i.e., S(n),S(n-1),S(n-2),. This
    is possible for many problems
  • in which system states are correlated over
    time.
  • Consider a simple example, the logistic map, in
    which the system state x
  • is updated iteratively according to

Our task is to predict the value of x at any step
based on its values in the previous two steps,
i.e., to estimate based on and
xn-2
xn-1
xn
20
Generating training data from the logistic map
  • The logistic map, though is simple, shows many
    interesting behaviors.
  • (more detail can be found at

http//mathworld.wolfram.com/LogisticMap.html)
  • The data collecting process
  • --Choose r4, and the initial value of x to be
    0.3
  • --Iterate the logistic map 500 steps, and
    collect 100 examples from the last
  • 100 iterations (chopping data into triplets,
    each triplet gives one
  • input-output pair).

The input data space
The time course of the system state
21
Clustering the input data
  • We cluster the input data by using the K-means
    clustering algorithm.
  • We choose K4. The clustering result returns the
    centers of basis functions
  • and the scale of width.
  • An typical example

22
The training result of RBFNN
23
The training result of RBFNN
24
(No Transcript)
25
The time series predicted
26
Comparison with multi-layer perceptron (1)
  • RBF
  • Simple structure one hidden layer,
  • linear combination at the output layer
  • Simple training the hybrid training
  • clustering the quadratic error function
  • Localized representation the input space
  • is covered by a number of localized
  • basis functions. A given input typically
  • only activate significantly a limited
  • number of hidden units (those are within
  • a close distance)
  • MLP
  • Complicated structure often many
  • layers and many hidden units
  • Complicated training optimizing
  • multiple layer together, local minimum
  • and slow convergence.
  • Distributed representation for a given
  • input, typically many hidden units will
  • be activated.

27
Comparison with MLP (2)
  • Different ways of interpolating data




RBF data are classified according to clusters
MLP data are classified by hyper-planes.
28
Shortcomings of RBFNN
  • Unsupervised learning implies that RBFNN may only
    achieve
  • sub-optimal solution, since the training of
    basis functions does not
  • consider the information of the output
    distribution.

Example a basis function is chosen based only on
the density of input data, which
gives p(x). It does not match the real output
function h(x).
29
Shortcomings of RBFNN
Example the output function is only determined
by one input component, the other
component is irrelevant. Due to unsupervised,
RBFNN is unable to detect this
irrelevant component, whereas, MLP may do (the
network weights connected to
irrelevant components will tend to have small
values).
30
The code for exact interpolation
Interpolation data with basis
functions clear width0.1
the width of Gaussian RBF for exact interpolation
width0.4 the width of
Gaussian RBF for a simple RBFNN x-10.11
the input data points Nsize(x,2)
the number of data
points fsin(pix) the clean
target value tf0.2randn(1,N) the
noise target value MN
the number of basis functions for exact
interpolation M4 the
number of basis functions for a simple RBFNN
czeros(1,M) initializing the
RBF centers basis_matrixzeros(M,N)
initializeing the RBF matrix, \phi_jn for j1M
c(j)x(j) for exact
interpolation centerround(rand(1)201)
for RBFNN c(j)x(center) for
RBFNN basis_matrix(j,)exp(-0.5(x-c(j)).(x-c(
j))./(widthwidth)) the value of basis
functions end v2.5eye(M) the
weight_decay regularization matrix
wt/(basis_matrixv) including
regularization wt/basis_matrix
without regularization x_test-10.011
the testing data for j1M
test_matrix(j,)exp(-0.5(x_test-c(j)).(x_test-c
(j))./(widthwidth)) plot(x_test,w(j)test_
matrix(j,)) plot the weighted basis
functions hold on end ywtest_matrix
the learned function plot(x,f,'k-',x,t,
'ko',x_test,y) plotting the result
31
The code for clustering
  • Initialization

K-means clustering algorithm clear K3
the number of clusters load cluster_data
loading the input data Nsize(cluster_data,1)
the number of input data Msize(cluster_data,2)
the dimension of input data cluster_labelzero
s(N,1) the variable that stores cluster
labels centerzeros(K,M) the variable
that stores cluster centers N_clusterzeros(K)
the number of data points in each
cluster diszeros(N,K) the variable
that stores the distances Step 1 Initially
random assign cluster labels cluster_labelround(r
and(N,1)K0.5)
32
  • The main part

stop_criterion0 while stop_criterionlt1
Step 2 centerzeros(K,M) reset the
initial value of center to be zero
N_clusterzeros(K,1) reset the initial value
of number of data points in each cluster for
n1N the summation
center(cluster_label(n),)center(cluster_label(n)
,)cluster_data(n,) N_cluster(cluster_l
abel(n))N_cluster(cluster_label(n))1 end
for j1K the average
center(j,)center(j,)/N_cluster(j) end

33
Step 3 update_count0 a number
recording how many updates happened for n1N
reassign each data point
minimum1000 initial setting a large minimum
value for j1K computing the
distances to each cluster
dis(n,j)(cluster_data(n,)-center(j,))(cluster_
data(n,)-center(j,))' if
dis(n,j)ltminimum recording the cluster label
for minimum distance
minimumdis(n,j) minimum_labelj
end end if abs(minimum_label-clu
ster_label(n))gt0 Checking whether updating is
needed cluster_label(n)minimum_label
update_countupdate_count1
end end update_count
printing out to monitoring the convergence
Step 4 if update_count0 Checking
whether stopping is needed
stop_criterion1 end end
34
  • Calculate the scale of RBF width

Determining the scale of the width of
RBF mean_distance0 for j1K-1 for ij1K
mean_distancemean_distancesqrt((center(j,
)-center(i,))(center(j,)-center(i,))')
end end mean_distancemean_distance/(K-1)/K2
35
  • Print out the results

plotting the data (only for 3 clusters
cases) for n1N if cluster_label(n)1
plot(cluster_data(n,1),cluster_data(n,2),'bo')
hold on elseif cluster_label(n)2
plot(cluster_data(n,1),cluster_data(n,2),'k')
hold on else plot(cluster_data(n,
1),cluster_data(n,2),'g') hold on
end end for j1K plot(center(j,1),center(j,
2),'ro','MarkerSize',18) hold on end hold
off
36
The code for generating logistic data
Generating data according to logistic
map N100 the number of data
points N_step500 the total number of
iterations cluster_datazeros(N,2) storing
the input data target_datazeros(N,1)
storing the target data x_old0.3 the
initial value count0 counting the number
of data points being recorded for n1N_step
x_new4x_old(1-x_old) the logistic map
if ngtN_step-N countcount1
cluster_data(count,)x_old x_new
target_data(count)4x_new(1-x_new) end
x_oldx_new end plot(cluster_data(,1),clust
er_data(,2),'o') plot(target_data,'-o') save
cluster_data save target_data
37
The code for RBFNN (1)
The RBFNN training algorithm load cluster_data
the input data Nsize(cluster_data,1) the
number of input data points load target_data
the target value load center the RBF
centers from the clustering result Ksize(center,1
) the number of centers
K1 is the number of hidden units (including
bias) load mean_distance the rough scale of
RBF width width1mean_distance set the RBF
width just equal to the mean_distance The
training process for RBFNN basis_matrixzeros(K1,
N) initializing the RBF matrix,
\phi_jn for j1K for n1N
basis_matrix(j,n)exp(-0.5(cluster_data(n,)-cent
er(j,))... (cluster_data(n,)-center(j,
))'/(widthwidth)) end
end basis_matrix(K1,)ones(1,N) adding the
bias term wbasis_matrix'\target_data the
solution of the sum of square error training_resul
tbasis_matrix'w the training result
38
The code for RBFNN (2)
Print out the results plot(cluster_data(,1),
training_result,'',cluster_data(,1),target_data,
'o') plot(cluster_data(,2),training_result,''
,cluster_data(,2),target_data,'o')
plot(training_result,'-') hold on
plot(target_data,'-o')
Write a Comment
User Comments (0)
About PowerShow.com