Blind Source Separation by Independent Components Analysis - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Blind Source Separation by Independent Components Analysis

Description:

Super-Gaussians: kurtosis (fourth order central moment, measures the flatness of the pdf) 0. ... Sub-Gaussians kurtosis 0 signals mainly 'on', e.g. 50/60 ... – PowerPoint PPT presentation

Number of Views:547

Avg rating:3.0/5.0

Slides: 40

Provided by: Jer71

Category:

more less

Transcript and Presenter's Notes

Title: Blind Source Separation by Independent Components Analysis

1
Blind Source Separation by Independent Components
Analysis

Professor Dr. Barrie W. Jervis
School of Engineering
Sheffield Hallam University
England
B.W.Jervis_at_shu.ac.uk

2
The Problem

Temporally independent unknown source signals are
linearly mixed in an unknown system to produce a
set of measured output signals.
It is required to determine the source signals.

Methods of solving this problem are known as
Blind Source Separation (BSS) techniques.
In this presentation the method of Independent
Components Analysis (ICA) will be described.
The arrangement is illustrated in the next slide.

4
Arrangement for BSS by ICA
y1g1(u1) y2g2(u2) yngn(un)
g(.)
s1 s2 sn
x1 x2 xn
u1 u2 un
A
W
5
Neural Network Interpretation

The si are the independent source signals,
A is the linear mixing matrix,
The xi are the measured signals,
W ? A-1 is the estimated unmixing matrix,
The ui are the estimated source signals or
activations, i.e. ui ?? si,
The gi(ui) are monotonic nonlinear functions
(sigmoids, hyperbolic tangents),
The yi are the network outputs.

6
Principles of Neural Network Approach

Use Information Theory to derive an algorithm
which minimises the mutual information between
the outputs yg(u).
This minimises the mutual information between the
source signal estimates, u, since g(u) introduces
no dependencies.
The different u are then temporally independent
and are the estimated source signals.

7
Cautions I

The magnitudes and signs of the estimated source
signals are unreliable since
the magnitudes are not scaled
the signs are undefined
because magnitude and sign information is shared
between the source signal vector and the unmixing
matrix, W.
The order of the outputs is permutated compared
wiith the inputs

8
Cautions II

Similar overlapping source signals may not be
properly extracted.
If the number of output channels ? number of
source signals, those source signals of lowest
variance will not be extracted. This is a problem
when these signals are important.

9
Information Theory I

If X is a vector of variables (messages) xi which
occur with probabilities P(xi), then the average
information content of a stream of N messages is

bits
and is known as the entropy of the random
variable, X.
10
Information Theory II

Note that the entropy is expressible in terms of
probability.
Given the probability density distribution (pdf)
of X we can find the associated entropy.
This link between entropy and pdf is of the
greatest importance in ICA theory.

11
Information Theory III

The joint entropy between two random variables X
and Y is given by

For independent variables

12
Information Theory IV

The conditional entropy of Y given X measures the
average uncertainty remaining about y when x is
known, and is

The mutual information between Y and X is

In ICA, X represents the measured signals, which
are applied to the nonlinear function g(u) to
obtain the outputs Y.

13
Bell and Sejnowskis ICA Theory (1995)

Aim to maximise the amount of mutual information
between the inputs X and the outputs Y of the
neural network.

(Uncertainty about Y when X is unknown)

Y is a function of W and g(u).
Here we seek to determine the W which
produces the ui ? si, assuming the correct g(u).

14
Differentiating
(0, since it did not come through W from X.) So,
maximising this mutual information is equivalent
to maximising the joint output entropy,
which is seen to be equivalent to minimising the
mutual information between the outputs and hence
the ui, as desired.
15
The Functions g(u)

The outputs yi are amplitude bounded random
variables, and so the marginal entropies H(yi)
are maximum when the yi are uniformly distributed
- a known statistical result.
With the H(yi) maximised, I(Y,X) 0, and the yi
uniformly distributed, the nonlinearity gi(ui)
has the form of the cumulative distribution
function of the probability density function of
the si, - a proven result.

16
Pause and review g(u) and W

W has to be chosen to maximise the joint output
entropy H(Y,X), which minimises the mutual
information between the estimated source signals,
ui.
The g(u) should be the cumulative distribution
functions of the source signals, si.
Determining the g(u) is a major problem.

17
One input and one output

For a monotonic nonlinear function, g(x),

Also

Substituting

(independent of W)
(we only need to maximise this term)
18

A stochastic gradient ascent learning rule is
adopted to maximise H(y) by assuming

Further progress requires knowledge of g(u).
Assume for now, after Bell and Sejnowski, that
g(u) is sigmoidal, i.e.

Also assume

19
Learning Rule 1 input, 1 output
Hence, we find
20
Learning Rule N inputs, N outputs

Need

Assuming g(u) is sigmoidal again, we obtain

The network is trained until the changes in the
weights become acceptably small at each
iteration.
Thus the unmixing matrix W is found.

22
The Natural Gradient

The computation of the inverse matrix

is time-consuming, and may be avoided by
rescaling the entropy gradient by multiplying it
by

Thus, for a sigmoidal g(u) we obtain

This is the natural gradient, introduced by
Amari (1998), and now widely adopted.

23
The nonlinearity, g(u)

We have already learnt that the g(u) should be
the cumulative probability densities of the
individual source distributions.
So far the g(u) have been assumed to be
sigmoidal, so what are the pdfs of the si?
The corresponding pdfs of the si are
super-Gaussian.

24
Super- and sub-Gaussian pdfs
Gaussian
Super-Gaussian
Sub-Gaussian

Note there are no mathematical definitions of
super- and sub-Gaussians

25
Super- and sub-Gaussians

Super-Gaussians kurtosis (fourth order
central moment, measures the flatness of the
pdf) gt 0. infrequent signals of short duration,
e.g. evoked brain signals.
Sub-Gaussians kurtosis lt 0 signals
mainly on, e.g. 50/60 Hz electrical mains
supply, but also eye blinks.

26
Kurtosis

Kurtosis 4th order central moment

and is seen to be calculated from the current
estimates of the source signals.
To separate the independent sources, information
about their pdfs such as skewness (3rd. moment)
and flatness (kurtosis) is required.
First and 2nd. moments (mean and variance) are
insufficient.

27
A more generalised learning rule

Girolami (1997) showed that tanh(ui) and
-tanh(ui) could be used for super- and
sub-Gaussians respectively.
Cardoso and Laheld (1996) developed a stability
analysis to determine whether the source signals
were to be considered super- or sub-Gaussian.
Lee, Girolami, and Sejnowski (1998) applied these
findings to develop their extended infomax
algorithm for super- and sub-Gaussians using a
kurtosis-based switching rule.

28
Extended Infomax Learning Rule

With super-Gaussians modelled as

and sub-Gaussians as a Pearson mixture model
the new extended learning rule is
29
Switching Decision
and the ki are the elements of the N-dimensional
diagonal matrix, K, and

Modifications of the formula for ki exist, but
in our experience the extended algorithm has been
unsatisfactory.

30
Reasons for unsatisfactory extended algorithm

1) Initial assumptions about super- and
sub-Gaussian distributions may be too inaccurate.
2) The switching criterion may be inadequate.

Alternatives

Postulate vague distributions for the source
signals which are then developed iteratively
during training.
Use an alternative approach, e.g, statistically
based, JADE (Cardoso).

31
Summary so far

We have seen how W may be obtained by training
the network, and the extended algorithm for
switching between super- and sub-Gaussians has
been described.
Alternative approaches have been mentioned.
Next we consider how to obtain the source signals
knowing W and the measured signals, x.

32
Source signal determination

The system is

Mixing matrix A
Unmixing matrix W
g(u)
si unknown
xi measured
ui?si estimated
yi

Hence UW.x and xA.S where A?W-1, and U?S.
The rows of U are the estimated source signals,
known as activations (as functions of time).
The rows of x are the time-varying measured
signals.

33
Source Signals
Channel number
Time, or sample number
34
Expressions for the Activations

We see that consecutive values of u are obtained
by filtering consecutive columns of x by the same
row of W.

The ith row of u is the ith row of w by the
columns of x.

35
Procedure

Record N time points from each of M sensors,
where N ? 5M.
Pre-process the data, e.g. filtering, trend
removal.
Sphere the data using Principal Components
Analysis (PCA). This is not essential but speeds
up the computation by first removing first and
second order moments.
Compute the ui ? si. Include desphering.
Analyse the results.

36
Optional Procedures I

The contribution of each activation at a sensor
may be found by back-projecting it to the
sensor.

37
Optional Procedures II

A measured signal which is contaminated by
artefacts or noise may be extracted by
back-projecting all the signal activations to
the measurement electrode, setting other
activations to zero. (An artefact and noise
removal method).

38
Current Developments

Overcomplete representations - more signal
sources than sensors.
Nonlinear mixing.
Nonstationary sources.
General formulation of g(u).

39
Conclusions

It has been shown how to extract temporally
independent unknown source signals from their
linear mixtures at the outputs of an unknown
system using Independent Components Analysis.
Some of the limitations of the method have been
mentioned.
Current developments have been highlighted.

Write a Comment

User Comments (0)