Blind Source Separation by Independent Components Analysis - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Blind Source Separation by Independent Components Analysis

Description:

Super-Gaussians: kurtosis (fourth order central moment, measures the flatness of the pdf) 0. ... Sub-Gaussians kurtosis 0 signals mainly 'on', e.g. 50/60 ... – PowerPoint PPT presentation

Number of Views:547
Avg rating:3.0/5.0
Slides: 40
Provided by: Jer71
Category:

less

Transcript and Presenter's Notes

Title: Blind Source Separation by Independent Components Analysis


1
Blind Source Separation by Independent Components
Analysis
  • Professor Dr. Barrie W. Jervis
  • School of Engineering
  • Sheffield Hallam University
  • England
  • B.W.Jervis_at_shu.ac.uk

2
The Problem
  • Temporally independent unknown source signals are
    linearly mixed in an unknown system to produce a
    set of measured output signals.
  • It is required to determine the source signals.

3
  • Methods of solving this problem are known as
    Blind Source Separation (BSS) techniques.
  • In this presentation the method of Independent
    Components Analysis (ICA) will be described.
  • The arrangement is illustrated in the next slide.

4
Arrangement for BSS by ICA
y1g1(u1) y2g2(u2) yngn(un)
g(.)
s1 s2 sn
x1 x2 xn
u1 u2 un
A
W
5
Neural Network Interpretation
  • The si are the independent source signals,
  • A is the linear mixing matrix,
  • The xi are the measured signals,
  • W ? A-1 is the estimated unmixing matrix,
  • The ui are the estimated source signals or
    activations, i.e. ui ?? si,
  • The gi(ui) are monotonic nonlinear functions
    (sigmoids, hyperbolic tangents),
  • The yi are the network outputs.

6
Principles of Neural Network Approach
  • Use Information Theory to derive an algorithm
    which minimises the mutual information between
    the outputs yg(u).
  • This minimises the mutual information between the
    source signal estimates, u, since g(u) introduces
    no dependencies.
  • The different u are then temporally independent
    and are the estimated source signals.

7
Cautions I
  • The magnitudes and signs of the estimated source
    signals are unreliable since
  • the magnitudes are not scaled
  • the signs are undefined
  • because magnitude and sign information is shared
    between the source signal vector and the unmixing
    matrix, W.
  • The order of the outputs is permutated compared
    wiith the inputs

8
Cautions II
  • Similar overlapping source signals may not be
    properly extracted.
  • If the number of output channels ? number of
    source signals, those source signals of lowest
    variance will not be extracted. This is a problem
    when these signals are important.

9
Information Theory I
  • If X is a vector of variables (messages) xi which
    occur with probabilities P(xi), then the average
    information content of a stream of N messages is

bits
and is known as the entropy of the random
variable, X.
10
Information Theory II
  • Note that the entropy is expressible in terms of
    probability.
  • Given the probability density distribution (pdf)
    of X we can find the associated entropy.
  • This link between entropy and pdf is of the
    greatest importance in ICA theory.

11
Information Theory III
  • The joint entropy between two random variables X
    and Y is given by
  • For independent variables

12
Information Theory IV
  • The conditional entropy of Y given X measures the
    average uncertainty remaining about y when x is
    known, and is
  • The mutual information between Y and X is
  • In ICA, X represents the measured signals, which
    are applied to the nonlinear function g(u) to
    obtain the outputs Y.

13
Bell and Sejnowskis ICA Theory (1995)
  • Aim to maximise the amount of mutual information
    between the inputs X and the outputs Y of the
    neural network.

(Uncertainty about Y when X is unknown)
  • Y is a function of W and g(u).
  • Here we seek to determine the W which
    produces the ui ? si, assuming the correct g(u).

14
Differentiating
(0, since it did not come through W from X.) So,
maximising this mutual information is equivalent
to maximising the joint output entropy,
which is seen to be equivalent to minimising the
mutual information between the outputs and hence
the ui, as desired.
15
The Functions g(u)
  • The outputs yi are amplitude bounded random
    variables, and so the marginal entropies H(yi)
    are maximum when the yi are uniformly distributed
    - a known statistical result.
  • With the H(yi) maximised, I(Y,X) 0, and the yi
    uniformly distributed, the nonlinearity gi(ui)
    has the form of the cumulative distribution
    function of the probability density function of
    the si, - a proven result.

16
Pause and review g(u) and W
  • W has to be chosen to maximise the joint output
    entropy H(Y,X), which minimises the mutual
    information between the estimated source signals,
    ui.
  • The g(u) should be the cumulative distribution
    functions of the source signals, si.
  • Determining the g(u) is a major problem.

17
One input and one output
  • For a monotonic nonlinear function, g(x),
  • Also
  • Substituting

(independent of W)
(we only need to maximise this term)
18
  • A stochastic gradient ascent learning rule is
    adopted to maximise H(y) by assuming
  • Further progress requires knowledge of g(u).
    Assume for now, after Bell and Sejnowski, that
    g(u) is sigmoidal, i.e.
  • Also assume

19
Learning Rule 1 input, 1 output
Hence, we find
20
Learning Rule N inputs, N outputs
  • Need
  • Assuming g(u) is sigmoidal again, we obtain

21
  • The network is trained until the changes in the
    weights become acceptably small at each
    iteration.
  • Thus the unmixing matrix W is found.

22
The Natural Gradient
  • The computation of the inverse matrix

is time-consuming, and may be avoided by
rescaling the entropy gradient by multiplying it
by
  • Thus, for a sigmoidal g(u) we obtain
  • This is the natural gradient, introduced by
    Amari (1998), and now widely adopted.

23
The nonlinearity, g(u)
  • We have already learnt that the g(u) should be
    the cumulative probability densities of the
    individual source distributions.
  • So far the g(u) have been assumed to be
    sigmoidal, so what are the pdfs of the si?
  • The corresponding pdfs of the si are
    super-Gaussian.

24
Super- and sub-Gaussian pdfs
Gaussian
Super-Gaussian
Sub-Gaussian
  • Note there are no mathematical definitions of
    super- and sub-Gaussians

25
Super- and sub-Gaussians
  • Super-Gaussians kurtosis (fourth order
    central moment, measures the flatness of the
    pdf) gt 0. infrequent signals of short duration,
    e.g. evoked brain signals.
  • Sub-Gaussians kurtosis lt 0 signals
    mainly on, e.g. 50/60 Hz electrical mains
    supply, but also eye blinks.

26
Kurtosis
  • Kurtosis 4th order central moment
  • and is seen to be calculated from the current
    estimates of the source signals.
  • To separate the independent sources, information
    about their pdfs such as skewness (3rd. moment)
    and flatness (kurtosis) is required.
  • First and 2nd. moments (mean and variance) are
    insufficient.

27
A more generalised learning rule
  • Girolami (1997) showed that tanh(ui) and
    -tanh(ui) could be used for super- and
    sub-Gaussians respectively.
  • Cardoso and Laheld (1996) developed a stability
    analysis to determine whether the source signals
    were to be considered super- or sub-Gaussian.
  • Lee, Girolami, and Sejnowski (1998) applied these
    findings to develop their extended infomax
    algorithm for super- and sub-Gaussians using a
    kurtosis-based switching rule.

28
Extended Infomax Learning Rule
  • With super-Gaussians modelled as

and sub-Gaussians as a Pearson mixture model
the new extended learning rule is
29
Switching Decision
and the ki are the elements of the N-dimensional
diagonal matrix, K, and
  • Modifications of the formula for ki exist, but
    in our experience the extended algorithm has been
    unsatisfactory.

30
Reasons for unsatisfactory extended algorithm
  • 1) Initial assumptions about super- and
    sub-Gaussian distributions may be too inaccurate.
  • 2) The switching criterion may be inadequate.

Alternatives
  • Postulate vague distributions for the source
    signals which are then developed iteratively
    during training.
  • Use an alternative approach, e.g, statistically
    based, JADE (Cardoso).

31
Summary so far
  • We have seen how W may be obtained by training
    the network, and the extended algorithm for
    switching between super- and sub-Gaussians has
    been described.
  • Alternative approaches have been mentioned.
  • Next we consider how to obtain the source signals
    knowing W and the measured signals, x.

32
Source signal determination
  • The system is

Mixing matrix A
Unmixing matrix W
g(u)
si unknown
xi measured
ui?si estimated
yi
  • Hence UW.x and xA.S where A?W-1, and U?S.
  • The rows of U are the estimated source signals,
    known as activations (as functions of time).
  • The rows of x are the time-varying measured
    signals.

33
Source Signals
Channel number
Time, or sample number
34
Expressions for the Activations
  • We see that consecutive values of u are obtained
    by filtering consecutive columns of x by the same
    row of W.
  • The ith row of u is the ith row of w by the
    columns of x.

35
Procedure
  • Record N time points from each of M sensors,
    where N ? 5M.
  • Pre-process the data, e.g. filtering, trend
    removal.
  • Sphere the data using Principal Components
    Analysis (PCA). This is not essential but speeds
    up the computation by first removing first and
    second order moments.
  • Compute the ui ? si. Include desphering.
  • Analyse the results.

36
Optional Procedures I
  • The contribution of each activation at a sensor
    may be found by back-projecting it to the
    sensor.

37
Optional Procedures II
  • A measured signal which is contaminated by
    artefacts or noise may be extracted by
    back-projecting all the signal activations to
    the measurement electrode, setting other
    activations to zero. (An artefact and noise
    removal method).

38
Current Developments
  • Overcomplete representations - more signal
    sources than sensors.
  • Nonlinear mixing.
  • Nonstationary sources.
  • General formulation of g(u).

39
Conclusions
  • It has been shown how to extract temporally
    independent unknown source signals from their
    linear mixtures at the outputs of an unknown
    system using Independent Components Analysis.
  • Some of the limitations of the method have been
    mentioned.
  • Current developments have been highlighted.
Write a Comment
User Comments (0)
About PowerShow.com