Independent Component Analysis - PowerPoint PPT Presentation

1 / 73

About This Presentation

Title:

Independent Component Analysis

Description:

Nongaussianity Measurement Kurtosis. ICA By Maximization of Nongaussianity. Gradient and FastICA Algorithms Using Kurtosis. Measuring Nongaussianity by Negentropy ... – PowerPoint PPT presentation

Number of Views:778

Avg rating:3.0/5.0

Slides: 74

Provided by: taiwe

Category:

more less

Transcript and Presenter's Notes

Title: Independent Component Analysis

1
Independent Component Analysis
??????
2
Content

What is ICA?
Nongaussianity Measurement Kurtosis
ICA By Maximization of Nongaussianity
Gradient and FastICA Algorithms Using Kurtosis
Measuring Nongaussianity by Negentropy
FastICA Using Negentrophy

3
Independent Component Analysis
What is ICA?
4
Motivation

Example three people are speaking simultaneously
in a room that has three microphones.
Denote the microphone signals by x1(t), x2(t),
and x3(t).
They are mixtures of sources s1(t), s2(t), and
s3(t).
The goal is to estimate the original speech
signals using only the recorded signals.
This is called the cocktail-party problem.

5
The Cocktail-Party Problem
The original speech signals
The mixed speech signals
6
The Cocktail-Party Problem
The original speech signals
The estimated sources
7
The Problem

Find the sources s1(t), s2(t) and s3(t), and the
coefficients aijs from the observed signals
x1(t), x2(t), and x3(t).
It turns out that the problem can be solved just
by assuming that the sources si(t) are
nongaussian and statistically independent.

8
Applications

Cocktail party problem separation of voices or
music or sounds
Sensor array processing, e.g. radar
Biomedical signal processing with multiple
sensors EEG, ECG, MEG, fMRI
Telecommunications e.g. multiuser detection in
CDMA
Financial and other time series
Noise removal from signals and images
Feature extraction for images and signals
Brain modelling

9
Basic ICA Model
Mixing signals (observable)
10
The Basic Assumptions

The independent components are assumed
statistically independent.
The independent components must have nongaussian
distributions.
For simplicity, we assume that the unknown mixing
matrix A is square.

11
Assumption IStatistical Independence

Basically, random variables y1, y2, , yn are
said to be independent if information on the
value of yi does not give any information on the
value of yj for i ? j.
Mathematically, the joint pdf is factorizable in
the following way
p(y1, y2, , yn) p1(y1) p2(y2)pn(yn)
Note that uncorrelatedness does not necessary
imply independence.

12
Assumption IINongaussian Distributions

Note that in the basic model we do not have to
know what the nongaussian distributions of the
ICs look like.

13
Assumption IIIMixing Matrix is square

In other words, the number of independent
components is equal to the number of observed
mixtures.
This simplifies our discussion in the first
stage.
However, in the basic ICA model, this is no
restriction as long as originally the number of
observations xi is at least as large as the
number of sources sj.

14
Ambiguities of ICA

We cannot determine the variances (energies) of
ICs.
This also implies Ex0 (centering of x) and
sign of si is unimportant.
We cannot determine the order of ICs.

Therefore, we assume
where P is any permutation matrix.
15
Illustration of ICA
Mixing
16
Whitening Is Only Half of ICA
Whitening Matrix
whitening
17
Whitening Is Only Half of ICA
Uncorrelatedness is related to independence, but
is weaker than independence.
By whitening, we have EzzT I.
This, however, doesnt imply zis are
independent, i.e., we may have
18
Independent Component Analysis
Central limit theorem implicitly tells us that
the additive of components, makes the
distribution to become more Gaussian.
Therefore, nongaussianity is an important
criterion for ICA.
Degaussian is hence the central theme in ICA.
19
Independent Component Analysis
Nongaussianity Measurement Kurtosis
20
Moments
The jth moment
Mean
The jth central moment
Variance
Skewness
21
Moment Generating Function

The moment generating function MX(t) of a random
variable X is defined by
XN(?, ?2)
ZN(0, 1)

22
Standard Normal Distribution N(0, 1)
Zero for all odd moments
23
Kurtosis

Kurtosis of a zero-mean random variable X is
defined by
Normalize kurtosis

24
Gaussianity
25
Kurtosis for Supergaussian
Consider Laplacian Distribution
gt 0
26
Kurtosis for Supergaussian
Consider Laplacian Distribution
gt 0
27
Kurtosis for Subgassian
Consider Uniform Distribution
lt 0
28
Nongaussianity Measurement By Kurtosis

Kurtosis, or rather is absolute value, has been
widely used as a measure of nongaussianity in ICA
and related fields.
Computationally, kurtosis can be estimated simply
by using the 4th moment of the sample data (if
the variance is kept constant).

29
Properties of Kurtosis

Let X1 and X2 be two independent variables both
have zero mean.

30
Independent Component Analysis
ICA By Maximization of Nongaussianity
31
Restate the Problem
How?
Ultimate goal
32
Simplification
Ultimate goal
For simplicity, we assume sources are i.i.d.
To estimate an independent component by
whitening
If b is properly identified, qT bTA contains
only one nonzero entry with value one.
This implies that b will be one row of
identified, A?1.
33
Nongaussian Is Independent
Ultimate goal
For simplicity, we assume sources are i.i.d.
To estimate an independent component by
whitening
We will take b that maximizes the nongaussianity
of bTx.
34
Nongaussian Is Independent
Mixing
35
Nongaussian Is Independent
whitening
36
Nongaussian Is Independent
Additive of components becomes more Gaussian
37
Nongaussian Is Independent
Rotation
38
Nongaussian Is Independent
Estimated density
39
Nongaussian Is Independent
Consider to get one independent component.
bT
x
40
Nongaussian Is Independent
Consider to get one independent component.
Project the whitened data to a unit vector w to
get an independent component.
41
Nongaussian Is Independent
q2
q1
Using kurtosis as nongaussianity measurement.
We require that
The search space is
42
Independent Component Analysis
Gradient Algorithm Using Kurtosis
43
Criterion for ICA Using Kurtosis
maximize
Subject to
44
Gradient Algorithm
maximize
Subject to
unrelated
45
FastICA Algorithm
maximize
Subject to
At a stable point, the gradient must point in the
direction of w.
Using fixed-point interation, then
sign is not important
FastICA
46
Independent Component Analysis
Measuring Nongaussianity by Negentropy
47
Critique of Kurtosis

Kurtosis can be very sensitive to outliers.
Kurtosis may depend on only a few observations in
the tails of the distribution.
Not a robust measure of nongaussianity.

48
Negentropy
Differential Entropy
Negentropy Entropy
?0
Negentropy is zero only when the random variable
is Gaussian distributed.
It is invariant by a invertible linear
transformation.
49
Approximation of Negentropy (I)
For a zero mean and unit variance random variable.
Using approximation is helpless because it is
sensitive to outliers.
50
Approximation of Negentropy (II)
G1(x) ? odd
Choose two nonpolynomial functions
Measures the dimension of bimodality vs.
peak at zero
such that
G2(x) ? even
Measures the asymmetry
The first term is zero if the underlying density
is zero.
Usually, only the second term is used.
51
Approximation of Negentropy (II)
If only an even nonpolynomial function, say, G is
used, we have
The following two functions are useful
G3(x)x4
G1
G2
52
Degaussian
For ICA, we want to maximize this quantity.
Specifically, let z Vx be the whitened data.
For one-unit ICA, we want to find a rotation,
say, w to
maximize
subject to
53
Gradient Algorithm
Fact
constant
maximize
Algorithm
batch mode
On-line mode
54
Analysis
maximize
Consider the term inside the braces.
The functions Gs we used have the following
property
55
Analysis
Minimize EG(wTz) if IC is suppergaussian. Maximi
ze EG(wTz) if IC is subgaussian.
maximize
Consider the term inside the braces.
The functions Gs we used have the following
property
56
Analysis
g3
g1
g2
Both g1 and g2 are more insensitive on outliers
than g3.
57
Analysis
Controls the search direction. The sign is
dependent on the super/subgaussianity of samples
Nonlinearity g(wTt) is for weighting samples.
58
Stability Analysis

Assume that the input data follows the ICA model
with whiten data z VAs.
And, G is a sufficiently smooth even function.
Then, the local maxima (resp. minima) of
EG(wTz) under the constraint w 1 include
those rows of the inverse of the mixing matrix VA
such that the corresponding independent
components si satisify

59
Stability Analysis

Assume that the input data follows the ICA model
with whiten data z VAs.
And, G is a sufficiently smooth even function.
Then, the local maxima (resp. minima) of
EG(wTz) under the constraint w 1 include
those rows of the inverse of the mixing matrix VA
such that the corresponding independent
components si satisify

This condition is, in general, true for
reasonable choices of G.
60
Independent Component Analysis
FastICA Using Negentropy
61
Clue From Gradient Algorithm
Fixed-point iteration suggested
Nonpolynomial moments do not have the same nice
algebraic properties as kurtosis. Such a
iteration scheme is poor.
62
Newtons Method
Maximize or minimize
subject to
Construct the Lagrangian as follows
Newtons method finds an extreme point by
letting
63
Newtons Method
Evaluate the Hessian matrix and its inverse is
time consuming. We want to approximate it.
Newtons method finds an extreme point of the by
letting
64
Newtons Method
A diagonal matix
65
Newtons Method
A diagonal matix
66
FastICA
The algorithm
67
FastICA