Title: Independent Component Analysis
1Independent Component Analysis
??????
2Content
- What is ICA?
- Nongaussianity Measurement Kurtosis
- ICA By Maximization of Nongaussianity
- Gradient and FastICA Algorithms Using Kurtosis
- Measuring Nongaussianity by Negentropy
- FastICA Using Negentrophy
3Independent Component Analysis
What is ICA?
4Motivation
- Example three people are speaking simultaneously
in a room that has three microphones. - Denote the microphone signals by x1(t), x2(t),
and x3(t). - They are mixtures of sources s1(t), s2(t), and
s3(t). - The goal is to estimate the original speech
signals using only the recorded signals. - This is called the cocktail-party problem.
5The Cocktail-Party Problem
The original speech signals
The mixed speech signals
6The Cocktail-Party Problem
The original speech signals
The estimated sources
7The Problem
- Find the sources s1(t), s2(t) and s3(t), and the
coefficients aijs from the observed signals
x1(t), x2(t), and x3(t). - It turns out that the problem can be solved just
by assuming that the sources si(t) are
nongaussian and statistically independent.
8Applications
- Cocktail party problem separation of voices or
music or sounds - Sensor array processing, e.g. radar
- Biomedical signal processing with multiple
sensors EEG, ECG, MEG, fMRI - Telecommunications e.g. multiuser detection in
CDMA - Financial and other time series
- Noise removal from signals and images
- Feature extraction for images and signals
- Brain modelling
9Basic ICA Model
Mixing signals (observable)
10The Basic Assumptions
- The independent components are assumed
statistically independent. - The independent components must have nongaussian
distributions. - For simplicity, we assume that the unknown mixing
matrix A is square.
11Assumption IStatistical Independence
- Basically, random variables y1, y2, , yn are
said to be independent if information on the
value of yi does not give any information on the
value of yj for i ? j. - Mathematically, the joint pdf is factorizable in
the following way - p(y1, y2, , yn) p1(y1) p2(y2)pn(yn)
- Note that uncorrelatedness does not necessary
imply independence.
12Assumption IINongaussian Distributions
- Note that in the basic model we do not have to
know what the nongaussian distributions of the
ICs look like.
13Assumption IIIMixing Matrix is square
- In other words, the number of independent
components is equal to the number of observed
mixtures. - This simplifies our discussion in the first
stage. - However, in the basic ICA model, this is no
restriction as long as originally the number of
observations xi is at least as large as the
number of sources sj.
14Ambiguities of ICA
- We cannot determine the variances (energies) of
ICs. - This also implies Ex0 (centering of x) and
sign of si is unimportant. - We cannot determine the order of ICs.
Therefore, we assume
where P is any permutation matrix.
15Illustration of ICA
Mixing
16Whitening Is Only Half of ICA
Whitening Matrix
whitening
17Whitening Is Only Half of ICA
Uncorrelatedness is related to independence, but
is weaker than independence.
By whitening, we have EzzT I.
This, however, doesnt imply zis are
independent, i.e., we may have
18Independent Component Analysis
Central limit theorem implicitly tells us that
the additive of components, makes the
distribution to become more Gaussian.
Therefore, nongaussianity is an important
criterion for ICA.
Degaussian is hence the central theme in ICA.
19Independent Component Analysis
Nongaussianity Measurement Kurtosis
20Moments
The jth moment
Mean
The jth central moment
Variance
Skewness
21Moment Generating Function
- The moment generating function MX(t) of a random
variable X is defined by - XN(?, ?2)
- ZN(0, 1)
22Standard Normal Distribution N(0, 1)
Zero for all odd moments
23Kurtosis
- Kurtosis of a zero-mean random variable X is
defined by - Normalize kurtosis
24Gaussianity
25Kurtosis for Supergaussian
Consider Laplacian Distribution
gt 0
26Kurtosis for Supergaussian
Consider Laplacian Distribution
gt 0
27Kurtosis for Subgassian
Consider Uniform Distribution
lt 0
28Nongaussianity Measurement By Kurtosis
- Kurtosis, or rather is absolute value, has been
widely used as a measure of nongaussianity in ICA
and related fields. - Computationally, kurtosis can be estimated simply
by using the 4th moment of the sample data (if
the variance is kept constant).
29Properties of Kurtosis
- Let X1 and X2 be two independent variables both
have zero mean.
30Independent Component Analysis
ICA By Maximization of Nongaussianity
31Restate the Problem
How?
Ultimate goal
32Simplification
Ultimate goal
For simplicity, we assume sources are i.i.d.
To estimate an independent component by
whitening
If b is properly identified, qT bTA contains
only one nonzero entry with value one.
This implies that b will be one row of
identified, A?1.
33Nongaussian Is Independent
Ultimate goal
For simplicity, we assume sources are i.i.d.
To estimate an independent component by
whitening
We will take b that maximizes the nongaussianity
of bTx.
34Nongaussian Is Independent
Mixing
35Nongaussian Is Independent
whitening
36Nongaussian Is Independent
Additive of components becomes more Gaussian
37Nongaussian Is Independent
Rotation
38Nongaussian Is Independent
Estimated density
39Nongaussian Is Independent
Consider to get one independent component.
bT
x
40Nongaussian Is Independent
Consider to get one independent component.
Project the whitened data to a unit vector w to
get an independent component.
41Nongaussian Is Independent
q2
q1
Using kurtosis as nongaussianity measurement.
We require that
The search space is
42Independent Component Analysis
Gradient Algorithm Using Kurtosis
43Criterion for ICA Using Kurtosis
maximize
Subject to
44Gradient Algorithm
maximize
Subject to
unrelated
45FastICA Algorithm
maximize
Subject to
At a stable point, the gradient must point in the
direction of w.
Using fixed-point interation, then
sign is not important
FastICA
46Independent Component Analysis
Measuring Nongaussianity by Negentropy
47Critique of Kurtosis
- Kurtosis can be very sensitive to outliers.
- Kurtosis may depend on only a few observations in
the tails of the distribution. - Not a robust measure of nongaussianity.
48Negentropy
Differential Entropy
Negentropy Entropy
?0
Negentropy is zero only when the random variable
is Gaussian distributed.
It is invariant by a invertible linear
transformation.
49Approximation of Negentropy (I)
For a zero mean and unit variance random variable.
Using approximation is helpless because it is
sensitive to outliers.
50Approximation of Negentropy (II)
G1(x) ? odd
Choose two nonpolynomial functions
Measures the dimension of bimodality vs.
peak at zero
such that
G2(x) ? even
Measures the asymmetry
The first term is zero if the underlying density
is zero.
Usually, only the second term is used.
51Approximation of Negentropy (II)
If only an even nonpolynomial function, say, G is
used, we have
The following two functions are useful
G3(x)x4
G1
G2
52Degaussian
For ICA, we want to maximize this quantity.
Specifically, let z Vx be the whitened data.
For one-unit ICA, we want to find a rotation,
say, w to
maximize
subject to
53Gradient Algorithm
Fact
constant
maximize
Algorithm
batch mode
On-line mode
54Analysis
maximize
Consider the term inside the braces.
The functions Gs we used have the following
property
55Analysis
Minimize EG(wTz) if IC is suppergaussian. Maximi
ze EG(wTz) if IC is subgaussian.
maximize
Consider the term inside the braces.
The functions Gs we used have the following
property
56Analysis
g3
g1
g2
Both g1 and g2 are more insensitive on outliers
than g3.
57Analysis
Controls the search direction. The sign is
dependent on the super/subgaussianity of samples
Nonlinearity g(wTt) is for weighting samples.
58Stability Analysis
- Assume that the input data follows the ICA model
with whiten data z VAs. - And, G is a sufficiently smooth even function.
- Then, the local maxima (resp. minima) of
EG(wTz) under the constraint w 1 include
those rows of the inverse of the mixing matrix VA
such that the corresponding independent
components si satisify
59Stability Analysis
- Assume that the input data follows the ICA model
with whiten data z VAs. - And, G is a sufficiently smooth even function.
- Then, the local maxima (resp. minima) of
EG(wTz) under the constraint w 1 include
those rows of the inverse of the mixing matrix VA
such that the corresponding independent
components si satisify
This condition is, in general, true for
reasonable choices of G.
60Independent Component Analysis
FastICA Using Negentropy
61Clue From Gradient Algorithm
Fixed-point iteration suggested
Nonpolynomial moments do not have the same nice
algebraic properties as kurtosis. Such a
iteration scheme is poor.
62Newtons Method
Maximize or minimize
subject to
Construct the Lagrangian as follows
Newtons method finds an extreme point by
letting
63Newtons Method
Evaluate the Hessian matrix and its inverse is
time consuming. We want to approximate it.
Newtons method finds an extreme point of the by
letting
64Newtons Method
A diagonal matix
65Newtons Method
A diagonal matix
66FastICA
The algorithm
67FastICA
- Center the data to make mean zero.
- Whiten the data to give z.
- Choose the initial vector w of unit norm.
-
-
- If not converged, go back to step 4.
68FastICA
- Center the data to make mean zero.
- Whiten the data to give z.
- Choose the initial vector w of unit norm.
-
-
- If not converged, go back to step 4.
69FastICA
70Estimating Several ICs
- Deflation Orthogonalization
- Based on Gram-Schmidt Method
- Symmetric Orthogonalization
- Adjust vectors in parallel
71Deflation Orthogonalization
- Center the data to make mean zero.
- Whiten the data to give z.
- Choose m, the number of ICs to estimate, set
counter p?1 - Choose an initial vector wp of unit norm,
randomly. -
-
-
- If wp not converged, go back to step 5.
- Set p? p 1, if pltm, go back to step 4.
72Symmetric Orthogonalization
- Choose the number of independent components to
estimate, say, m. - Initialize the wi, i1,,m.
- Do an iteration of one-unit algorithm on every wi
in parallel. - Do a symmetric orthogonalization of
matrix W(w1, , wn). - If not converged, go back to step3.
73Symmetric Orthogonalization
Method 1 (Classic Method)
Method 2 (Iteration Method)
- Let
- Let
- If WWT is not close enough to identity, go back
to step 2.