Independent Component Analysis - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Independent Component Analysis

Description:

Nongaussianity Measurement Kurtosis. ICA By Maximization of Nongaussianity. Gradient and FastICA Algorithms Using Kurtosis. Measuring Nongaussianity by Negentropy ... – PowerPoint PPT presentation

Number of Views:778
Avg rating:3.0/5.0
Slides: 74
Provided by: taiwe
Category:

less

Transcript and Presenter's Notes

Title: Independent Component Analysis


1
Independent Component Analysis
??????
2
Content
  • What is ICA?
  • Nongaussianity Measurement Kurtosis
  • ICA By Maximization of Nongaussianity
  • Gradient and FastICA Algorithms Using Kurtosis
  • Measuring Nongaussianity by Negentropy
  • FastICA Using Negentrophy

3
Independent Component Analysis
What is ICA?
4
Motivation
  • Example three people are speaking simultaneously
    in a room that has three microphones.
  • Denote the microphone signals by x1(t), x2(t),
    and x3(t).
  • They are mixtures of sources s1(t), s2(t), and
    s3(t).
  • The goal is to estimate the original speech
    signals using only the recorded signals.
  • This is called the cocktail-party problem.

5
The Cocktail-Party Problem
The original speech signals
The mixed speech signals
6
The Cocktail-Party Problem
The original speech signals
The estimated sources
7
The Problem
  • Find the sources s1(t), s2(t) and s3(t), and the
    coefficients aijs from the observed signals
    x1(t), x2(t), and x3(t).
  • It turns out that the problem can be solved just
    by assuming that the sources si(t) are
    nongaussian and statistically independent.

8
Applications
  • Cocktail party problem separation of voices or
    music or sounds
  • Sensor array processing, e.g. radar
  • Biomedical signal processing with multiple
    sensors EEG, ECG, MEG, fMRI
  • Telecommunications e.g. multiuser detection in
    CDMA
  • Financial and other time series
  • Noise removal from signals and images
  • Feature extraction for images and signals
  • Brain modelling

9
Basic ICA Model
Mixing signals (observable)
10
The Basic Assumptions
  • The independent components are assumed
    statistically independent.
  • The independent components must have nongaussian
    distributions.
  • For simplicity, we assume that the unknown mixing
    matrix A is square.

11
Assumption IStatistical Independence
  • Basically, random variables y1, y2, , yn are
    said to be independent if information on the
    value of yi does not give any information on the
    value of yj for i ? j.
  • Mathematically, the joint pdf is factorizable in
    the following way
  • p(y1, y2, , yn) p1(y1) p2(y2)pn(yn)
  • Note that uncorrelatedness does not necessary
    imply independence.

12
Assumption IINongaussian Distributions
  • Note that in the basic model we do not have to
    know what the nongaussian distributions of the
    ICs look like.

13
Assumption IIIMixing Matrix is square
  • In other words, the number of independent
    components is equal to the number of observed
    mixtures.
  • This simplifies our discussion in the first
    stage.
  • However, in the basic ICA model, this is no
    restriction as long as originally the number of
    observations xi is at least as large as the
    number of sources sj.

14
Ambiguities of ICA
  • We cannot determine the variances (energies) of
    ICs.
  • This also implies Ex0 (centering of x) and
    sign of si is unimportant.
  • We cannot determine the order of ICs.

Therefore, we assume
where P is any permutation matrix.
15
Illustration of ICA
Mixing
16
Whitening Is Only Half of ICA
Whitening Matrix
whitening
17
Whitening Is Only Half of ICA
Uncorrelatedness is related to independence, but
is weaker than independence.
By whitening, we have EzzT I.
This, however, doesnt imply zis are
independent, i.e., we may have
18
Independent Component Analysis
Central limit theorem implicitly tells us that
the additive of components, makes the
distribution to become more Gaussian.
Therefore, nongaussianity is an important
criterion for ICA.
Degaussian is hence the central theme in ICA.
19
Independent Component Analysis
Nongaussianity Measurement Kurtosis
20
Moments
The jth moment
Mean
The jth central moment
Variance
Skewness
21
Moment Generating Function
  • The moment generating function MX(t) of a random
    variable X is defined by
  • XN(?, ?2)
  • ZN(0, 1)

22
Standard Normal Distribution N(0, 1)
Zero for all odd moments
23
Kurtosis
  • Kurtosis of a zero-mean random variable X is
    defined by
  • Normalize kurtosis

24
Gaussianity
25
Kurtosis for Supergaussian
Consider Laplacian Distribution
gt 0
26
Kurtosis for Supergaussian
Consider Laplacian Distribution
gt 0
27
Kurtosis for Subgassian
Consider Uniform Distribution
lt 0
28
Nongaussianity Measurement By Kurtosis
  • Kurtosis, or rather is absolute value, has been
    widely used as a measure of nongaussianity in ICA
    and related fields.
  • Computationally, kurtosis can be estimated simply
    by using the 4th moment of the sample data (if
    the variance is kept constant).

29
Properties of Kurtosis
  • Let X1 and X2 be two independent variables both
    have zero mean.

30
Independent Component Analysis
ICA By Maximization of Nongaussianity
31
Restate the Problem
How?
Ultimate goal
32
Simplification
Ultimate goal
For simplicity, we assume sources are i.i.d.
To estimate an independent component by
whitening
If b is properly identified, qT bTA contains
only one nonzero entry with value one.
This implies that b will be one row of
identified, A?1.
33
Nongaussian Is Independent
Ultimate goal
For simplicity, we assume sources are i.i.d.
To estimate an independent component by
whitening
We will take b that maximizes the nongaussianity
of bTx.
34
Nongaussian Is Independent
Mixing
35
Nongaussian Is Independent
whitening
36
Nongaussian Is Independent
Additive of components becomes more Gaussian
37
Nongaussian Is Independent
Rotation
38
Nongaussian Is Independent
Estimated density
39
Nongaussian Is Independent
Consider to get one independent component.
bT
x
40
Nongaussian Is Independent
Consider to get one independent component.
Project the whitened data to a unit vector w to
get an independent component.
41
Nongaussian Is Independent
q2
q1
Using kurtosis as nongaussianity measurement.
We require that
The search space is
42
Independent Component Analysis
Gradient Algorithm Using Kurtosis
43
Criterion for ICA Using Kurtosis
maximize
Subject to
44
Gradient Algorithm
maximize
Subject to
unrelated
45
FastICA Algorithm
maximize
Subject to
At a stable point, the gradient must point in the
direction of w.
Using fixed-point interation, then
sign is not important
FastICA
46
Independent Component Analysis
Measuring Nongaussianity by Negentropy
47
Critique of Kurtosis
  • Kurtosis can be very sensitive to outliers.
  • Kurtosis may depend on only a few observations in
    the tails of the distribution.
  • Not a robust measure of nongaussianity.

48
Negentropy
Differential Entropy
Negentropy Entropy
?0
Negentropy is zero only when the random variable
is Gaussian distributed.
It is invariant by a invertible linear
transformation.
49
Approximation of Negentropy (I)
For a zero mean and unit variance random variable.
Using approximation is helpless because it is
sensitive to outliers.
50
Approximation of Negentropy (II)
G1(x) ? odd
Choose two nonpolynomial functions
Measures the dimension of bimodality vs.
peak at zero
such that
G2(x) ? even
Measures the asymmetry
The first term is zero if the underlying density
is zero.
Usually, only the second term is used.
51
Approximation of Negentropy (II)
If only an even nonpolynomial function, say, G is
used, we have
The following two functions are useful
G3(x)x4
G1
G2
52
Degaussian
For ICA, we want to maximize this quantity.
Specifically, let z Vx be the whitened data.
For one-unit ICA, we want to find a rotation,
say, w to
maximize
subject to
53
Gradient Algorithm
Fact
constant
maximize
Algorithm
batch mode
On-line mode
54
Analysis
maximize
Consider the term inside the braces.
The functions Gs we used have the following
property
55
Analysis
Minimize EG(wTz) if IC is suppergaussian. Maximi
ze EG(wTz) if IC is subgaussian.
maximize
Consider the term inside the braces.
The functions Gs we used have the following
property
56
Analysis
g3
g1
g2
Both g1 and g2 are more insensitive on outliers
than g3.
57
Analysis
Controls the search direction. The sign is
dependent on the super/subgaussianity of samples
Nonlinearity g(wTt) is for weighting samples.
58
Stability Analysis
  • Assume that the input data follows the ICA model
    with whiten data z VAs.
  • And, G is a sufficiently smooth even function.
  • Then, the local maxima (resp. minima) of
    EG(wTz) under the constraint w 1 include
    those rows of the inverse of the mixing matrix VA
    such that the corresponding independent
    components si satisify

59
Stability Analysis
  • Assume that the input data follows the ICA model
    with whiten data z VAs.
  • And, G is a sufficiently smooth even function.
  • Then, the local maxima (resp. minima) of
    EG(wTz) under the constraint w 1 include
    those rows of the inverse of the mixing matrix VA
    such that the corresponding independent
    components si satisify

This condition is, in general, true for
reasonable choices of G.
60
Independent Component Analysis
FastICA Using Negentropy
61
Clue From Gradient Algorithm
Fixed-point iteration suggested
Nonpolynomial moments do not have the same nice
algebraic properties as kurtosis. Such a
iteration scheme is poor.
62
Newtons Method
Maximize or minimize
subject to
Construct the Lagrangian as follows
Newtons method finds an extreme point by
letting
63
Newtons Method
Evaluate the Hessian matrix and its inverse is
time consuming. We want to approximate it.
Newtons method finds an extreme point of the by
letting
64
Newtons Method
A diagonal matix
65
Newtons Method
A diagonal matix
66
FastICA
The algorithm
67
FastICA
  • Center the data to make mean zero.
  • Whiten the data to give z.
  • Choose the initial vector w of unit norm.
  • If not converged, go back to step 4.

68
FastICA
  • Center the data to make mean zero.
  • Whiten the data to give z.
  • Choose the initial vector w of unit norm.
  • If not converged, go back to step 4.

69
FastICA
70
Estimating Several ICs
  • Deflation Orthogonalization
  • Based on Gram-Schmidt Method
  • Symmetric Orthogonalization
  • Adjust vectors in parallel

71
Deflation Orthogonalization
  • Center the data to make mean zero.
  • Whiten the data to give z.
  • Choose m, the number of ICs to estimate, set
    counter p?1
  • Choose an initial vector wp of unit norm,
    randomly.
  • If wp not converged, go back to step 5.
  • Set p? p 1, if pltm, go back to step 4.

72
Symmetric Orthogonalization
  • Choose the number of independent components to
    estimate, say, m.
  • Initialize the wi, i1,,m.
  • Do an iteration of one-unit algorithm on every wi
    in parallel.
  • Do a symmetric orthogonalization of
    matrix W(w1, , wn).
  • If not converged, go back to step3.

73
Symmetric Orthogonalization
Method 1 (Classic Method)
Method 2 (Iteration Method)
  • Let
  • Let
  • If WWT is not close enough to identity, go back
    to step 2.
Write a Comment
User Comments (0)
About PowerShow.com