Title: Information Theory
1Information Theory
- Applications of Information Theory in ICA
2Agenda
- ICA review
- Measures for Nongaussianity
- ICA by Minimization of Negentropy
- ICA by maximization of Mutual Information
- Introduction to ML Estimation
- Conclusion
- References
3ICA review
- Given n linear mixtures x1, , xn of n
independent components s1, , sn - Xj aj1s1 aj2s2 ajnsn, j1..n
- xnx1 Anxn snx1
- Estimate Anxn and snxn so that snxn is
statistically independent. -
4ICA review, example
5ICA review, example
6ICA review, example
7ICA review
- Problems in ICA estimation
- How to measure statistical independence?
- By Kurtosis
- By Negentropy
- By mutual information
- How to find the global maximum of statistical
independence? - Optimization methods (e.g. Gradient Descent)
8Measures for Nongaussianity Kurtosis
- Kurt(y) Ey4 3(Ey2)2
- Examples for Kurtosis of several distributions
Laplace, Gauss, Uniform
Kurt(ygauss) 0 Kurt(yuniform)
-1,2 Kurt(ylapace) 3
9Problems using Kurtosis
- Kurt(y) Ey4 3(Ey2)2
- Robustness
- Example Gaussian distribution, zero mean, unit
variance 1000 samples, 1 outlier at 10 - Kurt(y) is at least 104 / 1000 3 7
No Outliers kurt(y) 0.11 1 Outlier at 10
kurt(y) 10.98
10ICA by Minimization of Negentropy
- Remember from basic Information Theory
- Differential Entropy
- Negentropy
- A Gaussian Variable has the largest Entropy, so
Entropy can be used to measure nongaussianity.
11ICA by Minimization of Negentropy
- Properties of negentropy
- statistically ideal measure of nongaussianity
- always nonnegative
- zero iff the probability distribution is gaussian
- invariant for invertible linear transformations
- Disadvantage
- p(y) must be known to get H(y)
12Approximating Negentropy by higher order cumulants
- Idea behind the approximation
- The PDF of a random variable x should be
approximated - X has zero mean and unit variance
- The density of x, px(?), is assumed to be near
the standardized gaussian density - Then px(?) can be approximated by Gram-Charlier
expansion
13Outline of the Approximation
standardized gaussian density
Hermite polynomials
14Approximating Negentropy by higher order cumulants
Using the approximation on 1000 random samples
with gaussian distribution and one outlier at 10
J(y) ? 1,43
15Problems using higher order cumulants
- The approximation is valid only for distributions
similar to a gaussian distribution - Symmetric distributions suffer from the same
robustness problems as Kurtosis alone
16Approximating Negentropy by nonpolynomial
cumulants
- Idea behind the approximation
- A Generalization of the previous approximation
- If G1(y) y3 and G2(y) y4, the approximations
are identical. - Based on the Maximum Entropy Principle. If Gi(y)
are nonpolynomial functions, EGi(y) are
nonpolynomial moments of the random variable y.
The approximation gives the Maximum Entropy that
can be reached without violating the constraints
given by the moments.
17Advantages
- Gi(y) can be chosen to enhance robustness.
- Especially, replacing y4 by a function that does
not grow too fast, one gets much more robust
results. - Approximation by nonpolynomial Moments does not
give exact results, but is consistent in the
sense that it is - Always nonnegative
- Zero iff y has a gaussian distribution
18Choosing G1 and G2
- Useful functions
- G1(y) 1/a1 logcosh(a1y), 1lta1lt2
- G2(y) -exp(-y2/2)
19Approximating Negentropy by nonpolynomial
cumulants
Using the approximation on 1000 random samples
with gaussian distribution
G1(y) 1/a1 log(cosh(a1y)), a1 1 G2(y) -
exp(-y2 / 2) J(y) ? 0,57
20Advantages of Negentropy approximation
- efficient to compute
- more robust than kurtosis
- good compromise between negentropy and kurtosis
21ICA by minimization of mutual information
- By now, statistical independence has been
measured by Nongaussianity - Mutual information is another measure for
independence of random variables
22Relation of mutual information to negentropy
- Mutual Information
- Negentropy
- For an invertible linear transformation W y Wx
- gt if y is uncorrelated and has unit variance
23Conclusion
- ICA by estimating Negentropy is more robust than
Kurtosis - Minimization of mutual information is an
alternative approach, but one by one estimation
of independent components is not possible - An efficient algorithm for practical purposes
needs to be introduced (FastICA)
24References
- Aapo Hyvärinen, Erkki Oja Independent Component
analysis, a Tutorial - Hyvärinen, Karhunen, Oja Independent Component
Analysis Wiley Sons - Prior Presentations