Information Theory - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Information Theory

Description:

Symmetric distributions suffer from the same robustness problems as Kurtosis alone ... ICA by estimating Negentropy is more robust than Kurtosis ... – PowerPoint PPT presentation

Number of Views:90

Avg rating:3.0/5.0

Slides: 25

Provided by: ruet

Category:

more less

Transcript and Presenter's Notes

Title: Information Theory

1
Information Theory

Applications of Information Theory in ICA

2
Agenda

ICA review
Measures for Nongaussianity
ICA by Minimization of Negentropy
ICA by maximization of Mutual Information
Introduction to ML Estimation
Conclusion
References

3
ICA review

Given n linear mixtures x1, , xn of n
independent components s1, , sn
Xj aj1s1 aj2s2 ajnsn, j1..n
xnx1 Anxn snx1
Estimate Anxn and snxn so that snxn is
statistically independent.

4
ICA review, example
5
ICA review, example
6
ICA review, example
7
ICA review

Problems in ICA estimation
How to measure statistical independence?
By Kurtosis
By Negentropy
By mutual information
How to find the global maximum of statistical
independence?
Optimization methods (e.g. Gradient Descent)

8
Measures for Nongaussianity Kurtosis

Kurt(y) Ey4 3(Ey2)2
Examples for Kurtosis of several distributions
Laplace, Gauss, Uniform

Kurt(ygauss) 0 Kurt(yuniform)
-1,2 Kurt(ylapace) 3
9
Problems using Kurtosis

Kurt(y) Ey4 3(Ey2)2
Robustness
Example Gaussian distribution, zero mean, unit
variance 1000 samples, 1 outlier at 10
Kurt(y) is at least 104 / 1000 3 7

No Outliers kurt(y) 0.11 1 Outlier at 10
kurt(y) 10.98
10
ICA by Minimization of Negentropy

Remember from basic Information Theory
Differential Entropy
Negentropy
A Gaussian Variable has the largest Entropy, so
Entropy can be used to measure nongaussianity.

11
ICA by Minimization of Negentropy

Properties of negentropy
statistically ideal measure of nongaussianity
always nonnegative
zero iff the probability distribution is gaussian
invariant for invertible linear transformations
Disadvantage
p(y) must be known to get H(y)

12
Approximating Negentropy by higher order cumulants

Idea behind the approximation
The PDF of a random variable x should be
approximated
X has zero mean and unit variance
The density of x, px(?), is assumed to be near
the standardized gaussian density
Then px(?) can be approximated by Gram-Charlier
expansion

13
Outline of the Approximation
standardized gaussian density
Hermite polynomials
14
Approximating Negentropy by higher order cumulants
Using the approximation on 1000 random samples
with gaussian distribution and one outlier at 10
J(y) ? 1,43
15
Problems using higher order cumulants

The approximation is valid only for distributions
similar to a gaussian distribution
Symmetric distributions suffer from the same
robustness problems as Kurtosis alone

16
Approximating Negentropy by nonpolynomial
cumulants

Idea behind the approximation
A Generalization of the previous approximation
If G1(y) y3 and G2(y) y4, the approximations
are identical.
Based on the Maximum Entropy Principle. If Gi(y)
are nonpolynomial functions, EGi(y) are
nonpolynomial moments of the random variable y.
The approximation gives the Maximum Entropy that
can be reached without violating the constraints
given by the moments.

17
Advantages

Gi(y) can be chosen to enhance robustness.
Especially, replacing y4 by a function that does
not grow too fast, one gets much more robust
results.
Approximation by nonpolynomial Moments does not
give exact results, but is consistent in the
sense that it is
Always nonnegative
Zero iff y has a gaussian distribution

18
Choosing G1 and G2

Useful functions
G1(y) 1/a1 logcosh(a1y), 1lta1lt2
G2(y) -exp(-y2/2)

19
Approximating Negentropy by nonpolynomial
cumulants
Using the approximation on 1000 random samples
with gaussian distribution
G1(y) 1/a1 log(cosh(a1y)), a1 1 G2(y) -
exp(-y2 / 2) J(y) ? 0,57
20
Advantages of Negentropy approximation

efficient to compute
more robust than kurtosis
good compromise between negentropy and kurtosis

21
ICA by minimization of mutual information

By now, statistical independence has been
measured by Nongaussianity
Mutual information is another measure for
independence of random variables

22
Relation of mutual information to negentropy

Mutual Information
Negentropy
For an invertible linear transformation W y Wx
gt if y is uncorrelated and has unit variance

23
Conclusion

ICA by estimating Negentropy is more robust than
Kurtosis
Minimization of mutual information is an
alternative approach, but one by one estimation
of independent components is not possible
An efficient algorithm for practical purposes
needs to be introduced (FastICA)

24
References