Information Theory - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Information Theory

Description:

Symmetric distributions suffer from the same robustness problems as Kurtosis alone ... ICA by estimating Negentropy is more robust than Kurtosis ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 25
Provided by: ruet
Category:

less

Transcript and Presenter's Notes

Title: Information Theory


1
Information Theory
  • Applications of Information Theory in ICA

2
Agenda
  • ICA review
  • Measures for Nongaussianity
  • ICA by Minimization of Negentropy
  • ICA by maximization of Mutual Information
  • Introduction to ML Estimation
  • Conclusion
  • References

3
ICA review
  • Given n linear mixtures x1, , xn of n
    independent components s1, , sn
  • Xj aj1s1 aj2s2 ajnsn, j1..n
  • xnx1 Anxn snx1
  • Estimate Anxn and snxn so that snxn is
    statistically independent.

4
ICA review, example
5
ICA review, example
6
ICA review, example
7
ICA review
  • Problems in ICA estimation
  • How to measure statistical independence?
  • By Kurtosis
  • By Negentropy
  • By mutual information
  • How to find the global maximum of statistical
    independence?
  • Optimization methods (e.g. Gradient Descent)

8
Measures for Nongaussianity Kurtosis
  • Kurt(y) Ey4 3(Ey2)2
  • Examples for Kurtosis of several distributions
    Laplace, Gauss, Uniform

Kurt(ygauss) 0 Kurt(yuniform)
-1,2 Kurt(ylapace) 3
9
Problems using Kurtosis
  • Kurt(y) Ey4 3(Ey2)2
  • Robustness
  • Example Gaussian distribution, zero mean, unit
    variance 1000 samples, 1 outlier at 10
  • Kurt(y) is at least 104 / 1000 3 7

No Outliers kurt(y) 0.11 1 Outlier at 10
kurt(y) 10.98
10
ICA by Minimization of Negentropy
  • Remember from basic Information Theory
  • Differential Entropy
  • Negentropy
  • A Gaussian Variable has the largest Entropy, so
    Entropy can be used to measure nongaussianity.

11
ICA by Minimization of Negentropy
  • Properties of negentropy
  • statistically ideal measure of nongaussianity
  • always nonnegative
  • zero iff the probability distribution is gaussian
  • invariant for invertible linear transformations
  • Disadvantage
  • p(y) must be known to get H(y)

12
Approximating Negentropy by higher order cumulants
  • Idea behind the approximation
  • The PDF of a random variable x should be
    approximated
  • X has zero mean and unit variance
  • The density of x, px(?), is assumed to be near
    the standardized gaussian density
  • Then px(?) can be approximated by Gram-Charlier
    expansion

13
Outline of the Approximation
standardized gaussian density
Hermite polynomials
14
Approximating Negentropy by higher order cumulants
Using the approximation on 1000 random samples
with gaussian distribution and one outlier at 10
J(y) ? 1,43
15
Problems using higher order cumulants
  • The approximation is valid only for distributions
    similar to a gaussian distribution
  • Symmetric distributions suffer from the same
    robustness problems as Kurtosis alone

16
Approximating Negentropy by nonpolynomial
cumulants
  • Idea behind the approximation
  • A Generalization of the previous approximation
  • If G1(y) y3 and G2(y) y4, the approximations
    are identical.
  • Based on the Maximum Entropy Principle. If Gi(y)
    are nonpolynomial functions, EGi(y) are
    nonpolynomial moments of the random variable y.
    The approximation gives the Maximum Entropy that
    can be reached without violating the constraints
    given by the moments.

17
Advantages
  • Gi(y) can be chosen to enhance robustness.
  • Especially, replacing y4 by a function that does
    not grow too fast, one gets much more robust
    results.
  • Approximation by nonpolynomial Moments does not
    give exact results, but is consistent in the
    sense that it is
  • Always nonnegative
  • Zero iff y has a gaussian distribution

18
Choosing G1 and G2
  • Useful functions
  • G1(y) 1/a1 logcosh(a1y), 1lta1lt2
  • G2(y) -exp(-y2/2)

19
Approximating Negentropy by nonpolynomial
cumulants
Using the approximation on 1000 random samples
with gaussian distribution
G1(y) 1/a1 log(cosh(a1y)), a1 1 G2(y) -
exp(-y2 / 2) J(y) ? 0,57
20
Advantages of Negentropy approximation
  • efficient to compute
  • more robust than kurtosis
  • good compromise between negentropy and kurtosis

21
ICA by minimization of mutual information
  • By now, statistical independence has been
    measured by Nongaussianity
  • Mutual information is another measure for
    independence of random variables

22
Relation of mutual information to negentropy
  • Mutual Information
  • Negentropy
  • For an invertible linear transformation W y Wx
  • gt if y is uncorrelated and has unit variance

23
Conclusion
  • ICA by estimating Negentropy is more robust than
    Kurtosis
  • Minimization of mutual information is an
    alternative approach, but one by one estimation
    of independent components is not possible
  • An efficient algorithm for practical purposes
    needs to be introduced (FastICA)

24
References
  • Aapo Hyvärinen, Erkki Oja Independent Component
    analysis, a Tutorial
  • Hyvärinen, Karhunen, Oja Independent Component
    Analysis Wiley Sons
  • Prior Presentations
Write a Comment
User Comments (0)
About PowerShow.com