Maximum Likelihood and the Information Bottleneck - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Maximum Likelihood and the Information Bottleneck

Description:

Maximum Likelihood and the Information Bottleneck. By Noam Slonim & Yair Weiss. 2/19/2003 ... Under some initial conditions, an algorithm for one gives a ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 16
Provided by: chadmich
Category:

less

Transcript and Presenter's Notes

Title: Maximum Likelihood and the Information Bottleneck


1
Maximum Likelihood and the Information Bottleneck
  • By Noam Slonim Yair Weiss
  • 2/19/2003

2
Overview
  • Main contribution
  • Defines mapping ML of mixture models to
    iterative IB
  • Under some initial conditions, an algorithm for
    one gives a solution for the other.
  • Theoretical and practical concern
  • ML ideal vs. IB real
  • Using opposite algorithm could improve performance

3
IB Intuition Review
  • Given r.v. X and Y w/ joint p(x,y)
  • Rerepresent X with clusters T that preserve
    information about Y
  • Find compressed representation T of X with
    mapping q(tx)
  • choice of q(t\x) must minimize the IB-Functional
  • T and fixed
  • minimizing I(TX) maximizes compression
  • maximizing I(TY) minimizes distortion

4
IB Review
  • Additionally, given
  • so
  • From prev. paper to minimize
  • Use initial to get q(t), q(yt) and
    iterate

5
ML for mixture models
  • Generative process
  • Y generated by multinomial distribution
  • choose t to maximize this probability
  • but we dont know
  • We dont have p(x,y) either, just samples n(x,y)
  • Use EM to find , that
    maximizes the likelihood of seeing n(x,y) with
    ts

6
EM
  • Iterative algorithm to compute ML
  • E step
  • denote as
  • set
  • k(x) normalization factor,

7
EM cont
  • M step
  • set
  • set
  • Alternative free energy version

8
ML IB mapping
  • Fairly straightforward mapping
  • ,
  • Since we cant map the corresponding parameter
    distributions directly, we do this mapping then
    an M-step or IB-step.

9
Observations
  • When X uniformly distributed, mapping is equiv.
    to direct mapping of parameter distributions.
  • M-step and IB-step mathematically equivalent
  • When X uniform, EM is equiv to IB iterative
    algorithm with r X.
  • equivalence of E-step to IB step setting q(t x).
  • since

10
Main Equivalence Claims
  • When X uniform and r X, all fixed points of
    the likelihood L are fixed points of with
  • at the fixed points,
  • Any algorithm that finds a fixed point of L
    induces a fixed point of . If more than one
    the one that maximizes L minimizes

11
Claims (2)
  • For or all the fixed
    points of L are mapped to the fixed points of
  • again, at the fixed points
  • Again any algorithm that finds one induces one
    for the other domain.

12
Simulations
  • How do we know when N or is large enough to
    use the mapping?
  • Empirical validation
  • Newsgroup clustering experiment
  • X500 documents, Y2000 words, T10
    groupings
  • N43433 occurrences in one set, N2171 in pruned
    set

13
Simulation results
  • At small values of N the differences are more
    prominent

14
Discussion
  • At higher values of N, EM can converge to a
    smaller value of after the mapping, and
    vice versa.
  • Mentions alternative formulation for IB where we
    minimize the KL distance between
    and the family of distributions for which the
    mixture model assumption holds.
  • For smaller sample size, the freedom of choosing
    in IB seems beneficial

15
Conclusion
  • Interesting reformulation of IB in the standard
    mixture model setting for clustering.
  • Interesting theoretical results with possible
    practical advantages for mapping from one to the
    other.
Write a Comment
User Comments (0)
About PowerShow.com