Title: EM Algorithm and Mixture of Gaussians
1EM AlgorithmandMixture of Gaussians
- Collard Fabien - 20046056
- ??? (Kim Jinsik) - 20043152
- ??? (Joo Chanhye) - 20043595
2Summary
- Hidden Factors
- EM Algorithm
- Principles
- Formalization
- Mixture of Gaussians
- Generalities
- Processing
- Formalization
- Other Issues
- Bayesian Network with hidden variables
- Hidden Markov models
- Bayes net structures with hidden variables
3The Problem Hidden Factors
Hidden factors
- Unobservable / Latent / Hidden
- Make them as variables
- Simplicity of the model
4 Simplicity details (graph1)
Hidden factors
2
2
2
Smoking
Diet
Exercise
708 Priors !
5Simplicity details (Graph2)
Hidden factors
2
2
2
Smoking
Diet
Exercise
78 Priors
6
6
6
Symptom 1
Symptom 2
Symptom 3
6A Solution EM Algorithm
EM Algorithm
7Principles Generalities
EM Algorithm
- Given
- Cause (or Factor / Component)
- Evidence
- Compute
- Probability in connection table
8Principles The two steps
EM Algorithm
Parameters P(effects/causes) P(causes)
9Principles the E-Step
EM Algorithm
- Perception Step
- For each evidence and cause
- Compute probablities
- Find probable relationships
10Principles the M-Step
EM Algorithm
- Learning Step
- Recompute the probability
- Cause event / Evidence event
- Sum for all Evidence events
- Maximize the log likelihood
- Modify the model parameters
11Formulae Notations
EM Algorithm
- Terms
- ? underlying probability distribution
- x observed data
- z unobserved data
- h current hypothesis of ?
- h revised hypothesis
- q a hidden variable distribution
- Task estimate ? from X
- E-step
- M-step
12Formulae the Log Likelihood
EM Algorithm
- L(h) estimates the fitting of the parameter h to
the data x with the given hidden variables z - Jensen's inequality ? for any distribution of
hidden states q(z) - Defines the auxiliary function A(q,h)
- Lower bound on the log likelihood
- What we want to optimize
13Formulae the E-step
EM Algorithm
- Lower bound on log likelihood
- H(q) entropy of q(z),
- Optimize A(q,h)
- By distribute data over hidden variables
14Formulae the M-step
EM Algorithm
- Maximise A(q,h)
- By choosing the optimal parameters
- Equivalent to optimize likelihood
15Formulae Convergence (1/2)
EM Algorithm
- EM increases the log likelihood of the data at
every iteration - Kullback-Liebler (KL) divergence
- Non negative
- Equals 0 iff q(z)p(z/x,h)
16Formulae Convergence (2/2)
- Likelihood increases at each iteration
- Usually, EM converges to a local optimum of L
17Problem of likelihood
- Can be high dimensional integral
- Latent variables ? additional dimensions
- Likelihood term can be complicated
18The Issue Mixture of Gaussian
Mixture of Gaussians
- Unsupervised clustering
- Set of data points (Evidences)
- Data generated from mixture distribution
- Continuous data Mixture of Gaussians
- Not easy to handle
- Number of parameters is Dimension-squared
19Gaussian Mixture model (2/2)
Mixture of Gaussians
- Distribution
- Likelihood of Gaussian Distribution
- Likelihood given a GMM
- N number of Gaussians
- wi the weight of Gaussian I
- All weights positive
- Total weight 1
20EM for Gaussian Mixture Model
- What for ?
- Find parameters
- Weights wiP(Ci)
- Means ?i
- Covariances ?i
- How ?
- Guess the priority Distribution
- Guess components (Classes -or Causes)
- Guess the distribution function
21Processing EM Initialization
Mixture of Gaussians
- Initialization
- Assign random value to parameters
22Processing the E-Step (1/2)
Mixture of Gaussians
- Expectation
- Pretend to know the parameter
- Assign data point to a component
23Processing the E-Step (2/2)
Mixture of Gaussians
- Competition of Hypotheses
- Compute the expected values of Pij of hidden
indicator variables. - Each gives membership weights to data point
- Normalization
- Weight relative likelihood of class membership
24Processing the M-Step (1/2)
Mixture of Gaussians
- Maximization
- Fit the parameter to its set of points
25Processing the M-Step (2/2)
Mixture of Gaussians
- For each Hypothesis
- Find the new value of parameters to maximize the
log likelihood - Based on
- Weight of points in the class
- Location of the points
- Hypotheses are pulled toward data
26Applied formulae the E-Step
Mixture of Gaussians
- Find Gaussian for every data point
- Use Bayes rule
27Applied formulae the M-Step
Mixture of Gaussians
- Maximize A
- For each parameter of h, search for
- Results
- µ
- s2
- w
28Eventual problems
Mixture of Gaussians
- Gaussian Component shrinks
- Variance 0
- Likelihood infinite
- Gaussian Components merge
- Same values
- Share the data points
- A Solution reasonable prior values
29Bayesian Networks
Other Issues
30Hidden Markov models
Other Issues
- Forward-Backward Algorithm
- Smooth rather than filter
31Bayes net with hidden variables
Other Issues
- Pretend that data is complete
- Or invent new hidden variable
- No label or meaning
32Conclusion
- Widely applicable
- Diagnosis
- Classification
- Distribution Discovery
- Does not work for complex models
- High dimension
- ? Structural EM