Title: Expectation-Maximization
1Expectation-Maximization
- Markoviana Reading Group
- Fatih Gelgi, ASU, 2005
2 Outline
- What is EM?
- Intuitive Explanation
- Example Gaussian Mixture
- Algorithm
- Generalized EM
- Discussion
- Applications
- HMM Baum-Welch
- K-means
3What is EM?
- Two main applications
- Data has missing values, due to problems with or
limitations of the observation process. - Optimizing the likelihood function is extremely
hard, but the likelihood function can be
simplified by assuming the existence of and
values for additional missing or hidden
parameters.
4Key Idea
- The observed data U is generated by some
distribution and is called the incomplete data. - Assume that a complete data set exists Z (U,J),
where J is the missing or hidden data. - Maximize the posterior probability of the
parameters ? given the data U, marginalizing over
J
5Intuitive Explanation of EM
- Alternate between estimating the unknowns ? and
the hidden variables J. - In each iteration, instead of finding the best J
? J, compute a distribution over the space J. - EM is a lower-bound maximization process
(Minka,98). - E-step construct a local lower-bound to the
posterior distribution. - M-step optimize the bound.
6Intuitive Explanation of EM
- Lower-bound approximation method
Sometimes provides faster convergence than
gradient descent and Newtons method
7Example Mixture Components
8Example (contd)True Likelihood of Parameters
9Example (contd)Iterations of EM
10Lower-bound Maximization
- Posterior probability ? Logarithm of the joint
distribution -
- Idea start with a guess ?t, compute an easily
computed lower-bound B(? ?t) to the function log
P(?U) and maximize the bound instead.
11Lower-bound Maximization (cont.)
- Construct a tractable lower-bound B(? ?t) that
contains a sum of logarithms. - ft(J) is an arbitrary prob. dist.
- By Jensens inequality,
-
12Optimal Bound
- B(? ?t) touches the objective function log
P(U,?) at ?t. - Maximize B(?t ?t) with respect to ft(J)
- Introduce a Lagrange multiplier ? to enforce the
constraint
13Optimal Bound (cont.)
- Derivative with respect to ft(J)
- Maximizes at
14Maximizing the Bound
- Re-write B(??t) with respect to the
expectations -
- where
- Finally,
15EM Algorithm
- EM converges to a local maximum of log P(U,?) ?
maximum of log P(?U).
16A Relation to the Log-Posterior
- An alternative way to compute expected
log-posterior - which is the same as maximization with respect
to ?,
17Generalized EM
- Assume and B function are
differentiable in - .The EM likelihood converges to a point
where - GEM Instead of setting ?t1 argmax B(??t)
- Just find ?t1 such that
- B(??t1) gt B(??t)
- GEM also is guaranteed to converge
18HMM Baum-Welch Revisited
Estimate the parameters (a, b, ?) st. number of
correct individual states to be maximum.
gt(i) is the probability of being in state Si at
time t
xt(i,j) is the probability of being in state Si
at time t, and Sj at time t1
19Baum-Welch E-step
20Baum-Welch M-step
21K-Means
- Problem Given data X and the number of clusters
K, find clusters. - Clustering based on centroids,
- A point belongs to the cluster with closest
centroid. - Hidden variables centroids of the clusters!
22K-Means (cont.)
- Starting with an initial ?0, centroids,
- E-step Split the data into K clusters according
to distances to the centroids (Calculate the
distribution ft(J)). - M-step Update the centroids (Calculate ?t1).
23K Means Example(K2)
Reassign clusters
Converged!
24Discussion
- Is EM a Primal-Dual algorithm?
25Reference
- A.P.Dempster et al Maximum-likelihood from
incomplete data Journal of the Royal Statistical
Society. Series B (Methodological), Vol. 39, No.
1. (1977), pp. 1-38. - F. Dellaert, The Expectation Maximization
Algorithm, Tech. Rep. GIT-GVU-02-20, 2002. - T. Minka, Expectation-Maximization as lower
bound maximization, 1998 - Y. Chang, M. Kölsch. Presentation Expectation
Maximization, UCSB, 2002. - K. Andersson, Presentation Model Optimization
using the EM algorithm, COSC 7373, 2001
26Thanks!