Title: HMM - Part 2
1HMM - Part 2
- The EM algorithm
- Continuous density HMM
2The EM Algorithm
- EM Expectation Maximization
- Why EM?
- Simple optimization algorithms for likelihood
functions rely on the intermediate variables,
called latent dataFor HMM, the state sequence is
the latent data - Direct access to the data necessary to estimate
the parameters is impossible or difficultFor
HMM, it is almost impossible to estimate (A, B,
?) without considering the state sequence - Two Major Steps
- E step computes an expectation of the likelihood
by including the latent variables as if they were
observed - M step computes the maximum likelihood estimates
of the parameters by maximizing the expected
likelihood found in the E step
3Three Steps for EM
- Step 1. Draw a lower bound
- Use the Jensens inequality
- Step 2. Find the best lower bound ? auxiliary
function - Let the lower bound touch the objective function
at the current guess - Step 3. Maximize the auxiliary function
- Obtain the new guess
- Go to Step 2 until converge
Minka 1998
4Form an Initial Guess of ?(A,B,?)
objective function
current guess
5Step 1. Draw a Lower Bound
objective function
lower bound function
6Step 2. Find the Best Lower Bound
objective function
lower bound function
7Step 3. Maximize the Auxiliary Function
objective function
auxiliary function
8Update the Model
objective function
9Step 2. Find the Best Lower Bound
objective function
auxiliary function
10Step 3. Maximize the Auxiliary Function
objective function
11Step 1. Draw a Lower Bound (contd)
Objective function
If f is a concave function, and X is a r.v.,
then Ef(X) f(EX)
Apply Jensens Inequality
12Step 2. Find the Best Lower Bound (contd)
- Find that makes
- the lower bound function
- touch the objective function
- at the current guess
13Step 2. Find the Best Lower Bound (contd)
Set it to zero
14Step 2. Find the Best Lower Bound (contd)
15EM for HMM Training
- Basic idea
- Assume we have ? and the probability that each Q
occurred in the generation of O - i.e., we have in fact observed a complete
data pair (O,Q) with frequency proportional to
the probability P(O,Q?) - We then find a new that maximizes
- It can be guaranteed that
- EM can discover parameters of model ? to maximize
the log-likelihood of the incomplete data,
logP(O?), by iteratively maximizing the
expectation of the log-likelihood of the complete
data, logP(O,Q?)
16Solution to Problem 3 - The EM Algorithm
- The auxiliary function
- where and
can be expressed as
17Solution to Problem 3 - The EM Algorithm (contd)
- The auxiliary function can be rewritten as
18Solution to Problem 3 - The EM Algorithm (contd)
- The auxiliary function is separated into three
independent terms, each respectively corresponds
to , , and - Maximization procedure on can be
done by maximizing the individual terms
separately subject to probability constraints - All these terms have the following form
19Solution to Problem 3 - The EM Algorithm (contd)
- Proof Apply Lagrange Multiplier
Constraint
20Solution to Problem 3 - The EM Algorithm (contd)
21Solution to Problem 3 - The EM Algorithm (contd)
22Solution to Problem 3 - The EM Algorithm (contd)
23Solution to Problem 3 - The EM Algorithm (contd)
- The new model parameter set
can be expressed as
24Discrete vs. Continuous Density HMMs
- Two major types of HMMs according to the
observations - Discrete and finite observation
- The observations that all distinct states
generate are finite in number, i.e., Vv1, v2,
v3, , vM, vk?RL - In this case, the observation probability
distribution in state j, Bbj(k), is defined as
bj(k)P(otvkqtj), 1?k?M, 1?j?Not
observation at time t, qt state at time t - ? bj(k) consists of only M probability values
- Continuous and infinite observation
- The observations that all distinct states
generate are infinite and continuous, i.e., Vv
v?RL - In this case, the observation probability
distribution in state j, Bbj(v), is defined as
bj(v)f(otvqtj), 1?j?Not observation at
time t, qt state at time t - ? bj(v) is a continuous probability density
function (pdf) and is often a mixture of
Multivariate Gaussian (Normal) Distributions
25Gaussian Distribution
- A continuous random variable X is said to have a
Gaussian distribution with mean µand variance
s2(sgt0) if X has a continuous pdf in the
following form
26Multivariate Gaussian Distribution
- If X(X1,X2,X3,,XL) is an L-dimensional random
vector with a multivariate Gaussian distribution
with mean vector ? and covariance matrix ?, then
the pdf can be expressed as - If X1,X2,X3,,XL are independent random
variables, the covariance matrix is reduced to
diagonal, i.e.,
27Multivariate Mixture Gaussian Distribution
- An L-dimensional random vector X(X1,X2,X3,,XL)
is with a multivariate mixture Gaussian
distribution if - In CDHMM, bj(v) is a continuous probability
density function (pdf) and is often a mixture of
multivariate Gaussian distributions
28Solution to Problem 3 The Segmental K-means
Algorithm
- Assume that we have a training set of
observations and an initial estimate of model
parameters - Step 1 Segment the training data
- The set of training observation sequences is
segmented into states, based on the current
model, by Viterbi Algorithm - Step 2 Re-estimate the model parameters
- Step 3 Evaluate the model If the difference
between the new and current model scores exceeds
a threshold, go back to Step 1 otherwise, return
29Solution to Problem 3 The Segmental K-means
Algorithm (contd)
- 3 states and 4 Gaussian mixtures per state
State
s3
s3
s3
s3
s3
s3
s3
s3
s3
s2
s2
s2
s2
s2
s2
s2
s2
s2
s1
s1
s1
s1
s1
s1
s1
s1
s1
1 2 N
O1
O2
ON
?12,?12,c12
?11,?11,c11
K-means
Global mean
Cluster 1 mean
Cluster 2mean
?13,?13,c13
?14,?14,c14
30Solution to Problem 3 The Intuitive View
(CDHMM)
- Define a new variable ?t(j,k)
- probability of being in state j at time t with
the k-th mixture component accounting for ot
31Solution to Problem 3 The Intuitive View
(CDHMM) (contd)
- Re-estimation formulae for
are
32A Simple Example
The Forward/Backward Procedure
S1
S1
S1
State
S2
S2
S2
1 2 3 Time
o1
o2
o3
33A Simple Example (contd)
q 1 1 1
q 1 1 2
Total 8 paths
34A Simple Example (contd)
back