HMM - Part 2 - PowerPoint PPT Presentation

About This Presentation

Title:

HMM - Part 2

Description:

Simple optimization algorithms for likelihood functions rely on the ... [Minka 1998] 4. objective function. current guess. Form an Initial Guess of =(A,B, ... – PowerPoint PPT presentation

Number of Views:172

Avg rating:3.0/5.0

Slides: 35

Provided by: whm

Category:

more less

Transcript and Presenter's Notes

Title: HMM - Part 2

1
HMM - Part 2

The EM algorithm
Continuous density HMM

2
The EM Algorithm

EM Expectation Maximization
Why EM?
Simple optimization algorithms for likelihood
functions rely on the intermediate variables,
called latent dataFor HMM, the state sequence is
the latent data
Direct access to the data necessary to estimate
the parameters is impossible or difficultFor
HMM, it is almost impossible to estimate (A, B,
?) without considering the state sequence
Two Major Steps
E step computes an expectation of the likelihood
by including the latent variables as if they were
observed
M step computes the maximum likelihood estimates
of the parameters by maximizing the expected
likelihood found in the E step

3
Three Steps for EM

Step 1. Draw a lower bound
Use the Jensens inequality
Step 2. Find the best lower bound ? auxiliary
function
Let the lower bound touch the objective function
at the current guess
Step 3. Maximize the auxiliary function
Obtain the new guess
Go to Step 2 until converge

Minka 1998
4
Form an Initial Guess of ?(A,B,?)
objective function
current guess
5
Step 1. Draw a Lower Bound
objective function
lower bound function
6
Step 2. Find the Best Lower Bound
objective function
lower bound function
7
Step 3. Maximize the Auxiliary Function
objective function
auxiliary function
8
Update the Model
objective function
9
Step 2. Find the Best Lower Bound
objective function
auxiliary function
10
Step 3. Maximize the Auxiliary Function
objective function
11
Step 1. Draw a Lower Bound (contd)
Objective function
If f is a concave function, and X is a r.v.,
then Ef(X) f(EX)
Apply Jensens Inequality
12
Step 2. Find the Best Lower Bound (contd)

Find that makes
the lower bound function
touch the objective function
at the current guess

13
Step 2. Find the Best Lower Bound (contd)
Set it to zero
14
Step 2. Find the Best Lower Bound (contd)
15
EM for HMM Training

Basic idea
Assume we have ? and the probability that each Q
occurred in the generation of O
i.e., we have in fact observed a complete
data pair (O,Q) with frequency proportional to
the probability P(O,Q?)
We then find a new that maximizes
It can be guaranteed that
EM can discover parameters of model ? to maximize
the log-likelihood of the incomplete data,
logP(O?), by iteratively maximizing the
expectation of the log-likelihood of the complete
data, logP(O,Q?)

16
Solution to Problem 3 - The EM Algorithm

The auxiliary function
where and
can be expressed as

17
Solution to Problem 3 - The EM Algorithm (contd)

The auxiliary function can be rewritten as

18
Solution to Problem 3 - The EM Algorithm (contd)

The auxiliary function is separated into three
independent terms, each respectively corresponds
to , , and
Maximization procedure on can be
done by maximizing the individual terms
separately subject to probability constraints
All these terms have the following form

19
Solution to Problem 3 - The EM Algorithm (contd)

Proof Apply Lagrange Multiplier

Constraint
20
Solution to Problem 3 - The EM Algorithm (contd)
21
Solution to Problem 3 - The EM Algorithm (contd)
22
Solution to Problem 3 - The EM Algorithm (contd)
23
Solution to Problem 3 - The EM Algorithm (contd)

The new model parameter set
can be expressed as

24
Discrete vs. Continuous Density HMMs

Two major types of HMMs according to the
observations
Discrete and finite observation
The observations that all distinct states
generate are finite in number, i.e., Vv1, v2,
v3, , vM, vk?RL
In this case, the observation probability
distribution in state j, Bbj(k), is defined as
bj(k)P(otvkqtj), 1?k?M, 1?j?Not
observation at time t, qt state at time t
? bj(k) consists of only M probability values
Continuous and infinite observation
The observations that all distinct states
generate are infinite and continuous, i.e., Vv
v?RL
In this case, the observation probability
distribution in state j, Bbj(v), is defined as
bj(v)f(otvqtj), 1?j?Not observation at
time t, qt state at time t
? bj(v) is a continuous probability density
function (pdf) and is often a mixture of
Multivariate Gaussian (Normal) Distributions

25
Gaussian Distribution

A continuous random variable X is said to have a
Gaussian distribution with mean µand variance
s2(sgt0) if X has a continuous pdf in the
following form

26
Multivariate Gaussian Distribution

If X(X1,X2,X3,,XL) is an L-dimensional random
vector with a multivariate Gaussian distribution
with mean vector ? and covariance matrix ?, then
the pdf can be expressed as
If X1,X2,X3,,XL are independent random
variables, the covariance matrix is reduced to
diagonal, i.e.,

27
Multivariate Mixture Gaussian Distribution

An L-dimensional random vector X(X1,X2,X3,,XL)
is with a multivariate mixture Gaussian
distribution if
In CDHMM, bj(v) is a continuous probability
density function (pdf) and is often a mixture of
multivariate Gaussian distributions

28
Solution to Problem 3 The Segmental K-means
Algorithm

Assume that we have a training set of
observations and an initial estimate of model
parameters
Step 1 Segment the training data
The set of training observation sequences is
segmented into states, based on the current
model, by Viterbi Algorithm
Step 2 Re-estimate the model parameters
Step 3 Evaluate the model If the difference
between the new and current model scores exceeds
a threshold, go back to Step 1 otherwise, return

29
Solution to Problem 3 The Segmental K-means
Algorithm (contd)

3 states and 4 Gaussian mixtures per state

State
s3
s3
s3
s3
s3
s3
s3
s3
s3
s2
s2
s2
s2
s2
s2
s2
s2
s2
s1
s1
s1
s1
s1
s1
s1
s1
s1
1 2 N
O1
O2
ON
?12,?12,c12
?11,?11,c11
K-means
Global mean
Cluster 1 mean
Cluster 2mean
?13,?13,c13
?14,?14,c14
30
Solution to Problem 3 The Intuitive View
(CDHMM)

Define a new variable ?t(j,k)
probability of being in state j at time t with
the k-th mixture component accounting for ot

31
Solution to Problem 3 The Intuitive View
(CDHMM) (contd)

Re-estimation formulae for
are

32
A Simple Example
The Forward/Backward Procedure
S1
S1
S1
State
S2
S2
S2
1 2 3 Time
o1
o2
o3
33
A Simple Example (contd)

q 1 1 1
q 1 1 2
Total 8 paths
34
A Simple Example (contd)
back

Write a Comment

User Comments (0)