HMM - Part 2 - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

HMM - Part 2

Description:

HMM - Part 2 The EM algorithm Continuous density HMM – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 44

Provided by: whm9

Category:

more less

Transcript and Presenter's Notes

Title: HMM - Part 2

1
HMM - Part 2

The EM algorithm
Continuous density HMM

2
The EM Algorithm

EM Expectation Maximization
Why EM?
Simple optimization algorithms for likelihood
functions rely on the intermediate variables,
called latent dataFor HMM, the state sequence is
the latent data
Direct access to the data necessary to estimate
the parameters is impossible or difficultFor
HMM, it is almost impossible to estimate (A, B,
?) without considering the state sequence
Two Major Steps
E step calculate expectation with respect to the
latent data given the current estimate of the
parameters and the observations
M step estimate a new set of parameters
according to Maximum Likelihood (ML) or Maximum A
Posteriori (MAP) criteria

ML vs. MAP
3
The EM Algorithm (cont.)

The EM algorithm is important to HMMs and many
other model learning techniques
Basic idea
Assume we have ? and the probability that each
Qq occurred in the generation of Oo
i.e., we have in fact observed a complete
data pair (o,q) with frequency proportional to
the probability P(Oo,Qq?)
We then find a new that maximizes
It can be guaranteed that
EM can discover parameters of model ? to maximize
the log-likelihood of the incomplete data,
logP(Oo?), by iteratively maximizing the
expectation of the log-likelihood of the complete
data, logP(Oo,Qq?)

4
The EM Algorithm (cont.)
5
The EM Algorithm (cont.)
1. Jensens inequality If f is a concave
function, and X is a r.v., then Ef(X)
f(EX) 2. log x x-1
6
Solution to Problem 3 - The EM Algorithm

The auxiliary function
Where and
can be expressed as

7
Solution to Problem 3 - The EM Algorithm (cont.)

The auxiliary function can be rewritten as

8
Solution to Problem 3 - The EM Algorithm (cont.)

The auxiliary function is separated into three
independent terms, each respectively corresponds
to , , and
Maximization procedure on can be
done by maximizing the individual terms
separately subject to probability constraints
All these terms have the following form

9
Solution to Problem 3 - The EM Algorithm (cont.)

Proof Apply Lagrange Multiplier

Constraint
10
Solution to Problem 3 - The EM Algorithm (cont.)
11
Solution to Problem 3 - The EM Algorithm (cont.)
12
Solution to Problem 3 - The EM Algorithm (cont.)
13
Solution to Problem 3 - The EM Algorithm (cont.)

The new model parameter set
can be expressed as

14
Discrete vs. Continuous Density HMMs

Two major types of HMMs according to the
observations
Discrete and finite observation
The observations that all distinct states
generate are finite in number, i.e., Vv1, v2,
v3, , vM, vk?RL
In this case, the observation probability
distribution in state j, Bbj(k), is defined as
bj(k)P(otvkqtj), 1?k?M, 1?j?Not
observation at time t, qt state at time t
? bj(k) consists of only M probability values
Continuous and infinite observation
The observations that all distinct states
generate are infinite and continuous, i.e., Vv
v?RL
In this case, the observation probability
distribution in state j, Bbj(v), is defined as
bj(v)f(otvqtj), 1?j?Not observation at
time t, qt state at time t
? bj(v) is a continuous probability density
function (pdf) and is often a mixture of
Multivariate Gaussian (Normal) Distributions

15
Gaussian Distribution

A continuous random variable X is said to have a
Gaussian distribution with mean µand variance
s2(sgt0) if X has a continuous pdf in the
following form

16
Multivariate Gaussian Distribution

If X(X1,X2,X3,,XL) is an L-dimensional random
vector with a multivariate Gaussian distribution
with mean vector ? and covariance matrix ?, then
the pdf can be expressed as
If X1,X2,X3,,XL are independent random
variables, the covariance matrix is reduced to
diagonal, i.e.,

17
Multivariate Mixture Gaussian Distribution

An L-dimensional random vector X(X1,X2,X3,,XL)
is with a multivariate mixture Gaussian
distribution if
In CDHMM, bj(v) is a continuous probability
density function (pdf) and is often a mixture of
multivariate Gaussian distributions

18
Solution to Problem 3 The Intuitive View
(CDHMM)

Define a new variable ?t(j,k)
probability of being in state j at time t with
the k-th mixture component accounting for ot

19
Solution to Problem 3 The Intuitive View
(CDHMM) (cont.)

Re-estimation formulae for
are

20
Solution to Problem 3 - The EM Algorithm(CDHMM)

Express with respect to each single
mixture component

K one of the possible mixture component sequence
along with the state sequence Q
21
Solution to Problem 3 - The EM Algorithm(CDHMM)
(cont.)

The auxiliary function can be written as
Compared to the DHMM case, we need to further
solve

22
Solution to Problem 3 - The EM Algorithm(CDHMM)
(cont.)

The new model parameter set can be
derived as

23
Solution to Problem 3 - The EM Algorithm(CDHMM)
(cont.)

The new model parameter sets can
be derived as

24
Solution to Problem 3 - The EM Algorithm(CDHMM)
(cont.)
We thus solve
25
Solution to Problem 3 - The EM Algorithm(CDHMM)
(cont.)
26
Solution to Problem 3 - The EM Algorithm(CDHMM)
(cont.)
27
HMM Topology

Speech is a time-evolving non-stationary signal
Each HMM state has the ability to capture some
quasi-stationary segment in the non-stationary
speech signal
A left-to-right topology is a natural candidate
to model the speech signal
Each state has a state-dependent output
probability distribution that can be used to
interpret the observable speech signal
It is general to represent a phone using 35
states (English) and a syllable using 68 states
(Mandarin Chinese)

28
HMM Limitations

HMMs have proved themselves to be a good model of
speech variability in time and feature space
simultaneously
There are a number of limitations in the
conventional HMMs
The state duration follows an exponential
distribution
Dont provide adequate representation of the
temporal structure of speech
First order (Markov) assumption the state
transition depends only on the previous state
Output-independent assumption all observation
frames are dependent on the state that generated
them, not on neighboring observation frames
HMMs are well defined only for processes that are
a function of a single independent variable, such
as time or one-dimensional position
Although speech recognition remains the dominant
field in which HMMs are applied, their use has
been spreading steadily to other fields

29
ML vs. MAP

Estimation principle based on observations
Oo1, o2, , oT
The Maximum Likelihood (ML) principlefind the
model parameter ? so that the likelihood P(O?)
is maximum
for example, if ? ?,? is the parameters of a
multivariate normal distribution, and O is i.i.d.
(independent, identically distributed), then the
ML estimate of ? ?,? is
The Maximum a Posteriori (MAP) principlefind
the model parameter ? so that the likelihood P(?
O) is maximum

back
30
A Simple Example
The Forward/Backward Procedure
S1
S1
S1
State
S2
S2
S2
1 2 3 Time
o1
o2
o3
31
A Simple Example (cont.)

q 1 1 1
q 1 1 2
Total 8 paths
32
A Simple Example (cont.)
back
33
Appendix - Matrix Calculus