HMM - Part 2 - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

HMM - Part 2

Description:

HMM - Part 2 The EM algorithm Continuous density HMM – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 44
Provided by: whm9
Category:
Tags: hmm | microsoft | part

less

Transcript and Presenter's Notes

Title: HMM - Part 2


1
HMM - Part 2
  • The EM algorithm
  • Continuous density HMM

2
The EM Algorithm
  • EM Expectation Maximization
  • Why EM?
  • Simple optimization algorithms for likelihood
    functions rely on the intermediate variables,
    called latent dataFor HMM, the state sequence is
    the latent data
  • Direct access to the data necessary to estimate
    the parameters is impossible or difficultFor
    HMM, it is almost impossible to estimate (A, B,
    ?) without considering the state sequence
  • Two Major Steps
  • E step calculate expectation with respect to the
    latent data given the current estimate of the
    parameters and the observations
  • M step estimate a new set of parameters
    according to Maximum Likelihood (ML) or Maximum A
    Posteriori (MAP) criteria

ML vs. MAP
3
The EM Algorithm (cont.)
  • The EM algorithm is important to HMMs and many
    other model learning techniques
  • Basic idea
  • Assume we have ? and the probability that each
    Qq occurred in the generation of Oo
  • i.e., we have in fact observed a complete
    data pair (o,q) with frequency proportional to
    the probability P(Oo,Qq?)
  • We then find a new that maximizes

  • It can be guaranteed that
  • EM can discover parameters of model ? to maximize
    the log-likelihood of the incomplete data,
    logP(Oo?), by iteratively maximizing the
    expectation of the log-likelihood of the complete
    data, logP(Oo,Qq?)

4
The EM Algorithm (cont.)
5
The EM Algorithm (cont.)
1. Jensens inequality If f is a concave
function, and X is a r.v., then Ef(X)
f(EX) 2. log x x-1
6
Solution to Problem 3 - The EM Algorithm
  • The auxiliary function
  • Where and
    can be expressed as

7
Solution to Problem 3 - The EM Algorithm (cont.)
  • The auxiliary function can be rewritten as

8
Solution to Problem 3 - The EM Algorithm (cont.)
  • The auxiliary function is separated into three
    independent terms, each respectively corresponds
    to , , and
  • Maximization procedure on can be
    done by maximizing the individual terms
    separately subject to probability constraints
  • All these terms have the following form

9
Solution to Problem 3 - The EM Algorithm (cont.)
  • Proof Apply Lagrange Multiplier

Constraint
10
Solution to Problem 3 - The EM Algorithm (cont.)
11
Solution to Problem 3 - The EM Algorithm (cont.)
12
Solution to Problem 3 - The EM Algorithm (cont.)
13
Solution to Problem 3 - The EM Algorithm (cont.)
  • The new model parameter set
    can be expressed as

14
Discrete vs. Continuous Density HMMs
  • Two major types of HMMs according to the
    observations
  • Discrete and finite observation
  • The observations that all distinct states
    generate are finite in number, i.e., Vv1, v2,
    v3, , vM, vk?RL
  • In this case, the observation probability
    distribution in state j, Bbj(k), is defined as
    bj(k)P(otvkqtj), 1?k?M, 1?j?Not
    observation at time t, qt state at time t
  • ? bj(k) consists of only M probability values
  • Continuous and infinite observation
  • The observations that all distinct states
    generate are infinite and continuous, i.e., Vv
    v?RL
  • In this case, the observation probability
    distribution in state j, Bbj(v), is defined as
    bj(v)f(otvqtj), 1?j?Not observation at
    time t, qt state at time t
  • ? bj(v) is a continuous probability density
    function (pdf) and is often a mixture of
    Multivariate Gaussian (Normal) Distributions

15
Gaussian Distribution
  • A continuous random variable X is said to have a
    Gaussian distribution with mean µand variance
    s2(sgt0) if X has a continuous pdf in the
    following form

16
Multivariate Gaussian Distribution
  • If X(X1,X2,X3,,XL) is an L-dimensional random
    vector with a multivariate Gaussian distribution
    with mean vector ? and covariance matrix ?, then
    the pdf can be expressed as
  • If X1,X2,X3,,XL are independent random
    variables, the covariance matrix is reduced to
    diagonal, i.e.,

17
Multivariate Mixture Gaussian Distribution
  • An L-dimensional random vector X(X1,X2,X3,,XL)
    is with a multivariate mixture Gaussian
    distribution if
  • In CDHMM, bj(v) is a continuous probability
    density function (pdf) and is often a mixture of
    multivariate Gaussian distributions

18
Solution to Problem 3 The Intuitive View
(CDHMM)
  • Define a new variable ?t(j,k)
  • probability of being in state j at time t with
    the k-th mixture component accounting for ot

19
Solution to Problem 3 The Intuitive View
(CDHMM) (cont.)
  • Re-estimation formulae for
    are

20
Solution to Problem 3 - The EM Algorithm(CDHMM)
  • Express with respect to each single
    mixture component

K one of the possible mixture component sequence
along with the state sequence Q
21
Solution to Problem 3 - The EM Algorithm(CDHMM)
(cont.)
  • The auxiliary function can be written as
  • Compared to the DHMM case, we need to further
    solve

22
Solution to Problem 3 - The EM Algorithm(CDHMM)
(cont.)
  • The new model parameter set can be
    derived as

23
Solution to Problem 3 - The EM Algorithm(CDHMM)
(cont.)
  • The new model parameter sets can
    be derived as

24
Solution to Problem 3 - The EM Algorithm(CDHMM)
(cont.)
We thus solve
25
Solution to Problem 3 - The EM Algorithm(CDHMM)
(cont.)
26
Solution to Problem 3 - The EM Algorithm(CDHMM)
(cont.)
27
HMM Topology
  • Speech is a time-evolving non-stationary signal
  • Each HMM state has the ability to capture some
    quasi-stationary segment in the non-stationary
    speech signal
  • A left-to-right topology is a natural candidate
    to model the speech signal
  • Each state has a state-dependent output
    probability distribution that can be used to
    interpret the observable speech signal
  • It is general to represent a phone using 35
    states (English) and a syllable using 68 states
    (Mandarin Chinese)

28
HMM Limitations
  • HMMs have proved themselves to be a good model of
    speech variability in time and feature space
    simultaneously
  • There are a number of limitations in the
    conventional HMMs
  • The state duration follows an exponential
    distribution
  • Dont provide adequate representation of the
    temporal structure of speech
  • First order (Markov) assumption the state
    transition depends only on the previous state
  • Output-independent assumption all observation
    frames are dependent on the state that generated
    them, not on neighboring observation frames
  • HMMs are well defined only for processes that are
    a function of a single independent variable, such
    as time or one-dimensional position
  • Although speech recognition remains the dominant 
    field in which HMMs are applied, their use has
    been spreading steadily to other fields

29
ML vs. MAP
  • Estimation principle based on observations
    Oo1, o2, , oT
  • The Maximum Likelihood (ML) principlefind the
    model parameter ? so that the likelihood P(O?)
    is maximum
  • for example, if ? ?,? is the parameters of a
    multivariate normal distribution, and O is i.i.d.
    (independent, identically distributed), then the
    ML estimate of ? ?,? is
  • The Maximum a Posteriori (MAP) principlefind
    the model parameter ? so that the likelihood P(?
    O) is maximum

back
30
A Simple Example
The Forward/Backward Procedure
S1
S1
S1
State
S2
S2
S2
1 2 3 Time
o1
o2
o3
31
A Simple Example (cont.)









q 1 1 1
q 1 1 2
Total 8 paths
32
A Simple Example (cont.)
back
33
Appendix - Matrix Calculus
  • Notation

34
Appendix - Matrix Calculus (cont.)
  • Property 1
  • proof

35
Appendix - Matrix Calculus (cont.)
  • Property 1 - Extension
  • proof

36
Appendix - Matrix Calculus (cont.)
  • Property 2
  • proof

back
37
Appendix - Matrix Calculus (cont.)
  • Property 3
  • proof

38
Appendix - Matrix Calculus (cont.)
  • Property 4
  • proof

back
39
Appendix - Matrix Calculus (cont.)
  • Property 5
  • proof

40
Appendix - Matrix Calculus (cont.)
  • Property 6
  • proof

41
Appendix - Matrix Calculus (cont.)
  • Property 7
  • proof

42
Appendix - Matrix Calculus (cont.)
  • Property 8
  • proof

43
Appendix - Matrix Calculus (cont.)
  • Property 9
  • proof

back
Write a Comment
User Comments (0)
About PowerShow.com