CSE 552 - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

CSE 552

Description:

Define variable which has meaning of 'the probability of observations o1 through ot and ... Now we can define , the probability of being in state i at time t ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 26

Provided by: hos1

Category:

Tags: cse

more less

Transcript and Presenter's Notes

Title: CSE 552

1

CSE 552
Hidden Markov Models for Speech Recognition
Spring, 2004
Oregon Health Science University
OGI School of Science Engineering
John-Paul Hosom
Lecture Notes for May 5
Gamma, Xi, and the Forward-Backward Algorithm

2
Review ? and ?

Define variable ? which has meaning of the
probability of observations o1 through ot and
being in state i at time t, given our HMM

Compute ? and P(O ?) with the following
procedure
Induction
Termination
3
Review ? and ?

In the same way that we defined ?, we can define
?

Define variable ? which has meaning of the
probability of observations ot1 through oT,
given that were in state i at time t, and
given our HMM

Compute ? with the following procedure
Where a value of 1 is chosen arbitrarily (but
wont affect results) Induction
4
Forward Procedure Algorithm Example

Example hi

0.65
0.55
0.15
0.20

observed features o1 0.8 o2 0.8 o3
0.2

?1(h)0.55 ?1(ay)0.0 ?2(h) 0.550.3
0.00.0 0.55 0.09075 ?2(ay) 0.550.7
0.00.4 0.15 0.05775
?3(h) 0.090750.3 0.057750.0 0.20
0.0054 ?3(ay) 0.090750.7 0.057750.4
0.65 0.0563
??3(i) 0.0617
5
Backward Procedure Algorithm Example

What are all ? values?

?3(h)1.0 ?3(ay)1.0
?2(h) 0.30.201.0 0.70.651.0
0.515 ?2(ay) 0.00.201.0 0.40.651.0
0.260
?1(h) 0.30.550.515 0.70.150.260
0.1123 ?1(ay) 0.00.550.515 0.40.150.260
0.0156 ?0() 1.00.550.1123
0.00.150.0156 0.0618 ?0() ? ?3(i)
P(O?)
6
Probability of Gamma

Now we can define ?, the probability of being in
state i at time t given an observation
sequence and HMM.

also
, so
7
Probability of Gamma Illustration
Illustration what is probability of being in
state 2 at time 2?
b1(o3)
b1(o1)
b1(o2)
State 1
a21
a12
b2(o3)
b2(o1)
b2(o2)
a22
a22
State 2
a32
a23
b3(o3)
b3(o1)
b3(o2)
State 3
8
Gamma Example

Given this 3-state HMM and set of 4
observations, what is probability of being in
state A at time 2?

0.2
0.3
1.0
0.8
0.7
1.0
A
C
B
1.0
0.0
0.0
1.0
O 0.2 0.3 0.4 0.5
9
Gamma Example
1. Compute forward probabilities up to time 2
10
Gamma Example
2. Compute backward probabilities for times 4, 3,
2
11
Gamma Example
3. Compute ?
12
Xi

We can define one more variable ? is the
probability of being in state i at time t,
and in state j at time t1, given the
observations and HMM

We can specify ? as follows

13
Xi Diagram

This diagram illustrates ?

b1(o4)
b1(o1)
b1(o3)
b1(o2)
a12
State 1
a21
a22
b2(o3)
b2(o2)
b2(o4)
b2(o1)
a22
a32
State 2
a23
b3(o4)
b3(o1)
b3(o3)
b3(o2)
State 3
a12b2(o3)
t
t1
t2
t-1
?2(1)
?3(2)
14
Xi Example 1

Given the same HMM and observations as before,
what is ?2(A,B)?

15
Xi Example 2

Given this 3-state HMM and set of 4
observations, what is the expected number of
transitions from B to C?

0.2
0.3
1.0
0.8
0.7
1.0
A
C
B
1.0
0.0
0.0
1.0
O 0.2 0.3 0.4 0.5
16
Xi Example 2
17
Xi

We can also specify ? in terms of ?

and finally,

But why do we care??

18
How Do We Improve Estimates of HMM Parameters?

With the Expectation-Maximization algorithm,
also known as the Baum-Welch method
In this case, we can use the following
re-estimation formulae

19
How Do We Improve Estimates of HMM Parameters?

For discrete HMMs

After computing new model parameters, we
maximize by substituting the new parameter
values in place of the old parameter values
and repeat.

20
How Do We Improve Estimates of HMM Parameters?

For continuous HMMs

jstate, kmixture component!!
p(being in state j from component k)
p(being in state j)
21
How Do We Improve Estimates of HMM Parameters?

For continuous HMMs

expected value of ot based on existing ?
expected value of diagonal of covariance
matrix based on existing ?
22
How Do We Improve Estimates of HMM Parameters?

EM called Baum-Welch, also called
forward-backward algorithm
This process is guaranteed to converge
monotonically to a maximum-likelihood
estimate.
There may be many local maxima cant guarantee
the process will reach globally best result.

23
Multiple Training Files
So far, weve implicitly assumed a single set of
observations for training. Most systems are
trained on multiple sets of observations (files).
This makes it necessary to use
accumulators. Initialize for each file compute
initial state boundaries (e.g. flat start) add
information to accumulator compute average,
standard deviation Update for each
iteration reset accumulators for each
file add information to accumulators compute
average, standard deviation update estimates
24
Viterbi Search Project Notes

Assume that any state can follow any other
state this will greatly simplify the
implementation.
Also assume that this is a whole-word
recognizer, and that each word is recognized
with a separate execution of the program.
This will greatly simplify the implementation
Print out both the score for the utterance and
the most likely state sequence from t1 to T

25
Viterbi Search Project Notes

the Normal p.d. f. returns probabilities??

techniques from multivariate calculus must be
used to show that
(Devore, p. 138)
Examples ot 2.0, ? 4.0, ? 5.0 N0.07365 ot
3.9, ? 4.0, ? 0.2 N1.76032 Conclusion whe
n ? is small and ot is near ?, N(ot, ?, ?)
yields likelihoods instead of probabilities.

Write a Comment

User Comments (0)