Title: Hidden Markov Models (HMM) Rabiner
1Hidden Markov Models (HMM)Rabiners Paper
- Markoviana Reading Group
- Computer Eng. Science Dept.
- Arizona State University
2Stationary and Non-stationary
- Stationary ProcessIts statistical properties do
not vary with time - Non-stationary ProcessThe signal properties
vary over time
3HMM Example - Casino Coin
0.9
Two CDF tables
0.2
0.1
Fair
Unfair
State transition Pbbties.
States
0.8
Symbol emission Pbbties.
0.5
0.3
0.5
0.7
Observation Symbols
H
H
T
T
Observation Sequence
HTHHTTHHHTHTHTHHTHHHHHHTHTHH
State Sequence
FFFFFFUUUFFFFFFUUUUUUUFFFFFF
Motivation Given a sequence of H Ts, can you
tell at what times the casino cheated?
4Properties of an HMM
- First-order Markov process
- qt only depends on qt-1
- Time is discrete
5Elements of an HMM
- N, the number of States
- M, the number of Symbols
- States S1, S2, SN
- Observation Symbols O1, O2, OM
- l, the Probability Distributions a, b, p
6HMM Basic Problems
- Given an observation sequence OO1O2O3OT and l,
find P(Ol) - Forward Algorithm / Backward Algorithm
- Given OO1O2O3OT and l, find most likely state
sequence Qq1q2qT - Viterbi Algorithm
- Given OO1O2O3OT and l, re-estimate l so that
P(Ol) is higher than it is now - Baum-Welch Re-estimation
7Forward Algorithm Illustration
at(i) is the probability of observing a partial
sequence O1O2O3Ot such that the state Si.
8Forward Algorithm Illustration (contd)
at(i) is the probability of observing a partial
sequence O1O2O3Ot such that the state Si.
Total of this column gives solution
State Sj SN pNbN(O1) S (a1(i) aiN) bN(O2)
State Sj
State Sj S6 p6b6(O1) S (a1(i) ai6) b6(O2)
State Sj S5 p5b5(O1) S (a1(i) ai5) b5(O2)
State Sj S4 p4b4(O1) S (a1(i) ai4) b4(O2)
State Sj S3 p3b3(O1) S (a1(i) ai3) b3(O2)
State Sj S2 p2b2(O1) S (a1(i) ai2) b2(O2)
State Sj S1 p1b1(O1) S (a1(i) ai1) b1(O2)
State Sj at(j) O1 O2 O3 O4 OT
Observations Ot Observations Ot Observations Ot Observations Ot Observations Ot Observations Ot Observations Ot
9Forward Algorithm
Definition Initialization Induction Problem
1 Answer
at(i) is the probability of observing a partial
sequence O1O2O3Ot such that the state Si.
Complexity O(N2T)
10Backward Algorithm Illustration
?t(i) is the probability of observing a partial
sequence Ot1Ot2Ot3OT such that the state Si.
11Backward Algorithm
Definition Initialization Induction
?t(i) is the probability of observing a partial
sequence Ot1Ot2Ot3OT such that the state Si.
12Q2 Optimality Criterion 1
- Maximize the expected number of correct
individual states - Definition
- Initialization
- Problem 2 Answer
- ?t(i) is the probability of being in state Si at
time t given the observation sequence O and the
model ?. - Problem If some aij0, the optimal state
sequence may not even be a valid state sequence. -
13Q2 Optimality Criterion 2
- Find the single best state sequence (path),
i.e. maximize P(QO,?). - Definition
dt(i) is the highest probability of a state path
for the partial observation sequence O1O2O3Ot
such that the state Si.
14Viterbi Algorithm
The major difference from the forward
algorithm Maximization instead of sum
15Viterbi Algorithm Illustration
dt(i) is the highest probability of a state path
for the partial observation sequence O1O2O3Ot
such that the state Si.
Max of this col indicates traceback start
State Sj SN pN bN(O1) max d1(i) aiN bN(O2)
State Sj
State Sj S6 p6 b6(O1) max d1(i) ai6 b6(O2)
State Sj S5 p5 b5(O1) max d1(i) ai5 b5(O2)
State Sj S4 p4 b4(O1) max d1(i) ai4 b4(O2)
State Sj S3 p3 b3(O1) max d1(i) ai3 b3(O2)
State Sj S2 p2 b2(O1) max d1(i) ai2 b2(O2)
State Sj S1 p1 b1(O1) max d1(i) ai1 b1(O2)
State Sj dt(j) O1 O2 O3 O4 OT
Observations Ot Observations Ot Observations Ot Observations Ot Observations Ot Observations Ot Observations Ot
16Relations with DBN
- Forward Function
- Backward Function
- Viterbi Algorithm
bj(Ot1)
aij
?t(i)
?t1(j)
bj(Ot1)
aij
?t1(j)
?t(i)
?T(i)1
?t1(j)
bj(Ot1)
aij
?t(i)
17Some more definitions
gt(i) is the probability of being in state Si at
time t
xt(i,j) is the probability of being in state Si
at time t, and Sj at time t1
18Baum-Welch Re-estimation
- Expectation-Maximization Algorithm
- Expectation
19Baum-Welch Re-estimation (contd)
20Notes on the Re-estimation
- If the model does not change, it means that it
has reached a local maxima. - Depending on the model, many local maxima can
exist - Re-estimated probabilities will sum to 1
21Implementation issues
- Scaling
- Multiple observation sequences
- Initial parameter estimation
- Missing data
- Choice of model size and type
22Scaling
- calculation
- Recursion to calculate
23Scaling (contd)
- calculation
- Desired condition
-
- Note that is not true!
24Scaling (contd)
25Maximum log-likelihood
- Initialization
- Recursion
- Termination
26Multiple observations sequences
- Problem with re-estimation
27Initial estimates of parameters
- For ? and A,
- Random or uniform is sufficient
- For B (discrete symbol prb.),
- Good initial estimate is needed
28Insufficient training data
- Solutions
- Increase the size of training data
- Reduce the size of the model
- Interpolate parameters using another model
29References
- L Rabiner. A Tutorial on Hidden Markov Models
and Selected Applications in Speech Recognition.
Proceedings of the IEEE 1989. - S Russell, P Norvig. Probabilistic Reasoning
Over Time. AI A Modern Approach, Ch.15, 2002
(draft). - V Borkar, K Deshmukh, S Sarawagi. Automatic
segmentation of text into structured records.
ACM SIGMOD 2001. - T Scheffer, C Decomain, S Wrobel. Active Hidden
Markov Models for Information Extraction.
Proceedings of the International Symposium on
Intelligent Data Analysis 2001. - S Ray, M Craven. Representing Sentence Structure
in Hidden Markov Models for Information
Extraction. Proceedings of the 17th
International Joint Conference on Artificial
Intelligence 2001.