Title: Ch-9: Markov Models
1- Ch-9 Markov Models
- Prepared by Qaiser Abbas (07-0906)
2Outline
- Markov Models
- Hidden MarKov Models (HMM)
- Three problems in HMM and their solutions
3Credits and References
- Materials used in this representation are taken
from following textbooks or web resources - 1."Foundations of Statistical Natural Language
Processing" by Manning Schütze. Chapter 9,
Markov Models - 2.SPEECH and LANGUAGE PROCESSING An
Introduction to Natural Language Processing,
Computational Linguistics, and Speech
Recognition, by D. Jurafsky and J.H. Martin,
updated chapters are available on authors
website Chapter 9 Automatic Speech
Recognition - 3.Spoken Language Processing - A Guide to
Theory, Algorithm, and System Development, by X.
Huang, A. Acero, and H.W. Hon. Chapter 8Hidden
Markov Models Chapter 12, Basic Search
Algorithms - 4.Dr. Andrew W. Moore, Carnegie Melon University,
http//www.cs.cmu.edu/awm/tutorials - 5.Larry Rabiners tutorial on HMMs
4A Markov System
Has N states, called s1, s2 .. sN There are
discrete timesteps, t0, t1,
s2
s1
s3
N 3 t0
5A Markov System
Has N states, called s1, s2 .. sN There are
discrete timesteps, t0, t1, On the tth
timestep the system is in exactly one of the
available states. Call it qt Note qt ?s1, s2 ..
sN
s2
s1
s3
N 3 t0 qtq0s3
6A Markov System
Has N states, called s1, s2 .. sN There are
discrete timesteps, t0, t1, On the tth
timestep the system is in exactly one of the
available states. Call it qt Note qt ?s1, s2 ..
sN Between each timestep, the next state is
chosen by random.
s2
s1
s3
N 3 t1 qtq1s2
7A Markov System
P(qt1s1qts2) 1/2 P(qt1s2qts2)
1/2 P(qt1s3qts2) 0
Has N states, called s1, s2 .. sN There are
discrete timesteps, t0, t1, On the tth
timestep the system is in exactly one of the
available states. Call it qt Note qt ?s1, s2 ..
sN The current state determines the probability
distribution for the next state.
s2
P(qt1s1qts1) 0 P(qt1s2qts1)
0 P(qt1s3qts1) 1
1/2
2/3
1/2
s1
s3
1/3
N 3 t1 qtq1s2
1
P(qt1s1qts3) 1/3 P(qt1s2qts3)
2/3 P(qt1s3qts3) 0
8Markov Property
P(qt1s1qts2) 1/2 P(qt1s2qts2)
1/2 P(qt1s3qts2) 0
qt1 is conditionally independent of qt-1,
qt-2, q1, q0 given qt. In other words P(qt1
sj qt si ) P(qt1 sj qt si ,any
earlier history) The sequence of q is said to be
a Markov chain ,or to have the Markov property if
the next state depends only upon the current
state and not on any past states
s2
P(qt1s1qts1) 0 P(qt1s2qts1)
0 P(qt1s3qts1) 1
1/2
2/3
1/2
s1
s3
1/3
N 3 t1 qtq1s2
1
P(qt1s1qts3) 1/3 P(qt1s2qts3)
2/3 P(qt1s3qts3) 0
9Transition Matrix
Question What is the probability of states
sequence of
10Example A Simple Markov Model For Weather
Prediction
- Any given day, the weather can be described as
being in one of three states - State 1 snowy
- State 2 cloudy
- State 3 sunny
transition matrix
11Question
- Given that the weather on day 1(t1) is sunny
(state 3), What is the probability that the
weather for eight consecutive days is
sun-sun-sun-rain-rain-sun-cloudy-sun? - Solution
- O sun sun sun rain rain sun cloudy sun
- 3 3 3 1 1 3 2
3
12From Markov To Hidden Markov
- The previous model assumes that each state can be
uniquely associated with an observable event - Once an observation is made, the state of the
system is then trivially retrieved - This model, however, is too restrictive to be of
practical use for most realistic problems - To make the model more flexible, we will assume
that the outcomes or observations of the model
are a probabilistic function of each state - Each state can produce a number of outputs
according to a probability distribution, and each
distinct output can potentially be generated at
any state - These are known a Hidden Markov Models (HMM),
because the state sequence is not directly
observable, it can only be approximated from the
sequence of observations produced by the system
13Example A Crazy Soft Drink Machine
- Suppose you have a crazy soft drink machine it
can be in two states, cola preferring (CP) and
iced tea preferring (IP), but it switches between
them randomly after each purchase, as shown below
Three possible outputs( observations) cola, iced
Tea, lemonade
14Question
- What is the probability of seeing the output
sequence lem, ice_t if the machine always
starts off in the cola preferring state? - Solution
- We need to consider all paths that might be
taken through the HMM, and then to sum over them.
We know that the machine starts in state CP.
There are then four possibilities to produce the
observations - CP-gtCP-gtCP
- CP-gtCP-gt IP
- CP-gtIP-gtCP
- CP-gtIP-gtIP
- So the total probability is
15A Crazy Soft Drink Machine (Continued)
16General Form of an HMM
- HMM is specified by a five-tuple
- 1)
- Set of hidden states
- N the number of states the state
at time t - 2)
- Set of observation symbols
- M the number of observation symbols
- 3)
- The initial state distribution
- 4)
- State transition probability distribution
- 5)
- Observation symbol probability distribution in
state
17General Form of an HMM (Continued)
Two assumptions 1.Markov assumption
represents the state sequence 2.Output
independence assumption
represents the output sequence
18Three Basic Problems in HMM
How to evaluate an HMM? Forward Algorithm
- 1.The Evaluation Problem Given a model and a
sequence of observations
, what is the probability
i.e., the probability of the model that
generates the observations? - 2.The Decoding Problem Given a model and
a sequence of observation
, what is the most likely state sequence
in the model that
produces the observations? - 3.The Learning Problem Given a model and
a set of observations, how can we adjust the
model parameter to maximize the joint
probability - ?
How to Decode an HMM? Viterbi Algorithm
How to Train an HMM? Baum-Welch Algorithm
19How to Evaluate an HMM-A Straightforward Method
- To calculate the probability (likelihood)
of the observation sequence
, given the HMM , the most intuitive
way is to sum up the probabilities of all
possible state sequences -
Applying Markov assumption
Applying output independent assumption
20How to Evaluate an HMM-A Straightforward Method
(complexity)
For any given state sequence, we start from
initial state with probability or
. We take a transition from to
with probability and generate the
observation with probability
until we reach the last transition.
21How to Evaluate an HMM-The Forward Algorithm
- Define forward probability
is the probability that the HMM is in
state having generated partial observation
The computation is done in a time- synchronous
fashion from left to right
22How to Evaluate an HMM-The Forward Algorithm
It needs exactly N(N1)(T-1)N multiplications
and N(N-1)(T-1) additions, so the complexity for
this algorithm is O(N2T). For N5, T100, we
need about 3000 computations for the forward
algorithm, versus 1072 computations for the
straightforward method.
23How to Decode an HMM-The Viterbi Algorithm
- Instead of summing up probabilities from
different paths coming to the same destination
state, the Viterbi algorithm picks and remembers
the best path. - Define the best-path probability
is the probability of the most likely
state sequence at time t, which has generated the
observation (until time t) and ends in
state i.
24How to Decode an HMM-The Viterbi Algorithm
The computation is done in a time-synchronous
fashion from left to right. The complexity is
also O(N2T).
25HMM Training UsingBaum-Welch Algorithm
- A Hidden Markov Model is a probabilistic model of
the joint probability of a collection of random
variables O1,OT, Q1,QT. The Ot variables are
discrete observations and the Qt variables are
hidden and discrete states. Under HMM, two
conditional independence assumptions are - 1. the tth hidden variable, given the (t-1)st
hidden variable, is independent of previous
variables, or P(Qt Qt-1, Ot-1, , Q1, O1)
P(Qt Qt-1). - 2. the tth observation depends only on the tth
state. P(Ot Qt,Ot,, Q1, O1) P(Ot Qt). - EM algorithm for finding the MLE of the
parameters of a HMM given a set of observed
feature vectors. This algorithm is also known as
the Baum-Welch algorithm. - Qt is a discrete random variable with N possible
values 1.N. We further assume that the
underlying hidden Markov chain defined by P(Qt
Qt-1 is time-homogeneous (i.e., is
independent of the time t). Therefore, we can
represent P(Qt Qt-1 as a time-independent
stochastic transition matrix Aaijp(QtjQt-1i
. - The special case of time t1 is described by the
initial state distribution piP(Q1i). We say
that we are in state j at time t if Qt j. A
particular sequence of states is described by q
(q1. . . qT ) where qt? 1..N is the state at
time t. - The observation is one of L possible observation
symbols, Ot? o1,.oL.The probability of a
particular observation vector at a particular
time t for state j is described by bj(ot) p(Ot
otQt j). (Bbij is an L by N matrix). A
particular observation sequence O is described as
O (O1 o1, , , OT oT ).
26- Therefore, we can describe a HMM by? (A,B, p).
Given an observation O, the Baum-Welch algorithm
finds that is, the
HMM ?, that maximizes the probability of the
observation O. - The Baum-Welch algorithm
- Initialization set with random initial
conditions. The algorithm updates the parameters
of ? iteratively until convergence, following the
procedure below. - The forward procedure We define ai(t) p(O1
o1, , ,Ot ot, Qt i ?), which is the
probability of seeing the partial sequence o1, ,
, ot and ending up in state i at time t. We can
efficiently calculate ai(t) recursively as - The backward procedure This is the probability
of the ending partial sequence ot1, , , oT given
that we started at state i, at time t. We can
efficiently calculate ßi(t) as - using a and ß, we can calculate the following
variables
27- having ? and ? , one can define update rules as
follows
28Toolkits for HMM
- Hidden Markov Model Toolkit (HTK)
http//htk.eng.cam.ac.uk/ - Hidden Markov Model (HMM) Toolbox for Matlab
- http//www.cs.ubc.ca/murphyk/Software/HMM/hmm.ht
ml