Lecture 8: Hidden Markov Models (HMMs) - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Lecture 8: Hidden Markov Models (HMMs)

Description:

Originally presented at Yaakov Stein's DSPCSP Seminar, spring 2002. Modified by Benny Chor, using also some ... States Rainy:1, Cloudy:2, Sunny:3. Matrix A ... – PowerPoint PPT presentation

Number of Views:109

Avg rating:3.0/5.0

Slides: 35

Provided by: shlo1

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 8: Hidden Markov Models (HMMs)

1
Lecture 8 Hidden Markov Models
(HMMs)
Prepared by

Michael Gutkin
Shlomi Haba

Originally presented at Yaakov Steins DSPCSP
Seminar, spring 2002
Modified by Benny Chor, using also some slides of
Nir Friedman (Hebrew Univ.), for the
Computational Genomics Course, Tel-Aviv Univ.,
Dec. 2002
2
Outline

Discrete Markov Models
Hidden Markov Models
Three major questions
Q1. Computing the probability of a given
observation.
A1. Forward Backward (Baum Welch) DP
algorithm.
Q2. Computing the most probable sequence,
given an observation.
A2. Viterbi DP Algorithm
Q3. Given an observation, learn best model.
A3. Expectation Maximization (EM) A
Heuristic.

3
Markov Models

A discrete (finite) system
N distinct states.
Begins (at time t1) in some initial state.
At each time step (t1,2,) the system moves
from current to next state (possibly the same
as
the current state) according to transition
probabilities associated with current state.
This kind of system is called aDiscrete Markov
Model

4
Discrete Markov Model

Example Discrete Markov Model with 5 states
Each of the aij represents the probability of
moving from state i to state j
The aij are given in a matrix A aij
The probability to start in a given state i is
pi , The vector p represents these
start probabilities.

5
Types of Models

Ergodic model
Strongly connected - directed
path w/ positive probabilities
from each state i to state j
(but not necessarily complete directed graph)

6
Types of Models (cont.)

Left-to-Right (LR) model
Index of state non-decreasing with time

7
Discrete Markov Model - Example

States Rainy1, Cloudy2, Sunny3
Matrix A
Problem given that the weather on day 1 (t1)
is sunny(3), what is the probability for the
observation O

8
Discrete Markov Model Example (cont.)

The answer is -

9
Hidden Markov Models (probabilistic
finite state automata)

Often we face scenarios where states cannot be
directly observed.
We need an extension Hidden Markov Models

aij are state transition probabilities. bik are
observation (output) probabilities.
Observed phenomenon
b11 b12 b13 b14 1, b21 b22 b23 b24
1, etc.
10
Example Dishonest Casino
Actually, what is hidden in this model?
11
Biological Example CpG islands

In human genome, CpG dinucleotides are relatively
rare
CpG pairs undergo a process called methylation
that modifies the C nucleotide
A methylated C can (with relatively high
probability) mutate to a T
Promoter regions are CpG rich
These regions are not methylated, and thus mutate
less often
These are called CpG islands

12
CpG Islands

We construct two Markov chains One for CpG
rich, one for CpG poor regions.
Using observations from 60K nucleotide, we get
two models, and - .

13
HMMs Question I

Given an observation sequence O (O1 O2 O3
OT), and a model M A, B, p , how do we
efficiently compute P(OM), the probability that
the given model M produces the observation O in a
run of length T ?
This probability can be viewed as a measure of
the
quality of the model M. Viewed this way, it
enables discrimination/selection among
alternative models.

14
HMM Question II (Harder)

Given an observation sequence, O (O1 O2 O3
OT), and a model, M A, B, p , how do we
efficiently compute the most probable sequence(s)
of states, Q?
That is, the sequence of states Q (Q1 Q2 Q3
QT) , which maximizes P(OQ,M), the probability
that the given model M produces the given
observation O when it goes through the specific
sequence of states Q .
Recall that given a model M, a sequence of
observations O, and a sequence of states Q, we
can efficiently compute P(OQ,M) (should watch
out for numeric underflows)

15
HMM Question III (Hardest)

Given an observation sequence O (O1 O2 O3
OT), and a
class of models, each of the form M A,
B, p , which
specific model best explains the
observations?
A solution to question I enables the efficient
computation
of P(OM) (the probability that a specific
model M produces
the observation O).
Question III can be viewed as a learning problem
We
want to use the sequence of observations
in order to train an HMM and learn the optimal
underlying model
parameters (transition and output
probabilities).

16
HMM Recognition (question I)

For a given model M A, B, p and a given
state sequence
Q1 Q2 Q3 QT ,, the probability of an
observation sequence
O1 O2 O3 OT is P(OQ,M) bQ1O1
bQ2O2 bQ3O3 bQTOT
For a given hidden Markov model M A, B, p
the probability of the state sequence Q1 Q2 Q3
QT
is (the initial probability of Q1 is taken to be
pQ1)
P(QM) pQ1 aQ1Q2 aQ2Q3 aQ3Q4
aQT-1QT
So, for a given hidden Markov model, M
the probability of an observation sequence O1 O2
O3 OT
is obtained by summing over all possible state
sequences

17
HMM Recognition (cont.)

P(O M) S P(OQ) P(QM)
SQ pQ1 bQ1O1 aQ1Q2 bQ2O2 aQ2Q3 bQ2O2
Requires summing over exponentially many paths
But can be made more efficient

18
HMM Recognition (cont.)
T

Why isnt it efficient? O(2TQ )
For a given state sequence of length T we have
about 2T calculations
P(QM) pQ1 aQ1Q2 aQ2Q3 aQ3Q4 aQT-1QT
P(OQ) bQ1O1 bQ2O2 bQ3O3 bQTOT
There are Q possible state sequence
So, if Q5, and T100, then the algorithm
requires 2 100 5 1.6 10 computations
We can use the forward-backward (F-B) algorithm

T
100
72
x
x
x
19
The F-B Algorithm

Some definitions
1. Legal final state a state at which a path
through the model may end.
2. a - a forward-going
3. b a backward-going
4. a(ji) aij b(Oi) biO
5. O the observation O1O2Ot in times 1,2,,t
(O1 on t1, O2 on t2, etc.)

t
1
20
The F-B Algorithm (cont.)

a can be recursively calculated
Stopping condition
Moving from state i to state j
But we can enter state j from all others states

21
The F-B Algorithm (cont.)

Now we can work sequentially
And on time tT we get what we wanted -

22
The F-B Algorithm (cont.)

The full algorithm

Run Demo
23
The F-B Algorithm (cont.)

The likelihood is measured using any sequence of
states of length T
This is known as the Any Path Method
We can choose an HMM by the probability generated
using the best possible sequence of states
Well refer to this method as the Best Path
Method

24
Most Probable States Sequence (ques. II)

Idea
If we know the value of Qi , then the most
probable sequence on i1,,n does not depend on
observations before time i
Let Vl(i) be the probability of the best sequence
Q1,,Qi such that Qi l

25
Viterbi Algorithm

A DP problem
Grid
X frame index, t (time)
Q State index, i
Constraints
Every path must advance in time by one, and only
one, time step for each path segment
Final grid points on any path must be of the form
(T, if ), where if is a legal final state in a
model

26
Viterbi Algorithm (cont.)

Cost
Node (t,i) the probability to emit the
observation y(t) on state i biy
Transition from (t-1,i) to (t,j) the
probability to change state from i to j aij
The total cost associated with the path is given
by the product of the costs (type B)
Initial Transition cost a0i pi
Goal
The best path will be the one of maximum cost

27
Viterbi Algorithm (cont.)