Hidden Markov Models - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Hidden Markov Models

Description:

Casino player switches between fair and loaded die randomly, on avg. once every 20 turns ... How often does the casino player change from fair to loaded, and back? ... – PowerPoint PPT presentation

Number of Views:160

Avg rating:3.0/5.0

Slides: 46

Provided by: root

Category:

more less

Transcript and Presenter's Notes

Title: Hidden Markov Models

1
Hidden Markov Models
2
Outline for our next topic

Hidden Markov models the theory
Probabilistic interpretation of alignments using
HMMs
Later in the course
Applications of HMMs to biological sequence
modeling and discovery of features such as genes

3
Example The Dishonest Casino Player

A casino has two dice
Fair die
P(1) P(2) P(3) P(5) P(6) 1/6
Loaded die
P(1) P(2) P(3) P(5) 1/10
P(6) 1/2
Casino player switches between fair and loaded
die randomly, on avg. once every 20 turns
Game
You bet 1
You roll (always with a fair die)
Casino player rolls (maybe with fair die, maybe
with loaded die)
Highest number wins 2

4
Question 1 Evaluation

GIVEN
A sequence of rolls by the casino player
12455264621461461361366616646616366163661636165156
15115146123562344
QUESTION
How likely is this sequence, given our model of
how the casino works?
This is the EVALUATION problem in HMMs

Prob 1.3 x 10-35
5
Question 2 Decoding

GIVEN
A sequence of rolls by the casino player
12455264621461461361366616646616366163661636165156
15115146123562344
QUESTION
What portion of the sequence was generated with
the fair die, and what portion with the loaded
die?
This is the DECODING question in HMMs

FAIR
LOADED
FAIR
6
Question 3 Learning

GIVEN
A sequence of rolls by the casino player
12455264621461461361366616646616366163661636165156
15115146123562344
QUESTION
How loaded is the loaded die? How fair is the
fair die? How often does the casino player change
from fair to loaded, and back?
This is the LEARNING question in HMMs

Prob(6) 64
7
The dishonest casino model
0.05
0.95
0.95
FAIR
LOADED
P(1F) 1/6 P(2F) 1/6 P(3F) 1/6 P(4F)
1/6 P(5F) 1/6 P(6F) 1/6
P(1L) 1/10 P(2L) 1/10 P(3L) 1/10 P(4L)
1/10 P(5L) 1/10 P(6L) 1/2
0.05
8
A HMM is memory-less

At each time step t,
the only thing that affects future states
is the current state ?t

1
2
K

9
Definition of a hidden Markov model

Definition A hidden Markov model (HMM)
Alphabet ? b1, b2, , bM
Set of states Q 1, ..., K
Transition probabilities between any two states
aij transition prob from state i to state j
ai1 aiK 1, for all states i 1K
Start probabilities a0i
a01 a0K 1
Emission probabilities within each state
ei(b) P( xi b ?i k)
ei(b1) ei(bM) 1, for all states i
1K

1
2
End Probabilities ai0 in Durbin not needed
K

10
A HMM is memory-less

At each time step t,
the only thing that affects future states
is the current state ?t
P(?t1 k whatever happened so far)
P(?t1 k ?1, ?2, , ?t, x1, x2, , xt)
P(?t1 k ?t)

1
2
K

11
A HMM is memory-less

At each time step t,
the only thing that affects xt
is the current state ?t
P(xt b whatever happened so far)
P(xt b ?1, ?2, , ?t, x1, x2, , xt-1)
P(xt b ?t)

1
2
K

12
A parse of a sequence

Given a sequence x x1xN,
A parse of x is a sequence of states ? ?1, ,
?N

1
2
2
K
x1
x2
x3
xK
13
Generating a sequence by the model

Given a HMM, we can generate a sequence of length
n as follows
Start at state ?1 according to prob a0?1
Emit letter x1 according to prob e?1(x1)
Go to state ?2 according to prob a?1?2
until emitting xn

1
a02
2
2
0
K
e2(x1)
x1
x2
x3
xn
14
Likelihood of a parse

Given a sequence x x1xN
and a parse ? ?1, , ?N,
To find how likely this scenario is
(given our HMM)
P(x, ?) P(x1, , xN, ?1, , ?N)
P(xN ?N) P(?N ?N-1) P(x2 ?2) P(?2
?1) P(x1 ?1) P(?1)
a0?1 a?1?2a?N-1?N e?1(x1)e?N(xN)

A compact way to write a0?1 a?1?2a?N-1?N
e?1(x1)e?N(xN) Enumerate all parameters aij
and ei(b) n params Example a0Fair ?1
a0Loaded ?2 eLoaded(6) ?18 Then, count
in x and ? the of times each parameter j 1,
, n occurs F(j, x, ?) parameter ?j occurs
in (x, ?) (call F(.,.,.) the feature counts)
Then, P(x, ?) ?j1n ?jF(j, x, ?)
exp?j1n log(?j)?F(j, x, ?)
1
2
2
K
x1
x2
x3
xK
15
Example the dishonest casino

Let the sequence of rolls be
x 1, 2, 1, 5, 6, 2, 1, 5, 2, 4
Then, what is the likelihood of
? Fair, Fair, Fair, Fair, Fair, Fair, Fair,
Fair, Fair, Fair?
(say initial probs a0Fair ½, aoLoaded ½)
½ ? P(1 Fair) P(Fair Fair) P(2 Fair) P(Fair
Fair) P(4 Fair)
½ ? (1/6)10 ? (0.95)9 .00000000521158647211
0.5 ? 10-9

16
Example the dishonest casino

So, the likelihood the die is fair in this run
is just 0.521 ? 10-9
OK, but what is the likelihood of
? Loaded, Loaded, Loaded, Loaded, Loaded,
Loaded, Loaded, Loaded, Loaded, Loaded?
½ ? P(1 Loaded) P(Loaded, Loaded) P(4
Loaded)
½ ? (1/10)9 ? (1/2)1 (0.95)9 .000000000157562352
43 0.16 ? 10-9
Therefore, it somewhat more likely that all the
rolls are done with the fair die, than that they
are all done with the loaded die

17
Example the dishonest casino

Let the sequence of rolls be
x 1, 6, 6, 5, 6, 2, 6, 6, 3, 6
Now, what is the likelihood ? F, F, , F?
½ ? (1/6)10 ? (0.95)9 0.5 ? 10-9, same as
before
What is the likelihood
? L, L, , L?
½ ? (1/10)4 ? (1/2)6 (0.95)9 .000000492382351347
35 0.5 ? 10-7
So, it is 100 times more likely the die is loaded

18
Question 1 Evaluation

GIVEN
A sequence of rolls by the casino player
12455264621461461361366616646616366163661636165156
15115146123562344
QUESTION
How likely is this sequence, given our model of
how the casino works?
This is the EVALUATION problem in HMMs

Prob 1.3 x 10-35
19
Question 2 Decoding

GIVEN
A sequence of rolls by the casino player
12455264621461461361366616646616366163661636165156
15115146123562344
QUESTION
What portion of the sequence was generated with
the fair die, and what portion with the loaded
die?
This is the DECODING question in HMMs

FAIR
LOADED
FAIR
20
Question 3 Learning

GIVEN
A sequence of rolls by the casino player
12455264621461461361366616646616366163661636165156
15115146123562344
QUESTION
How loaded is the loaded die? How fair is the
fair die? How often does the casino player change
from fair to loaded, and back?
This is the LEARNING question in HMMs

Prob(6) 64
21
The three main questions on HMMs

Evaluation
GIVEN a HMM M, and a sequence x,
FIND Prob x M
Decoding
GIVEN a HMM M, and a sequence x,
FIND the sequence ? of states that maximizes P
x, ? M
Learning
GIVEN a HMM M, with unspecified
transition/emission probs.,
and a sequence x,
FIND parameters ? (ei(.), aij) that maximize P
x ?

22
Lets not be confused by notation

P x M The probability that sequence x was
generated by the model
The model is architecture (states, etc)
parameters ? aij, ei(.)
So, Px M is the same with P x ? , and P
x , when the architecture, and the parameters,
respectively, are implied
Similarly, P x, ? M , P x, ? ? and P x,
? are the same when the architecture, and the
parameters, are implied
In the LEARNING problem we always write P x ?
to emphasize that we are seeking the ? that
maximizes P x ?

23
Problem 1 Decoding

Find the most likely parse of a sequence

24
Decoding
1
1
1
1
1

GIVEN x x1x2xN
Find ? ?1, , ?N,
to maximize P x, ?
? argmax? P x, ?
Maximizes a0?1 e?1(x1) a?1?2a?N-1?N e?N(xN)
Dynamic Programming!
Vk(i) max?1 ?i-1 Px1xi-1, ?1, , ?i-1, xi,
?i k
Prob. of most likely sequence of states
ending at state ?i k

2
2
2
2
2
2

K
K
K
K
K

x1
x2
x3
xK
Given that we end up in state k at step i,
maximize product to the left and right
25
Decoding main idea

Inductive assumption Given that for all states
k,
and for a fixed position i,
Vk(i) max?1 ?i-1 Px1xi-1, ?1, , ?i-1,
xi, ?i k
What is Vl(i1)?
From definition,
Vl(i1) max?1 ?iP x1xi, ?1, , ?i, xi1,
?i1 l
max?1 ?iP(xi1, ?i1 l x1xi,
?1,, ?i) Px1xi, ?1,, ?i
max?1 ?iP(xi1, ?i1 l ?i )
Px1xi-1, ?1, , ?i-1, xi, ?i
maxk P(xi1, ?i1 l ?ik) max?1
?i-1Px1xi-1,?1,,?i-1, xi,?ik
maxk P(xi1 ?i1 l ) P(?i1 l
?ik) Vk(i)
el(xi1) maxk akl Vk(i)

26
The Viterbi Algorithm

Input x x1xN
Initialization
V0(0) 1 (0 is the imaginary first position)
Vk(0) 0, for all k 0
Iteration
Vj(i) ej(xi) ? maxk akj Vk(i 1)
Ptrj(i) argmaxk akj Vk(i 1)
Termination
P(x, ?) maxk Vk(N)
Traceback
?N argmaxk Vk(N)
?i-1 Ptr?i (i)

27
The Viterbi Algorithm
x1 x2 x3 ..xN
State 1
2
Vj(i)
K

Similar to aligning a set of states to a
sequence
Time
O(K2N)
Space
O(KN)

28
Viterbi Algorithm a practical detail

Underflows are a significant problem
P x1,., xi, ?1, , ?i a0?1 a?1?2a?i
e?1(x1)e?i(xi)
These numbers become extremely small underflow
Solution Take the logs of all values
Vl(i) log ek(xi) maxk Vk(i-1) log akl

29
Example

Let x be a long sequence with a portion of 1/6
6s,
followed by a portion of ½ 6s
x 12345612345612345 66263646561626364656
Then, it is not hard to show that optimal parse
is (exercise)
FFF...F LLL...L
6 characters 123456 parsed as F, contribute
.956?(1/6)6 1.6?10-5
parsed as L, contribute
.956?(1/2)1?(1/10)5 0.4?10-5
162636 parsed as F, contribute
.956?(1/6)6 1.6?10-5
parsed as L, contribute
.956?(1/2)3?(1/10)3 9.0?10-5

30
Problem 2 Evaluation

Find the likelihood a sequence is generated by
the model

31
Generating a sequence by the model

Given a HMM, we can generate a sequence of length
n as follows
Start at state ?1 according to prob a0?1
Emit letter x1 according to prob e?1(x1)
Go to state ?2 according to prob a?1?2
until emitting xn

1
a02
2
2
0
K
e2(x1)
x1
x2
x3
xn
32
A couple of questions
P(box FFFFFFFFFFF) (1/6)11 0.9512 2.76-9
0.54 1.49-9 P(box LLLLLLLLLLL) (1/2)6
(1/10)5 0.9510 0.052 1.5610-7 1.5-3
0.23-9

Given a sequence x,
What is the probability that x was generated by
the model?
Given a position i, what is the most likely state
that emitted xi?
Example the dishonest casino
Say x 123412316261636461623411221341
Most likely path ? FFF
(too unlikely to transition F ? L ? F)
However marked letters more likely to be L than
unmarked letters

F
F
33
Evaluation

We will develop algorithms that allow us to
compute
P(x) Probability of x given the model
P(xixj) Probability of a substring of x given
the model
P(?i k x) Posterior probability that the
ith state is k, given x
A more refined measure of which states x may be
in

34
The Forward Algorithm

We want to calculate
P(x) probability of x, given the HMM
Sum over all possible ways of generating x
P(x) ??? P(x, ?) ??? P(x ?) P(?)
To avoid summing over an exponential number of
paths ?, define
fk(i) P(x1xi, ?i k) (the forward
probability)
generate i first characters of x and end up in
state k

35
The Forward Algorithm derivation

Define the forward probability
fk(i) P(x1xi, ?i k)
??1?i-1 P(x1xi-1, ?1,, ?i-1, ?i k)
ek(xi)
?l ??1?i-2 P(x1xi-1, ?1,, ?i-2, ?i-1 l)
alk ek(xi)
?l P(x1xi-1, ?i-1 l) alk ek(xi)
ek(xi) ?l fl(i 1 ) alk

36
The Forward Algorithm

We can compute fk(i) for all k, i, using dynamic
programming!
Initialization
f0(0) 1
fk(0) 0, for all k 0
Iteration
fk(i) ek(xi) ?l fl(i 1) alk
Termination
P(x) ?k fk(N)

37
Relation between Forward and Viterbi

VITERBI
Initialization
V0(0) 1
Vk(0) 0, for all k 0
Iteration
Vj(i) ej(xi) maxk Vk(i 1) akj
Termination
P(x, ?) maxk Vk(N)

FORWARD Initialization f0(0) 1 fk(0)
0, for all k 0 Iteration fl(i) el(xi) ?k
fk(i 1) akl Termination P(x) ?k fk(N)
38
Motivation for the Backward Algorithm