Hidden Markov Model - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Hidden Markov Model

Description:

We 'solve' the first question by modeling strings with and without CpG islands ... we define a Markov Chain over 8 states, all interconnected (hence it is ergodic) ... – PowerPoint PPT presentation

Number of Views:100

Avg rating:3.0/5.0

Slides: 43

Provided by: shlomo9

Category:

more less

Transcript and Presenter's Notes

Title: Hidden Markov Model

1
Hidden Markov Model
2
Example CpG Island
We consider two questions (and some
variants) Question 1 Given a short stretch of
genomic data, does it come from a CpG island
? Question 2 Given a long piece of genomic
data, does it contain CpG islands in it, where,
what length ? We solve the first question by
modeling strings with and without CpG islands as
Markov Chains over the same states A,C,G,T but
different transition probabilities
3
CpG Island Question 2
Now we solve the 2nd question Question 2
Given a long piece of genomic data, does it
contain CpG islands in it, and where? For this,
we need to decide which parts of a given long
sequence of letters is more likely to come from
the model, and which parts are more likely to
come from the model. This is done by using
the Hidden Markov Model, to be defined.
4
Question 2 Finding CpG Islands
Given a long genomic str with possible CpG
Islands, we define a Markov Chain over 8 states,
all interconnected (hence it is ergodic)
The problem is that we dont know the sequence of
states which are traversed, but just the sequence
of letters.
C
T
G
A
C-
T-
G-
A-
Therefore we use here Hidden Markov Model
5
Hidden Markov Models - HMM
Hidden variables
Observed data
6
Hidden Markov Model
A Markov chain (s1,,sL)
and for each state s and a symbol x we have
p(XixSis)
Application in communication message sent is
(s1,,sm) but we receive (x1,,xm) . Compute
what is the most likely message sent ?
Application in speech recognition word said is
(s1,,sm) but we recorded (x1,,xm) . Compute
what is the most likely word said ?
7
Hidden Markov Model
Notations Markov Chain transition
probabilities p(Si1 tSi s)
ast Emission probabilities p(Xi b Si s)
es(b)
8
Example The Dishonest Casino

A casino has two dice
Fair die
P(1) P(2) P(3) P(5) P(6) 1/6
Loaded die
P(1) P(2) P(3) P(5) 1/10
P(6) 1/2
Casino player switches back--forth between fair
and loaded die once every 20 turns
Game
You bet 1
You roll (always with a fair die)
Casino player rolls (maybe with fair die, maybe
with loaded die)
Highest number wins 2

9
Question 1 Decoding

GIVEN
A sequence of rolls by the casino player
12455264621461461361366616646616366163661636165156
15115146123562344
QUESTION
What portion of the sequence was generated with
the fair die, and what portion with the loaded
die?
This is the DECODING question in HMMs

10
Question 2 Evaluation

GIVEN
A sequence of rolls by the casino player
12455264621461461361366616646616366163661636165156
15115146123562344
QUESTION
How likely is this sequence, given our model of
how the casino works?
This is the EVALUATION problem in HMMs

11
Question 3 Learning

GIVEN
A sequence of rolls by the casino player
12455264621461461361366616646616366163661636165156
15115146123562344
QUESTION
How loaded is the loaded die? How fair is the
fair die? How often does the casino player change
from fair to loaded, and back?
This is the LEARNING question in HMMs

12
The dishonest casino model
0.05
0.95
0.95
FAIR
LOADED
P(1F) 1/6 P(2F) 1/6 P(3F) 1/6 P(4F)
1/6 P(5F) 1/6 P(6F) 1/6
P(1L) 1/10 P(2L) 1/10 P(3L) 1/10 P(4L)
1/10 P(5L) 1/10 P(6L) 1/2
0.05
13
A parse of a sequence

Given a sequence x x1xN,
A parse of x is a sequence of states ? ?1, ,
?N

1
2
2
K
x1
x2
x3
xK
14
Likelihood of a parse

Given a sequence x x1xN
and a parse ? ?1, , ?N,
To find how likely is the parse
(given our HMM)
P(x, ?) P(x1, , xN, ?1, , ?N)
P(xN, ?N x1xN-1, ?1, , ?N-1) P(x1xN-1,
?1, , ?N-1)
P(xN, ?N ?N-1) P(x1xN-1, ?1, , ?N-1)
P(xN, ?N ?N-1) P(xN-1, ?N-1
?N-2)P(x2, ?2 ?1) P(x1, ?1)
P(xN ?N) P(?N ?N-1) P(x2 ?2) P(?2
?1) P(x1 ?1) P(?1)
a0?1 a?1?2a?N-1?N e?1(x1)e?N(xN)

1
2
2
K
x1
x2
x3
xK
15
Example the dishonest casino

Let the sequence of rolls be
x 1, 2, 1, 5, 6, 2, 1, 6, 2, 4
Then, what is the likelihood of
? Fair, Fair, Fair, Fair, Fair, Fair, Fair,
Fair, Fair, Fair?
(say initial probs a0Fair ½, a0Loaded ½)
½ ? P(1 Fair) P(Fair Fair) P(2 Fair) P(Fair
Fair) P(4 Fair)
½ ? (1/6)10 ? (0.95)9 .00000000521158647211
0.5 ? 10-9

16
Example the dishonest casino

So, the likelihood the die is fair in all this
run
is just 0.521 ? 10-9
OK, but what is the likelihood of
? Loaded, Loaded, Loaded, Loaded, Loaded,
Loaded, Loaded, Loaded, Loaded, Loaded?
½ ? P(1 Loaded) P(Loaded, Loaded) P(4
Loaded)
½ ? (1/10)8 ? (1/2)2 (0.95)9 .000000000787811762
15 0.79 ? 10-9
Therefore, it somewhat more likely that the die
is fair all the way, than that it is loaded all
the way

17
Example the dishonest casino

Let the sequence of rolls be
x 1, 6, 6, 5, 6, 2, 6, 6, 3, 6
Now, what is the likelihood ? F, F, , F?
½ ? (1/6)10 ? (0.95)9 0.5 ? 10-9, same as
before
What is the likelihood
? L, L, , L?
½ ? (1/10)4 ? (1/2)6 (0.95)9 .000000492382351347
35 0.5 ? 10-7
So, it is 100 times more likely the die is loaded

18
C-G Islands Example
G
A
change
C
T
G
A
C
T
19
Hidden Markov Model for CpG Islands
Domain(Si), - (2 values)
The states
In this representation P(xi si) 0 or 1
depending on whether xi is consistent with si .
E.g. xi G is consistent with si(,G) and with
si(-,G) but not with any other state of si.
20
1. Most Probable state path decoding
First Question Given an output sequence x
(x1,,xL), A most probable path s (s1,,sL)
is one which maximizes p(sx).
21
Decoding

GIVEN x x1x2xN
We want to find ? ?1, , ?N,
such that P x, ? is maximized
? argmax? P x, ?
We can use dynamic programming!
Let Vk(i) max?1,,i-1 Px1xi-1, ?1, , ?i-1,
xi, ?i k
Probability of most likely sequence of states
ending at state ?i k

22
Decoding main idea

Given that for all states k,
and for a fixed position i,
Vk(i) max?1,,i-1 Px1xi-1, ?1, , ?i-1,
xi, ?i k
What is Vj(i1)?
From definition,
Vj(i1) max?1,,iP x1xi, ?1, , ?i, xi1,
?i1 j
max?1,,iP(xi1, ?i1 j
x1xi,?1,, ?i) Px1xi, ?1,, ?i
max?1,,iP(xi1, ?i1 j ?i )
Px1xi-1, ?1, , ?i-1, xi, ?i
maxk P(xi1, ?i1 j ?i k)
max?1,,i-1Px1xi-1,?1,,?i-1, xi,?ik
ej(xi1) maxk akj Vk(i)

23
The Viterbi Algorithm

Input x x1xN
Initialization
V0(0) 1 (0 is the imaginary first position)
Vk(0) 0, for all k gt 0
Iteration
Vj(i) ej(xi) ? maxk akj Vk(i-1)
Ptrj(i) argmaxk akj Vk(i-1)
Termination
P(x, ?) maxk Vk(N)
Traceback
?N argmaxk Vk(N)
?i-1 Ptr?i (i)

24
The Viterbi Algorithm
x1 x2 x3 ..xN
State 1
2
Vj(i)
K

Similar to aligning a set of states to a
sequence
Time
O(K2N)
Space
O(KN)

25
Viterbi Algorithm a practical detail

Underflows are a significant problem
P x1,., xi, ?1, , ?i a0?1 a?1?2a?i
e?1(x1)e?i(xi)
These numbers become extremely small underflow
Solution Take the logs of all values
Vl(i) log ek(xi) maxk Vk(i-1) log akl

26
Example

Let x be a sequence with a portion of 1/6 6s,
followed by a portion of ½ 6s
x 12345612345612345 66263646561626364656
Then, it is not hard to show that optimal parse
is
FFF...F LLL...L
6 characters 123456 parsed as F, contribute
.956?(1/6)6 1.6?10-5
parsed as L, contribute
.956?(1/2)1?(1/10)5 0.4?10-5
162636 parsed as F, contribute
.956?(1/6)6 1.6?10-5
parsed as L, contribute
.956?(1/2)3?(1/10)3 9.0?10-5

27
2. Computing p(x) evaluation
Given an output sequence x (x1,,xL), Compute
the probability that this sequence was generated
The summation taken over all state-paths s
generating x.
28
Evaluation

We will develop algorithms that allow us to
compute
P(x) Probability of x given the model
P(xixj) Probability of a substring of x given
the model
P(?i k x) Probability that the ith state is
k, given x

29
The Forward Algorithm

We want to calculate
P(x) probability of x, given the HMM
Sum over all possible ways of generating x
P(x) ??? P(x, ?) ??? P(x ?) P(?)
To avoid summing over an exponential number of
paths ?, define
fk(i) P(x1xi, ?i k) (the forward
probability)

30
The Forward Algorithm derivation

Define the forward probability
fk(i) P(x1xi, ?i k)
??1?i-1 P(x1xi-1, ?1,, ?i-1, ?i k)
ek(xi)
?j ??1?i-2 P(x1xi-1, ?1,, ?i-2, ?i-1 j)
ajk ek(xi)
ek(xi) ?j fj(i-1) ajk

31
The Forward Algorithm

We can compute fk(i) for all k, i, using dynamic
programming!
Initialization
f0(0) 1
fk(0) 0, for all k gt 0
Iteration
fk(i) ek(xi) ?j fj(i-1) ajk
Termination
P(x) ?k fk(N) ak0
Where, ak0 is the probability that the
terminating state is k (usually a0k)

32
Relation between Forward and Viterbi

VITERBI
Initialization
V0(0) 1
Vk(0) 0, for all k gt 0
Iteration
Vj(i) ej(xi) maxk Vk(i-1) akj
Termination
P(x, ?) maxk Vk(N)

FORWARD
Initialization
f0(0) 1
fk(0) 0, for all k gt 0
Iteration
fj(i) ej(xi) ?k fk(i-1) akj
Termination
P(x) ?k fk(N) ak0

33
Motivation for the Backward Algorithm

We want to compute
P(?i k x),
the probability distribution on the ith position,
given x
We start by computing
P(?i k, x) P(x1xi, ?i k, xi1xN)
P(x1xi, ?i k) P(xi1xN x1xi, ?i
k)
P(x1xi, ?i k) P(xi1xN ?i k)
Then, P(?i k x) P(?i k, x) / P(x)

Forward, fk(i)
Backward, bk(i)
34
The Backward Algorithm derivation

Define the backward probability
bk(i) P(xi1xN ?i k)
??i1?N P(xi1,xi2, , xN, ?i1, ,
?N ?i k)
?j ??i1?N P(xi1,xi2, , xN, ?i1
j, ?i2, , ?N ?i k)
?j ej(xi1) akj ??i2?N P(xi2, , xN,
?i2, , ?N ?i1 j)
?j ej(xi1) akj bj(i1)

35
The Backward Algorithm

We can compute bk(i) for all k, i, using dynamic
programming
Initialization
bk(N) ak0, for all k
Iteration
bk(i) ?j ej(xi1) akj bj(i1)
Termination
P(x) ?j a0j ej(x1) bj(1)

36
Computational Complexity

What is the running time, and space required, for
Forward, and Backward?
Time O(K2N)
Space O(KN)
Useful implementation technique to avoid
underflows
Viterbi sum of logs
Forward/Backward rescaling at each position
by multiplying by a constant

37
Viterbi, Forward, Backward

VITERBI
Initialization
V0(0) 1
Vk(0) 0, for all k gt 0
Iteration
Vj(i) ej(xi) maxk Vk(i-1) akj
Termination
P(x, ?) maxk Vk(N)

FORWARD
Initialization
f0(0) 1
fk(0) 0, for all k gt 0
Iteration
fj(i) ej(xi) ?k fk(i-1) akj
Termination
P(x) ?k fk(N) ak0

BACKWARD Initialization bk(N) ak0, for all
k Iteration bj(i) ?k ej(xi1) akj
bk(i1) Termination P(x) ?k a0k ek(x1)
bk(1)
38
Posterior Decoding

We can now calculate
fk(i) bk(i)
P(?i k x)
P(x)
Then, we can ask
What is the most likely state at position i of
sequence x
Define ? by Posterior Decoding
?i argmaxk P(?i k x)

39
Posterior Decoding

For each state,
Posterior Decoding gives us a curve of likelihood
of state for each position
That is sometimes more informative than Viterbi
path ?
Posterior Decoding may give an invalid sequence
of states
Why?

40
Posterior Decoding
x1 x2 x3 xN
State 1
P(?ilx)
l
k

P(?i k x) ?? P(? x) 1(?i k)
? ??i k P(? x)

41
Posterior Decoding
x1 x2 x3 xN
State 1
P(?ilx)
l
P(?jlx)
k

Example How do we compute P(?i l, ?j?i l
x)?
fl(i) bl(j)
P(?i l, ?i?I l x)
P(x)

42
Applications of the model

Given a DNA region x,
The Viterbi algorithm predicts locations of CpG
islands
Given a nucleotide xi, (say xi A)
The Viterbi parse tells whether xi is in a CpG
island in the most likely general scenario
The Forward/Backward algorithms can calculate
P(xi is in CpG island) P(?i A x)
Posterior Decoding can assign locally optimal
predictions of CpG islands
?i argmaxk P(?i k x)

43
A model of CpG Islands (2) Transitions