Title: Hidden Markov Model
1Hidden Markov Model
2Example CpG Island
We consider two questions (and some
variants) Question 1 Given a short stretch of
genomic data, does it come from a CpG island
? Question 2 Given a long piece of genomic
data, does it contain CpG islands in it, where,
what length ? We solve the first question by
modeling strings with and without CpG islands as
Markov Chains over the same states A,C,G,T but
different transition probabilities
3CpG Island Question 2
Now we solve the 2nd question Question 2
Given a long piece of genomic data, does it
contain CpG islands in it, and where? For this,
we need to decide which parts of a given long
sequence of letters is more likely to come from
the model, and which parts are more likely to
come from the model. This is done by using
the Hidden Markov Model, to be defined.
4Question 2 Finding CpG Islands
Given a long genomic str with possible CpG
Islands, we define a Markov Chain over 8 states,
all interconnected (hence it is ergodic)
The problem is that we dont know the sequence of
states which are traversed, but just the sequence
of letters.
C
T
G
A
C-
T-
G-
A-
Therefore we use here Hidden Markov Model
5Hidden Markov Models - HMM
Hidden variables
Observed data
6Hidden Markov Model
A Markov chain (s1,,sL)
and for each state s and a symbol x we have
p(XixSis)
Application in communication message sent is
(s1,,sm) but we receive (x1,,xm) . Compute
what is the most likely message sent ?
Application in speech recognition word said is
(s1,,sm) but we recorded (x1,,xm) . Compute
what is the most likely word said ?
7Hidden Markov Model
Notations Markov Chain transition
probabilities p(Si1 tSi s)
ast Emission probabilities p(Xi b Si s)
es(b)
8Example The Dishonest Casino
- A casino has two dice
- Fair die
- P(1) P(2) P(3) P(5) P(6) 1/6
- Loaded die
- P(1) P(2) P(3) P(5) 1/10
- P(6) 1/2
- Casino player switches back--forth between fair
and loaded die once every 20 turns - Game
- You bet 1
- You roll (always with a fair die)
- Casino player rolls (maybe with fair die, maybe
with loaded die) - Highest number wins 2
9Question 1 Decoding
- GIVEN
- A sequence of rolls by the casino player
- 12455264621461461361366616646616366163661636165156
15115146123562344 - QUESTION
- What portion of the sequence was generated with
the fair die, and what portion with the loaded
die? - This is the DECODING question in HMMs
10Question 2 Evaluation
- GIVEN
- A sequence of rolls by the casino player
- 12455264621461461361366616646616366163661636165156
15115146123562344 - QUESTION
- How likely is this sequence, given our model of
how the casino works? - This is the EVALUATION problem in HMMs
11Question 3 Learning
- GIVEN
- A sequence of rolls by the casino player
- 12455264621461461361366616646616366163661636165156
15115146123562344 - QUESTION
- How loaded is the loaded die? How fair is the
fair die? How often does the casino player change
from fair to loaded, and back? - This is the LEARNING question in HMMs
12The dishonest casino model
0.05
0.95
0.95
FAIR
LOADED
P(1F) 1/6 P(2F) 1/6 P(3F) 1/6 P(4F)
1/6 P(5F) 1/6 P(6F) 1/6
P(1L) 1/10 P(2L) 1/10 P(3L) 1/10 P(4L)
1/10 P(5L) 1/10 P(6L) 1/2
0.05
13A parse of a sequence
- Given a sequence x x1xN,
- A parse of x is a sequence of states ? ?1, ,
?N
1
2
2
K
x1
x2
x3
xK
14Likelihood of a parse
- Given a sequence x x1xN
- and a parse ? ?1, , ?N,
- To find how likely is the parse
- (given our HMM)
- P(x, ?) P(x1, , xN, ?1, , ?N)
- P(xN, ?N x1xN-1, ?1, , ?N-1) P(x1xN-1,
?1, , ?N-1) - P(xN, ?N ?N-1) P(x1xN-1, ?1, , ?N-1)
-
- P(xN, ?N ?N-1) P(xN-1, ?N-1
?N-2)P(x2, ?2 ?1) P(x1, ?1) - P(xN ?N) P(?N ?N-1) P(x2 ?2) P(?2
?1) P(x1 ?1) P(?1) - a0?1 a?1?2a?N-1?N e?1(x1)e?N(xN)
1
2
2
K
x1
x2
x3
xK
15Example the dishonest casino
- Let the sequence of rolls be
- x 1, 2, 1, 5, 6, 2, 1, 6, 2, 4
- Then, what is the likelihood of
- ? Fair, Fair, Fair, Fair, Fair, Fair, Fair,
Fair, Fair, Fair? - (say initial probs a0Fair ½, a0Loaded ½)
- ½ ? P(1 Fair) P(Fair Fair) P(2 Fair) P(Fair
Fair) P(4 Fair) - ½ ? (1/6)10 ? (0.95)9 .00000000521158647211
0.5 ? 10-9
16Example the dishonest casino
- So, the likelihood the die is fair in all this
run - is just 0.521 ? 10-9
- OK, but what is the likelihood of
- ? Loaded, Loaded, Loaded, Loaded, Loaded,
Loaded, Loaded, Loaded, Loaded, Loaded? - ½ ? P(1 Loaded) P(Loaded, Loaded) P(4
Loaded) - ½ ? (1/10)8 ? (1/2)2 (0.95)9 .000000000787811762
15 0.79 ? 10-9 - Therefore, it somewhat more likely that the die
is fair all the way, than that it is loaded all
the way
17Example the dishonest casino
- Let the sequence of rolls be
- x 1, 6, 6, 5, 6, 2, 6, 6, 3, 6
- Now, what is the likelihood ? F, F, , F?
- ½ ? (1/6)10 ? (0.95)9 0.5 ? 10-9, same as
before - What is the likelihood
- ? L, L, , L?
- ½ ? (1/10)4 ? (1/2)6 (0.95)9 .000000492382351347
35 0.5 ? 10-7 - So, it is 100 times more likely the die is loaded
18C-G Islands Example
G
A
change
C
T
G
A
C
T
19Hidden Markov Model for CpG Islands
Domain(Si), - (2 values)
The states
In this representation P(xi si) 0 or 1
depending on whether xi is consistent with si .
E.g. xi G is consistent with si(,G) and with
si(-,G) but not with any other state of si.
201. Most Probable state path decoding
First Question Given an output sequence x
(x1,,xL), A most probable path s (s1,,sL)
is one which maximizes p(sx).
21Decoding
- GIVEN x x1x2xN
- We want to find ? ?1, , ?N,
- such that P x, ? is maximized
- ? argmax? P x, ?
- We can use dynamic programming!
- Let Vk(i) max?1,,i-1 Px1xi-1, ?1, , ?i-1,
xi, ?i k -
- Probability of most likely sequence of states
ending at state ?i k
22Decoding main idea
- Given that for all states k,
- and for a fixed position i,
- Vk(i) max?1,,i-1 Px1xi-1, ?1, , ?i-1,
xi, ?i k - What is Vj(i1)?
- From definition,
- Vj(i1) max?1,,iP x1xi, ?1, , ?i, xi1,
?i1 j - max?1,,iP(xi1, ?i1 j
x1xi,?1,, ?i) Px1xi, ?1,, ?i - max?1,,iP(xi1, ?i1 j ?i )
Px1xi-1, ?1, , ?i-1, xi, ?i - maxk P(xi1, ?i1 j ?i k)
max?1,,i-1Px1xi-1,?1,,?i-1, xi,?ik - ej(xi1) maxk akj Vk(i)
23The Viterbi Algorithm
- Input x x1xN
- Initialization
- V0(0) 1 (0 is the imaginary first position)
- Vk(0) 0, for all k gt 0
- Iteration
- Vj(i) ej(xi) ? maxk akj Vk(i-1)
- Ptrj(i) argmaxk akj Vk(i-1)
- Termination
- P(x, ?) maxk Vk(N)
- Traceback
- ?N argmaxk Vk(N)
- ?i-1 Ptr?i (i)
24The Viterbi Algorithm
x1 x2 x3 ..xN
State 1
2
Vj(i)
K
- Similar to aligning a set of states to a
sequence - Time
- O(K2N)
- Space
- O(KN)
25Viterbi Algorithm a practical detail
- Underflows are a significant problem
- P x1,., xi, ?1, , ?i a0?1 a?1?2a?i
e?1(x1)e?i(xi) - These numbers become extremely small underflow
- Solution Take the logs of all values
- Vl(i) log ek(xi) maxk Vk(i-1) log akl
26Example
- Let x be a sequence with a portion of 1/6 6s,
followed by a portion of ½ 6s - x 12345612345612345 66263646561626364656
- Then, it is not hard to show that optimal parse
is - FFF...F LLL...L
- 6 characters 123456 parsed as F, contribute
.956?(1/6)6 1.6?10-5 - parsed as L, contribute
.956?(1/2)1?(1/10)5 0.4?10-5 - 162636 parsed as F, contribute
.956?(1/6)6 1.6?10-5 - parsed as L, contribute
.956?(1/2)3?(1/10)3 9.0?10-5
272. Computing p(x) evaluation
Given an output sequence x (x1,,xL), Compute
the probability that this sequence was generated
The summation taken over all state-paths s
generating x.
28Evaluation
- We will develop algorithms that allow us to
compute - P(x) Probability of x given the model
-
- P(xixj) Probability of a substring of x given
the model - P(?i k x) Probability that the ith state is
k, given x -
-
29The Forward Algorithm
- We want to calculate
- P(x) probability of x, given the HMM
- Sum over all possible ways of generating x
- P(x) ??? P(x, ?) ??? P(x ?) P(?)
- To avoid summing over an exponential number of
paths ?, define - fk(i) P(x1xi, ?i k) (the forward
probability)
30The Forward Algorithm derivation
- Define the forward probability
- fk(i) P(x1xi, ?i k)
- ??1?i-1 P(x1xi-1, ?1,, ?i-1, ?i k)
ek(xi) - ?j ??1?i-2 P(x1xi-1, ?1,, ?i-2, ?i-1 j)
ajk ek(xi) - ek(xi) ?j fj(i-1) ajk
31The Forward Algorithm
- We can compute fk(i) for all k, i, using dynamic
programming! - Initialization
- f0(0) 1
- fk(0) 0, for all k gt 0
- Iteration
- fk(i) ek(xi) ?j fj(i-1) ajk
- Termination
- P(x) ?k fk(N) ak0
- Where, ak0 is the probability that the
terminating state is k (usually a0k)
32Relation between Forward and Viterbi
- VITERBI
- Initialization
- V0(0) 1
- Vk(0) 0, for all k gt 0
- Iteration
- Vj(i) ej(xi) maxk Vk(i-1) akj
- Termination
- P(x, ?) maxk Vk(N)
- FORWARD
- Initialization
- f0(0) 1
- fk(0) 0, for all k gt 0
- Iteration
- fj(i) ej(xi) ?k fk(i-1) akj
- Termination
-
- P(x) ?k fk(N) ak0
33Motivation for the Backward Algorithm
- We want to compute
- P(?i k x),
- the probability distribution on the ith position,
given x - We start by computing
- P(?i k, x) P(x1xi, ?i k, xi1xN)
- P(x1xi, ?i k) P(xi1xN x1xi, ?i
k) - P(x1xi, ?i k) P(xi1xN ?i k)
- Then, P(?i k x) P(?i k, x) / P(x)
Forward, fk(i)
Backward, bk(i)
34The Backward Algorithm derivation
- Define the backward probability
- bk(i) P(xi1xN ?i k)
- ??i1?N P(xi1,xi2, , xN, ?i1, ,
?N ?i k) - ?j ??i1?N P(xi1,xi2, , xN, ?i1
j, ?i2, , ?N ?i k) - ?j ej(xi1) akj ??i2?N P(xi2, , xN,
?i2, , ?N ?i1 j) - ?j ej(xi1) akj bj(i1)
35The Backward Algorithm
- We can compute bk(i) for all k, i, using dynamic
programming - Initialization
- bk(N) ak0, for all k
- Iteration
- bk(i) ?j ej(xi1) akj bj(i1)
- Termination
-
- P(x) ?j a0j ej(x1) bj(1)
36Computational Complexity
- What is the running time, and space required, for
Forward, and Backward? -
- Time O(K2N)
- Space O(KN)
- Useful implementation technique to avoid
underflows - Viterbi sum of logs
- Forward/Backward rescaling at each position
by multiplying by a constant
37Viterbi, Forward, Backward
- VITERBI
- Initialization
- V0(0) 1
- Vk(0) 0, for all k gt 0
- Iteration
- Vj(i) ej(xi) maxk Vk(i-1) akj
- Termination
- P(x, ?) maxk Vk(N)
- FORWARD
- Initialization
- f0(0) 1
- fk(0) 0, for all k gt 0
- Iteration
- fj(i) ej(xi) ?k fk(i-1) akj
- Termination
-
- P(x) ?k fk(N) ak0
BACKWARD Initialization bk(N) ak0, for all
k Iteration bj(i) ?k ej(xi1) akj
bk(i1) Termination P(x) ?k a0k ek(x1)
bk(1)
38Posterior Decoding
- We can now calculate
-
- fk(i) bk(i)
- P(?i k x)
- P(x)
- Then, we can ask
- What is the most likely state at position i of
sequence x - Define ? by Posterior Decoding
- ?i argmaxk P(?i k x)
39Posterior Decoding
- For each state,
- Posterior Decoding gives us a curve of likelihood
of state for each position - That is sometimes more informative than Viterbi
path ? - Posterior Decoding may give an invalid sequence
of states - Why?
40Posterior Decoding
x1 x2 x3 xN
State 1
P(?ilx)
l
k
- P(?i k x) ?? P(? x) 1(?i k)
- ? ??i k P(? x)
41Posterior Decoding
x1 x2 x3 xN
State 1
P(?ilx)
l
P(?jlx)
k
- Example How do we compute P(?i l, ?j?i l
x)? - fl(i) bl(j)
- P(?i l, ?i?I l x)
- P(x)
42Applications of the model
- Given a DNA region x,
- The Viterbi algorithm predicts locations of CpG
islands - Given a nucleotide xi, (say xi A)
- The Viterbi parse tells whether xi is in a CpG
island in the most likely general scenario - The Forward/Backward algorithms can calculate
- P(xi is in CpG island) P(?i A x)
- Posterior Decoding can assign locally optimal
predictions of CpG islands - ?i argmaxk P(?i k x)
43A model of CpG Islands (2) Transitions
- What about transitions between () and (-)
states? - They affect
- Avg. length of CpG island
- Avg. separation between two CpG islands
1-p
Length distribution of region X PlX 1
1-p PlX 2 p(1-p) PlX k
pk(1-p) ElX 1/(1-p) Geometric distribution,
with mean 1/(1-p)
X
Y
p
q
1-q