Title: Presentazione di PowerPoint
1Observatory of Complex Systems
http//lagash.dft.unipa.it
L2 - Hidden Markov Models
Salvatore Miccichè
Università degli Studi di PalermoDipartimento di
Fisica e Tecnologie Relative
Scuola di Dottorato in Fisica Applicata - XX
cicloCorso di Bioinformatica Università degli
Studi di Palermo - 17 Maggio 2006
21) Hidden Markov Models - motivations
- formal definition - algorithms -
parameter estimation - model topology
2) Applications - pairwise alignment using
HMMs - microarray and HMMs
3THEME 1Hidden Markov Models
1) Hidden Markov Models - motivations
- formal definition - algorithms -
parameter estimation - model topology
4b A. KroghAn introduction to HMM for
biological sequances in Computational Methods
in Molecular Biology Elsevier, (1999)ISBN
0444 502041 chap 4
4 R. Durbin, S. Eddy, A. Krogh, G.
MitchisonBiological Sequence AnalysisCambridge
University Press, Cambridge, (2001)ISBN
0521620414 chap. 3
41.1 Motivations
1.1.1 Motivations
Example 1 - CpG islandsWhenever the dinucleotide
CG occurs (CpG), the C is chemically modified by
methylation (methyl-C). There is a relatively
high chance that methyl-C mutates to T, with the
consequence that CpG are rarer than expected
(under a Bernoullian hypothesis).However, in
same region of the genome (around promoters and
around start regions of genes) the methylation
process is suppressed. Therefore there is
abundance of CpGs (CpG-islands). They are a few
hundreds up to few thousands bases long.There a
three typical questions1) What is the
probabilistic model that we can use for CpG
island regions?2) Given a genomic region, how do
we decide whether it comes from a CpG island?3)
Given a genome, how do we find the CpG-island
regions in it?
4 R. Durbin, S. Eddy, A. Krogh, G.
MitchisonBiological Sequence AnalysisCambridge
University Press, Cambridge, (2001)ISBN
0521620414 chap. 3
51.1 Motivations
1.1.1 Motivations
Example 2 - Motifs
A short conserved region in a protein sequence
4b A. KroghAn introduction to HMM for
biological sequances in Computational Methods
in Molecular Biology Elsevier, (1999)ISBN
0444 502041 chap 4
61.1 Motivations
1.1.2 Markov Chains - Answer to question 1)
Let us consider the problem of CpG islands. In
this particular case, the simplest model to
consider is a Markov Model of order 1. We will
hereafter refer to this models as Markov
Chains. ?(x) ?(xL, x L-1, , x2, x1)
?(xL x L-1, , x2, x1) ?(xL-1 x L-2, ,
x2, x1) ?(x1 ) ?(xL x L-1) ?(xL-1
x L-2) P(x1)where ?(.) are the transition
probabilities. ?(..) are the
conditional transition probabilities.
P(.) are the marginal distributions
71.1 Motivations
1.1.2 Markov Chains - Answer to question 1)
acgactgcatcgactgactagagctagcta
Length30Number of dinucleotides29number of
GC 3number of CG 2
?(CG)2/290.07
?(GC)3/290.10
81.1 Motivations
1.1.3 Markov Chains discrimination - Answer to
question 2)
If we have putative CpG-island regions , how do
we test that these are really CpG-island ?
For CpG-island regions and no-CpG-island regions
we can have two distinct Markov Chains. How do we
characterize the one for CpG island regions?
91.1 Motivations
1.1.3 Markov Chains discrimination - Answer to
question 2)
Now it is more reasonable to compute the log-odds
ratios
CpG-island- no-CpG-island
The highest the score the highest the
probability that the region is a CpG island.
101.1 Motivations
1.1.4 Hidden Markov Models - Answer to question
3)
3) Given a genome, how do we find the CpG-island
regions in it?
Suppose that we observe a certain nucleotide x
(A,C,G,T). We cannot decide whether it belongs
to a CpG-island x or to a no-CpG-island region
x_ .
Therefore something is hidden
Thus, not only we need transition probabilities
between x nucleotides or between x_ nucleotides,
but also transitions between states, . !!!!
111.2 Hidden Markov Models
In general we have to deal with Sequence of
Symbols - the nucleotides x (A,C,G,T) Sequence
of States - the analogous of having symbols in
the CpG and no-CpG island regions x and x_
Paths - the different realizations, i.e. the
state sequence
(CCCG) (C-C-C-G-) (CC-CG-)
(CCC-G) 3rd state
path (CCCG)
121.2 Hidden Markov Models
We will indicate the path with S. The ith state
in path S is called Si, i1, L. Therefore
we will need Transition probabilities between
statesprobability that a state j is seen at
position i, when state k was observed in position
i-1.
akjP(SijSi-1k)Emission probabilities
probability that a nucleotide b is seen when in
state k at position i
ek(b)P(xibSik)
S (CCC-G) S3 C- L4
131.2 Hidden Markov Models
The probability distribution of a certain path is
141.2 Hidden Markov Models
Example 1 - CpG island
To illustrate the formula, let us suppose that
the following path is true
S(CG-C-G)
Here each state contains only one symbol (in
general, the situation can be different).
P(CGCG,--)P(C) (eC(C) aCG- ) (eG-(G) aG-C-
) (eC-(C)
aC-G ) (eG(G) aGX )
eG(G)1
eG-(G)1
eC-(C)1
eC(C)1
151.2 Hidden Markov Models
Example 2 - motif
To illustrate the fact that emission
probabilities can be different from 1, let us
consider Example 2
3rd state
Insertion state
ek(b)
akj
ATCGACACGTATGGC
1
2
3
7
8
9
Here there are five possible states for each
position.
CONSENSUS PATH ACAC--ATCEXCEPTIONAL
PATH TGCT--AGG
161.2 Hidden Markov Models
CONSENSUS PATH ACAC--ATCP(S)
P(A)P(12)P(C)P(23)P(A)P(3i) P(C)
P(i7)P(A)P(78)P(T) P(89)P(C) 0.8
1.0 0.8 1.0 0.8 0.6 0.4
0.6 1.0 1.0 0.8 1.0 0.8
4.7 10 -2
ek(b)
EXCEPTIONAL PATH TGCT--AGGP(S)
P(T)P(12)P(G)P(23)P(C)P(3i) P(T)
P(i7)P(A)P(78)P(G) P(89)P(G) 0.2
1.0 0.2 1.0 0.2 0.6 0.2
0.6 1.0 1.0 0.2 1.0 0.2
2.3 10 -5
The probability associated with the consensus
path is about 2000 times higher than the one
associated to the exceptional path
171.2 Hidden Markov Models
However, in general, we do not know the path.In
fact, we are usually interested in searching for
the Most Probable Path
Going back to Example 1, (CCCG)
(C-C-C-G-) (CC-CG-) (CCC-G)give rise to
the same realization(CCCG) However the single
paths have different probabilities. Which path
S has the highest probability?
181.3 Algorithms
1.3.1 Viterbi Algorithm
By definition S argmaxS
P(x,S)
S can be found recursively 1) indicate with
Vk(i) the probability of the most probable state
S which shows state k in position i.2) then
the probability that S will show state j at
position i1 is given by ej(x i1) maxk (Vk(i)
akl)
191.3 Algorithms
1.3.1 Viterbi Algorithm
Initialization V0(0) 1, Vk(0) 0 for k gt
0. Recursion Vk(i) ek(xi) maxk (Vk(i-1)
akl). Termination P(x,S) maxk (Vk(L) ak0).
Choose the highest coefficients
Example 1 - CpG island
i 1, 4
k,l1, , 8 x2G V3(2)e3(G) maxk
(Vk(i-1) ak2) V3(2)e3(G) V2(1) a22 V3(2)e3(G)
V6(1) a62 e3(G) eG(G) 1 a22P(GC)0.274
k,l1, , 8
201.3 Algorithms
1.3.1 Viterbi Algorithm
Therefore the highest probability is associated
with the path (CGCG)
This is somehow expected for biological
reasons (C-G-C-G-) here the P(G-C-) are
smaller than in CpG island regions (CG-CG-)
here the probabilities of switching from to -
are small ...
By performing the same analysis as before on
longer regions, one ends up with long chains
characterized by sequences of plus followed by
sequences of minus. These sequences of plus are
the predicted CpG island regions.
211.3 Algorithms
1.3.1 Viterbi Algorithm
Example 4 - the dishonest casino
221.3 Algorithms
1.3.2 Forward Algorithm
How to calculate the probability to observe
sequence x in the model? P(x) ?S P(x, S) The
strategy of summing over all the possible paths
might seem unrealistic. However, iterations
strategies similar to the Viterbi algorithm seen
before, result quite efficient. Let fk(i) be the
probability contributed by all paths from the
beginning up to (and include) position i with the
state at position i being k. Then the following
recurrence is true f k(i) ek(xi) ? j f j(i-1)
ajk
231.3 Algorithms
1.3.2 Forward Algorithm
Initialization f0(0) 1, fk(0) 0 for k gt
0. Recursion fk(i) ek(xi) ?jfj(i-1) ajk.
i1, ,L Termination P(x) ?kfk(L)
ak0. Time complexity O(N2L), where N is the
number of states and L is the sequence length.
241.3 Algorithms
1.3.3 Posterior Probability
By definition Probability that observation xi
came from state k, given the observed sequence
xobs.
P(Sik xobs)
One can show that
bk(i) P(xi1, , xL S(i) k)
251.3 Algorithms
1.3.4 Backward Algorithm
How to calculate the probability to observe
sequence x in the model?
P(x) ?S P(x, S) Let bk(i) be the
probability contributed by all paths (from the
end) that pass state k at position i. Then
analogously to the forward algorithm
bk(i) P(xi1, , xL S(i)
k) Backward algorithm Initialization bk(L)
ak0 for all k. Recursion bk(i) ?j akj
ek(xi1) fj(i1) i L-1, , 1 Termination
P(x) ?k a0k ek(x1)bk(1). Time complexity
O(N2L), where N is the number of states and L is
the sequence length.
261.4 Parameter Estimation
So far we supposed that1) parameters ek(b) are
known2) parameters ajk are known3) connections
between states are knownHow can we evaluate
these features of the model?
271.4 Parameter Estimation
1.4.1 Estimation when the state sequence is known
In this case the estimation of ajk is the same as
in a simple Markov Chain
Where Akj are the number of transitions between
state k and state j Ek(b) are the
counts of b emissions in state k.
281.4 Parameter Estimation
1.4.1 Estimation when the state sequence is
unknown
In this case there exists a quite general
algorithm that is based on the existence of some
M training sequences where it is already known
that the model works well. This is the Baum-Welch
algorithm.
STEP 1 - Initialization pick arbitrary model
parameters Akj and ek(b)
STEP 2 - Recurrence DO w1,M
i) compute the fk(i) as seen before for training
sequence w ii) compute the bk(i) as
seen before for training sequence w
291.4 Parameter Estimation
1.4.1 Estimation when the state sequence is
unknown
iii) compute the probability that akl is used
at some position i in
sequence x END DO
STEP 3 - Counts estimation
301.4 Parameter Estimation
1.4.1 Estimation when the state sequence is
unknown
STEP 4 - probability estimation
STEP 5 - compute likelihood
Sum over w of the probability P(x,S) computed
with the above parameters .
STEP 6 - Termination
If the change in the likelihood is smaller than
some pre-fixed threshold value, then stop the
procedure. Otherwise repeat the procedure with
the above computed probabilities.
311.4 Parameter Estimation
1.4.1 Estimation when the state sequence is
unknown
Problems
The likelihood might converge to a local maximum,
rather than an absolute maximum. This problem is
more pronounced if the sequenes admits many paths
that are close, in probabiity, to the most
probable S.
321.5 Model Topology
We have seen how to evaluate the parameters ek(b)
and ajk. We are left with the problem of
connections between
states.
One approach is to allow all the transitions. By
using the algorithms seen before one can a
posteriori check whether some connections are
less favoured than others.
Alternative approaches 4- Duration
modelling- Silent states
33THEME 2Applications of Hidden Markov Models
2) Applications - pairwise alignment using
HMMs 4 - microarray and HMMs 5,6
5 M.F. Ramoni et al PNAS, 99, 9121-9126,
(2002) - 49 citations6 A. Schliep et al
Bioinformatics, 19 suppl. 1, i255-i263, (2003)
7 T.C. Mockler, J.R. Ecker Genomics, 85,
1-15, (2005) - 25 citations8 W. Li, C.A.
Meyer, X.S. Liu Bioinformatics, 21 suppl.
1, i274-i282, (2005)
4 R. Durbin, S. Eddy, A. Krogh, G.
MitchisonBiological Sequence AnalysisCambridge
University Press, Cambridge, (2001)ISBN
0521620414 chap. 2,4
342.1 Pairwise alignments using HMMs
2.1.1 Pairwise alignments scoring model
Let us suppose we have two sequences X and Y of
length n and m respectively. Xx1, xn
Yy1, ym
We want to assign a score to the alignment that
gives a measure of the relative likelihood that
sequences are related, as opposite to being
unrelated.
The idea is to specify a probabilistic model that
describes the divergence of the two sequences
from a common ancestor.
352.1 Pairwise alignments using HMMs
2.1.1 Pairwise alignments scoring model
Random Divergence model
qa is the occurrency frequence of base a
Non-Random Divergence Match model
pab is the probability that a and b have each
independently been derived form some original
base of an (unknown) ancestor sequence
362.1 Pairwise alignments using HMMs
2.1.1 Pairwise alignments scoring model
Odds Ratio
Log-Odds Ratio
S is the score we are looking for !!!!
372.1 Pairwise alignments using HMMs
2.1.1 Pairwise alignments scoring model
pab is evaluated starting from some pre-defined
substitution matrices (BLOSUM50, PAM, ...)
In case of gaps, one must also consider cost
functions that penalize gaps according to their
extensions.
i.e.
or
where d is the gap-open penalty, e is the
estension penalty and g is the extension of the
gap.
f(g) depends on the choice of the cost function.
382.1 Pairwise alignments using HMMs
2.1.2 Pairwise alignments alignment algorithm
Given the scoring model, we need an algorithm for
finding the optimal (maximize S) alignment of the
two sequences.
There are possible global alignments with
gaps between two sequences of length n
The magic word is dynamic programming !!!!
392.1 Pairwise alignments using HMMs
2.1.2 Needlemann-Wulsh for global alignment
(linear case)
Given the X and Y, construct the two
sub-sequences
Xix1, xi i1,,n
Yjy1, yi j1,,m
The idea is to recursively determine a matrix of
elements F(i,j) for each pair (i,j) , where
F(i,j) is defined as the score of the two
sub-sequences
402.1 Pairwise alignments using HMMs
2.1.2 Needlemann-Wulsh for global alignment
(linear case)
F(n,m) is the score we are looking for.
TRACEBACK procedure
Mantegna - lezione 2
412.1 Pairwise alignments using HMMs
2.1.2 Smith-Waterman for local alignment (linear
case)
This algorithm is used when one is looking for
the best alignment between parts of two given
sequences.
422.1 Pairwise alignments using HMMs
2.1.2 Needlemann-Wulsh for global alignment
(affine case)
In the case when the cost function is
The NW algorithm must be modified rather than a
single score function F(i,j) we now have to deal
with
M(i,j) best score up to (i,j) given that xi is
aligned to yi. It is the analogous of
F(i,j)IX(i,j) best score given that xi is
aligned to a gap lodging in an insertion with
respect to YIY(i,j) best score given that yi is
in an insertion with respect to X
432.1 Pairwise alignments using HMMs
2.1.2 Needlemann-Wulsh for global alignment
(affine case)
442.1 Pairwise alignments using HMMs
2.1.3 Pairwise alignment and HMMs
One can interpret the transition from a base
(gap) in X to a base (gap) in Y as an Hidden
Markov process
IX
s(xi,yj)
-e
-d
s(xi,yj)
M
-d
IY
-e
s(xi,yj)
452.1 Pairwise alignments using HMMs
2.1.3 Pairwise alignment and HMMs
state M has emission probability pab
(base-to-base) transition
probability ? (base-to-insert )
self-transition probability 1-2? (remaining in
M)
state IX has emission probability qa
(base-to-gap) transition
probability 1- ? (insert-to-base )
self-transition probability ? (remaining in
M)
state IY has emission probability qa
(base-to-gap) transition
probability 1- ? (insert-to-base )
self-transition probability ? (remaining in
M)
462.1 Pairwise alignments using HMMs
2.1.3 Pairwise alignment and HMMs
This is referred to as a pair Hidden MarkovModel
472.1 Pairwise alignments using HMMs
2.1.3 Pairwise alignment and HMMs
As much as a standard HMM can generate a
sequence, a pair-HMM can generate a pair of
sequences.
The Viterbi algorithms is
M(0,0)0IX(i,0)0IY(0,j)0
? is the transition probability from/to the
initial/final state
482.1 Pairwise alignments using HMMs
2.1.3 Pairwise alignment and HMMs
The link between the HMM approach and the
dynamic programming approach is given by
492.2 Microarray and HMMs
2.2.1 Time Course Analysis
When microarray expreriments are performed
consecutively in time we call this experimental
setting a time course of gene expression.
There are a number of approaches to analyze such
experiments CLASS 1 do not consider any
dependencies between profiles corresponding to
subsequent time-points (hierarchical, K-means,
...). CLASS 2 consider such dependencies as
important (model-based clustering)5
One possibility is that such model is just an
Hidden Markov model 6
502.2 Microarray and HMMs
2.2.2 Tiling Arrays
Tiling Arrays are high-density oligonucleotide-bas
ed whole-genome microarrays that have recently
emerged as a preferred platform for genomic
analysis beyond simple gene expression profiling.
While microarrays make use of relatively few
probes for each gene and are therefore biased
toward known and predicted gene structures, with
tiling arrays one can explore the whole genome by
using non-overlapping or partially overlapping
(100-mers) sequences 7
In 8 HMMS are used to infer the hybridization
state of state I.
51The End
52ADDITIONAL 1
531.2 Hidden Markov Models
Example 3
different from Example 1
The probability that sequence x is emitted by a
state path S is P(x, S) ?i1 to L eSi (xi) a
Si Si1 i123456789 xTGCGCGTAC s
----- P(x, S) 0.338 0.70 0.112 0.30
0.368 0.65 0.274 0.65 0.368 0.65
0.274 0.35 0.338 0.70 0.372 0.70
0.198. Then, the probability to observe sequence
x in the model is P(x) ?p P(x, S), which is
also called the likelihood of the model.