Title: Markov Chains
1Markov Chains
2Dependencies along the genome
- In previous classes we assumed every letter in a
sequence is sampled randomly from some
distribution q(?) over the alpha bet A,C,T,G. - This model could suffice for alignment scoring,
but it is not the case in true genomes. - There are special subsequences in the genome,
like TATA within the regulatory area, upstream a
gene. - The pairs C followed by G is less common than
expected for random sampling. - We model such dependencies by Markov chains and
hidden Markov model, which we define next.
3Finite Markov Chain
- An integer time stochastic process, consisting of
a domain D of m states s1,,sm and - An m dimensional initial distribution vector (
p(s1),.., p(sm)). - An mm transition probabilities matrix M (asisj)
- For example, D can be the letters A, C, T, G,
p(A) the probability of A to be the 1st letter in
a sequence, and aAG the probability that G
follows A in a sequence.
4Simple Model - Markov Chains
Markov Property The state of the system at
time t1 only depends on the state of the system
at time t
X2
X1
X3
X4
X5
5Markov Chain (cont.)
Similarly, (X1,, Xi ,)is a sequence of
probability distributions over D.
6Matrix Representation
The transition probabilities Matrix M (ast)
M is a stochastic Matrix
The initial distribution vector (u1um) defines
the distribution of X1 (p(X1si)ui) .
Then after one move, the distribution is changed
to X2 X1M After i moves the distribution is Xi
X1Mi-1
7Simple Example
Weather raining today rain
tomorrow prr 0.4 raining today
no rain tomorrow prn 0.6 no
raining today rain tomorrow pnr
0.2 no raining today no rain
tomorrow prr 0.8
8Simple Example
Transition Matrix for Example Note that
rows sum to 1 Such a matrix is called a
Stochastic Matrix If the rows of a matrix and
the columns of a matrix all sum to 1, we have a
Doubly Stochastic Matrix
9Gamblers Example
At each play we have the following Gambler
wins 1 with probability p Gambler loses 1
with probability 1-p Game ends when gambler
goes broke, or gains a fortune of 100 Both 0
and 100 are absorbing states
or
10Coke vs. Pepsi
Given that a persons last cola purchase was
Coke, there is a 90 chance that her next cola
purchase will also be Coke. If a persons last
cola purchase was Pepsi, there is an 80 chance
that her next cola purchase will also be Pepsi.
11Coke vs. Pepsi
Given that a person is currently a Pepsi
purchaser, what is the probability that she will
purchase Coke two purchases from now?
12Coke vs. Pepsi
Given that a person is currently a Coke drinker,
what is the probability that she will purchase
Pepsi three purchases from now?
13Coke vs. Pepsi
Assume each person makes one cola purchase per
week. Suppose 60 of all people now drink Coke,
and 40 drink Pepsi. What fraction of people
will be drinking Coke three weeks from now?
Let (Q0,Q1)(0.6,0.4) be the initial
probabilities. We will regard Coke as 0 and Pepsi
as 1 We want to find P(X30)
14Good Markov chains
For certain Markov Chains, the distributions Xi ,
as i?8 (1) converge to a unique distribution,
independent of the initial distribution. (2) In
that unique distribution, each state has a
positive probability. Call these Markov Chain
good. We describe these good Markov Chains
by considering Graph representation of Stochastic
matrices.
15Representation as a Digraph
Each directed edge A?B is associated with the
positive transition probability from A to B.
- We now define properties of this graph which
guarantee - Convergence to unique distribution
- In that distribution, each state has positive
probability.
16Examples of Bad Markov Chains
- Markov chains are not good if either
- They do not converge to a unique distribution.
- They do converge to u.d., but some states in
this distribution have zero probability.
17Bad case 1 Mutual Unreachabaility
- Consider two initial distributions
- p(X1A)1 (p(X1 x)0 if x?A).
- p(X1 C) 1
In case a), the sequence will stay at A
forever. In case b), it will stay in C,D for
ever. Fact 1 If G has two states which are
unreachable from each other, then Xi cannot
converge to a distribution which is independent
on the initial distribution.
18Bad case 2 Transient States
Def A state s is recurrent if it can be reached
from any state reachable from s otherwise it is
transient.
A and B are transient states, C and D are
recurrent states. Once the process moves from B
to D, it will never come back.
19Bad case 2 Transient States
Fact 2 For each initial distribution, with
probability 1 a transient state will be visited
only a finite number of times.
X
20Bad case 3 Periodic States
A state s has a period k if k is the GCD of the
lengths of all the cycles that pass via s.
A Markov Chain is periodic if all the states in
it have a period k gt1. It is aperiodic
otherwise. Example Consider the initial
distribution p(B)1. Then states B, C are
visited (with positive probability) only in odd
steps, and states A, D, E are visited in only
even steps.
21Bad case 3 Periodic States
Fact 3 In a periodic Markov Chain (of period k
gt1) there are initial distributions under which
the states are visited in a periodic
manner. Under such initial distributions Xi does
not converge as i?8.
22Ergodic Markov Chains
- A Markov chain is ergodic if
- All states are recurrent (ie, the graph is
strongly connected) - It is not periodic
- The Fundamental Theorem of Finite Markov Chains
- If a Markov Chain is ergodic, then
- It has a unique stationary distribution vector V
gt 0, which is an Eigenvector of the transition
matrix. - The distributions Xi , as i?8, converges to V.
23Use of Markov Chains in Genome search Modeling
CpG Islands
In human genomes the pair CG often transforms to
(methyl-C) G which often transforms to
TG. Hence the pair CG appears less than expected
from what is expected from the independent
frequencies of C and G alone. Due to biological
reasons, this process is sometimes suppressed in
short stretches of genomes such as in the start
regions of many genes. These areas are called
CpG islands (p denotes pair).
24Example CpG Island (Cont.)
We consider two questions (and some
variants) Question 1 Given a short stretch of
genomic data, does it come from a CpG island
? Question 2 Given a long piece of genomic
data, does it contain CpG islands in it, where,
what length ? We solve the first question by
modeling strings with and without CpG islands as
Markov Chains over the same states A,C,G,T but
different transition probabilities
25Example CpG Island (Cont.)
The model Use transition matrix A
(ast), Where ast (the probability that t
follows s in a CpG island) The - model Use
transition matrix A- (a-st), Where a-st
(the probability that t follows s in a non CpG
island)
26Example CpG Island (Cont.)
With this model, to solve Question 1 we need to
decide whether a given short sequence of letters
is more likely to come from the model or from
the model. This is done by using the
definitions of Markov Chain. to solve Question
2 we need to decide which parts of a given long
sequence of letters is more likely to come from
the model, and which parts are more likely to
come from the model. This is done by using
the Hidden Markov Model, to be defined
later. We start with Question 1
27Question 1 Using two Markov chains
A (For CpG islands)
We need to specify p(xi xi-1) where stands
for CpG Island. From Durbin et al we have
Xi
Xi-1
(Recall rows must add up to one columns need
not.)
28Question 1 Using two Markov chains
A- (For non-CpG islands)
and for p-(xi xi-1) (where - stands for Non
CpG island) we have
Xi
Xi-1
29Discriminating between the two models
Given a string x(x1.xL), now compute the ratio
If RATIOgt1, CpG island is more likely. Actually
the log of this ratio is computed
Note p(x1x0) is defined for convenience as
p(x1). p-(x1x0) is defined for convenience as
p-(x1).
30Log Odds-Ratio test
If logQ gt 0, then is more likely (CpG
island). If logQ lt 0, then - is more likely
(non-CpG island).
31Where do the parameters (transition-
probabilities) come from ?
- Learning from complete data, namely, when the
label is given and every xi is measured
Source A collection of sequences from CpG
islands, and a collection of sequences from
non-CpG islands.
Input Tuples of the form (x1, , xL, h), where
h is or -
Output Maximum Likelihood parameters (MLE)
Count all pairs (Xia, Xi-1b) with label , and
with label -, say the numbers are Nba, and Nba,-
.
32Maximum Likelihood Estimate (MLE) of the
parameters (using labeled data)
The needed parameters are P(x1), p (xi
xi-1), p-(x1), p-(xi xi-1) The ML estimates are
given by