Markov Chains - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Markov Chains

Description:

... subsequences in the genome, like TATA within the regulatory area, upstream a gene. The pairs C followed by G is less common than expected for random sampling. ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 33

Provided by: shlomo9

Category:

more less

Transcript and Presenter's Notes

Title: Markov Chains

1
Markov Chains
2
Dependencies along the genome

In previous classes we assumed every letter in a
sequence is sampled randomly from some
distribution q(?) over the alpha bet A,C,T,G.
This model could suffice for alignment scoring,
but it is not the case in true genomes.
There are special subsequences in the genome,
like TATA within the regulatory area, upstream a
gene.
The pairs C followed by G is less common than
expected for random sampling.
We model such dependencies by Markov chains and
hidden Markov model, which we define next.

3
Finite Markov Chain

An integer time stochastic process, consisting of
a domain D of m states s1,,sm and
An m dimensional initial distribution vector (
p(s1),.., p(sm)).
An mm transition probabilities matrix M (asisj)
For example, D can be the letters A, C, T, G,
p(A) the probability of A to be the 1st letter in
a sequence, and aAG the probability that G
follows A in a sequence.

4
Simple Model - Markov Chains
Markov Property The state of the system at
time t1 only depends on the state of the system
at time t
X2
X1
X3
X4
X5
5
Markov Chain (cont.)
Similarly, (X1,, Xi ,)is a sequence of
probability distributions over D.
6
Matrix Representation
The transition probabilities Matrix M (ast)

M is a stochastic Matrix
The initial distribution vector (u1um) defines
the distribution of X1 (p(X1si)ui) .
Then after one move, the distribution is changed
to X2 X1M After i moves the distribution is Xi
X1Mi-1
7
Simple Example
Weather raining today rain
tomorrow prr 0.4 raining today
no rain tomorrow prn 0.6 no
raining today rain tomorrow pnr
0.2 no raining today no rain
tomorrow prr 0.8
8
Simple Example
Transition Matrix for Example Note that
rows sum to 1 Such a matrix is called a
Stochastic Matrix If the rows of a matrix and
the columns of a matrix all sum to 1, we have a
Doubly Stochastic Matrix
9
Gamblers Example
At each play we have the following Gambler
wins 1 with probability p Gambler loses 1
with probability 1-p Game ends when gambler
goes broke, or gains a fortune of 100 Both 0
and 100 are absorbing states
or
10
Coke vs. Pepsi
Given that a persons last cola purchase was
Coke, there is a 90 chance that her next cola
purchase will also be Coke. If a persons last
cola purchase was Pepsi, there is an 80 chance
that her next cola purchase will also be Pepsi.
11
Coke vs. Pepsi
Given that a person is currently a Pepsi
purchaser, what is the probability that she will
purchase Coke two purchases from now?
12
Coke vs. Pepsi
Given that a person is currently a Coke drinker,
what is the probability that she will purchase
Pepsi three purchases from now?
13
Coke vs. Pepsi
Assume each person makes one cola purchase per
week. Suppose 60 of all people now drink Coke,
and 40 drink Pepsi. What fraction of people
will be drinking Coke three weeks from now?
Let (Q0,Q1)(0.6,0.4) be the initial
probabilities. We will regard Coke as 0 and Pepsi
as 1 We want to find P(X30)
14
Good Markov chains
For certain Markov Chains, the distributions Xi ,
as i?8 (1) converge to a unique distribution,
independent of the initial distribution. (2) In
that unique distribution, each state has a
positive probability. Call these Markov Chain
good. We describe these good Markov Chains
by considering Graph representation of Stochastic
matrices.
15
Representation as a Digraph
Each directed edge A?B is associated with the
positive transition probability from A to B.

We now define properties of this graph which
guarantee
Convergence to unique distribution
In that distribution, each state has positive
probability.

16
Examples of Bad Markov Chains

Markov chains are not good if either
They do not converge to a unique distribution.
They do converge to u.d., but some states in
this distribution have zero probability.

17
Bad case 1 Mutual Unreachabaility

Consider two initial distributions
p(X1A)1 (p(X1 x)0 if x?A).
p(X1 C) 1

In case a), the sequence will stay at A
forever. In case b), it will stay in C,D for
ever. Fact 1 If G has two states which are
unreachable from each other, then Xi cannot
converge to a distribution which is independent
on the initial distribution.
18
Bad case 2 Transient States
Def A state s is recurrent if it can be reached
from any state reachable from s otherwise it is
transient.
A and B are transient states, C and D are
recurrent states. Once the process moves from B
to D, it will never come back.
19
Bad case 2 Transient States
Fact 2 For each initial distribution, with
probability 1 a transient state will be visited
only a finite number of times.
X
20
Bad case 3 Periodic States
A state s has a period k if k is the GCD of the
lengths of all the cycles that pass via s.
A Markov Chain is periodic if all the states in
it have a period k gt1. It is aperiodic
otherwise. Example Consider the initial
distribution p(B)1. Then states B, C are
visited (with positive probability) only in odd
steps, and states A, D, E are visited in only
even steps.
21
Bad case 3 Periodic States
Fact 3 In a periodic Markov Chain (of period k
gt1) there are initial distributions under which
the states are visited in a periodic
manner. Under such initial distributions Xi does
not converge as i?8.
22
Ergodic Markov Chains

A Markov chain is ergodic if
All states are recurrent (ie, the graph is
strongly connected)
It is not periodic

The Fundamental Theorem of Finite Markov Chains
If a Markov Chain is ergodic, then
It has a unique stationary distribution vector V
gt 0, which is an Eigenvector of the transition
matrix.
The distributions Xi , as i?8, converges to V.

23
Use of Markov Chains in Genome search Modeling
CpG Islands
In human genomes the pair CG often transforms to
(methyl-C) G which often transforms to
TG. Hence the pair CG appears less than expected
from what is expected from the independent
frequencies of C and G alone. Due to biological
reasons, this process is sometimes suppressed in
short stretches of genomes such as in the start
regions of many genes. These areas are called
CpG islands (p denotes pair).
24
Example CpG Island (Cont.)
We consider two questions (and some
variants) Question 1 Given a short stretch of
genomic data, does it come from a CpG island
? Question 2 Given a long piece of genomic
data, does it contain CpG islands in it, where,
what length ? We solve the first question by
modeling strings with and without CpG islands as
Markov Chains over the same states A,C,G,T but
different transition probabilities
25
Example CpG Island (Cont.)
The model Use transition matrix A
(ast), Where ast (the probability that t
follows s in a CpG island) The - model Use
transition matrix A- (a-st), Where a-st
(the probability that t follows s in a non CpG
island)
26
Example CpG Island (Cont.)
With this model, to solve Question 1 we need to
decide whether a given short sequence of letters
is more likely to come from the model or from
the model. This is done by using the
definitions of Markov Chain. to solve Question
2 we need to decide which parts of a given long
sequence of letters is more likely to come from
the model, and which parts are more likely to
come from the model. This is done by using
the Hidden Markov Model, to be defined
later. We start with Question 1
27
Question 1 Using two Markov chains
A (For CpG islands)
We need to specify p(xi xi-1) where stands
for CpG Island. From Durbin et al we have
Xi
Xi-1
(Recall rows must add up to one columns need
not.)
28
Question 1 Using two Markov chains
A- (For non-CpG islands)
and for p-(xi xi-1) (where - stands for Non
CpG island) we have
Xi
Xi-1
29
Discriminating between the two models
Given a string x(x1.xL), now compute the ratio
If RATIOgt1, CpG island is more likely. Actually
the log of this ratio is computed
Note p(x1x0) is defined for convenience as
p(x1). p-(x1x0) is defined for convenience as
p-(x1).
30
Log Odds-Ratio test

Taking logarithm yields

If logQ gt 0, then is more likely (CpG
island). If logQ lt 0, then - is more likely
(non-CpG island).
31
Where do the parameters (transition-
probabilities) come from ?

Learning from complete data, namely, when the
label is given and every xi is measured

Source A collection of sequences from CpG
islands, and a collection of sequences from
non-CpG islands.
Input Tuples of the form (x1, , xL, h), where
h is or -
Output Maximum Likelihood parameters (MLE)
Count all pairs (Xia, Xi-1b) with label , and
with label -, say the numbers are Nba, and Nba,-
.
32
Maximum Likelihood Estimate (MLE) of the
parameters (using labeled data)
The needed parameters are P(x1), p (xi
xi-1), p-(x1), p-(xi xi-1) The ML estimates are
given by

Write a Comment

User Comments (0)