Hidden Markov Models

About This Presentation

Title:

Hidden Markov Models

Description:

pLoaded(Fair) = 0.10, pLoaded(Loaded) = 0.90, eFair(1) = 1/6, ..., eFair(6) = 1/6 ... Loaded. 0.05. 0.10. 0.90. 0.95. 17. HMM structure. min length = 7, ... – PowerPoint PPT presentation

Number of Views:116

Avg rating:3.0/5.0

Slides: 30

Provided by: nathanjoh

Category:

more less

Transcript and Presenter's Notes

Title: Hidden Markov Models

1
Hidden Markov Models Gene Finding

Lecture 21 November 15, 2005
Algorithms in Biosequence Analysis
Nathan Edwards - Fall, 2005

2
Hidden Markov Models

Set of states Q
Transition probabilities ps(t), s,t in Q
Output alphabet S
Emission probabilities es(x), s in Q, x in S
Separate states from observed sequence.

3
Dishonest casino
0.90
0.95
1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6
1 1/10 2 1/10 3 1/10 4 1/10 5 1/10 6 1/2
0.05
0.10
Loaded
Fair
4
Dishonest casino

Q Fair, Loaded , S 1, 2, 3, 4, 5, 6
pFair(Fair) 0.95, pFair(Loaded) 0.05,
pLoaded(Fair) 0.10, pLoaded(Loaded) 0.90,
eFair(1) 1/6, ..., eFair(6) 1/6
eLoaded(1) 1/10, ..., eLoaded(6) 1/2

0.90
0.95
1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6
1 1/10 2 1/10 3 1/10 4 1/10 5 1/10 6 1/2
0.05
0.10
Loaded
Fair
5
Finding CpG Islands
6
Finding CpG Islands

Q A, C, G, T, A-, C-, G-, T-
S A, C, G, T
eA(A) 1, ... , eA(T) 0,
eA-(A) 1, ... , eA-(T) 0,
...
px(y) CpG Island transition axy
px-(y-) non-CpG Island transition axy
px(y-) prob of switch from CpG to non-CpG
px-(y) prob of switch from non-CpG to CpG

7
Dynamic Programming Algorithms

Viterbi Find most likely state sequence S for a
given observed sequence x.
S argmax S Pp,e S x argmax S Pp,e S, x
Forward, Backward Find the probability of
observed sequence x.
Pp,e x SS PS,x
fk(i) Px1,...,xi,Si k, bk(i)
Pxi1,...,xLSi k.
Posterior State Probability Likelihood, given
the observed sequence x, that Si k.
Pp,e Si k x fk(i) bk(i) / P x

8
Parameter Estimation

Given a set of training sequences, (and Q and S)
determine p and e.
Require a set of training sequences, much as in
any data-mining problem
Ideally, these are already labeled with the
states Q.

9
Parameter Estimation

Given sequences S and x, compute ps(t) and
es(y)
Count the number of times s is followed by t in
S, and number of times y is output in state s.
ps(t) Sis, Si1t / Sis
Sis, Si1t / Sk Sis,
Si1kes(y) xiy, Sis / Sis
xiy, Sis / Sy xiy, Sis
What if we never see a particular transition?

10
Parameter Estimation

ps(t) Sis, Si1t / Sk Sis,
Si1kes(y) xiy, Sis / Sy
xiy, Sis
These are the maximum likelihood estimates (p,e)
argmax Pp,e S, x
What if we never see a particular transition or
emission?
Add pseudo-counts to represent our Bayesian
belief that these events happen, albeit rarely.

11
What if we have no labels?

If the state sequence is unknown, we cant just
count.
If only we had p and e, we could compute the
expected number of s to t transitions by summing
over all paths.
Pe,p Si s, Si1 t x
fs(i)ps(t)et(xi1)bt(i1) / Pe,p x
E Sis,Si1t Sx Si Pe,p Sis,Si1 t x
E xiy,Sis Sx Si Pe,p Si s,xi y x

12
Baum-Welch

If we use these expected counts to estimate the
parameters
ps(t) Ep,e Sis, Si1t /Sk Ep,e Sis,
Si1kes(y) Ep,e xiy, Sis /Sy Ep,e
xiy, Sis
then Pp,e x Pp,e x , or p p and e
e.
So, pick initial p,e values and iterate,
stopwhen no more improvement is made.
This is a special case of a general class of
parameter estimation procedures called
Expectation-Maximization (EM).

13
Baum-Welch

The solution found by this iterative procedure is
a local maximum
very much dependent on initial parameters
Updates are equivalent to p, e for maxp,e
L(p,e,p,e) SSPp,eSx log Pp,eS,xs.t.
St ps(t) 1, for all s in Q St
es(y) 1, for all y in S ps(t) 0,
es(y) 0

14
Baum-Welch

Could use Viterbi state assignments instead
less accurate, but a little faster.
Many computational biology applications of HMMs
use training sequences with labels.

15
HMM structure

These basic algorithms can be quite expressive,
but we need more.
Our current models emit a geometric number of
emissions in each state
Pls ps(s)l-1(1-ps(s))
This is fine, as long as its really true!

16
Dishonest casino
0.90
0.95
1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6
1 1/10 2 1/10 3 1/10 4 1/10 5 1/10 6 1/2
0.05
0.10
Loaded
Fair
17
HMM structure
min length 7, geometric distribution otherwise
arbitrary distribution of lengths between 2 and 8
18
HMM structure

Using topology in this way is equivalent to
setting some ps(t) to zero.
Typically, also constrain es(t) to be equal, for
s in Qj.
The dynamic programming algorithms stay much the
same, but become much more graph like.
Can also change the emit model
Silent states

19
HMM structure

Suppose we need to model skipping states
Too many parameters! O(n2)

silent states
20
Semi-HMM

Semi-Hidden Markov Models extend this idea still
further
Explicitly model the output length distribution
Each state emits a sequence of characters, whose
length is drawn from some distribution.
No more s to s transitions!

21
Dishonest casino
Geom(p0.95)
Geom(p0.90)
1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6
1 1/10 2 1/10 3 1/10 4 1/10 5 1/10 6 1/2
1.00
1.00
Loaded
Fair
22
Finding CpG Islands
23
CpG Islands
Plength
P-length
24
Semi-HMMs

Semi-HMMs free us to choose arbitrary length
distributions and emit sequences of (arbitrary)
characters
Markov chains for local dependencies
Highly structured models that capture specific
modules
CpG / non-CpG
Gene / Intergenic
Exon / Intron

25
Example Gene-finding HMM
26
Example Gene-finding HMM

5th order intergenic markov model
Pxixi-5xi-4xi-3xi-2xi-1
453 3072 parameters
First order internal codon model
6160 3660 parameters
Stop codon choice
2 parameters
Total of 6734 parameters two length
distributions!

27
Viterbi for Semi-HMM

Let vk(i) be the probability of the most probable
sequence of states that ends in state k with
observation i, then vk(i) maxk in Q ek(xi)
pk(k) vk(i-1)
Let Vk(i) be the probability of the most probable
sequence of states that ends in state k with an
emission that ends at position i. Vk(i) maxj

28
Viterbi for Semi-HMM

Vk(i) maxj xj1...xipk(k)Vk(j)
This should look familiar
arbitrary gap models for sequence alignment!
Unfortunately, this means O(N2) time.
For gene finding models, not so bad
exons are short
start and stop codons define few positions to
check

29
Summary

HMM parameters are estimated from training data.
Explicit frequencies if states are known
Baum-Welch if states are unknown
HMM structure makes domain knowledge explicit
Silent states
Semi-HMM for arbitrary length distributions
Emit sequences (Markov dependencies) rather than
single character

Write a Comment

User Comments (0)

About PowerShow.com

Hidden Markov Models - PowerPoint PPT Presentation

Hidden Markov Models

pLoaded(Fair) = 0.10, pLoaded(Loaded) = 0.90, eFair(1) = 1/6, ..., eFair(6) = 1/6 ... Loaded. 0.05. 0.10. 0.90. 0.95. 17. HMM structure. min length = 7, ... – PowerPoint PPT presentation