Hidden Markov Models - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Hidden Markov Models

Description:

pLoaded(Fair) = 0.10, pLoaded(Loaded) = 0.90, eFair(1) = 1/6, ..., eFair(6) = 1/6 ... Loaded. 0.05. 0.10. 0.90. 0.95. 17. HMM structure. min length = 7, ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 30
Provided by: nathanjoh
Category:
Tags: hidden | loaded | markov | models

less

Transcript and Presenter's Notes

Title: Hidden Markov Models


1
Hidden Markov Models Gene Finding
  • Lecture 21 November 15, 2005
  • Algorithms in Biosequence Analysis
  • Nathan Edwards - Fall, 2005

2
Hidden Markov Models
  • Set of states Q
  • Transition probabilities ps(t), s,t in Q
  • Output alphabet S
  • Emission probabilities es(x), s in Q, x in S
  • Separate states from observed sequence.

3
Dishonest casino
0.90
0.95
1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6
1 1/10 2 1/10 3 1/10 4 1/10 5 1/10 6 1/2
0.05
0.10
Loaded
Fair
4
Dishonest casino
  • Q Fair, Loaded , S 1, 2, 3, 4, 5, 6
  • pFair(Fair) 0.95, pFair(Loaded) 0.05,
  • pLoaded(Fair) 0.10, pLoaded(Loaded) 0.90,
  • eFair(1) 1/6, ..., eFair(6) 1/6
  • eLoaded(1) 1/10, ..., eLoaded(6) 1/2

0.90
0.95
1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6
1 1/10 2 1/10 3 1/10 4 1/10 5 1/10 6 1/2
0.05
0.10
Loaded
Fair
5
Finding CpG Islands
6
Finding CpG Islands
  • Q A, C, G, T, A-, C-, G-, T-
  • S A, C, G, T
  • eA(A) 1, ... , eA(T) 0,
  • eA-(A) 1, ... , eA-(T) 0,
  • ...
  • px(y) CpG Island transition axy
  • px-(y-) non-CpG Island transition axy
  • px(y-) prob of switch from CpG to non-CpG
  • px-(y) prob of switch from non-CpG to CpG

7
Dynamic Programming Algorithms
  • Viterbi Find most likely state sequence S for a
    given observed sequence x.
  • S argmax S Pp,e S x argmax S Pp,e S, x
  • Forward, Backward Find the probability of
    observed sequence x.
  • Pp,e x SS PS,x
  • fk(i) Px1,...,xi,Si k, bk(i)
    Pxi1,...,xLSi k.
  • Posterior State Probability Likelihood, given
    the observed sequence x, that Si k.
  • Pp,e Si k x fk(i) bk(i) / P x

8
Parameter Estimation
  • Given a set of training sequences, (and Q and S)
    determine p and e.
  • Require a set of training sequences, much as in
    any data-mining problem
  • Ideally, these are already labeled with the
    states Q.

9
Parameter Estimation
  • Given sequences S and x, compute ps(t) and
    es(y)
  • Count the number of times s is followed by t in
    S, and number of times y is output in state s.
  • ps(t) Sis, Si1t / Sis
    Sis, Si1t / Sk Sis,
    Si1kes(y) xiy, Sis / Sis
    xiy, Sis / Sy xiy, Sis
  • What if we never see a particular transition?

10
Parameter Estimation
  • ps(t) Sis, Si1t / Sk Sis,
    Si1kes(y) xiy, Sis / Sy
    xiy, Sis
  • These are the maximum likelihood estimates (p,e)
    argmax Pp,e S, x
  • What if we never see a particular transition or
    emission?
  • Add pseudo-counts to represent our Bayesian
    belief that these events happen, albeit rarely.

11
What if we have no labels?
  • If the state sequence is unknown, we cant just
    count.
  • If only we had p and e, we could compute the
    expected number of s to t transitions by summing
    over all paths.
  • Pe,p Si s, Si1 t x
    fs(i)ps(t)et(xi1)bt(i1) / Pe,p x
  • E Sis,Si1t Sx Si Pe,p Sis,Si1 t x
  • E xiy,Sis Sx Si Pe,p Si s,xi y x

12
Baum-Welch
  • If we use these expected counts to estimate the
    parameters
  • ps(t) Ep,e Sis, Si1t /Sk Ep,e Sis,
    Si1kes(y) Ep,e xiy, Sis /Sy Ep,e
    xiy, Sis
  • then Pp,e x Pp,e x , or p p and e
    e.
  • So, pick initial p,e values and iterate,
    stopwhen no more improvement is made.
  • This is a special case of a general class of
    parameter estimation procedures called
    Expectation-Maximization (EM).

13
Baum-Welch
  • The solution found by this iterative procedure is
    a local maximum
  • very much dependent on initial parameters
  • Updates are equivalent to p, e for maxp,e
    L(p,e,p,e) SSPp,eSx log Pp,eS,xs.t.
    St ps(t) 1, for all s in Q St
    es(y) 1, for all y in S ps(t) 0,
    es(y) 0

14
Baum-Welch
  • Could use Viterbi state assignments instead
    less accurate, but a little faster.
  • Many computational biology applications of HMMs
    use training sequences with labels.

15
HMM structure
  • These basic algorithms can be quite expressive,
    but we need more.
  • Our current models emit a geometric number of
    emissions in each state
  • Pls ps(s)l-1(1-ps(s))
  • This is fine, as long as its really true!

16
Dishonest casino
0.90
0.95
1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6
1 1/10 2 1/10 3 1/10 4 1/10 5 1/10 6 1/2
0.05
0.10
Loaded
Fair
17
HMM structure
min length 7, geometric distribution otherwise
arbitrary distribution of lengths between 2 and 8
18
HMM structure
  • Using topology in this way is equivalent to
    setting some ps(t) to zero.
  • Typically, also constrain es(t) to be equal, for
    s in Qj.
  • The dynamic programming algorithms stay much the
    same, but become much more graph like.
  • Can also change the emit model
  • Silent states

19
HMM structure
  • Suppose we need to model skipping states
  • Too many parameters! O(n2)

silent states
20
Semi-HMM
  • Semi-Hidden Markov Models extend this idea still
    further
  • Explicitly model the output length distribution
  • Each state emits a sequence of characters, whose
    length is drawn from some distribution.
  • No more s to s transitions!

21
Dishonest casino
Geom(p0.95)
Geom(p0.90)
1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6
1 1/10 2 1/10 3 1/10 4 1/10 5 1/10 6 1/2
1.00
1.00
Loaded
Fair
22
Finding CpG Islands
23
CpG Islands
Plength
P-length
24
Semi-HMMs
  • Semi-HMMs free us to choose arbitrary length
    distributions and emit sequences of (arbitrary)
    characters
  • Markov chains for local dependencies
  • Highly structured models that capture specific
    modules
  • CpG / non-CpG
  • Gene / Intergenic
  • Exon / Intron

25
Example Gene-finding HMM
26
Example Gene-finding HMM
  • 5th order intergenic markov model
  • Pxixi-5xi-4xi-3xi-2xi-1
  • 453 3072 parameters
  • First order internal codon model
  • 6160 3660 parameters
  • Stop codon choice
  • 2 parameters
  • Total of 6734 parameters two length
    distributions!

27
Viterbi for Semi-HMM
  • Let vk(i) be the probability of the most probable
    sequence of states that ends in state k with
    observation i, then vk(i) maxk in Q ek(xi)
    pk(k) vk(i-1)
  • Let Vk(i) be the probability of the most probable
    sequence of states that ends in state k with an
    emission that ends at position i. Vk(i) maxj

28
Viterbi for Semi-HMM
  • Vk(i) maxj xj1...xipk(k)Vk(j)
  • This should look familiar
  • arbitrary gap models for sequence alignment!
  • Unfortunately, this means O(N2) time.
  • For gene finding models, not so bad
  • exons are short
  • start and stop codons define few positions to
    check

29
Summary
  • HMM parameters are estimated from training data.
  • Explicit frequencies if states are known
  • Baum-Welch if states are unknown
  • HMM structure makes domain knowledge explicit
  • Silent states
  • Semi-HMM for arbitrary length distributions
  • Emit sequences (Markov dependencies) rather than
    single character
Write a Comment
User Comments (0)
About PowerShow.com