Class%207:%20Hidden%20Markov%20Models - PowerPoint PPT Presentation

About This Presentation
Title:

Class%207:%20Hidden%20Markov%20Models

Description:

This means that the order of elements in the sequence did not ... P(Hi 1=k |Hi=l ) = Alk. Observations X1,...,Xn. Assumption: Xi depends only on hidden state Hi ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 34
Provided by: NirFri
Category:

less

Transcript and Presenter's Notes

Title: Class%207:%20Hidden%20Markov%20Models


1
Class 7Hidden Markov Models
2
Sequence Models
  • So far we examined several probabilistic models
  • sequence models
  • These model, however, assumed that positions are
    independent
  • This means that the order of elements in the
    sequence did not play a role
  • In this class we learn about probabilistic models
    of sequences

3
Probability of Sequences
  • Fix an alphabet ?
  • Let X1,,Xn be a sequence of random variables
    over ?
  • We want to model P(X1,,Xn)

4
Markov Chains
  • Assumption
  • Xi1 is independent of the past once we know Xi
  • This allows us to write

5
Markov Chains (cont)
  • Assumption
  • P(Xi1Xi) is the same for all i
  • Notation P(Xi1b Xia ) Aab
  • By specifying the matrix A and initial
    probabilities, we define P(X1,,Xn)
  • To avoid the special case of P(X1), we can use a
    special start state, and denote P(X1 a) Asa

6
Example CpG islands
  • In human genome, CpG dinucleotides are relatively
    rare
  • CpG pairs undergo a process called methylation
    that modifies the C nucleotide
  • A methylated C can (with relatively high chance)
    mutate to a T
  • Promotor regions are CpG rich
  • These regions are not methylated, and thus mutate
    less often
  • These are called CpG islands

7
CpG Islands
  • We construct Markov chain for CpG rich and poor
    regions
  • Using maximum likelihood estimates from 60K
    nucleotide, we get two models

8
Ratio Test for CpG islands
  • Given a sequence X1,,Xn we compute the
    likelihood ratio

9
Empirical Evalation
10
Finding CpG islands
  • Simple Minded approach
  • Pick a window of size N (N 100, for example)
  • Compute log-ratio for the sequence in the window,
    and classify based on that
  • Problems
  • How do we select N?
  • What do we do when the window intersects the
    boundary of a CpG island?

11
Alternative Approach
  • Build a model that include states and -
    states
  • A state remembers last nucleotide and the type
    of region
  • A transition from a - state to a describes a
    start of CpG island

12
Hidden Markov Models
  • Two components
  • A Markov chain of hidden states H1,,Hn with L
    values
  • P(Hi1k Hil ) Alk
  • Observations X1,,Xn
  • Assumption Xi depends only on hidden state Hi
  • P(Xia Hik ) Bka

13
Semantics
14
Example Dishonest Casino
15
Computing Most Probable Sequence
  • Given x1,,xn
  • Output h1,,hn such that

16
  • Idea
  • If we know the value of hi, then the most
    probable sequence on i1,,n does not depend on
    observations before time i
  • Let Vi(l) be the probability of the best sequence
    h1,,hi such that hi l

17
Dynamic Programming Rule
  • so

18
Viterbi Algorithm
  • Set V0(0) 1, V0(l) 0 for l gt 0
  • for i 1, , n
  • for l 1,,L
  • set
  • Let hn argmaxl Vn(l)
  • for i n-1,,1
  • set hi Pi1(hi1)

19
Viterbi Algorithm Example
20
Computing Probabilities
  • Given x1,,xn
  • Output P(x1,,xn )
  • How do we sum exponential number of hidden
    sequences?

21
Forward Algorithm
  • Perform dynamic programming on sequences
  • Let fi(l) P(x1,,xi,Hil)
  • Recursion rule
  • Conclusion

22
Computing Posteriors
  • How do we compute P(Hi x1,,xn) ?

23
Backward Algorithm
  • Perform dynamic programming on sequences
  • Let bi(l) P(xi1,,xnHil)
  • Recursion rule
  • Conclusion

24
Computing Posteriors
  • How do we compute P(Hi x1,,xn) ?

25
Dishonest Casino (again)
  • Computing posterior probabilities for fair at
    each point in a long sequence

26
Learning
  • Given a sequence x1,,xn, h1,,hn
  • How do we learn Akl and Bka ?
  • We want to find parameters that maximize the
    likelihood P(x1,,xn, h1,,hn)
  • We simply count
  • Nkl - number of times hik hi1l
  • Nka - number of times hik xi a

27
Learning
  • Given only sequence x1,,xn
  • How do we learn Akl and Bka ?
  • We want to find parameters that maximize the
    likelihood P(x1,,xn)
  • Problem
  • Counts are inaccessible since we do not observe hi

28
  • If we have Akl and Bka we can compute

29
Expected Counts
  • We can compute expected number of times hik
    hi1l
  • Similarly

30
Expectation Maximization (EM)
  • Choose Akl and Bka
  • E-step
  • Compute expected counts ENkl, ENka
  • M-Step
  • Restimate
  • Reiterate

31
EM - basic properties
  • P(x1,,xn Akl, Bka) ? P(x1,,xn Akl, Bka)
  • Likelihood grows in each iteration
  • If P(x1,,xn Akl, Bka) P(x1,,xn Akl,
    Bka)then Akl, Bka is a stationary point of the
    likelihood
  • either a local maxima, minima, or saddle point

32
Complexity of E-step
  • Compute forward and backward messages
  • Time Space complexity O(nL)
  • Accumulate expected counts
  • Time complexity O(nL2)
  • Space complexity O(L2)

33
EM - problems
  • Local Maxima
  • Learning can get stuck in local maxima
  • Sensitive to initialization
  • Require some method for escaping such maxima
  • Choosing L
  • We often do not know how many hidden values we
    should have or can learn
Write a Comment
User Comments (0)
About PowerShow.com