Markov Chains and Hidden Markov Models - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Markov Chains and Hidden Markov Models

Description:

No one-to-one correspondence between states and symbols. No longer ... from state k, given the observed sequence. Posterior probability of state k at time i when ... – PowerPoint PPT presentation

Number of Views:333
Avg rating:3.0/5.0
Slides: 46
Provided by: apst
Category:
Tags: chains | hidden | lead | markov | models

less

Transcript and Presenter's Notes

Title: Markov Chains and Hidden Markov Models


1
Markov Chains and Hidden Markov Models
  • Marjolijn Elsinga
  • Elze de Groot

2
Andrei A. Markov
  • Born 14 June 1856 in Ryazan, RussiaDied 20
    July 1922 in Petrograd, Russia
  • Graduate of Saint Petersburg University (1878)
  • Work number theory and analysis, continued
    fractions, limits of integrals, approximation
    theory and the convergence of series

3
Todays topics
  • Markov chains
  • Hidden Markov models
  • - Viterbi Algorithm
  • - Forward Algorithm
  • - Backward Algorithm
  • - Posterior Probabilities

4
Markov Chains (1)
  • Emitting states

5
Markov Chains (2)
  • Transition probabilities
  • Probability of the sequence

6
Key property of Markov Chains
  • The probability of a symbol xi depends only on
    the value of the preceding symbol xi-1

7
Begin and End states
  • Silent states

8
Example CpG Islands
  • CpG Cytosine phosphodiester bond Guanine
  • 100 1000 bases long
  • Cytosine is modified by methylation
  • Methylation is suppressed in short stretches of
    the genome (start regions of genes)
  • High chance of mutation into a thymine (T)

9
Two questions
  • How would we decide if a short strech of genomic
    sequence comes from a CpG island or not?
  • How would we find, given a long piece of
    sequence, the CpG islands in it, if there are any?

10
Discrimination
  • 48 putative CpG islands are extracted
  • Derive 2 models
  • - regions labelled as CpG island ( model)
  • - regions from the remainder (- model)
  • Transition probabilities are set
  • - Where Cst is number of times letter t follows
    letter s

11
Maximum Likelihood Estimators
  • Each row sums to 1
  • Tables are asymmetric

12
Log-odds ratio
13
Discrimination shown
14
Simulation model
15
Simulation - model
16
Todays topics
  • Markov chains
  • Hidden Markov models
  • - Viterbi Algorithm
  • - Forward Algorithm
  • - Backward Algorithm
  • - Posterior Probabilities

17
Hidden Markov Models (HMM) (1)
  • No one-to-one correspondence between states and
    symbols
  • No longer possible to say what state the model is
    in when in xi
  • Transition probability from state k to l
  • pi is the ith state in the path (state sequence)

18
Hidden Markov Models (HMM) (2)
  • Begin state a0k
  • End state a0k
  • In CpG islands example

19
Hidden Markov Models (HMM) (3)
  • We need new set of parameters because we
    decoupled symbols from states
  • Probability that symbol b is seen when in state k

20
Example dishonest casino (1)
  • Fair die and loaded die
  • Loaded die probability 0.5 of a 6 and
    probability 0.1 for 1-5
  • Switch from fair to loaded probability 0.05
  • Switch back probability 0.1

21
Dishonest casino (2)
  • Emission probabilities HMM model that generate
    or emit sequences

22
Dishonest casino (3)
  • Hidden you dont know if die is fair or loaded
  • Joint probability of observed sequence x and
    state sequence p

23
Three algorithms
  • What is the most probable path for generating a
    given sequence?
  • Viterbi Algorithm
  • How likely is a given sequence?
  • Forward Algorithm
  • How can we learn the HMM parameters given a set
    of sequences?
  • Forward-Backward (Baum-Welch) Algorithm

24
Viterbi Algorithm
  • CGCG can be generated on different ways, and with
    different probabilities
  • Choose path with highest probability
  • Most probable path can be found recursively

25
Viterbi Algorithm (2)
  • vk(i) probability of most probable path ending
    in state k with observation i

26
Viterbi Algorithm (3)
27
Viterbi Algorithm
  • Most probable path for CGCG

28
Viterbi Algorithm
  • Result with casino example

29
Three algorithms
  • What is the most probable path for generating a
    given sequence?
  • Viterbi Algorithm
  • How likely is a given sequence?
  • Forward Algorithm
  • How can we learn the HMM parameters given a set
    of sequences?
  • Forward-Backward (Baum-Welch) Algorithm

30
Forward Algorithm (1)
  • Probability over all possible paths
  • Number of possible paths increases exponentonial
    with length of sequence
  • Forward algorithm enables us to compute this
    efficiently

31
Forward Algorithm (2)
  • Replacing maximisation steps for sums in viterbi
    algorithm
  • Probability of observed sequence up to and
    including xi, requiring pi k

32
Forward Algorithm (3)
33
Three algorithms
  • What is the most probable path for generating a
    given sequence?
  • Viterbi Algorithm
  • How likely is a given sequence?
  • Forward Algorithm
  • How can we learn the HMM parameters given a set
    of sequences?
  • Forward-Backward (Baum-Welch) Algorithm

34
Backward Algorithm (1)
  • Probability of observed sequence from xi to the
    end of the sequence, requiring pi k

35
Disadvantage Algorithms
  • Multiplying many probabilities gives very small
    numbers which can lead to underflow errors on the
    computer
  • ? can be solved by doing the algorithms in log
    space, calculating log(vl(i))

36
Backward Algorithm
37
Posterior State Probability (1)
  • Probability that observation xi came from state
    k, given the observed sequence
  • Posterior probability of state k at time i when
    the emitted sequence is known
  • P(pi k x)

38
Posterior State Probability (2)
  • First calculate probability of producing entire
    observed sequence with the ith symbol being
    produced by state k
  • P(x, pi k) fk (i) ? bk (i)

39
Posterior State Probability (3)
  • Posterior probabilities will then be
  • P(x) is result of forward or backward calculation

40
Posterior Probabilities (4)
  • For the casino example

41
Two questions
  • How would we decide if a short strech of genomic
    sequence comes from a CpG island or not?
  • How would we find, given a long piece of
    sequence, the CpG islands in it, if there are any?

42
Prediction of CpG islands
  • First way Viterbi Algorithm
  • - Find most probable path through the model
  • - When this path goes through the state, a
    CpG island is predicted

43
Prediction of CpG islands
  • Second Way Posterior Decoding
  • - function
  • - g(k) 1 for k ? A, C, G, T
  • - g(k) 0 for k ? A-, C-, G-, T-
  • - G(ix) is posterior probability according to
    the model that base i is in a CpG island

44
Summary (1)
  • Markov chain is a collection of states where a
    state depends only on the state before
  • Hidden markov model is a model in which the
    states sequence is hidden

45
Summary (2)
  • Most probable path viterbi algorithm
  • How likely is a given sequence? forward
    algorithm
  • Posterior state probability forward and backward
    algorithms (used for most probable state of an
    observation)
Write a Comment
User Comments (0)
About PowerShow.com