Markov Chains and Hidden Markov Models - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Markov Chains and Hidden Markov Models

Description:

No one-to-one correspondence between states and symbols. No longer ... from state k, given the observed sequence. Posterior probability of state k at time i when ... – PowerPoint PPT presentation

Number of Views:334

Avg rating:3.0/5.0

Slides: 46

Provided by: apst

Category:

more less

Transcript and Presenter's Notes

Title: Markov Chains and Hidden Markov Models

1
Markov Chains and Hidden Markov Models

Marjolijn Elsinga
Elze de Groot

2
Andrei A. Markov

Born 14 June 1856 in Ryazan, RussiaDied 20
July 1922 in Petrograd, Russia
Graduate of Saint Petersburg University (1878)
Work number theory and analysis, continued
fractions, limits of integrals, approximation
theory and the convergence of series

3
Todays topics

Markov chains
Hidden Markov models
- Viterbi Algorithm
- Forward Algorithm
- Backward Algorithm
- Posterior Probabilities

4
Markov Chains (1)

Emitting states

5
Markov Chains (2)

Transition probabilities
Probability of the sequence

6
Key property of Markov Chains

The probability of a symbol xi depends only on
the value of the preceding symbol xi-1

7
Begin and End states

Silent states

8
Example CpG Islands

CpG Cytosine phosphodiester bond Guanine
100 1000 bases long
Cytosine is modified by methylation
Methylation is suppressed in short stretches of
the genome (start regions of genes)
High chance of mutation into a thymine (T)

9
Two questions

How would we decide if a short strech of genomic
sequence comes from a CpG island or not?
How would we find, given a long piece of
sequence, the CpG islands in it, if there are any?

10
Discrimination

48 putative CpG islands are extracted
Derive 2 models
- regions labelled as CpG island ( model)
- regions from the remainder (- model)
Transition probabilities are set
- Where Cst is number of times letter t follows
letter s

11
Maximum Likelihood Estimators

Each row sums to 1
Tables are asymmetric

12
Log-odds ratio
13
Discrimination shown
14
Simulation model
15
Simulation - model
16
Todays topics

Markov chains
Hidden Markov models
- Viterbi Algorithm
- Forward Algorithm
- Backward Algorithm
- Posterior Probabilities

17
Hidden Markov Models (HMM) (1)

No one-to-one correspondence between states and
symbols
No longer possible to say what state the model is
in when in xi
Transition probability from state k to l
pi is the ith state in the path (state sequence)

18
Hidden Markov Models (HMM) (2)

Begin state a0k
End state a0k
In CpG islands example

19
Hidden Markov Models (HMM) (3)

We need new set of parameters because we
decoupled symbols from states
Probability that symbol b is seen when in state k

20
Example dishonest casino (1)

Fair die and loaded die
Loaded die probability 0.5 of a 6 and
probability 0.1 for 1-5
Switch from fair to loaded probability 0.05
Switch back probability 0.1

21
Dishonest casino (2)

Emission probabilities HMM model that generate
or emit sequences

22
Dishonest casino (3)

Hidden you dont know if die is fair or loaded
Joint probability of observed sequence x and
state sequence p

23
Three algorithms

What is the most probable path for generating a
given sequence?
Viterbi Algorithm
How likely is a given sequence?
Forward Algorithm
How can we learn the HMM parameters given a set
of sequences?
Forward-Backward (Baum-Welch) Algorithm

24
Viterbi Algorithm

CGCG can be generated on different ways, and with
different probabilities
Choose path with highest probability
Most probable path can be found recursively

25
Viterbi Algorithm (2)

vk(i) probability of most probable path ending
in state k with observation i

26
Viterbi Algorithm (3)
27
Viterbi Algorithm

Most probable path for CGCG

28
Viterbi Algorithm

Result with casino example

29
Three algorithms

What is the most probable path for generating a
given sequence?
Viterbi Algorithm
How likely is a given sequence?
Forward Algorithm
How can we learn the HMM parameters given a set
of sequences?
Forward-Backward (Baum-Welch) Algorithm

30
Forward Algorithm (1)

Probability over all possible paths
Number of possible paths increases exponentonial
with length of sequence
Forward algorithm enables us to compute this
efficiently

31
Forward Algorithm (2)

Replacing maximisation steps for sums in viterbi
algorithm
Probability of observed sequence up to and
including xi, requiring pi k

32
Forward Algorithm (3)
33
Three algorithms

What is the most probable path for generating a
given sequence?
Viterbi Algorithm
How likely is a given sequence?
Forward Algorithm
How can we learn the HMM parameters given a set
of sequences?
Forward-Backward (Baum-Welch) Algorithm

34
Backward Algorithm (1)

Probability of observed sequence from xi to the
end of the sequence, requiring pi k

35
Disadvantage Algorithms

Multiplying many probabilities gives very small
numbers which can lead to underflow errors on the
computer
? can be solved by doing the algorithms in log
space, calculating log(vl(i))

36
Backward Algorithm
37
Posterior State Probability (1)

Probability that observation xi came from state
k, given the observed sequence
Posterior probability of state k at time i when
the emitted sequence is known
P(pi k x)

38
Posterior State Probability (2)

First calculate probability of producing entire
observed sequence with the ith symbol being
produced by state k
P(x, pi k) fk (i) ? bk (i)

39
Posterior State Probability (3)

Posterior probabilities will then be
P(x) is result of forward or backward calculation

40
Posterior Probabilities (4)

For the casino example

41
Two questions

How would we decide if a short strech of genomic
sequence comes from a CpG island or not?
How would we find, given a long piece of
sequence, the CpG islands in it, if there are any?

42
Prediction of CpG islands

First way Viterbi Algorithm
- Find most probable path through the model
- When this path goes through the state, a
CpG island is predicted

43
Prediction of CpG islands

Second Way Posterior Decoding
- function
- g(k) 1 for k ? A, C, G, T
- g(k) 0 for k ? A-, C-, G-, T-
- G(ix) is posterior probability according to
the model that base i is in a CpG island

44
Summary (1)

Markov chain is a collection of states where a
state depends only on the state before
Hidden markov model is a model in which the
states sequence is hidden

45
Summary (2)

Most probable path viterbi algorithm
How likely is a given sequence? forward
algorithm
Posterior state probability forward and backward
algorithms (used for most probable state of an
observation)

Write a Comment

User Comments (0)