Hidden Markov Models: Basics - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Hidden Markov Models: Basics

Description:

A flips first, followed by B, then A again. Representation as state ... The persons flipping the coin are hidden? Only the results of the coin flips known? ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 21

Provided by: raj5165

Category:

more less

Transcript and Presenter's Notes

Title: Hidden Markov Models: Basics

1
Hidden Markov Models Basics

Raj Bandyopadhyay
11/07/2001

2
Our running example

Hypothetical Nucleic Acid (HNA)
Two bases or residues H and T
Assume existence of HNA databases
Examples
HTTTHT

3
Motivating Example

HNA sequence database HTT, TTT, HHH, TTH
1 2 3
H T T
T T T
H H H
T T H
P(H)P(T)0.5 P(H)0.25 P(H)P(T)0.5
P(T)0.75
Positions 1 3 are determined by an unbiased
coin, whereas position 2 is determined by a
biased coin.

4
Motivating Example (contd.)

Imagine 2 people A and B
A holds an unbiased coin P(H)P(T)0.5
B holds a biased coin P(T)0.75, P(H)0.25
Our HNA database can be explained
A flips first, followed by B, then A again
Representation as state diagram (Markov Chain).

5
Markov Representation
6
Hidden Markov Model

What if
The persons flipping the coin are hidden?
Only the results of the coin flips known?
I.e. Only emissions known states unknown
Hidden Markov Model
What can we infer
From a given HMM about the data?
From given data about the generating HMM?

7
Terms to understand

State
Transition Transition probability
Emission Emission Probability
Path

t110.5
t220.5
End
e2H0.25 e2T0.75
e1H0.5 e1T0.5
Start
t1E0.5
t120.5
tS11
8
Questions of interest

For a given HMM
Probability of generating a particular output
sequence? Likelihood
Most probable path? Decoding
Adjusting probabilities in the light of observed
sequences? Learning/training

9
Likelihood

Baum-Welch score Likelihood of a sequence s
Sum of likelihoods of all paths generating s
-log Lo(M) -ve log likelihood used in practice
Two kinds of deductions required
Forward Given observations upto time t, to
predict sequence state at time t1
Backward Given observations from time t1, to
deduce sequence state at time t.

10
Calculating Likelihood

The forward recursive relation
Dynamic Programming Store previously calculated
values of at(i) Forward method
Similarly, the backward recursive relation leads
to the Backward method

11
Example
0.4
0.6
1
End
P(H)0.2 P(T)0.8
P(H)0.3 P(T)0.7
Start
1
1
2
Consider the sequence sHT. Paths generating
s 11, 13 LHT(M) 10.20.40.8
(10.20.6)0.7 a1(1)1 a2(1)10.20.4 a2(2)10.
20.6
12
Most Probable Path

Viterbi Score of HMM M w.r.t sequence O
Most probable path to generate O
Recursive method and dynamic programming
Viterbi Algorithm
Let pi(t) path ending in state i at time t

13
Learning Training HMMs

Assume we have seen part X of a complete
sequence Y.
So far, we have a model M to maximize LX(M)
likelihood of X given M
We want a new model M to maximize LY(M), given
X and M
This is the Baum-Welch (EM) algorithm

14
Expectation-Maximization

Expectation Step obtain a score for the
goodness of a new model
Maximization Step set the new model as that
with maximum goodness

15
New Baum-Welch HMM parameters

New transition probabilities
New emission probabilities

16
Gradient Descent

Gradient Descent (Baldi-Chauvin)
Define log-likelihood as an energy measure
Derive new model parameters so to minimize this
energy at every step
Iterative greedy algorithm
Advantages over Baum-Welch
Online updates
Absorbing 0-probability in Baum-Welch

17
A Real Sequence HMM
Matching (main), insert and delete states
18
Implementation issues

Parameter initialization average,
uniform, random etc.
Priors initialized to favor transitions towards
matching (main) states
Initialization from existing Multiple Alignments
Model length Average length of input sequences
Adaptable architecture?????

19
HMMs Advantages

Solid statistical foundation
Efficient learning algorithms
Flexible and general model for sequence
properties
Unsupervised learning from variable-length, raw
sequences

20
HMMs Disadvantages

Large number of unstructured parameters
(emission and transition parameters)
Need large amounts of data
Subtle long-range correlations in real sequences
unaccounted for, due to Markov property

Write a Comment

User Comments (0)