Hidden Markov Models Modified by Winfried Just - PowerPoint PPT Presentation

1 / 53

About This Presentation

Title:

Hidden Markov Models Modified by Winfried Just

Description:

Suppose that the dealer uses both Fair and Biased coins. ... Each state has its own probability distribution, and the machine switches ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 54

Provided by: mch121

Category:

more less

Transcript and Presenter's Notes

Title: Hidden Markov Models Modified by Winfried Just

1
Hidden Markov ModelsModified by Winfried Just
2
Outline

CG-islands
The Fair Bet Casino
Hidden Markov Model
Decoding Algorithm
Forward-Backward Algorithm
HMM Parameter Estimation
Profile HMM Alignment

3
CG-Islands

Given 4 nucleotides probability of occurrence is
1/4. Thus, probability of occurrence of a
dinucleotide is 1/16.
However, the frequencies of dinucleotides in DNA
sequences vary widely.
In particular, CG is typically underrepresented
CG often mutates to TG. Thus, prob. of CG
occurrence is typically lt (1/16)

4
Why CG-Islands?

CG is the least frequent dinucleotide because C
in CG is easily methylated, then has the tendency
to mutate into T
However, the methylation is suppressed around
genes in a genome. So, CG appears at relatively
high frequency within these CG islands
So, finding the CG islands in a genome is an
important problem

5
CG Islands and the Fair Bet Casino

The CG islands problem can be modeled after a
problem named The Fair Bet Casino

6
TheFair Bet Casino

The game is to flip coins, which results in only
two possible outcomes Head or Tail.
Suppose that the dealer uses both Fair and Biased
coins.
The Fair coin will give Heads and Tails with the
same probability of ½.
The Biased coin will give Heads with a
probability of ¾.

7
The Fair Bet Casino (contd)

Thus, we define the probabilities
P(HF) P(TF) ½
P(HB) ¾, P(TB) ¼
The crooked dealer changes between Fair and
Biased coins with probability 10

8
The Fair Bet Casino Problem

Input A sequence of x x1x2x3xn of coin tosses
made by two possible coins (F or B).
Output A sequence p p1 p2 p3 pn, with each pi
being either F or B indicating that xi is the
result of tossing the Fair or Biased coin
respectively.

9
Problem
Fair Bet Casino Problem Any observed outcome
could have been generated by any sequence of coin
tosses!
Need to incorporate a way to grade different
sequences differently.
Decoding Problem
10
P(xfair coin) vs. P(xbiased coin)

Some definitions
P(xfair coin) probability of generating the
outcome x if the dealer uses the F coin.
P(xbiased coin) probability of generating the
outcome x if the dealer uses the B coin.
k the number of Heads in x.

11
P(xfair coin) vs. P(xbiased coin)

P(xfair coin) 1/2n
P(xbiased coin) 3k/4n
P(xfair coin) P(xbiased coin)
when k n / log23
k 0.67n

12
Log-odds Ratio

We define the log-odds ratio as follows
log2(P(xfair coin) / P(xbiased coin))
Ski1 log2(p(xi) / p-(xi))
n k log23

13
Computing Log-odds Ratio in Sliding Windows
x1x2x3x4x5x6x7x8xn Consider a sliding window
of the outcome sequence. Find the log-odds for
this short window.
14
Hidden Markov Model (HMM)

Can be viewed as an abstract machine with k
hidden states.
Each state has its own probability distribution,
and the machine switches between states according
to this probability distribution.
At each step, the machine makes 2 decisions
What state should it move to next?
What symbol from its alphabet should it emit?

15
Why Hidden?

Observers can see the emitted symbols of an HMM
but have no ability to know which state the HMM
is currently in.
Thus, the goal is to infer the most likely states
of an HMM basing on some given sequence of
emitted symbols.

16
HMM Parameters

S set of all possible emission characters.
Ex. S H, T for coin tossing
S 1, 2, 3, 4, 5, 6 for dice
tossing
S a, c, g, t for nucleotide
sequences
Q set of hidden states, each emitting symbols
from S.
Ex. Fair or Biased coin
CP island or not CP island
coding region or non-coding region

17
HMM Parameters (contd)

A (akl) a Q x Q matrix of probabilities of
changing from state k to state l.
E (ek(b)) a Q x S matrix of probabilities
of emitting symbol b during a step in which the
HMM is in state k.

18
HMM for Fair Bet Casino

The Fair Bet Casino can be defined in HMM terms
as follows
S 0, 1 (0 for Tails and 1 Heads)
Q F,B F for Fair B for Biased coin.
aFF aBB 0.9
aFB aBF 0.1
eF(0) ½ eF(1) ½
eB(0) ¼ eB(1) ¾

19
HMM for Fair Bet Casino (contd)

Visualization of the Transition Probabilities A

20
HMM for Fair Bet Casino (contd)

Visualization of the Emission Probabilities E

21
HMM for Fair Bet Casino (contd)
HMM model for the Fair Bet Casino Problem
22
Hidden Paths

A path p p1 pn in the HMM is defined as a
sequence of states.
Consider path p FFFBBBBBFFF and sequence x
01011101001

x 0 1 0 1 1 1
0 1 0 0 1 p F F F B
B B B B F F F P(xipi) ½ ½
½ ¾ ¾ ¾ ¾ ¾ ½ ½ ½ P(pi-1 ? pi) ½
9/10 9/10 1/10 9/10 9/10 9/10 9/10
1/10 9/10 9/10
23
P(xp) Calculation

P(xp) Probability that sequence x was generated
and the path p was followed, according to the
model M.
n
P(xp) P(p0? p1) . ? P(xi pi).P(pi ? pi1)
i1
n
a p0, p1 . ? e pi (xi) . a pi, pi1
i1

24
Decoding Problem

Goal Find an optimal hidden path of states given
observations.
Input Sequence of observations x x1..xn
generated by an HMM M(S, Q, A, E)
Output A path that maximizes P(xp) (and thus
P(px) ) over all possible paths p.

25
Building Manhattan for Decoding Problem

Andrew Viterbi used the Manhattan grid model to
solve our Decoding Problem.
Every choice of p p1 pn corresponds to a path
in the graph.
The only valid direction in the graph is
eastward.
This graph has Q2(n-1) edges.

26
Edit Graph for Decoding Problem
27
Decoding Problem vs. Alignment Problem
Valid directions in the alignment problem.
Valid directions in the decoding problem.
28
Decoding Problem as Finding a Longest Path in a
DAG

The Decoding Problem is reduced to finding a
longest path in the directed acyclic graph (DAG)
above.
Notes the length of the path is defined as the
product of its edges weights, not the sum.

29
Decoding Problem (contd)

Every path in the graph has weight P(xp).
The Viterbi algorithm finds the path that
maximizes P(xp) among all possible paths.
The Viterbi algorithm runs in O(nQ2) time.

30
Decoding Problem (contd)
w
(k, i)
(l, i1)
The weight w is given by w el(xi1). akl
31
Decoding Problem (contd)

Initialization
sbegin,0 1
sk,0 0 for k ? begin.
Final result
Let p be the optimal path. Then,
P(xp) maxk ? Q sk,n . ak,end

32
Viterbi Algorithm

The value of the product can become extremely
small, which leads to overflowing.
To avoid overflowing, use log value instead. So,
sk,i1 logel(xi1) max k ? Q sk,i
log(akl)

33
Forward-Backward Problem

Given a sequence of coin tosses generated by
an HMM.
Goal find the probability that the dealer was
using a biased coin at a particular time.

34
Forward Algorithm

Defined fk,i (forward probability) as the
probability of emitting the prefix x1xi and
reaching the state pi k.
The recurrence for the forward algorithm is
fk,i ek(xi) . S fk,i-1 . alk
l ?
Q

35
Backward Algorithm

However, forward probability is not the only
factor affecting P(pi kx).
The sequence of transitions and emissions that
the HMM undergoes between pi and pn also affect
P(pi kx).

36
Backward Algorithm (contd)

Backward probability bk,i the probability of
being in state pi k and emitting the suffix
xi1xn.
The backward algorithms recurrence
bk,i S el(xi1) . bl,i1 . Akl
l ? Q

37
Backward-Forward Algorithm

The probability that the dealer used a biased
coin at any moment i is as follows
P(x, pi k)
fk(i) . bk(i)
P(pi kx) _______________
______________
P(x)
P(x)

38
HMM Parameter Estimation

So far, we have assumed that the transition and
emission probabilities are known.
However, in most HMM applications, the
probabilities are not known. Its very hard to
estimate the probabilities.

39
HMM Parameter Estimation (contd)

Let T be a vector combining the unknown
transition and emission probabilities.
Given training sequences x (x1,, xm), let
P(xT) be the maximum probability of x given the
assignment of parameters T.
Then our goal is to find
m
maxT ? P(xjT)
j1

40
Finding Distant Members of a Protein Family

Motivation Distant cousins of functionally
related biological sequences in a protein family
may have weak similarities, and thus fail
statistical tests, but may have weak similarities
with many members of the family. So, the goal is
to align a sequence to all members of the family
at once.
Families of related proteins can be represented
by their multiple alignment and the corresponding
profile.

41
Profile Representation of Protein Families

Aligned DNA sequences can be represented by a 4n
profile matrix reflecting the frequencies of
nucleotides.

Protein family can be represented by a 20n
profile representing frequencies of amino acids.
42
Profiles and HMMs

HMMs can also be used for aligning a sequence
against a profile representing
a protein family.
A 20n profile P corresponds to n sequentially
linked match states M1,,Mn in the profile HMM of
P.

43
Profile HMM
A profile HMM
44
Insertion and Deletion States of Profile HMM

States Ii insertion states
States Di deletion states
Assumption
eIj(a) p(a)
where p(a) is the frequency of the occurrence of
the symbol a in all the sequences.

45
Profile HMM Alignment

Define vMj as the logarithmic likelihood score of
the best path for matching x1, , xi to a
profile HMM ending with xi emitted by the state
Mj.
vIj(i) and vDj(i) are defined similarly.

46
Profile HMM Alignment Dynamic Programming

vMj-1(i-1) log(aMj-1, Mj)
vMj(i) log (eMj(xi)/p(xi)) max
vIj-1(i-1) log(aIj-1, Mj)
vDj-1(i-1) log(aDj-1, Mj)
vMj(i-1) log(aMj, Ij)
vIj(i) log (eIj(xi)/p(xi)) max
vIj(i-1) log(aIj, Ij)
vDj(i-1) log(aDj, Ij)

47
Profile HMM Alignment Dynamic Programming

vMj-1(i-1) log(aMj-1, Dj)
vMj(i) max
vIj-1(i-1) log(aIj-1, Dj)
vDj-1(i-1) log(aDj-1, Dj)

48
Paths in Edit Graph and Profile HMM

A path through an edit graph and the
corresponding path through a profile HMM

49
Speech Recognition

Create an HMM of the words in a language
Each word is a state in Q.
Each of the basic sounds in the language is a
symbol in S.
Input use speech as the input sequence.
Goal find the most probable sequence of states.

50
Speech Recognition Building the Model

Analyze some large source of English sentences,
such as a database of newspaper articles, to form
probability matrixes.
A0i the chance that word i begins a sentence.
Aij the chance that word j follows word i.

51
Building the Model (contd)

Analyze English speakers to determine what sounds
are emitted with what words.
Ek(b) the chance that sound b is spoken in word
k. Allows for alternate pronunciation of words.

52
Speech Recognition Using the Model

Use the same dynamic programming algorithm as
before
Weave the spoken sounds through the model the
same way we wove the rolls of the die through the
casino model.
p represents the most likely set of words.

53
Using the Model (contd)

How well does it work?
Common words, such as the, a, of make
prediction less accurate, since there are so many
words that follow normally.
We can add more states to incorporate a little
context into the decision.

Write a Comment

User Comments (0)