CSE182-L10 - PowerPoint PPT Presentation

About This Presentation
Title:

CSE182-L10

Description:

Pr[x,pi=k]=F(i,k) B(i,k) Application of HMMs. How do we modify this to handle indels? ... HMMs are a natural technique for modeling many biological domains. ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 18
Provided by: vineet50
Learn more at: https://cseweb.ucsd.edu
Category:
Tags: cse182 | l10 | models | pr

less

Transcript and Presenter's Notes

Title: CSE182-L10


1
CSE182-L10
  • HMM applications

2
Probability of being in specific states
  • What is the probability that we were in state k
    at step I?
  • PrAll paths that passed through state k
    at step I, and emitted x
  • PrAll paths that
    emitted x

3
The Forward Algorithm
  • Recall vi,j Probability of the most likely
    path the automaton chose in emitting x1xi, and
    ending up in state j.
  • Define fi,j Probability that the automaton
    started from state 1, and emitted x1xi
  • What is the difference?

x1xi
4
Most Likely path versus Probability of Arrival
  • There are multiple paths from states 1..j in
    which the automaton can output x1xi
  • In computing the viterbi path, we choose the most
    likely path
  • Vi,j maxp Prx1xip
  • The probability of emitting x1xi and ending up
    in state j is given by
  • Fi,j ?p Prx1xip

5
The Forward Algorithm
  • Recall that
  • v(i,j) max l?Q v(i-1,l).Al,j .ej(xi)
  • Instead
  • F(i,j) ?l?Q (F(i-1,l).Al,j ).ej(xi)

1
j
6
The Backward Algorithm
  • Define bi,j Probability that the automaton
    started from state i, emitted xi1xn and ended
    up in the final state

xi1xn
x1xi
1
m
i
7
Forward Backward Scoring
  • F(i,j) ?l?Q (F(i-1,l).Al,j ).ej(xi)
  • Bi,j ?l?Q (Aj,l.el(xi1) B(i1,l))
  • Prx,pikF(i,k) B(i,k)

8
Application of HMMs
  • How do we modify this to handle indels?

9
Applications of the HMM paradigm
  • Modifying Profile HMMs to handle indels
  • States Ii insertion states
  • States Di deletion states

1 2 3 4 5 6 7 8
0.9 0.4 0.3 0.6 0.1 0.0 0.2 1.0 0.0
0.2 0.7 0.0 0.3 0.0 0.0 0.0 0.1
0.2 0.0 0.0 0.3 1.0 0.3 0.0 0.0 0.2
0.0 0.4 0.3 0.0 0.5 0.0
A C G T
10
Profile HMMs
  • An assignment of states implies insertion, match,
    or deletion. EX ACACTGTA

1 2 3 4 5 6 7 8
0.9 0.4 0.3 0.6 0.1 0.0 0.2 1.0 0.0
0.2 0.7 0.0 0.3 0.0 0.0 0.0 0.1
0.2 0.0 0.0 0.3 1.0 0.3 0.0 0.0 0.2
0.0 0.4 0.3 0.0 0.5 0.0
A C G T
C
A
A
A
T
G
T
C
11
Viterbi Algorithm revisited
  • Define vMj (i) as the log likelihood score of
    the best path for matching x1..xi to profile HMM
    ending with xi emitted by the state Mj.
  • vIj(i) and vDj(i) are defined similarly.

12
Viterbi Equations for Profile HMMs
vMj-1(i-1) log(AMj-1, Mj) vMj(i)
log (eMj(xi)) max vIj-1(i-1)
log(AIj-1, Mj)
vDj-1(i-1) log(ADj-1,
Mj)
vMj(i-1) log(AMj-1, Ij) vIj(i)
log (eIj(xi)) max vIj(i-1)
log(AIj-1, Ij)
vDj(i-1) log(ADj-1, Ij)
13
Compositional Signals
  • CpG islands. In genomic sequence, the CG
    di-nucleotide is rarely seen
  • CG helps methylation of C, and subsequent
    mutation to T.
  • In regions around a gene, the methylation is
    suppressed, and therefore CG is more common.
  • CpG islands Islands of CG on the genome.
  • How can you detect CpG islands?

14
An HMM for Genomic regions
  • Node A emits A with Prob. 1, and 0 for all other
    bases.
  • The start and end node do not emit any symbol.
  • All outgoing edges from nodes are equi-probable,
    except for the ones coming out of C.

A
G
0.1
.25
end
start
C
T
0.4
.25
15
An HMM for CpG islands
  • Node A emits A with Prob. 1, and 0 for all other
    bases.
  • The start and end node do not emit any symbol.
  • All outgoing edges from nodes are equi-probable,
    except for the ones coming out of C.

A
G
0.25
0.25
end
start
C
T
0.25
16
HMM for detecting CpG Islands
A
B
A
G
A
0.1
end
G
start
end
C
start
0.4
T
C
T
  • In the best parse of a genomic sequence, each
    base is assigned a state from the sets A, and B.
  • Any substring with multiple states coming from B
    can be described as a CpG island.

17
HMM Summary
  • HMMs are a natural technique for modeling many
    biological domains.
  • They can capture position dependent, and also
    compositional properties.
  • HMMs have been very useful in an important
    Bioinformatics application gene finding.
Write a Comment
User Comments (0)
About PowerShow.com