HMM%20for%20CpG%20Island%20%20Combined%20Model - PowerPoint PPT Presentation

About This Presentation
Title:

HMM%20for%20CpG%20Island%20%20Combined%20Model

Description:

Speech Recognition. Vehicle Trajectory Projection. Gesture Learning for Human-Robot Interface ... Visible: A. G. T. C. HMM for CpG Islands. The Three Problems ... – PowerPoint PPT presentation

Number of Views:162
Avg rating:3.0/5.0
Slides: 34
Provided by: peopleB
Category:

less

Transcript and Presenter's Notes

Title: HMM%20for%20CpG%20Island%20%20Combined%20Model


1
HMM for CpG Islands
Arti Kelkar Pete Rossetti Peter Warren
2
HMM for CpG Islands
  • HMM history
  • General background
  • Three Fundamental problems
  • Evaluation
  • Decoding
  • Training

3
HMM for CpG Islands
  • HMM Applications
  • Bioinformatics
  • Non-Bioinformatics
  • CpG Islands Problem
  • CpG Islands
  • Definition
  • Why interesting
  • Hidden Markov Model for CpG
  • Whats Hidden
  • Mathematica Implementation
  • Training
  • Decoding

4
Andrei Andreyevich Markov1856-1922
5
AA Markov
  • Early 1900s
  • Markov conceives Markov chains including a
    proof of the Central Limit theorem for Markov
    Chains
  • Studies with Chebyshev and takes over his classes
    at Univ. of St. Petersburg
  • 1913
  • Russian government celebrates the 300th
    anniversary of the House of Romanov
  • AA Markov organizes a counter-celebration the
    200th anniversary of Bernoullis Law of Large
    Numbers

6
HMM History
  • 1960s
  • Use of HMMs developed by a cold-war era research
    team in a classified program at the Communication
    Research Division of the Institute for Defense
    Analyses. (Oscar Rothaus).
  • 1970s
  • HMM work is de-classified and is soon being used
    in many peaceful applications.

7
Markov Chain
  • Sunny yesterday
  • gt 0.5 probability that it will be sunny today
    and 0.25 that it will be cloudy or rainy

8
Hidden Markov Model
9
HMM Definition
  • Hidden Markov Model is a triplet (?, A, B)
  • ? Vector of initial state probabilities
  • A Matrix of state transition probabilities
  • B Matrix of observation probabilities
  • N Number of hidden states in the model
  • M Number of observation symbols

10
HMM Three Problems
  • Evaluation
  • Decoding
  • Training

11
HMM - Overview Evaluation Problem
  • Given a set of HMMs, which is the one most
  • likely to have produced the observation sequence?

GACGAAACCCTGTCTCTATTTATCC
p(HMM-3)?
p(HMM-1)?
p(HMM-n)?
p(HMM-2)?
HMM 1
HMM n
HMM 3
HMM 2

Forward Algorithm is used to find Maxp(HMMs)
12
HMM - Overview Decoding Problem
  • States A,C,G,T,A-,C-,G-,T-

A
A
A
A
A
C
C
C
C
C
G
G
G
G
G
T
T
T
T
T
A-
A-
A-
A-
A-
C-
C-
C-
C-
C-
G-
G-
G-
G-
G-
T-
T-
T-
T-
T-
A
G
C
G
C
Obs seq
13
HMM - OverviewTraining Problem
From raw seqence data to Transition
Probabilities
A C G T A- C- G- T-
A C G T A- C- G- T-
How?
14
HMM - Applications BioInformatics
  • DNA Sequence analysis
  • Protein family profiling
  • Prediction of protein folding
  • Prediction of genes
  • Horizontal gene transfer
  • Radiation hybrid mapping, linkage analysis
  • Prediction of DNA functional sites.
  • CpG island prediction
  • Splicing signals prediction

15
HMM - Applications Non-BioInformatics
  • Speech Recognition
  • Vehicle Trajectory Projection
  • Gesture Learning for Human-Robot Interface
  • Positron Emission Tomography (PET)
  • Optical Signal Detection
  • Digital Communications
  • Music Analysis

16
Some HMM based Bioinformatics Resources
  • PROBE www.ncbi.nlm.nih.gov/
  • BLOCKS www.blocks.fhcrc.org/
  • META-MEME www.cse.ucsd.edu/users/bgrundy/metameme.
    1.0.html
  • SAM www.cse.ucsc.edu/research/compbio/sam.ht
    ml
  • HMMERS hmmer.wustl.edu/
  • HMMpro www.netid.com/
  • GENEWISE www.sanger.ac.uk/Software/Wise2/
  • PSI-BLAST www.ncbi.nlm.nih.gov/BLAST/newblast.html
  • PFAM www.sanger.ac.uk/Pfam/

17
HMM for CpG Islands
  • CpG ISLANDS
  • CpG means C precedes G
  • Not CG base pairs

18
HMM for CpG Islands
  • Nucleotides - 4 bases in DNA
  • A (Adenine)
  • C (Cytosine)
  • G (Guanine)
  • T (Thymine)

19
HMM for CpG Islands Whats a CpG Island
CG-poor regions P(CG) 0.07!
CG-rich region P(CG) 0.25

Gene coding region
Promoter region
20
HMM for CpG Islands Why the difference?
  • Away from gene regions
  • The C in CG pairs is usually methylated
  • Methylation inhibits gene transcription
  • These CGs tend to mutate to TG
  • Near promoter and coding regions
  • Methylation is suppressed
  • CGs remain CGs
  • Makes transcription easier!

21
HMM for CpG Islands Motivation
  • CpG-rich regions are associated with genes which
    are frequently transcribed.
  • Helps to understand gene expression related to
    location in genome.

22
HMM for CpG Islands Motivation
  • Q Why an HMM?
  • It can answer the questions
  • Short sequence does it come from a CpG island or
    not?
  • Long sequence where are the CpG islands?
  • So, whats a good model?
  • Well, we need states for ISLAND bases and
  • NON-ISLAND bases

23
HMM for CpG Islands Straight Markov Models
CpG NON-Island (-)
CpG Island ()
24
HMM for CpG Islands Combined Hidden Markov Model
CpG Island
CpG NON-Island
25
HMM for CpG IslandsWhats hidden?
Visible
Hidden
26
HMM for CpG IslandsThe Three Problems
  • (Evaluation not in CpG Islands)
  • Training
  • Decoding

27
HMM for CpG IslandsTraining Problem
HOW? ML or Forward/Backward algorithm
28
HMM for CpG Islands Decoding Problem
  • Viterbi Algorithm
  • Decoding- Meaning of observation sequence by
    looking at the underlying states.
  • Hidden states A,C,G,T,A-,C-,G-,T-
  • Observation sequence CGCGA
  • State sequences C,G,C,G,A or
    C-,G-,C-,G-,A-
  • or C,G-,C,G-,A
  • Most Probable Path C,G,C,G,A

29
HMM for CpG Islands Decoding Problem II
  • Viterbi Algorithm
  • Hidden Markov model S, akl, , el(x).
  • Observed symbol sequence E x1,.,xn.
  • Find - Most probable path of states that resulted
    in symbol sequence E
  • Let vk(i) be the partial probability of the most
    probable path of the symbol sequence x1, x2, ..,
    xi ending in state k. Then
  • v l(i 1) e l(xi1) max(vk(i) akl)

30
HMM for CpG Islands Decoding Problem III

A
A
A
A
A
C
C
C
C
C
G
G
G
G
G
T
T
T
T
T
A-
A-
A-
A-
A-
C-
C-
C-
C-
C-
G-
G-
G-
G-
G-
T-
T-
T-
T-
T-
A
C
G
C
G
31
HMM for CpG Islands Decoding Problem III
  • Summary
  • Computationally less expensive than forward
    algorithm.
  • Partial probability of reaching final state is
    the probability of the most probable path.
  • Decision of best path based on whole sequence,
    not an individual observation.

32
HMM for CpG Islands
  • Now, on to our Mathematica
  • implementation

33
HMM for CpG Islands
  • References
  • R.Dubin,S.Eddy, A.Krogh, and G. Mitchison.
    "Biologiclal Sequence Analysis Probablistic
    models of Proteins and nucleic acids. Cambridge
    University Press, 1998. chapters 3 and 5.
  • A.Krogh,M.Brown,I.Saira Mian,Kimmen Sjolander and
    David Haussler "Hidden Markov Models in
    Computational Biology Appications to Protein
    Modeling J.Mol Biol. (1994) 253, 1501-1531
  • L. Rabiner, A Tutorial on Hidden Markov Models
    and Selected Applications in Speech Recognition,
    Proceedings of the IEEE, Vol. 77, No. 2, Feb.
    1989
  • On-line tutorial
  • http//www.comp.leeds.ac.uk/roger/HiddenMarkovMode
    ls/html_dev/main.html
Write a Comment
User Comments (0)
About PowerShow.com