BIOINFORMATICS - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

BIOINFORMATICS

Description:

BIOINFORMATICS. Deepak Verghese. CS 6890. Gene Finding With A Hidden Markov model ... A Probabilistic model of both Genome Structure and Evolution ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 30
Provided by: digita1
Category:

less

Transcript and Presenter's Notes

Title: BIOINFORMATICS


1
BIOINFORMATICS
Gene Finding With A Hidden Markov model Of
Genomic Structure and Evolution.
Jakob Skou Pedersen and Jotun Hein
Deepak Verghese CS 6890
2
Number of models have incorprated evolutionary
information in them
  • GPHMM
  • CONSERVED Exon method
  • 2 step GLASS n ROSETTA
  • TWINSCAN which extends GENESCAN
  • etc

3
  • Do not exploit all information in evolutionary
    pattern
  • Not easily extended to multiple genome sequences.

4
(EHMM)
EVOLUTIONARY HIDDEN MARKOV MODEL
A Probabilistic model of both Genome Structure
and Evolution
  • Composed of
  • Hidden Markov Model (HMM)
  • Phylogenetic Tree

5
ADVANTAGES
  • Can handle any number of sequences in an
    alignment.
  • Can have properties of higher order HMMs
  • Can handle variability in the sequences along the
    alignment
  • State of art evolutionary models can be
    incorporated later
  • Evolutionary events between different genomes are
    not treated independently

6
MODEL
  • SCOPE
  • Not to compete with the existing finding methods
  • on performance but to illustrate the power of
    this approach.
  • Relies on a pre produced alignment.

7
MARKOV CHAINS
  • A set of states
  • The transitions from one state to all other
    states, including itself, are governed by a
    probability distribution
  • First order Markov chain the probabilities
    depend solely on the current state
  • n-th order Markov chain n previous states

8
HIDDEN MARKOV MODEL
5 Components
  • A set of states
  • Matrix of transition probabilities ( A )
  • Set of alphabets ( C )
  • Set of emission distribution (e)
  • Initial state distribution ( B )

9
Example of hidden Markov model
  • A C A - - - A T G
  • T C A A C T A T C
  • A C A C - - A G C
  • A G A - - - A T C
  • A C C G - - A T C

NO 11 correspondence between states and
symbols Why the name Hidden ?
10
Components
  • State k
  • Emits symbols (observables) C
  • PROBABILISTIC MODEL
  • Emission Distribution e
  • Initial state
    distribution B
  • Transition Probabilities
    A

11
Path ?
  • Different paths possible for same sequence

12
In EHMM
  • Emission distribution
  • e specified by
  • Evolutionary model Ek
  • Phylogenetic tree T

13
PHYLOGENETIC TREES
14
Motivation The problem of explaining the
evolutionary history of today's species
  • In Phylogenetic trees
  • Leaves represent present day species
  • Character states of inner nodes are missing data
  • Interior nodes represent hypothesized ancestors
  • The length of the brances of a tree represent the
    evolutionary difference.

15
Evolution is often modeled by continuous markov
chains Here evolution along the branches
of the phylogenetic tree is modelled by
Ek Transition probability Pk ( t ) For a branch
length t P k ( t ) exp ( t Q k
) Increasing the number of sequences is
increasing the amount of evolutionary
information. THE ALIGNMENT COLUMN CORRESPONDS
TO THE STATE OF ELOVUTION AT THE LEAVES OF THE
PHYLOGENETIC TREE
16
THE PEOPABILITY OF GENERATING AN ALIGNMENT
COLUMN IN STATE K EQUALS PROBABILITY OF
OBSERVING A GIVEN CHARACTER PATTERN ON THE
LEAVES OF T WHEN GIVEN E k
Phylogenetic tree of the entries of the 3
alignment columns
17
  • Codon based evolutionary model used to calculate
  • emission probability of columns of A
  • Nucleotide Based evolutionary model used to
    calculate
  • emission probability of column B
  • Emission probability of C is got from the
    equilibrium distribution
  • of the the relevant evolutionary model

18
Parameter Estimation
  • Parameters of HMM are estimated by a combination
    of
  • Baum Welch
  • Powell
  • Evolutionary model E
  • divided into
  • E equ
  • E evo

19
Initial State Distribution B can be estimated by
Baum-Welch but It is generally set to 0.000 01
for all states except the intergenic . The
expectation step of Baum-Welch estimates
the number of nucleotides emitted from each
state the expected number of state
transitions Expected number of times a state is
used. Powell another optimization method
estimates E evo
phylogenetic tree T Baum Welch method is used
to estimate E equ
A
20
Therefore Likelihood of an alignment ( x ) given
a parameterization of the EHMM Can be found by
the equation
Here we are summing over all possible paths This
can be done in linear time by Dynamic Programming
21
EHMM is fully probabilistic and can be used to
simulate data and find genes.
  • EUKARYOTIC GENOME MODEL can be used to generate
    alignments.
  • Reduced model produces only inner exons.

eukaryotic EHMM
22
Results
  • Benefits of modeling evolution with a EHMM
  • using a data set of orthologous
    mouse/human gene pair
  • Benefit will depend on divergence between
    sequences compared
  • Key parameter for modelling the difference
    between exons and introns is the dN/dS ratio.

23
(No Transcript)
24
Moreover we see that Evolutionary model shows a
distinct difference between the intergenic
/intron state and the codon state
25
Evaluations were performed on both single and
aligned sequences
26
Graphical Representation
27
Simple model used now not comparable to state of
art methods
Any number of aligned sequences can be handled
28
  • Extensions of the model
  • GENESCAN can be extended into HMM
  • Splice site finders
  • Models of ribosome binding site and promoter
    regions
  • Non geometric length distributions of exons
  • Pseudo higher order EHMM can be constructed.
  • Idea of pair HMM to multiple sequences

29
Disadvantages in present model
  • Existing frame work does not model gaps but
    treats it as missing data.
  • Optimal data for EHMM is a multiple alignment of
    full length genome.
  • Challenge in constructions of the alignment is to
    reduce the noise per signal ratio.
  • BUT ..
Write a Comment
User Comments (0)
About PowerShow.com