BCB 444544 Introduction to Bioinformatics - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

BCB 444544 Introduction to Bioinformatics

Description:

BCB 444/544 F06 ISU Terribilini #19 - Phylogenetic Methods ... Gibbon. Orangutan. Describe evolutionary relationships between species ... Gibbon. Orangutan. or ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 20
Provided by: off669
Category:

less

Transcript and Presenter's Notes

Title: BCB 444544 Introduction to Bioinformatics


1
BCB 444/544 - Introduction to Bioinformatics
Lab (Oct 5) Background Phylogenetic
Methods Oct 5
2
Background Phylogenetic Methods
Multiple Sequence Alignment
Building Phylogenetic Trees
3
Multiple Sequence Alignment (MSA) Motivation
  • Correspondence. Find out which parts do the same
    thing
  • Similar genes are conserved across widely
    divergent species, often performing similar
    functions
  • Protein Structure prediction
  • Use knowledge of structure of one or more members
    of a protein MSA to predict structure of other
    members
  • Structure is more conserved than sequence
  • Create profiles for protein families
  • Allow us to search for other members of the
    family
  • Genome Assembly Automated reconstruction of
    contig maps of genomic fragments such as ESTs
  • MSA is the starting point for phylogenetic
    analysis

4
Multiple Sequence Alignment
  • VTISCTGSSSNIGAG?NHVKWYQQLPG
  • VTISCTGTSSNIGS??ITVNWYQQLPG
  • LRLSCSSSGFIFSS??YAMYWVRQAPG
  • LSLTCTVSGTSFDD??YYSTWVRQPPG
  • PEVTCVVVDVSHEDPQVKFNWYVDG??
  • ATLVCLISDFYPGA??VTVAWKADS??
  • ATLVCLISDFYPGA??VTVAWKADS??
  • AALGCLVKDYFPEP??VTVSWNSG-??
  • VSLTCLVKGFYPSD??IAVEWESNG-?
  • Goal Bring the greatest number of similar
    characters into the same column of the alignment
  • Similar to alignment of two sequences.

5
Multiple Sequence Alignment Approaches
  • Optimal Global Alignments -Dynamic programming
  • Generalization of Needleman-Wunsch
  • Find alignment that maximizes a score function
  • Computationally expensive Time grows as product
    of sequence lengths
  • Global Progressive Alignments - Match
    closely-related sequences first
  • Global Iterative Alignments - Multiple
    re-building attempts to find best alignment

6
Dynamic Programming MSA General Case
  • For k sequences of length n, dynamic programming
    algorithm does (2k-1) nk operations
  • Example 6 sequences of length 100 require
    6.4X1013 calculations
  • Space for table is nk
  • Implementations (e.g., WashU MSA 2.1) use tricks
    and only search subset of dynamic programming
    table
  • Even this is expensive. E.g., Baylor CM Search
    launcher limits MSA to 8 sequences of 800
    characters and 10 minutes processing time

7
What is a phylogeny?
www.rci.rutgers.edu/dvhowe/ invertzoo/lecture1_20
06slides.pdf
8
Phylogenetic (evolutionary) trees
Describe evolutionary relationships between
species
or
Cannot be known with certainty!
Nevertheless, phylogenies can be useful
9
Applications of Phylogenetic Analysis
  • Inferring function
  • Closely related sequences occupy neighboring
    branches of tree
  • Tracking changes in rapidly evolving populations
    (e.g., viruses)
  • Which genes are under selection?

10
Methods
  • Distance-based
  • Parsimony
  • Maximum likelihood

11
Distance Matrices
a
b
c
d
12
Least Squares
13
Methods
  • Distance-based
  • Parsimony
  • Maximum likelihood

14
Parsimony
Goal Find the tree with least number of
evolutionary changes
a, b
f
c
d
e
d
15
Methods
  • Distance-based
  • Parsimony
  • Maximum likelihood

16
Markov models on trees
  • Observed The species labeling the leaves
  • Hidden The ancestral states
  • Transition probabilities The mutation
    probabilities
  • Assumptions
  • Only mutations are allowed
  • Sites are independent
  • Evolution at each site occurs according to a
    Markov process

17
Models of evolution at a site
  • Transition probability matrix M mij, i, j
    ?A, C, T, G where mij Prob(i ? j mutation
    in 1 time unit)
  • Different branches of tree may have different
    lengths

18
The probability of an assignment
T
G
T
A
G
C
T
Probability mTG mGA mGG mTT mTC mTT
19
Ancestral reconstruction most likely assignment
X
Y
Z
A
G
C
T
L maxX,Y,Z mXY mYA mYG mXZ mZC mZT
Write a Comment
User Comments (0)
About PowerShow.com