Phylogenetic Inference - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Phylogenetic Inference

Description:

The Molecular Clock ... molecular clocks more reasonable. Relative Rate Test. Test whether sets of sequences are evolving at equal rates (local molecular clock ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 24
Provided by: capr7
Category:

less

Transcript and Presenter's Notes

Title: Phylogenetic Inference


1
Phylogenetic Inference
  • Methods

2
need optimality criteria algorithm to
search for the best tree given the optimality
criteria
3
Best tree OR True tree
4
Types of optimality criteria used to infer
phylogeny from sequence
  • Distance methods
  • Parsimony
  • Likelihood
  • Probabilistic methods
  • Phylogentic invariants

5
Distance based methods
  • Minimum Evolution Principal
  • The tree with the smallest sum of branch lengths
    is the best tree

6
A
B
r
s
t
u
v
D
C
dAB dCD dAD dBC r s u v r t v
s t u 2r 2s 2u 2v 2t
T r s u v t T (dAB
dCD dAD dBC ) / 2
7
Number of possible unrooted trees from n
sequences
e.g. for 20 sequences there are approximately 1020
8
For realistic numbers of sequences it is
impossible to consider all possible trees. Need
algorithms that can arrive at the best tree
without considering all possible trees.
9
Neighbour joining is an approximation to minimum
evolution
10
Neighbour Joining
8
7
1
6
2
3
5
4
Choose the pair that minimizes the length of the
resulting tree
11
Distance methods
  • Calculate the distance CORRECTING FOR MULTIPLE
    HITS
  • The Distance Matrix
  • 7
  • Rat 0.0000 0.0646 0.1434 0.1456
    0.3213 0.3213 0.7018
  • Mouse 0.0646 0.0000 0.1716 0.1743
    0.3253 0.3743 0.7673
  • Rabbit 0.1434 0.1716 0.0000 0.0649
    0.3582 0.3385 0.7522
  • Human 0.1456 0.1743 0.0649 0.0000
    0.3299 0.2915 0.7116
  • Oppossum 0.3213 0.3253 0.3582 0.3299
    0.0000 0.3279 0.6653
  • Chicken 0.3213 0.3743 0.3385 0.2915
    0.3279 0.0000 0.5721
  • Frog 0.7018 0.7673 0.7522 0.7116
    0.6653 0.5721 0.0000

12
Correction for multiple hits
  • A great many models used for nucleotide sequences
    (e.g. JC, K2P, HKY, Rev, Maximum Likelihood)
  • aa sequences are even more complicated!
  • Can take account of different rates of evolution
    at sites (e.g. gamma distribution)
  • Accuracy falls off drastically for highly
    divergent sequences
  • Is it necessary to use the most realistic model??

13
The most accurate nucleotide substitution model
doesnt necessarily give the best estimate of the
true tree - models with higher numbers of
parameters provide distance estimates with higher
variance
14
How to infer the true tree? How to keep
reviewers/editors happy?
15
In short distance methods
  • Can be fast and simple
  • e.g. UPGMA, Neighbour Joining, Minimum Evolution,
    Fitch-Margoliash

16
Maximum Parsimony
  • Occams Razor
  • Entia non sunt multiplicanda praeter
    necessitatem.
  • William of Occam (1300-1349)

The best tree is the one which requires the least
number of substitutions
17
  • Check each topology
  • Count the minimum number of changes required to
    explain the data
  • Choose the tree with the smallest number of
    changes
  • Usually performs well with closely related
    sequences but often performs badly with very
    distantly related sequences
  • With distantly related sequences homoplasy
    becomes a major problem

18
Not all trees need to be considered (branch and
bound method still guarantees to find MP
tree) In practice a heuristic search is often
performed (involving branch swapping e.g. NNI,
SPR, TBR). No guarantee of finding the MP tree.
19
Long Branches Attract
  • In a set of sequences evolving at different
    rates the sequences evolving rapidly are drawn
    together

20
Comparison of methods
  • Inconsistency
  • Neighbour Joining (NJ) is very fast but depends
    on accurate estimates of distance. This is more
    difficult with very divergent data
  • NJ can suffer from Long Branch Attraction
  • Parsimony suffers from Long Branch Attraction.
    This may be a particular problem for very
    divergent data
  • Parsimony can be computationally intensive
  • Codon usage bias can be a problem for MP and NJ
  • NJ and MP both perform well if sequences are not
    too divergent
  • Maximum Likelihood can the most reliable but
    depends on the choice of model and can be very
    slow
  • Methods may be combined

21
(No Transcript)
22
The Molecular Clock
  • For a given protein the rate of sequence
    evolution is approximately constant across
    lineages
  • Zuckerkandl and Pauling (1965)

This would allow speciation and duplication
events to be dated accurately based on molecular
data
Local and approximate molecular clocks more
reasonable
23
Relative Rate Test
  • Test whether sets of sequences are evolving at
    equal rates (local molecular clock hypothesis)

e.g. RRTree, Robinson-Rechavi http//pbil.univ-lyo
n1.fr/software/rrtree.html
Write a Comment
User Comments (0)
About PowerShow.com