Fast Algorithms for Minimum Evolution - PowerPoint PPT Presentation

About This Presentation
Title:

Fast Algorithms for Minimum Evolution

Description:

Tree length formula as a function of average distances. ... If no swap improves length of the tree, stop and return the tree, else perform ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 40
Provided by: DES7159
Category:

less

Transcript and Presenter's Notes

Title: Fast Algorithms for Minimum Evolution


1
Fast Algorithms for Minimum Evolution
  • Richard Desper, NCBI
  • Olivier Gascuel, LIRMM

2
Overview
  1. Statement of phylogeny reconstruction problem and
    various approaches to solving it.
  2. Tree length formula as a function of average
    distances.
  3. Greedy algorithms for tree building and tree
    swapping.
  4. Simulation results.
  5. A few extras regarding consistency and branch
    lengths.

3
Phylogeny Reconstruction
  • General problem reconstruct the evolutionary
    history for a set L of extant species.
  • Input multiple sequence alignment for L or
    matrix of estimates of pairwise evolutionary
    distances.
  • Output weighted phylogeny representing history
    of L and common ancestors.

4
Methods
  • Likelihood methods model-based likelihood
    maximization.
  • Parsimony methods minimize total number of
    mutations in tree.
  • Distance methods fit tree structure to inferred
    evolutionary distances. Leading methods include
    Felsenstein-Fitch-Margoliash weighted
    least-squares and Neighbor-Joining and its
    variants.

5
Felsenstein-Fitch-Margoliash Least-squares Method
  • FITCH searches the space of topologies by
    iteratively adding leaves and by tree swapping.
  • Edge weights and topology are chosen to minimize
    the sum of squares (D is the input metric, DT is
    the induced tree metric)

If sij 1 for all i and j, this is called the
ordinary least-squares method.
6
Minimum Evolution
  • Developed by Rzhetsky and Nei (1992) as a
    modification of the OLS method
  • For each topology T,
  • Define function l assigning OLS lengths to edges
    of T
  • Define size of tree
  • Choose T minimizing l(T )

7
Recursive Definition of DAB
  • If A a, B b, DAB Dab,

All average distances for all pairs of
non-intersecting subtrees of a given topology
can be calculated in O(n2) time.
8
External OLS Edge Length Function
  • If e is the edge connecting the leaf i to the
  • subtrees A and B,

9
Internal OLS Edge Length Function
  • The length of the edge e is (Vach, 1988)

where
10
Tree length formula
  • Lemma with T as to the right,
  • let denote the root of subtree X,
  • and the edge to X for
  • Then,

11
Tree Length Formula
  • With T as in prior slide,

Using lemma and branch length formula for l(e),
12
General approach
  • To search the space of topologies, well keep in
    memory two data structures
  • Sizes of each subtree of given topology
  • Matrix of average distances DXY for X,Y disjoint
    subtrees in given topology
  • As we move from one topology to another, well
    update the matrix, but only as much as needed,
    in an efficient manner.

13
Tree Swapping by NNI
NNI swapping is a basic step in topology building
and searching
14
Tree Length Formula
  • With T as in prior slide,

Using lemma and branch length formula for l(e),
15
Tree Length after NNI
  • Given T g T the tree swap in prior slide, l
    the edge length function

(1)
where l and l are constants depending on the
topologies.
16
OLS FASTNNI
  • Pre-compute average distances between
    non-intersecting sub-trees. (O(n2) computations)
  • Loop over all internal edges, select the best
    swap using Equation (1). (O(n))
  • If no swap improves length of the tree, stop and
    return the tree, else perform the best swap and
    update the matrix of average distances and repeat
    Step 2. (O(n) per swap there is only one new
    split.)
  • Thus, if we require p swaps, the total complexity
    of
  • FASTNNI is O(n2 pn).

17
Balanced Minimum Evolution
  • Gascuel (2000) observed that the OLS/ME method
    was weaker than NJ in approximating the correct
    topology.
  • Pauplin (2000) to simplify tree length
    computation proposed to use a balanced version
    of Minimum Evolution, weighting each sub-tree
    equally when calculating averages if A and B are
    sub-trees of T, with

18
BNNI
  • Calculate balanced averages of all pairs of
    sub-trees. (O(n2))
  • Calculate improvement for each swap using
  • (2)
  • If no tree swap improves length of the tree, stop
    and return tree, else update matrix of average
    distances and repeat Step 2. (O(n diam(T)) per
    swap)
  • The average complexity, when performing p swaps,
    is
  • O(n2 pn diam(T)).

19
Updating Subtree Averages
T
x
X
A
C
e
Y
D
B
Q How many recalculations?
(Hint you can count (x,y) pairs).
A O(n diam(T))
20
Building trees from scratch
  • We have NNI algorithms for OLS and balanced
    branch lengths. But what if we have no initial
    topology for NNIs?

21
OLS Greedy Minimum Evolution
  • Start with three-taxon tree T3
  • For k4 to n,
  • Calculate DkA for each subtree A in Tk-1
  • Express cost of inserting k along edge e as f(e).
  • (Use Equation (3) on the next slide.)
  • Choose e minimizing f. Insert k along e to form
    Tk.
  • Update matrix of average distances between every
    pair of 2-distant subtrees.
  • GME runs in O(n2) running time

22
Greedy Minimum Evolution
We use a variant of Equation (1), where D k.
Let L l(T).
Then
23
Balanced Minimum Evolution
  • Same as GME,except
  • (modifications)
  • Calculate balanced average distances instead of
    ordinary average distances
  • Use l ½ to find weights for insertion points
  • Must keep average distances for all pairs of
    sub-trees.
  • BME runs in O(n2 diam(T)) running time.

24
Simulations
  • Created 24- and 96-taxon trees, 2000 per each
    size, Yule-Harding process (g molecular clock).
  • Edge lengths multiplied by (1.0 mX), where X is
    exponentially distributed.
  • Generated trees with three rates of evolution
  • SeqGen used to generate sequences for each tree
    and rate (12,000 in all)
  • DNADIST used to calculate distance matrices

25
Results topological distances
BNNI improved all input trees
26
Results topological distances
This improvement is large with fast rates and
high numbers of taxa
27
Results topological distances
NNI trees are close to the best possible for BME
28
Results topological distances
The quality of the NNI tree is (mostly) independen
t of starting point
29
Results topological distances
30
Computational Times
in (MMSS)
24 Taxa 96 Taxa 1000 Taxa 4000 Taxa
GME BNNI 0.0263 0.0842 11.3390 0602.1
HGT/FP 0.0252 0.1349 13.8080 0333.1
NJ/BIONJ 0.0630 0.1628 21.2500 2055.9
WEIGHBOR 0.4244 26.8818    
FITCH 4.3745      
Computations done on Sun Enterprise E4500/E5500
running Solaris 8 on 10 400-Mhz processors with 7
Gb memory.
31
Average number of NNIs
24 Taxa 96 Taxa 1000 Taxa 4000 Taxa
GME FASTNNI 1.244 8.446 44.9 336.50
GME BNNI 1.446 11.177 59.1 343.75
BME BNNI 1.070 6.933 29.1 116.25
We see that the average number of NNIs is
considerably lower than the number of taxa.
32
BME WLS
  • Why does the balanced approach work so well?
  • Pauplins formula for the length of a tree is
  • BME is a weighted least squares approach with

Where pT(i,j) is the length of the (i,j) path in
T.
Distantly related taxa see their importance
decrease exponentially.
33
Bonus features
  • BME is a consistent method. As observed
    distances converge to true distances, the true
    topology becomes the minimum evolution tree.
  • The BNNI tree has no negative branch lengths. A
    negative value to the branch length function
    implies a NNI leading to a smaller tree.

34
Consistency of Balanced ME
  • Theorem Suppose S is a weighted tree, and T
    is a tree topology incompatible with S. Let T
    be the tree of topology T with weights
    determined by the balanced scheme. Then
  • l(T) gt l(S).
  • Lemma it suffices to prove the case when S is a
    split metric.

35
Balanced ME consistency
  • Basic idea let l be the tree length function on
    the space of topologies. We find a sequence of
    topologies, TT0, T1, ... TkS such that
  • Each Ti1 can be reached from Ti via one of two
    simple topological transformations
  • l(Ti) gt l(Ti1) for all i.
  • Proof structure modeled after OLS/ME proof
    (Rzhetsky and Nei, 1993).

36
Type I transformation
Color the leaves black or white according to the
split metric S. A Type I transformation uses a
NNI to form a larger monochromatic cluster
This transformation reduces the size of the tree
under l
37
Type II transformation
A Type II transformation uses two NNIs to form
two monochromatic subtrees
This transformation also reduces the value of the
size of the tree under l
38
Positive Branch Lengths after BNNI
  • Recall that the length of an
    edge is described by

We do not perform the switch because
i.e.
Thus
Similarly,
39
Conclusions
  • BME BNNI runs in O((n2 pn) diam(T)), outputs
    trees comparable to (better than) FITCH,
    Weighbor, BioNJ, or NJ.
  • FastME is faster than NJ or its variants.
  • BNNI consistently improved output trees in all
    settings, even when WLS/Fitch trees were input.
  • BNNI outputs tree without negative branch
    lengths.
  • FASTME software available at http//www.ncbi.nlm.n
    ih.gov/CBBResearch/Desper/FastME.html or
    http//www.lirmm.fr/w3ifa/MAAS/.
Write a Comment
User Comments (0)
About PowerShow.com