Algorithmic research in phylogeny reconstruction - PowerPoint PPT Presentation

About This Presentation
Title:

Algorithmic research in phylogeny reconstruction

Description:

Simulation study based upon fixed edge lengths, K2P model of evolution, sequence ... For more information. Send me email to make an appointment ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 19
Provided by: csUt8
Category:

less

Transcript and Presenter's Notes

Title: Algorithmic research in phylogeny reconstruction


1
Algorithmic research in phylogeny reconstruction
  • Tandy Warnow
  • The University of Texas at Austin

2
Phylogeny
From the Tree of the Life Website,University of
Arizona
Orangutan
Human
Gorilla
Chimpanzee
3
Reconstructing the Tree of Life
Handling large datasets millions of species NSF
funds many projects towards this goal, under the
Assembling the Tree of Life (ATOL) program
4
Current projects
  • Heuristics for NP-hard optimization problems for
    phylogeny reconstruction
  • Phylogenetic multiple sequence alignment
  • Detecting and reconstruction horizontal gene
    transfer and hybridization
  • Constructing phylogenies on languages
  • Graph-theory, combinatorial optimization,
    probabilistic analysis, are fundamental to
    algorithm development in this area. But all
    methods are extensively tested in simulation and
    on real data as well. Collaborations with
    biologists or linguists are essential.

5
DNA Sequence Evolution
6
Phylogeny Problem
U
V
W
X
Y
TAGCCCA
TAGACTT
TGCACAA
TGCGCTT
AGGGCAT
X
U
Y
V
W
7
Solving NP-hard problems exactly is unlikely
leaves trees
4 3
5 15
6 105
7 945
8 10395
9 135135
10 2027025
20 2.2 x 1020
100 4.5 x 10190
1000 2.7 x 102900
  • Number of (unrooted) binary trees on n leaves is
    (2n-5)!!
  • If each tree on 1000 taxa could be analyzed in
    0.001 seconds, we would find the best tree in
  • 2890 millennia

8
Approaches for solving hard optimization
problems (like maximum parsimony)
  1. Hill-climbing heuristics (which can get stuck in
    local optima)
  2. Randomized algorithms for getting out of local
    optima
  3. Approximation algorithms (give bounds on what is
    possible)

9
Problems with current techniques for MP
Shown here is the performance of a heuristic
maximum parsimony analysis on a real dataset of
almost 14,000 sequences. (Optimal here means
best score to date, using any method for any
amount of time.) Acceptable error is below 0.01.
Performance of TNT with time
10
Performance of NJ, a popular polynomial time
method Nakhleh et al. ISMB 2001
  • Simulation study based upon fixed edge lengths,
    K2P model of evolution, sequence lengths fixed to
    1000 nucleotides.
  • Error rates reflect proportion of incorrect edges
    in inferred trees.

0.8
NJ
0.6
Error Rate
0.4
0.2
0
0
400
800
1600
1200
No. Taxa
11
DCMs (Disk-Covering Methods)
  • DCMs for polynomial time methods improve
    topological accuracy (empirical observation), and
    have provable theoretical guarantees under Markov
    models of evolution
  • DCMs for hard optimization problems reduce
    running time needed to achieve good levels of
    accuracy (empirically observation)

12
DCMs Divide-and-conquer for improving phylogeny
reconstruction
13
Boosting phylogeny reconstruction methods
  • DCMs boost the performance of phylogeny
    reconstruction methods.

DCM
Base method M
DCM-M
14
Iterative-DCM3
T
DCM3
Base method
T
15
Rec-I-DCM3 significantly improves performance
Current best techniques
DCM boosted version of best techniques
Comparison of TNT to Rec-I-DCM3(TNT) on one large
dataset
16
DCM1-boosting distance-based methodsNakhleh et
al. ISMB 2001
  • DCM1-boosting makes distance-based methods more
    accurate
  • Theoretical guarantees that DCM1-NJ converges to
    the true tree from polynomial length sequences

0.8
NJ
DCM1-NJ
0.6
Error Rate
0.4
0.2
0
0
400
800
1600
1200
No. Taxa
17
General comments
  • Everything in phylogeny (just about) is NP-hard
  • Graph-theory, probability, and optimization are
    the basic tools for algorithmic advances
  • Algorithms are tested on both real and simulated
    data.
  • Collaborations with domain experts (biologists or
    linguists) essential to success. (At UT, we have
    wonderful biologists to work with, and all my
    students collaborate with them.)

18
For more information
  • Send me email to make an appointment
  • Check my webpage for tutorials on the subject
  • See http//www.phylo.org and http//www.cs.utexas.
    edu/tandy for more info
Write a Comment
User Comments (0)
About PowerShow.com