Algorithmic research in phylogeny reconstruction - PowerPoint PPT Presentation

About This Presentation

Title:

Algorithmic research in phylogeny reconstruction

Description:

Simulation study based upon fixed edge lengths, K2P model of evolution, sequence ... For more information. Send me email to make an appointment ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 19

Provided by: csUt8

Learn more at: https://www.cs.utexas.edu

Category:

more less

Transcript and Presenter's Notes

Title: Algorithmic research in phylogeny reconstruction

1
Algorithmic research in phylogeny reconstruction

Tandy Warnow
The University of Texas at Austin

2
Phylogeny
From the Tree of the Life Website,University of
Arizona
Orangutan
Human
Gorilla
Chimpanzee
3
Reconstructing the Tree of Life
Handling large datasets millions of species NSF
funds many projects towards this goal, under the
Assembling the Tree of Life (ATOL) program
4
Current projects

Heuristics for NP-hard optimization problems for
phylogeny reconstruction
Phylogenetic multiple sequence alignment
Detecting and reconstruction horizontal gene
transfer and hybridization
Constructing phylogenies on languages
Graph-theory, combinatorial optimization,
probabilistic analysis, are fundamental to
algorithm development in this area. But all
methods are extensively tested in simulation and
on real data as well. Collaborations with
biologists or linguists are essential.

5
DNA Sequence Evolution
6
Phylogeny Problem
U
V
W
X
Y
TAGCCCA
TAGACTT
TGCACAA
TGCGCTT
AGGGCAT
X
U
Y
V
W
7
Solving NP-hard problems exactly is unlikely
leaves trees
4 3
5 15
6 105
7 945
8 10395
9 135135
10 2027025
20 2.2 x 1020
100 4.5 x 10190
1000 2.7 x 102900

Number of (unrooted) binary trees on n leaves is
(2n-5)!!
If each tree on 1000 taxa could be analyzed in
0.001 seconds, we would find the best tree in
2890 millennia

8
Approaches for solving hard optimization
problems (like maximum parsimony)

Hill-climbing heuristics (which can get stuck in
local optima)
Randomized algorithms for getting out of local
optima
Approximation algorithms (give bounds on what is
possible)

9
Problems with current techniques for MP
Shown here is the performance of a heuristic
maximum parsimony analysis on a real dataset of
almost 14,000 sequences. (Optimal here means
best score to date, using any method for any
amount of time.) Acceptable error is below 0.01.
Performance of TNT with time
10
Performance of NJ, a popular polynomial time
method Nakhleh et al. ISMB 2001

Simulation study based upon fixed edge lengths,
K2P model of evolution, sequence lengths fixed to
1000 nucleotides.
Error rates reflect proportion of incorrect edges
in inferred trees.

0.8
NJ
0.6
Error Rate
0.4
0.2
0
0
400
800
1600
1200
No. Taxa
11
DCMs (Disk-Covering Methods)

DCMs for polynomial time methods improve
topological accuracy (empirical observation), and
have provable theoretical guarantees under Markov
models of evolution
DCMs for hard optimization problems reduce
running time needed to achieve good levels of
accuracy (empirically observation)

12
DCMs Divide-and-conquer for improving phylogeny
reconstruction
13
Boosting phylogeny reconstruction methods

DCMs boost the performance of phylogeny
reconstruction methods.

DCM
Base method M
DCM-M
14
Iterative-DCM3
T
DCM3
Base method
T
15
Rec-I-DCM3 significantly improves performance
Current best techniques
DCM boosted version of best techniques
Comparison of TNT to Rec-I-DCM3(TNT) on one large
dataset
16
DCM1-boosting distance-based methodsNakhleh et
al. ISMB 2001

DCM1-boosting makes distance-based methods more
accurate
Theoretical guarantees that DCM1-NJ converges to
the true tree from polynomial length sequences

0.8
NJ
DCM1-NJ
0.6
Error Rate
0.4
0.2
0
0
400
800
1600
1200
No. Taxa
17
General comments

Everything in phylogeny (just about) is NP-hard
Graph-theory, probability, and optimization are
the basic tools for algorithmic advances
Algorithms are tested on both real and simulated
data.
Collaborations with domain experts (biologists or
linguists) essential to success. (At UT, we have
wonderful biologists to work with, and all my
students collaborate with them.)

18
For more information