Title: Reconstruction of Phylogenetic Trees with Very Short Branches
1Reconstruction of Phylogenetic Trees with Very
Short Branches
- Ilan Gronau
- Technion Israel Institute of Technology
- Haifa, Israel
Joint work with Shlomo Moran , Sagi Snir , Hilary
Finucane
2Phylogenetic Reconstruction
Main objective reconstruct the topology of the
real tree as accurately as possible from short
sequences.
A
1 .. k
B
1 .. k
C
1 .. k
reconstruct
n
X
1 .. k
k
3Adaptive Fast Convergence
Trees with short edges require very long
sequences. We would like to guarantee correct
reconstruction of all sufficiently long edges.
(those which can be reconstructed from the
given sequence-length)
- Classic approach of fast convergence minimize
the sequence length required for correct
reconstruction of the entire tree
- Trees with short edges require very long
sequences. - We would like to guarantee correct reconstruction
of all sufficiently long edges. (those
which can be reconstructed from k-long sequences)
Adaptive fast convergence
4Seeking Adaptive Fast Converging Algs
- The obvious candidates fast converging
algorithms. - Most FC algorithms try to resolve the topology
completely. - Errors in short edges propagate to longer
edges. - Forest reconstruction algorithms (Daskalakis et
al 2006, Mossel 2007) - Return a collection of trees.
- Detect low-evidence (short) edge.
- May not reach all sufficiently long edges.
- A less-likely candidate Bunemans algorithm
(Buneman 1971). - Has edge-reconstruction guarantees (Atteson
1999). - ... only for very long edges w gt2D,DT8.
- Not fast converging.
5Our Adaptive Fast Converging Algorithm
NEW Adaptive Fast Converging Incremental
Reconstruction Algorithm
O(n2) time complexity
- Zero false positives when risk is low, T
contains no faulty splits. - False negative guarantee upper bound on weight
of contracted edges.
6Experimental Results
- Simulated data
- 96-taxon model trees taken from The Methods and
Algorithms in Bioinformatics (MAB) lab, LIRMM.
http//www.lirmm.fr/guindon/simul/. - Simulation of DNA sequence-evolution according to
Jukes-Cantor via SeqGen. - Input distances calculated via the Jukes-Cantor
formula. - Compare reconstructed tree to real tree
- - False positives
- - False negatives
- - RF-distance (FPFN)
Thanks to Hilary Finucane
7Experimental Results
- Suggested strategy
- Gradually increase risk.
- Accumulate only edges consistent with previous
splits. - When to stop?
- - all-the-way
- - 1st inconsistency
- - 1st major inconsistency
n96, k300
edges
risk
8Experimental Results
Average results over 100 trees
n96, k300
? most reliable
? closest to real tree
(4 incons. edges)
? most resolved
9To sum up
- Adaptive Fast Convergence
- Stronger (more natural) requirement than classic
fast convergence. - Adaptive fast converging algorithm (details
omitted). - Application of algorithm
- Accumulating consistent edges while increasing
risk factor. - Different strategies
- Future work
- Optimizing basic black-box algorithm.
- Optimizing execution strategies.
- Using reliable partial reconstruction to deal
with short edges.
10(No Transcript)