Parallel RecursiveIterative Disk Covering Method for Reconstruction of Large Phylogenetic Trees

1 / 46
About This Presentation
Title:

Parallel RecursiveIterative Disk Covering Method for Reconstruction of Large Phylogenetic Trees

Description:

polynomial methods (NJ, UPGMA) yield lower accuracy trees. Iterative Improvement Methods ... Master Node: delegate 3, 4, 2. P2: receive 2. P1: receive 3. P3: ... –

Number of Views:23
Avg rating:3.0/5.0
Slides: 47
Provided by: csR7
Category:

less

Transcript and Presenter's Notes

Title: Parallel RecursiveIterative Disk Covering Method for Reconstruction of Large Phylogenetic Trees


1
Parallel Recursive-Iterative Disk Covering Method
for Reconstruction of Large Phylogenetic Trees
  • Cristian Coarfa and Yuri Dotsenko

Comp 571 Presentation
2
Outline
  • Background
  • Recursive-iterative DCM3 method
  • Parallel recursive-iterative DCM3 method
  • Research issues
  • Proposed experiments
  • Status and future work

3
Phylogeny Example
4
Phylogeny reconstruction methods
  • Polynomial time methods
  • Neighbor-Joining
  • UPGMA (Unweighted Pair Group Method with
    Arithmetic mean)
  • Maximum Parsimony (MP)
  • Maximum Likelihood (ML)
  • Bayesian Inference of phylogeny

5
Maximum Parsimony Overview
  • Input a multiple alignment S of n sequences
  • Output a tree T with n leaves, each leaf is
    labeled by a unique sequence from S, internal
    nodes labeled by sequences, and parsimony score
    is minimized
  • Edge length Hamming distance between the
    sequences at its endpoints

6
Example
7
Outline
  • Background
  • Recursive-iterative DCM3 method
  • Parallel recursive-iterative DCM3 method
  • Research issues
  • Proposed experiments
  • Status and future work

8
Challenges for phylogenetic tree reconstruction
  • (2n-5)!! possible solutions, n - of taxa
  • NP-hard
  • need to work well on tens to hundreds of taxa
  • few thousand for MP, few hundred for ML
  • tree of life has tens to hundred millions taxa
  • high accuracy required
  • 0.01 error rate 99.99 accuracy
  • polynomial methods (NJ, UPGMA) yield lower
    accuracy trees

9
Iterative Improvement Methods
  • Find initial tree
  • Apply local search repeatedly to find trees with
    better score
  • TBR (Tree Bisection and Reconnection)
  • PAUP
  • TNT genetic algorithms, simulated annealing,
    divide-and-conquer

10
TBR
11
Disk-Covering Methods (DCM)
  • decompose the dataset
  • solve the subproblems
  • merge the subproblems
  • refine the resulting tree
  • variants of DCM use different decomposition
    methods

12
DCM variants
  • DCM1
  • produces overlapping clusters of taxa
  • attempts to minimize the intracluster diameter
  • good subproblems, often poor decomposition
    structure
  • DCM2
  • compute a fixed structure graph separator
  • resulting subproblems too large
  • both methods are distance-based

13
DCM3
  • uses a dynamically updated guide tree to direct
    the decomposition
  • the guide tree is the current estimate of the
    phylogeny
  • enables focusing on the best part of the search
    space
  • smaller subproblems than DCM2
  • faster decomposition than DCM1 and DCM2 by not
    insisting on optimality of subproblems

14
Optimal DCM3 Decomposition
  • use a graph separator X that induces the guide
    tree partition C1,,Cm
  • find X that minimizes max1im X U Ci
  • the optimal DCM3 decomposition is formed of the
    subsets X U Ci, for i1,m
  • Theorem The optimal DCM3 decomposition can be
    computed in O(n3) time

15
Fast Suboptimal Decomposition
  • Find a centroid edge in T that produces the most
    balanced bipartition of leaves
  • X is the short subtree around the centroid edge e
  • In all the experiments X was also a graph
    separator for T
  • Decomposition takes O(n2) in practice

16
DCM3
  • The subproblems are further decomposed until they
    are small enough to be solved directly by the
    base tool
  • Resulting subtrees are merged using Strict
    Consensus Merger
  • Theoretical results prove that the resulting tree
    is accurate

17
Rec-I-DCM3
  • Input
  • SS1,,Sn aligned biomolecular sequences
  • chosen base method (TNT, PAUP)
  • starting tree T
  • Algorithm
  • produce smaller subtrees until each subproblem is
    of size at most k
  • compute, merge and resolve the subtrees
  • repeat for a specified number of iterations

18
Experimental evaluation
  • 10 datasets 1322-13921 sequences
  • used TNT as base command (better results than
    PAUP)
  • run five trials starting with different trees

19
Comparison of DCM decompositions
Mean subproblem size
Number of subproblems
20
Average Deviation From Best Score
21
Average MP scores of TNT and Rec-I-DCM3(TNT)
Dataset 10 13921 taxa
22
Outline
  • Background
  • Recursive-iterative DCM3 method
  • Parallel recursive-iterative DCM3 method
  • Research issues
  • Proposed experiments
  • Status and future work

23
Parallel Recursive-Iterative DCM3 (pRec-I-DCM3)
method
  • for iteration 1, num_Iterations
  • Decompose the initial problem
  • Further decompose subproblems in parallel
  • Solve leaf-subproblems in parallel
  • Merge subproblems
  • Refine the resulting tree
  • end

24
pRec-I-DCM3 Overview
  • Task parallelism
  • Data fits into memory
  • Master-slave approach
  • the master node is dedicated to
  • store control information and distribute the work
  • perform initial problem decomposition
  • perform the final merge of subproblems
  • slave nodes wait for composite or leaf
    subproblems from the master, decompose or solve
    (PAUP) them and return results to the to the
    master

25
pRec-I-DCM3 Illustrated
Master Node initialize
P2 initialize
P1 initialize
1
  • Loading guide tree
  • Loading set of taxa

P3 initialize
P4 initialize
  • Database
  • Pending queue
  • Active queue
  • Solved

26
pRec-I-DCM3 Illustrated
Master Node decompose 1
P2 idle
P1 idle
1
4
2
3
P3 idle
P4 idle
  • Database
  • Pending queue
  • Active queue
  • Solved

27
pRec-I-DCM3 Illustrated
Master Node delegate 3, 4, 2
P2 receive 2
P1 receive 3
1
3
MPI Send
2
4
2
3
P3 receive 4
P4 idle
4
  • Database
  • Pending queue
  • Active queue
  • Solved

28
pRec-I-DCM3 Illustrated
Master Node idle
P2 solve 2
P1 decompose 3
1
3
2
5
6
4
2
3
P3 solve 4
P4 idle
4
  • Database
  • Pending queue
  • Active queue
  • Solved

29
pRec-I-DCM3 Illustrated
Master Node receive
P2 solve 2
P1 delegate 6
1
3
2
5
6
4
2
3
P3 solve 4
P4 idle
6
5
4
  • Database
  • Pending queue
  • Active queue
  • Solved

30
pRec-I-DCM3 Illustrated
Master Node delegate 6
P2 solve 2
P1 decompose 5
1
5
2
8
4
7
2
3
P3 solve 4
P4 receive 6
6
5
4
6
  • Database
  • Pending queue
  • Active queue
  • Solved

31
pRec-I-DCM3 Illustrated
Master Node idle
P2 solve 2
P1 decompose 5
1
5
2
8
4
7
2
3
P3 solve 4
P4 solve 6
6
5
4
6
  • Database
  • Pending queue
  • Active queue
  • Solved

32
pRec-I-DCM3 Illustrated
Master Node receive solution
P2 send solution
P1 decompose 5
1
5
2
8
4
7
2
3
P3 solve 4
P4 solve 6
6
5
4
6
  • Database
  • Pending queue
  • Active queue
  • Solved

33
pRec-I-DCM3 Illustrated
Master Node idle
P2 idle
P1 decompose 5
1
5
8
4
7
2
3
P3 solve 4
P4 solve 6
6
5
4
6
  • Database
  • Pending queue
  • Active queue
  • Solved

34
pRec-I-DCM3 Illustrated
Master Node idle
P2 solve 7
P1 solve 8
1
7
8
4
2
3
P3 solve 4
P4 solve 6
6
5
4
6
  • Database
  • Pending queue
  • Active queue
  • Solved

7
8
35
pRec-I-DCM3 Illustrated
Master Node idle
P2 idle
P1 solve 8
1
8
4
2
3
P3 idle
P4 idle
6
5
  • Database
  • Pending queue
  • Active queue
  • Solved

7
8
36
pRec-I-DCM3 Illustrated
Master Node send 5,7
P2 idle
P1 receive 5,7
1
8
4
2
3
P3 idle
P4 idle
6
5
  • Database
  • Pending queue
  • Active queue
  • Solved

7
8
37
pRec-I-DCM3 Illustrated
Master Node idle
P2 idle
P1 merge 5
1
5
7
8
4
2
3
P3 idle
P4 idle
6
5
  • Database
  • Pending queue
  • Active queue
  • Solved

7
8
38
pRec-I-DCM3 Illustrated
Master Node receive
P2 idle
P1 send solution
1
5
7
8
4
2
3
P3 idle
P4 idle
6
5
  • Database
  • Pending queue
  • Active queue
  • Solved

7
8
39
pRec-I-DCM3 Illustrated
Master Node merge
P2 idle
P1 idle
1
4
2
3
P3 idle
P4 idle
6
5
  • Database
  • Pending queue
  • Active queue
  • Solved

7
8
40
pRec-I-DCM3 Illustrated
Master Node global search
P2 idle
P1 idle
1
4
2
3
P3 idle
P4 idle
6
5
  • Database
  • Pending queue
  • Active queue
  • Solved

7
8
41
pRec-I-DCM3 Illustrated
Master Node new iteration
P2 idle
P1 idle
1
Use new tree as the guide tree
P3 idle
P4 idle
  • Database
  • Pending queue
  • Active queue
  • Solved

42
Outline
  • Background
  • Recursive-iterative DCM3 method
  • Parallel recursive-iterative DCM3 method
  • Research issues
  • Proposed experiments
  • Status and future work

43
Research issues
  • Granularity of leaf problems
  • accuracy vs. subproblem granularity
  • scalability
  • given time limit, which granularity yields the
    best parsimony score
  • Scheduling of subproblems
  • what order to solve subproblems considering
    memory constraints and subproblem sizes

44
Proposed Experiments
  • Speedup ( of CPUs, subproblem size)
  • pRec-I-DCM3 using PAUP and TNT relative to
  • Rec-I-DCM3 using PAUP and TNT
  • PAUP and TNT
  • Time to best score
  • pRec-I-DCM3 vs. Rec-I-DCM3
  • Accuracy (max size of leaf-subproblem)
  • parsimony score
  • phylogenetic tree
  • Convergence (wall clock time)

45
More Experiments
  • Problem decomposition statistics
  • Time breakdown for execution of the algorithm

46
Status and Future Work
  • Implementation is almost done
  • No experimental data yet
  • Try pRec-I-DCM3 method with ML
  • Try similar approach for Bayesian methods
  • Develop a general framework to run
    divide-and-conquer algorithms in parallel
  • Distributed master
Write a Comment
User Comments (0)
About PowerShow.com