Title: A Lookahead BranchandBound
1A Lookahead Branch-and-Bound Algorithm For
Solving MQC Problem
Gang Wu, Jia-huai You, and Guohui Lin Department
of Computing Science University of
Alberta Edmonton, Canada
2Outline
- Introduction
- Research Methods
- Computational Results and Conclusions
3Common Phylogeny Terminology
Phylogeny pattern of historical relationships
among species (taxa). Tree mathematical
structure used to depict the evolutionary history
of a group of species
Leaf Nodes
Branches or Edges
A
Represent the taxa (genes, populations,
etc.) used to infer the phylogeny
internal
B
C
D
ROOT of the Tree (common ancestor of all taxa)
E
Internal Nodes (represent hypothetical ancestors
of the taxa)
4General Process of Phylogeny Construction
Input A set of (DNA or protein) sequences for
the species
Output An evolutionary tree(phylogeny) whose
leaf nodes are the input species
Methods Maximum Parsimony (MP), Maximum
Likelyhood (ML),etc
Not suitable for large trees (over 20 species).
Current software all use heuristics to speed up
computations.
5Quartet Based Phylogeny Construction
- Only one unrooted tree for one, two or three
species - Three possible unrooted resolved trees for four
species (A, B, C, D) - Quartets are the smallest informative unrooted
trees - MP or ML can be solved exactly on quartets
ABCD
ACBD
ADBC
6Process of Quartet Based Phylogeny Construction
7Definitions
A quartet topology abcd is consistent with a
phylogeny T, or a phylogeny T satisfies a
quartet topology abcd , iff a,b,c,d are all
leaves of T and the path from a to b does not
share any nodes with the path from c to d.
8b
a
aecd abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
c
d
f
e
Quartet topologies set Q
Phylogeny T
quartet topology aecd is consistent with T, or T
satisfies aecd
9Definitions
Given a set of quartet topologies Q on a set S of
taxa, Q is compatible, iff there is a phylogeny
on S which satisfies all the quartet topologies
in Q.
A set Q of quartet topologies is complete iff Q
contains a quartet topology for each four taxa
over taxa set S.
10aecd abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Quartet topologies set Q
Phylogeny T
- Q is compatible
- Q is complete
11Problem Descriptions
Quartet Compatibility Problem(QCP) Input A set
Q of quartet topologies on S Question Is Q
compatible? Equivalently, is there a phylogeny T
on S such that all quartet topologies in Q are
satisfied?
12aced abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Input quartet topologies set Q
No
Quartet Compatibility Problem(QCP)?
MQC or MQI ?
13aced abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Input quartet topologies set Q
No
Quartet Compatibility Problem(QCP)?
MQC or MQI ?
Only aced is not satisfied The satisfied
quartet topology is aecd
14Known Results
Quartet Compatibility Problem(QCP) can be solved
in polynomial time if the given quartet
topologies set Q is complete. But it is
NP-Complete if Q is incomplete.
Maximum Quartet Consistency Problem (MQC) and
Minimum Quartet Inconsistency Problem (MQI) are
NP-Complete even if Q is complete.
Exact algorithms "Guarantee" to find the
optimal or "best" tree. Heuristic algorithms
Approximate or quick-and-dirty methods that
attempt to find the optimal tree, but cannot
guarantee to do so.
15Known Results
Lots of Heuristics. Best known approximation
algorithm is Hypercleaning, with approximation
ratio of for MQI, where n is number of
taxa.
Dynamic programming can solve MQC problem with 20
taxa in 6 days in a 300MHz computer. Fixed
Parameter Algorithm can solve MQI problem with 50
taxa when k 100 in 40 minutes in a 750MHz
computer.
16Theorems
Local conflict Incompatible quartet topologies
set with 3 quartet topologies and 5 taxa. For
example, abcd, acbe and acde. Theorem 1.
Given a complete set of quartet topologies Q over
a set of taxa S and some taxon e in S, Q is
compatible iff there exists no local conflict
whose taxa set includes e. Idea Construct a
local conflict list involving a taxon e, and then
try to resolve all the local conflicts in the
list by changing less than k quartet
topologies. Method Branch and
Bound Complexity O(4knn4) computation and
O(kn4) memory.
17Theorems
Theorem 2. m number of local conflicts
involving e. We need change at least
quartet topologies to resolve all the local
conflicts.
This theorem can be used as a bound factor to cut
a node during the Branch-and-Bound search.
18Theorems
Theorem 3. For a quartet topology q in Q, if
there are more than 3k distinct local conflicts
that contain q, then q must be changed in the
optimal solution.
This theorem can be used as a branch factor used
to choose which quartet topology we should choose
to change
19Theorems
- Theorem 4. For a bipartition(edge) (X,Y) of S
where Xl, - p1 the number of quartet errors in Q across
(X,Y), - p2 the number of nonexchangeable l-subsets on X,
- p3 the number of nonexchangeable (n-l)-subsets
on Y. - If 2p1(l-1)p2(n-l-1)p3 lt (l-1)(n-l-1), then
bipartition (X,Y) must be in the optimal
phylogeny.
Quartet inference rules abcd, abce
abde abce, acde abce, abde, bcde
They are used to construct a need-to-be-fixed
quartet list, i.e., all the quartet topologies in
the list should not be changed during search.
20Lookahead
Contribution of changing a quartet topology The
difference between the size of the local conflict
lists before and after a quartet topology
changing.
At each search node, we first have a lookahead
mechanism to test the contribution of each
possible branch and choose the one with maximum
contribution to continue searching.
21Outline of Algorithm
- At every node in the search tree
- Use Theorem 2 to decide to cut the node or not
(test
k1 is the number of changed quartet
topologies so far) - Use Theorem 3 to determine need-to-be-changed
quartet (If there are 3(k-k1) distinct local
conflicts involving q, then q must be changed) - Use Theorem 4 to determine need-to-be-fixed
quartets (find optimal bipartitions and all the
quartet topologies consistent with the optimal
bipartitions are fixed) - Use the quartet inference rules on the quartet
topologies generated in step 3 -
-
22Outline of Algorithm-Contd
5. Build a local conflict list and partition it
into two parts IF there are
need-to-be-changed quartet topologies
Pick the need-to-be-changed quartet topology
achieving the largest contribution to resolve
ELSE Pick the resolvement way achieving
the largest contribution
23Experimental Results
Comparison between the GN algorithm and our
algorithm on Finding the first solution whose
quartet errors are less than k
24Experimental Results
Comparison among Hypercleaning, LBnB-1st, and
LBnB-Opt. Hypercleaning is a heuristic algorithm
to MQC problem LBnB-1st will stop when the first
solution is found LBnB-Opt will search all
possible solutions and output the optimal one
25Conclusions
- Our algorithm can be regarded as an improvement
over the GN algorithm. - It outperforms other exact algorithms
significantly in both finding the first solution
and the optimal solution. - In some instances, our algorithm has competitive
running times to the heuristic hypercleaning
method. -
-
26Acknowledgement
- This research work was supported by
- CFI
- NSERC
- NNSF Grant 60373012
Thanks