Title: Algorithms For Quartet Based Phylogeny Construction
1Algorithms For Quartet BasedPhylogeny
Construction
Gang Wu Department of Computing
Science University of Alberta Edmonton, Canada
2Outline
- Introduction
- Research methods
- Computational results
- Conclusions and future works
3Common Phylogeny Terminology
Phylogeny pattern of historical relationships
among species (taxa). Tree mathematical
structure used to depict the evolutionary history
of a group of taxa
Leaf Nodes
Branches or Edges
A
Represent the taxa (genes, populations,
etc.) used to infer the phylogeny
internal
B
C
D
ROOT of the Tree (common ancestor of all taxa)
E
Internal Nodes (represent hypothetical ancestors
of the taxa)
4Quartet Based Phylogeny Construction
- Quartet four taxa (A, B, C, D)
- Quartet topology an unrooted tree for a quartet
- Three possible quartet topologies for a quartet.
ABCD
ACBD
ADBC
5Process of Quartet Based Phylogeny Construction
6Definitions
A quartet topology abcd is consistent with a
phylogeny T, or a phylogeny T satisfies a
quartet topology abcd , iff a,b,c,d are all
leaves of T and the path from a to b does not
share any nodes with the path from c to d.
7b
a
c
aecd
d
f
e
Phylogeny T
quartet topology aecd is consistent with T, or T
satisfies aecd
8Definitions
Given a quartet topology set Q on a taxon set S,
Q is compatible iff there is a phylogeny on S
which satisfies all the quartet topologies in Q.
A quartet topology set Q is complete iff Q
contains a quartet topology for each four taxa
over S.
9aecd abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Quartet topology set Q on taxon set
Sa,b,c,d,e,f
Q is complete
10Problem Descriptions
Quartet Compatibility Problem(QCP) Input A set
Q of quartet topologies on S Question Is Q
compatible? Equivalently, is there a phylogeny T
on S such that all quartet topologies in Q are
satisfied?
In practice, the given quartet topology set Q
usually contains errors and thus is incompatible.
11Known Results
Quartet Compatibility Problem(QCP) can be solved
in polynomial time if Q is complete. But it is
NP-Complete if Q is incomplete.
Maximum Quartet Consistency Problem (MQC) and
Minimum Quartet Inconsistency Problem (MQI) are
NP-Complete even if Q is complete.
Exact algorithms "Guarantee" to find the
optimal or "best" tree. Heuristic algorithms
Approximate or quick-and-dirty methods that
attempt to find the optimal tree, but cannot
guarantee to do so.
12Known Results
Lots of Heuristics. Best known approximation
algorithm is Hypercleaning, with approximation
ratio of for MQI, where n is number of
taxa.
13Research Objectives
- Exact methods
- Faster
- Solve larger size of instances
14Outline
- Introduction
- Research methods
- Answer set programming based on ultrametric
phylogeny - Computational results
- Conclusions and future works
15Ultrametric Phylogeny and Matrix
- Ultrametric Phylogeny
- Label each internal node with a positive integer
number - Along any root to a leaf path, the labels on the
path is strictly decreasing
16Ultrametric Phylogeny and Matrix
- Ultrametric Phylogeny
- Label each internal node with a positive integer
number - Along any root to a leaf path, the labels on the
path is strictly decreasing
Ultrametric Matrix Each entry value is the label
of least common ancestor of the two leaf nodes.
It is
- Symmetric, M(i, i) 0 and
- For every triplet (i, j, k) there are two equal
values among - M(i, j), M(j, k), and M(i, k) and they are
greater than the third value.
e.g. i1, j3, k4, M(1, 3)M(3, 4)gt M(1, 4)
17Theorem 1 A quartet topology abcd is consistent
with a phylogeny T iff any ultrametric labeling
scheme M of T satisfies min M(a, c), M(b, d)
gt minM(a, b), M(c, d).
4
3
1
2
s1
s5
s4
s3
s2
s1 s5 s2 s3 is consistent with the tree and
its corresponding matrix min M(1, 2), M(5,
3)4 gt minM(1, 5), M(2, 3)1. Condition
satisfied!
An ultrametric matrix satisfies a quartet
topology abcd if the above inequality is
satisfied
18Theorem 2 Given a quartet topology set Q on S and
an ultrametric phylogeny T on S, T satisfies k
quartet topologies in Q if and only if the
corresponding ultrametric matrix M on S satisfies
the same k quartet topologies in Q.
We transfer the original MQC problem into an
ultrametric matrix searching problem
19Problem Formulation
- Input
- nn matrix M(i,j), the domain of each matrix
entry is 0..n-1 - Quartet topology set Q.
- Goal
- Find a solution to M(i,j), so that
- The matrix is ultrametric
- The number quartet topologies satisfied by the
matrix is maximized. -
20Formulation in Answer Set Programming
Domain
1m(1, 2, 1),m(1, 2, 2),m(1, 2, 3),m(1, 2,
4),m(1, 2, 5)1 matrix entry (1,2) takes exactly
one value in the domain 1,5
Ultrametric Constraints
for three matrix values, m(i,j), m(j,k) and
m(i,k), two of them are equal and greater than
the third one
Quartet Constraints
if minm(i,k),m(j,l)gtminm(i,j),m(k,l) then
quartet i,jk,l is satisfied
Objective
maximize q(i,j,k,l)
21Outline
- Introduction
- Research methods
- Answer set programming based on ultrametric
phylogeny - A lookahead Branch and Bound algorithm
- Computational results
- Conclusions and future works
22Background
Local conflict Incompatible set with 3 quartet
topologies and 5 taxa. For example, abcd, acbe
and acde. Theorem Given a complete set of
quartet topologies Q over a set of taxa S and
some taxon e in S, Q is compatible iff there
exists no local conflict whose taxon set includes
e. Idea Construct a local conflict list
involving a taxon e, and then try to resolve all
the local conflicts in the list by changing less
than k quartet topologies. Method Branch and
Bound
23Lookahead
Contribution of changing a quartet topology The
difference between the size of the local conflict
lists before and after a quartet topology
changing.
At each search node, we first have a lookahead
mechanism to test the contribution of each
possible branch and choose the one with maximum
contribution to continue searching.
24Outline of Algorithm
- At every node in the search tree
- Test to decide to cut the node
or not (m
is the number of local conflicts, k is the
maximum quartet errors, k1 is the number of
changed quartet topologies so far) - Determine need-to-be-changed quartet topologies
(If there are 3(k-k1) distinct local conflicts
involving q, then q must be changed) - Determine need-to-be-fixed quartet topologies
(find optimal edges and all the quartet
topologies consistent with the optimal edges are
fixed) - Use the quartet inference rules on the quartet
topologies generated in step 3 -
25Outline of Algorithm-Contd
5. Build a local conflict list and partition it
into two parts IF there are
need-to-be-changed quartet topologies
Pick the need-to-be-changed quartet topology
achieving the largest contribution to resolve
ELSE Pick the resolvement way achieving
the largest contribution
26Experimental Results
Running times of the exact algorithms, compared
with the Fixed Parameter (GN) Algorithm, and the
Dynamic Programming algorithm (DP)
27Experimental Results
Running times on large size of taxon set
28Conclusions
- The answer set programming formulation gives a
new perspective of the MQC problem. - The proposed exact algorithms outperform other
exact algorithms significantly. - In general problem instances, the answer set
programming method has the greatest efficiency. - If the quartet errors are small and the quartet
topology set is complete, the Lookahead branch
and bound algorithm has the greatest efficiency. -
-
29Future Works
- Design a quartet specific answer set programming
solver - Solve some special instances of MQC problem in
polynomial time. -
-