Algorithms For Quartet Based Phylogeny Construction - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Algorithms For Quartet Based Phylogeny Construction

Description:

Given a quartet topology set Q on a taxon set S, Q is compatible iff there is a ... there a phylogeny T on S such that all quartet topologies in Q are satisfied? ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 30
Provided by: gan98
Category:

less

Transcript and Presenter's Notes

Title: Algorithms For Quartet Based Phylogeny Construction


1
Algorithms For Quartet BasedPhylogeny
Construction
Gang Wu Department of Computing
Science University of Alberta Edmonton, Canada
2
Outline
  • Introduction
  • Research methods
  • Computational results
  • Conclusions and future works

3
Common Phylogeny Terminology
Phylogeny pattern of historical relationships
among species (taxa). Tree mathematical
structure used to depict the evolutionary history
of a group of taxa
Leaf Nodes
Branches or Edges
A
Represent the taxa (genes, populations,
etc.) used to infer the phylogeny
internal
B
C
D
ROOT of the Tree (common ancestor of all taxa)
E
Internal Nodes (represent hypothetical ancestors
of the taxa)
4
Quartet Based Phylogeny Construction
  • Quartet four taxa (A, B, C, D)
  • Quartet topology an unrooted tree for a quartet
  • Three possible quartet topologies for a quartet.

ABCD
ACBD
ADBC
5
Process of Quartet Based Phylogeny Construction
6
Definitions
A quartet topology abcd is consistent with a
phylogeny T, or a phylogeny T satisfies a
quartet topology abcd , iff a,b,c,d are all
leaves of T and the path from a to b does not
share any nodes with the path from c to d.
7
b
a
c
aecd
d
f
e
Phylogeny T
quartet topology aecd is consistent with T, or T
satisfies aecd
8
Definitions
Given a quartet topology set Q on a taxon set S,
Q is compatible iff there is a phylogeny on S
which satisfies all the quartet topologies in Q.
A quartet topology set Q is complete iff Q
contains a quartet topology for each four taxa
over S.
9
aecd abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Quartet topology set Q on taxon set
Sa,b,c,d,e,f
Q is complete
10
Problem Descriptions
Quartet Compatibility Problem(QCP) Input A set
Q of quartet topologies on S Question Is Q
compatible? Equivalently, is there a phylogeny T
on S such that all quartet topologies in Q are
satisfied?
In practice, the given quartet topology set Q
usually contains errors and thus is incompatible.
11
Known Results
Quartet Compatibility Problem(QCP) can be solved
in polynomial time if Q is complete. But it is
NP-Complete if Q is incomplete.
Maximum Quartet Consistency Problem (MQC) and
Minimum Quartet Inconsistency Problem (MQI) are
NP-Complete even if Q is complete.
Exact algorithms "Guarantee" to find the
optimal or "best" tree. Heuristic algorithms
Approximate or quick-and-dirty methods that
attempt to find the optimal tree, but cannot
guarantee to do so.
12
Known Results
Lots of Heuristics. Best known approximation
algorithm is Hypercleaning, with approximation
ratio of for MQI, where n is number of
taxa.
13
Research Objectives
  • Exact methods
  • Faster
  • Solve larger size of instances

14
Outline
  • Introduction
  • Research methods
  • Answer set programming based on ultrametric
    phylogeny
  • Computational results
  • Conclusions and future works

15
Ultrametric Phylogeny and Matrix
  • Ultrametric Phylogeny
  • Label each internal node with a positive integer
    number
  • Along any root to a leaf path, the labels on the
    path is strictly decreasing

16
Ultrametric Phylogeny and Matrix
  • Ultrametric Phylogeny
  • Label each internal node with a positive integer
    number
  • Along any root to a leaf path, the labels on the
    path is strictly decreasing

Ultrametric Matrix Each entry value is the label
of least common ancestor of the two leaf nodes.
It is
  • Symmetric, M(i, i) 0 and
  • For every triplet (i, j, k) there are two equal
    values among
  • M(i, j), M(j, k), and M(i, k) and they are
    greater than the third value.

e.g. i1, j3, k4, M(1, 3)M(3, 4)gt M(1, 4)
17
Theorem 1 A quartet topology abcd is consistent
with a phylogeny T iff any ultrametric labeling
scheme M of T satisfies min M(a, c), M(b, d)
gt minM(a, b), M(c, d).
4
3
1
2
s1
s5
s4
s3
s2
s1 s5 s2 s3 is consistent with the tree and
its corresponding matrix min M(1, 2), M(5,
3)4 gt minM(1, 5), M(2, 3)1. Condition
satisfied!
An ultrametric matrix satisfies a quartet
topology abcd if the above inequality is
satisfied
18
Theorem 2 Given a quartet topology set Q on S and
an ultrametric phylogeny T on S, T satisfies k
quartet topologies in Q if and only if the
corresponding ultrametric matrix M on S satisfies
the same k quartet topologies in Q.
We transfer the original MQC problem into an
ultrametric matrix searching problem
19
Problem Formulation
  • Input
  • nn matrix M(i,j), the domain of each matrix
    entry is 0..n-1
  • Quartet topology set Q.
  • Goal
  • Find a solution to M(i,j), so that
  • The matrix is ultrametric
  • The number quartet topologies satisfied by the
    matrix is maximized.

20
Formulation in Answer Set Programming
Domain
1m(1, 2, 1),m(1, 2, 2),m(1, 2, 3),m(1, 2,
4),m(1, 2, 5)1 matrix entry (1,2) takes exactly
one value in the domain 1,5
Ultrametric Constraints
for three matrix values, m(i,j), m(j,k) and
m(i,k), two of them are equal and greater than
the third one
Quartet Constraints
if minm(i,k),m(j,l)gtminm(i,j),m(k,l) then
quartet i,jk,l is satisfied
Objective
maximize q(i,j,k,l)
21
Outline
  • Introduction
  • Research methods
  • Answer set programming based on ultrametric
    phylogeny
  • A lookahead Branch and Bound algorithm
  • Computational results
  • Conclusions and future works

22
Background
Local conflict Incompatible set with 3 quartet
topologies and 5 taxa. For example, abcd, acbe
and acde. Theorem Given a complete set of
quartet topologies Q over a set of taxa S and
some taxon e in S, Q is compatible iff there
exists no local conflict whose taxon set includes
e. Idea Construct a local conflict list
involving a taxon e, and then try to resolve all
the local conflicts in the list by changing less
than k quartet topologies. Method Branch and
Bound
23
Lookahead
Contribution of changing a quartet topology The
difference between the size of the local conflict
lists before and after a quartet topology
changing.
At each search node, we first have a lookahead
mechanism to test the contribution of each
possible branch and choose the one with maximum
contribution to continue searching.
24
Outline of Algorithm
  • At every node in the search tree
  • Test to decide to cut the node
    or not (m
    is the number of local conflicts, k is the
    maximum quartet errors, k1 is the number of
    changed quartet topologies so far)
  • Determine need-to-be-changed quartet topologies
    (If there are 3(k-k1) distinct local conflicts
    involving q, then q must be changed)
  • Determine need-to-be-fixed quartet topologies
    (find optimal edges and all the quartet
    topologies consistent with the optimal edges are
    fixed)
  • Use the quartet inference rules on the quartet
    topologies generated in step 3

25
Outline of Algorithm-Contd
5. Build a local conflict list and partition it
into two parts IF there are
need-to-be-changed quartet topologies
Pick the need-to-be-changed quartet topology
achieving the largest contribution to resolve
ELSE Pick the resolvement way achieving
the largest contribution
26
Experimental Results
Running times of the exact algorithms, compared
with the Fixed Parameter (GN) Algorithm, and the
Dynamic Programming algorithm (DP)
27
Experimental Results
Running times on large size of taxon set
28
Conclusions
  • The answer set programming formulation gives a
    new perspective of the MQC problem.
  • The proposed exact algorithms outperform other
    exact algorithms significantly.
  • In general problem instances, the answer set
    programming method has the greatest efficiency.
  • If the quartet errors are small and the quartet
    topology set is complete, the Lookahead branch
    and bound algorithm has the greatest efficiency.

29
Future Works
  • Design a quartet specific answer set programming
    solver
  • Solve some special instances of MQC problem in
    polynomial time.
Write a Comment
User Comments (0)
About PowerShow.com