Title: Ch.6 Phylogenetic Trees
1Ch.6 Phylogenetic Trees
2Contents
- Phylogenetic Trees
- Character State Matrix
- Perfect Phylogeny
- Binary Character States
- Two Characters
- Distance Matrix
- Additive Trees
- Ultrametric Trees
- Agreement (Isomorphic) between Phylogenies
3Phylogenetic Trees (Phylogenies)
- Explain the evolutionary history of todays
species (Figure 6.1) - A hypothesis do not have enough data about
distant ancestors of present-day species - Characteristic
- Leaf an object or a set of objects, Interior
node hypothetical ancestor objects - Unrooted tree
- Classify input data for phylogeny reconstruction
into main categories - Character state matrix
- Distance matrix
4Character State Matrix
- Character have following features
- Independent inheritance
- Homologous
- Character state matrix
- A matrix M with n rows (objects) and m columns
(characters) - Mij denotes the state the object i has for
character j - Each row is the state vector for an object
5Difficulties to create a phylogeny from a
character state matrix
- Convergence or parallel evolution
- Objects that share the same state are genetically
closer than objects that do not - Reversal
- Gains and losses of the character
- ? assume convergence or reversal should not
happen, or their number should be minimized - Ordered or unordered, directed
6Perfect Phylogeny Problem
- For each state s of each character c, the set of
all nodes u (leaves and interior nodes) for which
the state is s with respect to c must form a
subtree of T - Characters are compatible
- If a set of objects defined by a character state
matrix admits a perfect phylogeny
7Example
8Perfect Phylogeny Problem
- How many different trees can we build for n
objects? - Consider only unrooted binary trees
-
9Binary Character States
- Two phases algorithm (runs in time O(nm))
- Decide whether the input matrix M admits a
perfect phylogeny - Construct one possible phylogeny
- Assume that state 0 is ancestral and state 1 is
derived
10Deciding perfect phylogeny
- A rooted tree T is a perfect phylogeny for input
matrix M, if - Every character in input matrix M there
corresponds an edge in T, and this edge marks the
transition from state 0 to state 1 for that
character - Edges are labeled by their respective characters
and root has character state vector (0, 0, , 0)
11Deciding perfect phylogeny
- Definition 6.1 For each column j of M, let Oj be
the set of objects whose state is 1 for j. Let Oj
be the set of objects whose state is 0 for j - Lemma 6.1 A binary matrix M admits a perfect
phylogeny if and only if for each pair of
character i and j the sets Oi and Oj are disjoint
or one of them contains the other
12Deciding perfect phylogeny
- Example Table 6.2
- O1 B, D, O2 B, O3 D
- O4 A, C, E, O5 A, C, O6 C
- Lemma 6.1 for decision phase takes O(nm2)
- Figure 6.5 Algorithm Perfect Binary Phylogeny
Decision -gt O(nm)
13Deciding perfect phylogeny
- if Lij ? Llj for some i, l and both Lij and Llj
are nonzero then - return FALSE
14Construction perfect phylogeny
- Figure 6.6 Algorithm Perfect Binary Phylogeny
Construction - Running time O(nm)
15Unordered binary character
- The majority state becomes 0 and the other 1
- If equal frequency, choose either one to be 0 and
the other to be 1
16Two characters
- Allow characters can be unordered and have an
arbitrary number of states, but restrict on the
maximum number of characters two - Definition 6.2 A triangulated graph is an
undirected graph in which any cycle with four or
more vertices has a chord, that is, an edge
joining two nonconsecutive vertices of the cycle - Theorem 6.1 To every collection of subtrees T1,
T2, , Tl of a tree T there corresponds a
triangulated graph and vice versa
17Two characters
- Definition 6.3 An intersection graph for a
collection C of sets is the graph G that we get
by mapping each set in C to a vertex of G, and
linking two vertices in G by an edge if the
corresponding sets have a nonempty intersection - Definition 6.4 Given a graph G (V, E) with a
coloring c on V, we say that G can be
c-triangulated if there exists a triangulated
graph H (V, E), such that E ? E and c is a
valid coloring for H. In other words, any edge
present in E but not in E must link two vertices
with different colors
18Two characters
- Theorem 6.2 A character state matrix M, with a
character set defining a coloring c, admits a
perfect phylogeny if and only if its
corresponding SIG can be c-triangulated - Theorem 6.3 A character state matrix M with only
two characters admits a perfect phylogeny if and
only if its corresponding SIG is acyclic
19Example
20Reconstruction algorithm for two characters
- Running time O(n)
- Test for acyclicity -gt O(n)
- Reconstruction of the perfect phylogeny -gt O(n)
21Parsimony and Compatibility
- Real character state matrices are unlikely to
admit perfect phylogenies - Experimental data always carries errors
- The assumptions (no reversals and no convergence)
sometimes are violated - Two approach
- Parsimony criterion
- Allow reversal and convergence events, but to try
to minimize their occurrence - Compatibility criterion
- Find a maximum set of characters that are
compatible -gt exclude characters that cause such
problem
22Algorithms for Distance Matrices
- Problem of reconstructing trees based on
comparative numerical data between n objects,
distance matrix M - Consider two problems
- Reconstructing Additive Trees
- Reconstructing Ultrametric Trees
23Reconstructing Additive Trees
- Metric space
- A set of objects O such that to every pair i, j ?
O and associated a nonnegative real number dij
with the following properties - dij gt 0 for i ? j,
- dij 0 for i j,
- dij dji for all i and j,
- dij dik dkj for all i, j, and k (the triangle
inequality) - M and T are additive
- Tree must have n leaves
- Leaves are nodes with degree one the others with
degree three - All edges in the tree have nonnegative weight
- The weight of the path between any two leaves i
and j must be equal to Mij
24Reconstructing Additive Trees
- Lemma 6.2 A metric space O is additive if and
only if given any four objects of O labeled i, j,
k, and l such that - dij dkl dik djl dil djk
- If M is additive, T is unique (algorithm runs in
time O(n2)) - Real-life distance matrices are rarely additive
due to errors in the distance measurement - Obtain a tree that is as close as possible to an
additive tree - Approaching the problem that is tractable
25Reconstructing Ultrametric Trees
- Given two distance matrices, Ml and Mh,
reconstruct an evolutionary tree such that the
distances measured on the tree fit between
these two input matrices (sandwich constraints,
) - A tree is ultrametric when it is additive and can
be rooted in such a way that the lengths of all
leaf-root paths are equal -gt the objects being
studied have evolved at equal rate from a common
ancestor
26Reconstructing Ultrametric Trees
- link of a and b in MST T (a, b)max
- The largest-weight edge in the unique path from a
to b in T - Definition 6.5 The cut-weight of an edge e of
the minimum spanning tree of Gh is given by -
27Reconstructing Ultrametric Trees
- Reconstruction algorithm -gt runs in time O(n2)
- Compute a MST T of Gh
- Construction of R
- Compute CW(e)
- Build ultrametric tree U
28Agreement between Phylogenies
- In practice it occurs quite often that two
different methods applied on the same data yield
different trees (in the topological sense) - Definition 6.6 We say that a tree Tr refines
another tree Ts whenever Tr can be transformed
into Ts by contracting selected edges from Tr.
Two trees T1 and T2 agree when there exists a
tree T3 that refines both
29Isomorphic
- Two trees T1 and T2 are isomorphic when there is
an one-to-one correspondence between their nodes
such that for every pair u, v of corresponding
nodes, u ? T1 and v ? T2, the objects contained
in leaves below u are the same as the objects
contained in leaves below v - Binary Tree Isomorphism
- Figure 6.21 runs in time O(n)
- General case (leaves contain several objects)
- Figure 6.22 runs in time O(n)