Bioinformatics Algorithms and Data Structures - PowerPoint PPT Presentation

About This Presentation

Title:

Bioinformatics Algorithms and Data Structures

Description:

Note: there are three parts to this question. Only answer the first two parts, i.e. ... Q: So what does frugal mean? A: Frugal: thrifty, economical. ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 54

Provided by: john244

Learn more at: https://cse.sc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Bioinformatics Algorithms and Data Structures

1
Bioinformatics Algorithms and Data Structures

Chapter 17.2-3 Strings and Evolutionary Trees
Lecturer Dr. Rose
Slides by Dr. Rose
April 5, 2007

2
Next Homework Due 4/19/07

3
8
Note there are three parts to this question.
Only answer the first two parts, i.e.,
show that if D is ultrametric and D(i, i)0 for
each i, then D is also additive.
Show the converse is not true.
14 (Grad students only.)

3
Additive Distance Trees

Real data is rarely ultrametric.
A weaker constraint is that data be additive.
Recall Additive distances are distances which
can be fitted to an unrooted tree such that
pairwise taxa distances are equal to the sum of
the branch lengths connecting them.

4
Additive Distance Trees

Consider the relationship between additive and
ultrametric trees
Q Are all ultrametric trees additive?
A Yes.
Q Are all additive trees ultrametric?
A No.

5
Additive Distance Trees

Assuming that
D is a symmetric n by n distance matrix
D contains only zero values on the diagonal
D contains only positive off-diagonal values
T is an n node tree, then
Defn. T is an additive tree for D if, for every
pair of labeled nodes (i, j), the path from i to
j has total weight exactly D(i, j).

6
Additive Distance Trees

Additive tree problem, given
symmetric matrix D
zero entries on the diagonal
Positive off-diagonal values
Find additive tree T for D or determine that one
does not exist.
Imagine that you have a distance matrix D
representing evolutionary distance between pairs
of taxa.
Q Do you expect the additive tree for D to be
unique?

7
Additive Distance Trees

Q Is there a unique additive tree for D?
If you think the answer is yes, why?
If you think the answer is no, why?
Consider what we know about D and T
Is Ts branching pattern is consistent with D?
(y/n)
Are the edge lengths in T consistent with D?
(y/n)
Does D specifies directed edges? (y/n)
Does D imply directed edges? (y/n)

8
Additive Distance Trees
(Table and figure from http//imbs.massey.ac.nz/Re
search/MolEvol/Farside/DNA/00312.html)
9
Additive Distance Trees

Concept Additive tree problem
Given n by n symmetrical matrix D
with zero diagonal entries
positive off-diagonal values
Find additive tree or determine none exists.

10
Additive Distance Trees

Concept Compact Additive tree problem
Given n by n symmetrical matrix D
with zero diagonal entries
positive off-diagonal values
Find additive tree with exactly n nodes.
Q What does this definition say about the
topology of the tree?
A For every node, there must be a corresponding
row in D.

11
Additive Distance Trees

Consider the symmetrical matrix D above and the
tree T.
Q Is T an additive tree for D?
Q Is T a compact additive tree for D?

12
Additive Distance Trees

Consider the symmetrical matrix D above.
Q Does D have a additive tree?
Q Does D have a compact additive tree?

13
Additive Distance Trees

Defn. Let G(D) be the n-node complete graph
corresponding to D where nodes are labeled 1 n
and edges have weight D(i, j).
Thm. If there is a compact additive tree T for D,
then T must be the unique minimum spanning tree
of G(D).

14
Additive Distance Trees

Proof. Let
T be a compact additive tree for D.
e (x, y) be any edge not in T.
We know
The path from x to y in T is D(x, y)
The edge weight for e is also D(x, y)
Since e is not in T, e is strictly greater than
any edge in the path from x to y in T.

15
Additive Distance Trees

Proof. continued
Assume that there is some other minimum spanning
tree T containing e.
Removing e splits T into two sets of nodes, S
S.
WLOG, S contains x S contains y.
In T there is an edge e that connects the nodes
in S S
Furthermore, e is on the path from x to y in T.
Hence e lt e.

16
Additive Distance Trees

Proof. continued
Create a new spanning tree T by removing e from
T and adding e.
The edge weight of T is less than that of T.
This contradicts the assumption that T is a
minimum spanning tree.
T must itself be the unique minimum spanning tree
of G(D).

17
Additive Distance Trees

How can we use this theorem to solve the compact
additive tree problem in O(n2) time?
Answer
Construct G(D) from D.
Use an O(n2) mst algorithm, such as Prims
algorithm, that extends a single growing tree T.
When an edge e (x, y) is added to T, and x is
already in T.
Compute d(i, y) d(i, x) D(x, y) for all i in
T. This takes O(n) per iteration and O(n2) for
all of T.
Verify d(i, y) D(i, y)

18
Parsimony

Q What is parsimony?
A Parsimony extreme or excessive frugality.
Q So what does frugal mean?
A Frugal thrifty, economical.
In this chapter, parsimony is a character-based
method for reconstructing evolutionary history.
Characters are attributes, traits
In this section we will look at highly
constrained trees that express evolutionary
history.

19
Parsimony

Can be used to deduce evolutionary trees
Specifies branching order
Does not specify divergence times
Can be used as basis for a taxonomy
This section is a limited introduction to maximum
parsimony problems
Binary-character problems
Focus on perfect phylogeny problem

20
Parsimony

Defn. Let M be an n by m, binary matrix
representing n objects with m character traits.
Since M is binary, each character trait has two
possible states, 1 or 0.
Cell (p, i) of M has value 1 iff object p has
character i.
M has a flavor similar to the old chestnut animal
guessing program that uses a binary tree.

21
Parsimony

Defn. a phylogenetic tree for M is a rooted tree
T with exactly n leaves such that
Each of the n leaves is labeled by exactly one
object.
Each of the m character-traits labels exactly one
edge of T.
For any object p, the character-traits labeling
the edges along the path from the root to p are
exactly those character-traits whose state is one.

22
Parsimony

Consider the matrices below
do either M1 or M2 have a phylogenetic tree?
If so, what does the tree look like?

23
Parsimony

Q What is the interpretation of the phylogenetic
tree?
A It is an estimate of the divergent
evolutionary history of the objects. (does not
give time)
The root represents an ancestor with none of the
m character-traits.
Each character-trait transitions from 0 to 1 only
once.
No character-trait ever transitions from 1 to 0.

24
Parsimony

Q In what sense are phylogenetic trees
parsimonious?
A Each character-trait labels exactly 1 edge of
the tree. The biological assumptions are
The root represents an ancestor with none of the
m character-traits.
Each character-trait transitions from 0 to 1 only
once.
No character-trait ever transitions from 1 to 0.

25
Parsimony

Q What character-traits can be used?
Morphological features
(from http//anthro.palomar.edu/hominid/australo_
2.htm)
(Also see http//www.cfsan.fda.gov/frf/rfe3pc00.
html )

26
Parsimony

Q What character-traits can be used?
Morphological features
Gross anatomical features
OTU-specific esoterica
DNA-based characters
specific substring patterns
Specific nucleotides in fixed positions
See pages 460 461 for more discussion

27
Parsimony

Defn. perfect phylogeny problem given the binary
matrix M, determine if there is a phylogenetic
tree for M, if there is one, build it.
We will discuss an O(nm)-time algorithm
First we need to preprocess M.
Consider each column as a binary number
msb in row 1
sort columns in decreasing order.
Let M denote the reordered matrix M.

28
Parsimony

Example.

29
Parsimony

Defn. for any column k of M, let Ok be the set
of objects with a one in column k.
Obs. If Oj is a proper subset of Ok, then column
k must be to the left of column j in M.

30
Parsimony

Thm. Matrix Mhas a phylogenetic tree iff for
every pair of columns i, j, either Oi and Oj are
disjoint or one contains the other.
Proof. (Sketch starting on next slide)

31
Parsimony

Proof. ?
Let T be the phylogenetic tree for M.
Consider characters i, j.
Let ej be the edge that character j transitions
from 0 to 1.
Let ei be the edge that character i transitions
from 0 to 1.
Objects with character i are below ei in T.
Objects with character j are below ej in T.

32
Parsimony

Proof. ?
There are 4 possible cases
ei ej
ei is on the path from the root to ej.
ej is on the path from the root to ei.
The paths diverge before reaching ei or ej.
In case 1, Oi Oj.
In case 2, Oj ? Oi since all objects possessing j
possess i.
In case 3, Oi ? Oj since all objects possessing i
possess j.
In case 4, Oi ? Oj ?

33
Parsimony

Proof. ?for all i, j Oi Oj are disjoint or one
contains the other
Consider objects p and q.
Let k be the largest character common to both.
All characters i lt k possessed by p are also
possessed by q
All characters i lt k possessed by q are also
possessed by p
So they have share exactly the same characters up
till k, and none thereafter.

34
Parsimony

Proof. ?for all i, j Oi Oj are disjoint or one
contains the other
Label each p with the string that is the
concatenation of the column numbers for which it
has nonzero entries. Likewise for q.
Append to the string so that no string is a
prefix of any other.
p q have a common prefix but diverge after k
The keyword tree (sans failure links) for the n
objects in M specifies a perfect phylogeny for
M.

35
Parsimony

O(nm) alg. for the perfect phylogeny problem
Reorder columns of M in descending order using
radix sort.
Let M be the resulting matrix.
Label each column by its column position in M.
Q Why do you think we are using radix sort?
A radix sort is O(nm). Also it can be applied to
a number with an arbitrary number of digits.

36
Parsimony

For each row p of M, construct the string
consisting of the characters, in sorted
(increasing) order, that p possesses.
Recall that in step 1 we labeled each character
by its column position.
The string for a given row will be the
concatenation of the column labels for which the
row has the value one.

37
Parsimony

Build the keyword tree T for the n strings from
step 2.
Recall that the keyword tree for set P is a
rooted directed tree K satisfying
Each edge is labeled with one character
Any two edges out of the same node have distinct
labels.
Every pattern Pi in P maps to some node v of K
s.t. the path from the root to v spells out Pi
Every leaf in K is mapped by some pattern in P.

38
Keyword Trees

Example From textbook P potato, poetry,
pottery, science, school

39
Parsimony

Test whether T is a perfect phylogeny for M.
Verify that T has exactly n leaves such that
Each of the n leaves is labeled by exactly one
object.
Each of the m character-traits labels exactly one
edge of T.
For any object p, the character-traits labeling
the edges along the path from the root to p are
exactly those character-traits whose state is one.

40
Tree Compatibility

Suppose you have two different phylogenetic
trees.
Note even for the same set of taxa we can derive
different trees by basing the comparison on
different proteins.
Q How can we determine if they describe a
consistent evolutionary history?
Q How can we combine them into a single tree?
This section addresses these questions.

41
Tree Compatibility

Defn. Phylogenetic tree refinement
A phylogenetic tree T? is a refinement of T if T
can be obtained by a series of contractions of
edges of T?.
Nutshell T? agrees with T, but expresses
additional evolutionary history.

42
Tree Compatibility

Tree refinement T1 T2? T1 T3? T1 T4? Etc?

43
Tree Compatibility

Defn. Phylogenetic tree compatibility
Trees T1 and T2 are compatible if there exists a
phylogenetic tree T3 refining both T1 and T2.

44
Tree Compatibility

Tree compatibility problem
Given two trees, T1 and T2
determine if they are compatible.
if so, return the refinement tree T3.
We will consider a matrix method for finding T3.

45
Tree Compatibility

Consider a binary matrix representation of a
phylogenetic tree
There is one row for each object (OTU)
There is one column for each internal node
Entry (i, j) is one iff the leaf for object i is
in the subtree rooted at j.
Q Would an example help?
A Ok, then suggest a simple phylogenetic tree.

46
Tree Compatibility

Let M1 be the matrix representation of T1 and
similarly M2 for T2.
Let M3 be the matrix formed by taking the union
of the columns of M1 and M2.
Q What is meant by taking the union of columns?
A M3 will contain
all columns found only in M1
all columns found only in M2
One copy of all columns appearing in both M1 and
M2
Obviously, columns will have a different order

47
Tree Compatibility

Q What should M3 look like? What about T3?

48
Tree Compatibility

Q Do you agree?

49
Tree Compatibility

Note In refining T3 to produce T4, in M4 there
is no impact wrt to the preceding columns in M3

50
Tree Compatibility

Theorem
T1 and T2 are compatible iff there is a
phylogenetic tree for M3. A phylogenetic tree T3
for M3 is a refinement of both T1 and T2.

51
Generalized Perfect Phylogeny

Generalization of perfect phylogeny
Allow multiple states (gt2) for character-traits
Label edges with triple (c x y) where
c is the character trait
x is the value of the state before the edge
y is the value of the state after the edge
The starting state for each character is
specified at the root

52
Generalized Perfect Phylogeny

Generalization of perfect phylogeny continued
The path from the root to the leaf labeled p
describes the character traits of the object p.
The ending states along this path specify ps
traits
A combination of trait and ending state can
appear only once in the tree
Example character c, ending state y
There can only be one edge labeled (c ? y)
Where ? Matches any state of c.

53
Generalized Perfect Phylogeny