Bioinformatics Algorithms and Data Structures - PowerPoint PPT Presentation

About This Presentation
Title:

Bioinformatics Algorithms and Data Structures

Description:

Note: there are three parts to this question. Only answer the first two parts, i.e. ... Q: So what does frugal mean? A: Frugal: thrifty, economical. ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 54
Provided by: john244
Learn more at: https://cse.sc.edu
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics Algorithms and Data Structures


1
Bioinformatics Algorithms and Data Structures
  • Chapter 17.2-3 Strings and Evolutionary Trees
  • Lecturer Dr. Rose
  • Slides by Dr. Rose
  • April 5, 2007

2
Next Homework Due 4/19/07
  • 3
  • 8
  • Note there are three parts to this question.
    Only answer the first two parts, i.e.,
  • show that if D is ultrametric and D(i, i)0 for
    each i, then D is also additive.
  • Show the converse is not true.
  • 14 (Grad students only.)

3
Additive Distance Trees
  • Real data is rarely ultrametric.
  • A weaker constraint is that data be additive.
  • Recall Additive distances are distances which
  • can be fitted to an unrooted tree such that
  • pairwise taxa distances are equal to the sum of
    the branch lengths connecting them.

4
Additive Distance Trees
  • Consider the relationship between additive and
    ultrametric trees
  • Q Are all ultrametric trees additive?
  • A Yes.
  • Q Are all additive trees ultrametric?
  • A No.

5
Additive Distance Trees
  • Assuming that
  • D is a symmetric n by n distance matrix
  • D contains only zero values on the diagonal
  • D contains only positive off-diagonal values
  • T is an n node tree, then
  • Defn. T is an additive tree for D if, for every
    pair of labeled nodes (i, j), the path from i to
    j has total weight exactly D(i, j).

6
Additive Distance Trees
  • Additive tree problem, given
  • symmetric matrix D
  • zero entries on the diagonal
  • Positive off-diagonal values
  • Find additive tree T for D or determine that one
    does not exist.
  • Imagine that you have a distance matrix D
    representing evolutionary distance between pairs
    of taxa.
  • Q Do you expect the additive tree for D to be
    unique?

7
Additive Distance Trees
  • Q Is there a unique additive tree for D?
  • If you think the answer is yes, why?
  • If you think the answer is no, why?
  • Consider what we know about D and T
  • Is Ts branching pattern is consistent with D?
    (y/n)
  • Are the edge lengths in T consistent with D?
    (y/n)
  • Does D specifies directed edges? (y/n)
  • Does D imply directed edges? (y/n)

8
Additive Distance Trees
(Table and figure from http//imbs.massey.ac.nz/Re
search/MolEvol/Farside/DNA/00312.html)
9
Additive Distance Trees
  • Concept Additive tree problem
  • Given n by n symmetrical matrix D
  • with zero diagonal entries
  • positive off-diagonal values
  • Find additive tree or determine none exists.

10
Additive Distance Trees
  • Concept Compact Additive tree problem
  • Given n by n symmetrical matrix D
  • with zero diagonal entries
  • positive off-diagonal values
  • Find additive tree with exactly n nodes.
  • Q What does this definition say about the
    topology of the tree?
  • A For every node, there must be a corresponding
    row in D.

11
Additive Distance Trees
  • Consider the symmetrical matrix D above and the
    tree T.
  • Q Is T an additive tree for D?
  • Q Is T a compact additive tree for D?

12
Additive Distance Trees
  • Consider the symmetrical matrix D above.
  • Q Does D have a additive tree?
  • Q Does D have a compact additive tree?

13
Additive Distance Trees
  • Defn. Let G(D) be the n-node complete graph
    corresponding to D where nodes are labeled 1 n
    and edges have weight D(i, j).
  • Thm. If there is a compact additive tree T for D,
    then T must be the unique minimum spanning tree
    of G(D).

14
Additive Distance Trees
  • Proof. Let
  • T be a compact additive tree for D.
  • e (x, y) be any edge not in T.
  • We know
  • The path from x to y in T is D(x, y)
  • The edge weight for e is also D(x, y)
  • Since e is not in T, e is strictly greater than
    any edge in the path from x to y in T.

15
Additive Distance Trees
  • Proof. continued
  • Assume that there is some other minimum spanning
    tree T containing e.
  • Removing e splits T into two sets of nodes, S
    S.
  • WLOG, S contains x S contains y.
  • In T there is an edge e that connects the nodes
    in S S
  • Furthermore, e is on the path from x to y in T.
  • Hence e lt e.

16
Additive Distance Trees
  • Proof. continued
  • Create a new spanning tree T by removing e from
    T and adding e.
  • The edge weight of T is less than that of T.
  • This contradicts the assumption that T is a
    minimum spanning tree.
  • T must itself be the unique minimum spanning tree
    of G(D).

17
Additive Distance Trees
  • How can we use this theorem to solve the compact
    additive tree problem in O(n2) time?
  • Answer
  • Construct G(D) from D.
  • Use an O(n2) mst algorithm, such as Prims
    algorithm, that extends a single growing tree T.
  • When an edge e (x, y) is added to T, and x is
    already in T.
  • Compute d(i, y) d(i, x) D(x, y) for all i in
    T. This takes O(n) per iteration and O(n2) for
    all of T.
  • Verify d(i, y) D(i, y)

18
Parsimony
  • Q What is parsimony?
  • A Parsimony extreme or excessive frugality.
  • Q So what does frugal mean?
  • A Frugal thrifty, economical.
  • In this chapter, parsimony is a character-based
    method for reconstructing evolutionary history.
  • Characters are attributes, traits
  • In this section we will look at highly
    constrained trees that express evolutionary
    history.

19
Parsimony
  • Can be used to deduce evolutionary trees
  • Specifies branching order
  • Does not specify divergence times
  • Can be used as basis for a taxonomy
  • This section is a limited introduction to maximum
    parsimony problems
  • Binary-character problems
  • Focus on perfect phylogeny problem

20
Parsimony
  • Defn. Let M be an n by m, binary matrix
    representing n objects with m character traits.
  • Since M is binary, each character trait has two
    possible states, 1 or 0.
  • Cell (p, i) of M has value 1 iff object p has
    character i.
  • M has a flavor similar to the old chestnut animal
    guessing program that uses a binary tree.

21
Parsimony
  • Defn. a phylogenetic tree for M is a rooted tree
    T with exactly n leaves such that
  • Each of the n leaves is labeled by exactly one
    object.
  • Each of the m character-traits labels exactly one
    edge of T.
  • For any object p, the character-traits labeling
    the edges along the path from the root to p are
    exactly those character-traits whose state is one.

22
Parsimony
  • Consider the matrices below
  • do either M1 or M2 have a phylogenetic tree?
  • If so, what does the tree look like?

23
Parsimony
  • Q What is the interpretation of the phylogenetic
    tree?
  • A It is an estimate of the divergent
    evolutionary history of the objects. (does not
    give time)
  • The root represents an ancestor with none of the
    m character-traits.
  • Each character-trait transitions from 0 to 1 only
    once.
  • No character-trait ever transitions from 1 to 0.

24
Parsimony
  • Q In what sense are phylogenetic trees
    parsimonious?
  • A Each character-trait labels exactly 1 edge of
    the tree. The biological assumptions are
  • The root represents an ancestor with none of the
    m character-traits.
  • Each character-trait transitions from 0 to 1 only
    once.
  • No character-trait ever transitions from 1 to 0.

25
Parsimony
  • Q What character-traits can be used?
  • Morphological features
  • (from http//anthro.palomar.edu/hominid/australo_
    2.htm)
  • (Also see http//www.cfsan.fda.gov/frf/rfe3pc00.
    html )

26
Parsimony
  • Q What character-traits can be used?
  • Morphological features
  • Gross anatomical features
  • OTU-specific esoterica
  • DNA-based characters
  • specific substring patterns
  • Specific nucleotides in fixed positions
  • See pages 460 461 for more discussion

27
Parsimony
  • Defn. perfect phylogeny problem given the binary
    matrix M, determine if there is a phylogenetic
    tree for M, if there is one, build it.
  • We will discuss an O(nm)-time algorithm
  • First we need to preprocess M.
  • Consider each column as a binary number
  • msb in row 1
  • sort columns in decreasing order.
  • Let M denote the reordered matrix M.

28
Parsimony
  • Example.

29
Parsimony
  • Defn. for any column k of M, let Ok be the set
    of objects with a one in column k.
  • Obs. If Oj is a proper subset of Ok, then column
    k must be to the left of column j in M.

30
Parsimony
  • Thm. Matrix Mhas a phylogenetic tree iff for
    every pair of columns i, j, either Oi and Oj are
    disjoint or one contains the other.
  • Proof. (Sketch starting on next slide)

31
Parsimony
  • Proof. ?
  • Let T be the phylogenetic tree for M.
  • Consider characters i, j.
  • Let ej be the edge that character j transitions
    from 0 to 1.
  • Let ei be the edge that character i transitions
    from 0 to 1.
  • Objects with character i are below ei in T.
  • Objects with character j are below ej in T.

32
Parsimony
  • Proof. ?
  • There are 4 possible cases
  • ei ej
  • ei is on the path from the root to ej.
  • ej is on the path from the root to ei.
  • The paths diverge before reaching ei or ej.
  • In case 1, Oi Oj.
  • In case 2, Oj ? Oi since all objects possessing j
    possess i.
  • In case 3, Oi ? Oj since all objects possessing i
    possess j.
  • In case 4, Oi ? Oj ?

33
Parsimony
  • Proof. ?for all i, j Oi Oj are disjoint or one
    contains the other
  • Consider objects p and q.
  • Let k be the largest character common to both.
  • All characters i lt k possessed by p are also
    possessed by q
  • All characters i lt k possessed by q are also
    possessed by p
  • So they have share exactly the same characters up
    till k, and none thereafter.

34
Parsimony
  • Proof. ?for all i, j Oi Oj are disjoint or one
    contains the other
  • Label each p with the string that is the
    concatenation of the column numbers for which it
    has nonzero entries. Likewise for q.
  • Append to the string so that no string is a
    prefix of any other.
  • p q have a common prefix but diverge after k
  • The keyword tree (sans failure links) for the n
    objects in M specifies a perfect phylogeny for
    M.

35
Parsimony
  • O(nm) alg. for the perfect phylogeny problem
  • Reorder columns of M in descending order using
    radix sort.
  • Let M be the resulting matrix.
  • Label each column by its column position in M.
  • Q Why do you think we are using radix sort?
  • A radix sort is O(nm). Also it can be applied to
    a number with an arbitrary number of digits.

36
Parsimony
  • For each row p of M, construct the string
    consisting of the characters, in sorted
    (increasing) order, that p possesses.
  • Recall that in step 1 we labeled each character
    by its column position.
  • The string for a given row will be the
    concatenation of the column labels for which the
    row has the value one.

37
Parsimony
  • Build the keyword tree T for the n strings from
    step 2.
  • Recall that the keyword tree for set P is a
    rooted directed tree K satisfying
  • Each edge is labeled with one character
  • Any two edges out of the same node have distinct
    labels.
  • Every pattern Pi in P maps to some node v of K
    s.t. the path from the root to v spells out Pi
  • Every leaf in K is mapped by some pattern in P.

38
Keyword Trees
  • Example From textbook P potato, poetry,
    pottery, science, school

39
Parsimony
  • Test whether T is a perfect phylogeny for M.
  • Verify that T has exactly n leaves such that
  • Each of the n leaves is labeled by exactly one
    object.
  • Each of the m character-traits labels exactly one
    edge of T.
  • For any object p, the character-traits labeling
    the edges along the path from the root to p are
    exactly those character-traits whose state is one.

40
Tree Compatibility
  • Suppose you have two different phylogenetic
    trees.
  • Note even for the same set of taxa we can derive
    different trees by basing the comparison on
    different proteins.
  • Q How can we determine if they describe a
    consistent evolutionary history?
  • Q How can we combine them into a single tree?
  • This section addresses these questions.

41
Tree Compatibility
  • Defn. Phylogenetic tree refinement
  • A phylogenetic tree T? is a refinement of T if T
    can be obtained by a series of contractions of
    edges of T?.
  • Nutshell T? agrees with T, but expresses
    additional evolutionary history.

42
Tree Compatibility
  • Tree refinement T1 T2? T1 T3? T1 T4? Etc?

43
Tree Compatibility
  • Defn. Phylogenetic tree compatibility
  • Trees T1 and T2 are compatible if there exists a
    phylogenetic tree T3 refining both T1 and T2.

44
Tree Compatibility
  • Tree compatibility problem
  • Given two trees, T1 and T2
  • determine if they are compatible.
  • if so, return the refinement tree T3.
  • We will consider a matrix method for finding T3.

45
Tree Compatibility
  • Consider a binary matrix representation of a
    phylogenetic tree
  • There is one row for each object (OTU)
  • There is one column for each internal node
  • Entry (i, j) is one iff the leaf for object i is
    in the subtree rooted at j.
  • Q Would an example help?
  • A Ok, then suggest a simple phylogenetic tree.

46
Tree Compatibility
  • Let M1 be the matrix representation of T1 and
    similarly M2 for T2.
  • Let M3 be the matrix formed by taking the union
    of the columns of M1 and M2.
  • Q What is meant by taking the union of columns?
  • A M3 will contain
  • all columns found only in M1
  • all columns found only in M2
  • One copy of all columns appearing in both M1 and
    M2
  • Obviously, columns will have a different order

47
Tree Compatibility
  • Q What should M3 look like? What about T3?

48
Tree Compatibility
  • Q Do you agree?

49
Tree Compatibility
  • Note In refining T3 to produce T4, in M4 there
    is no impact wrt to the preceding columns in M3

50
Tree Compatibility
  • Theorem
  • T1 and T2 are compatible iff there is a
    phylogenetic tree for M3. A phylogenetic tree T3
    for M3 is a refinement of both T1 and T2.

51
Generalized Perfect Phylogeny
  • Generalization of perfect phylogeny
  • Allow multiple states (gt2) for character-traits
  • Label edges with triple (c x y) where
  • c is the character trait
  • x is the value of the state before the edge
  • y is the value of the state after the edge
  • The starting state for each character is
    specified at the root

52
Generalized Perfect Phylogeny
  • Generalization of perfect phylogeny continued
  • The path from the root to the leaf labeled p
    describes the character traits of the object p.
  • The ending states along this path specify ps
    traits
  • A combination of trait and ending state can
    appear only once in the tree
  • Example character c, ending state y
  • There can only be one edge labeled (c ? y)
  • Where ? Matches any state of c.

53
Generalized Perfect Phylogeny
  • Time complexity for generalized perfect
    phylogeny
  • polynomial in n and m for fixed the number of
    states
  • NP-complete otherwise
Write a Comment
User Comments (0)
About PowerShow.com