The bootstrap, consenustrees, and supertrees - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

The bootstrap, consenustrees, and supertrees

Description:

Like in many other areas where statistical inference is applied, in ... Recode the input trees as a character matrix where each edge in each input tree ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 44
Provided by: holl167
Category:

less

Transcript and Presenter's Notes

Title: The bootstrap, consenustrees, and supertrees


1
The bootstrap,consenus-trees, and super-trees
Phylogenetics Workhop, 16-18 August 2006
Barbara Holland
2
What is the bootstrap?
  • Like in many other areas where statistical
    inference is applied, in phylogenetics it is not
    just of interest to get a point estimate of the
    phylogenetic tree.
  • We would also like some measure of confidence in
    our point estimate.
  • Is our tree likely to change if we got more
    data, or if we had used slightly different data?
  • How robust is our result to sampling error?
  • The bootstrap is a useful tool for answering
    these sorts of questions.

3
Assessing confidence in trees
  • In 1985 Felsenstein introduced the idea of the
    bootstrap to phylogenetics.
  • For each boostrap sample
  • Create a new alignment by resampling the columns
    of the observed alignment
  • Construct a tree for the bootstrap alignment
  • Can be applied to any method that starts from a
    sequence alignment, e.g., parsimony, likelihood,
    clustering methods if the distances are derived
    from an alignment
  • The bootstrap support for each edge is the number
    of bootstrap trees that edge appears in.

4
1234567 a ATATAAA b ATTATAA c TAAAATA d TATAAAT
1224567 a ATTTAAA b ATTATAA c TAAAATA d TAAAAAT
1334567 a AAATAAA b ATTATAA c TAAAATA d TTTAAAT
1234567 a ATATAAA b ATTATAA c TAAAATA d TATAAAT
1244567 a ATTTAAA b ATAATAA c TAAAATA d TAAAAAT
c
a
a
c
a
a
c
b
d
b
b
b
c
d
d
d
c
a
0.75
b
d
5
Example where the bootstrap is useful
  • Simulate data on the four taxon tree below (JC
    model)
  • Use sequence lengths of 100, 1000, and 10000

6
Example where the bootstrap is not so useful
  • Simulate data on the two four-taxon trees below
    (JC model) in the proportion 55, 45 and
    concatenate the sequences
  • Use total sequence lengths of 100, 1000, and 10000

55
45
7
Consensus trees
  • Consensus trees attempt to summarise the
    information contained in a set of trees, where
    each tree in the set is on the same taxa.
  • Some consensus tree methods are specific to
    rooted trees.

8
Why are consensus methods required?
  • Many phylogenetic methods produce a collection of
    trees rather than a single best tree.
  • Monte Carlo Markov Chain (MCMC)
  • Bootstrapping.
  • Equally parsimonious trees
  • Sometimes trees for different genes produce a
    collection of trees.

9
Terminology Splits and clades
  • Each edge in an unrooted tree corresponds to a
    split or bipartition of the taxa set.
  • Each edge in a rooted tree corresponds to a
    clade.

10
Splits
mouse
dog
turtle
cat, dog, mouse, parrot turtle
parrot
cat
dog, cat mouse, turtle, parrot
cat, dog, mouse turtle, parrot
11
Clades
dog
cat
mouse
parrot
turtle
12
Clades
dog
cat
mouse
parrot
turtle
13
Clades
dog
cat
mouse
parrot
turtle
14
Strict Consensus
  • The strict consensus tree contains only those
    splits/clades that appear in all trees

mouse
mouse
turtle
dog
dog
mouse
dog
turtle
turtle
parrot
parrot
cat
cat
cat
parrot
mouse
dog
turtle
parrot
cat
15
Semi-strict
  • The semi-strict consensus tree also contains
    those splits/clades that dont conflict with any
    of the input trees

mouse
mouse
dog
dog
turtle
turtle
parrot
cat
cat
parrot
mouse
dog
turtle
cat
parrot
16
Majority-Rule
  • The majority-rule consensus tree contains only
    those splits/clades that appear in more than 50
    of the input trees

dog
mouse
turtle
mouse
turtle
dog
mouse
dog
turtle
parrot
cat
parrot
cat
cat
parrot
turtle
dog
mouse
parrot
cat
17
Terminolgy 3-taxon statements
  • 3-taxon statements are triples of three species
    that show two species to be more closely related
    than is the third.
  • E.g. the tree below displays the
    3-taxon statements
  • ((dog,cat),mouse)
  • ((dog,mouse),parrot)
  • ((mouse,parrot),turtle)
  • and others

18
Terminology Rooted trees, hierarchies, clusters,
and partitions
Hierarchy of clusters
Partitions
abcd ef
a,b,c,d
a bcd ef
b,c,d
a b cd ef
e,f
c,d
a
b
c
d
e
f
a
b
c
d
e
f
19
Products of partitions
  • Given k partitions p1, p2, p3,, pk of the same
    set of taxa, the product of these partitions is
    the partition where a and b are in the same block
    if and only if the are in the same block for each
    pi
  • Example The product of abcde and adbce is
    abcde

20
Adams Consensus
  • Adams consensus method only applies to rooted
    trees.
  • It preserves all the 3-taxon statements that are
    common to all of the input trees.
  • Recursive algorithm that looks at the product of
    the maximal partitions of each of the input trees

21
AdamsTree algorithm (from Bryant 2003)
  • Procedure AdamsTree(T1,Tk)
  • if T1 contains only 1 leaf then
  • return T1
  • else
  • construct the product of the maximal partitions
    of the input trees
  • For each block B in the partition do
  • construct AdamsTree(T1B, TkB)
  • Attach the roots of these trees to a new node v
  • return this tree
  • end

22
Adams consensus example
Maximal partition abcd ef
Maximal partition bcde af
Product of maximal partitions abcdef
f
a
b,c,d
e
23
Adams consensus example cont.
Restrict to b,c,d
e
b
c
d
a
f
a
b
c
d
e
f
Maximal partition b cd
Maximal partition b cd
Product of maximal partitions b cd
f
a
b,c,d
e
b
c,d
24
Adams consensus example cont.
Restrict to c,d
e
b
c
d
a
f
a
b
c
d
e
f
Maximal partition c d
Maximal partition c d
f
a
b,c,d
e
Product of maximal partitions c d
b
c,d
c
d
25
What about an Adams like method for unrooted
trees?
  • Instead of triples we would need to consider
    statements about quartets of taxa.
  • If a quartet ((a,b),(c,d)) appeared in all the
    input trees it should be displayed in the output.
  • Easy enough?

26
Three requirements (Steel, Dress and Böcker 2000)
  • Relabelling of the species at the tip of the tree
    should yeild the same answer relabelled in the
    appropriate way
  • The input order of the trees should not matter
  • A quartet that appears in all the input trees
    should appear in the output tree

27
No method can satisfy these 3 requirements
  • Counter example

f
a
b
a
b
c
e
f
c
d
d
e
28
Supertree methods
  • Super-tree methods take a set of trees on
    overlapping taxa sets and return a tree (or
    sometimes a fail message)
  • Biological relevance
  • Not all genes are present in all species
  • Not all genes are easy to sequence for all
    species
  • Assembling the Tree of Life
  • Computationally impossible to try and build a
    tree for all taxa
  • Use a divide and conquer approach
  • And then use supertree methods to piece the Tree
    of Life together

29
Concept Refinement
c
c
b
b
d
refines
d
a
a
e
e
The trees below are also refinements
d
e
b
b
c
d
a
a
e
c
30
Concept Restriction
T
c
b
d
e
a
f
h
g
The label set X a,b,c,d,e,f,g,h We can
restrict T to any subset of the labels X
31
Concept Restriction
E.g. The restriction to a,c,e,g
T
c
c
e
b
d
e
a
a
g
f
h
g
Find the subtree and then supress the degree two
vertices
32
Concept Displaying
  • A tree T (on label set X) displays a tree T (on
    label set X subset of X) if T restricted to the
    labels X is a refinement of T
  • E.g.

d
c
c
e
d
b
a
f
displays
and
d
b
a
e
f
a
f
33
Concept Displaying
  • BUT

d
c
c
e
d
b
a
f
Does not display
or
b
d
a
e
f
a
c
34
The BUILD algorithm
  • Polynomial-time algorithm due to Aho et al (1981)
  • Takes a set of rooted input trees and either
    outputs a supertree that displays all of the
    input trees or returns a fail message.

35
BUILD algorithm
  • Recursive algorithm, at each step it constructs a
    graph associated with the triples displayed by
    the input trees.
  • Depending on whether this associated graph is
    connected or disconnected the algorithm either
    terminates or subdivides the problem.
  • What is this associated graph?

36
The associated graph
  • Nodes of the graph are the complete label set,
    i.e. all the labels that appear in any of the
    input trees
  • Put an edge between two nodes a and b if there is
    at least one input tree that displays the rooted
    triple ((a,b),c) for some c.
  • If this graph is connected stop and report a fail
    message
  • Otherwise call the algorithm again once for each
    connected component, restricting the input to the
    labels in that component.

37
BUILD Example (from Semple and Steel)
d
a
b
c
e
c
b
e
a
b
f
d
b
c
d
a
a,b,c,f
d,e
e
f
38
BUILD example continued
Subproblem 1 Restrict input to a,b,c,f
a
b
c
c
b
a
b
f
b
c
a
a,b,c,f
d,e
f
f
a,b
c
39
BUILD example continued
Subproblem 2 and 3 on d,e, and a,b are
trivial so the final tree is
a,b,c,f
d,e
f
e
a,b
d
c
a
b
a
b
c
f
d
e
40
What if the trees dont agree?
  • If the input trees are not compatible BUILD will
    return a fail message.
  • It is also of interest to have methods that will
    return some output even if the input trees cannot
    all be displayed by a single supertree.
  • Matrix representation with parsimony (MRP) is one
    such method

41
Matrix Representation with Parsimony (MRP)
  • Supertree method invented independently by Baum
    and Ragan (1992).
  • Recode the input trees as a character matrix
    where each edge in each input tree defines a
    character.
  • Do a parsimony analysis of the resulting
    character matrix.
  • Take the strict consensus of the most
    parsimonious trees.

42
MRP example
4
6
2
4
2
8
4
6
2
8
3
5
7
3
1
9
3
5
7
5
1
9
1
123456789 123456789 12345 a 101010100 101010000 ?
???? b 011010100 011010000 ????? c 000110100 0000
01000 ????? d 000001100 000110000 01100 e 00000001
0 000000110 10100 f 000000001 ????????? ????? g ??
??????? 000000101 00010 h ????????? ????????? 0000
1
43
MRP example
10 most parsimonious trees Strict consensus
e
c
b
f
g
a
d
h
Write a Comment
User Comments (0)
About PowerShow.com