Title: Phylogeny reconstruction
1Phylogeny reconstruction
How do we reconstruct the tree of
life? Outline Terminology Methods distance
parsimony maximum likelihood bootstrapping P
roblems homoplasy hybridisation
Dr. Sean Graham, UBC.
2Phylogenetic reconstruction
3Phylogenetic reconstruction
4Phylogenetic reconstruction
5Phylogenetic reconstruction
Introduction
6Understanding Trees
7Do these phylogenies agree?
Figure 14.17
8Branch lengths
A
B
C
D
A
B
C
D
1 nt change
9Understanding Trees
- Trees can be used to describe taxonomic groups
Monophyletic
Paraphyletic
Polyphyletic
10What is the relationship between taxonomic names
and phylogenetic groups?
11What is the relationship between taxonomic names
and phylogenetic groups?
12What is the relationship between taxonomic names
and phylogenetic groups?
13Polyphyletic example Amentiferae
14Polyphyletic example Amentiferae
Oaks
Walnuts
Willows
Evolution of catkins
Ancestor with separate flowers
15Vertebrate Phylogeny
Are these groups monophyletic, paraphyletic or
polyphyletic? fish? tetrapods? ( four
limbed) amphibians? mammals? ectotherms (
warm blooded)?
16Constructing Trees
Methods distance (UPGMA, Neighbor
joining) parsimony maximum likelihood (Bayesian)
17Distance Methods (phenetics)
18Distance methods rely on clustering algorithms
(e.g. UPGMA)
D
B
A
Example 1 morphology
Trait 2
C
Distance matrix
A B C D
A 1.0 3.0 4.9
B 3.3 3.0
C 3.0
D
Trait 1
19UPGMA
D
B
A
Example 1 morphology
Trait 2
C
Distance matrix
A B C D
A 1.0 3.0 4.9
B 3.3 3.0
C 3.0
D
Trait 1
A
B
20UPGMA
D
B
A
Example 1 morphology
Trait 2
C
Distance matrix
A B C D
A 1.0 3.0 4.9
B 3.3 3.0
C 3.0
D
Trait 1
A
B
C
D
21Distance methods with sequence data
A ATTGCAATCGG B ATTACGATCGG C GTTACAACCGG D
CTCGTAGTCGA
Distance matrix
A B C D
A 1 3 5
B 3 7
C 7
D
A
B
22Distance methods with sequence data
A B C D
A 1 3 5
B 3 7
C 7
D
AB C D
AB 3 6
C 7
D
New Distance matrix take averages
A
B
23Distance methods with sequence data
A B C D
A 1 3 5
B 3 7
C 7
D
A
B
C
AB C D
AB 3 6
C 7
D
A
B
C
D
24Distance methods with sequence data
A B C D
A 1 3 5
B 3 7
C 7
D
A
B
C
AB C D
AB 3 6
C 7
D
A
B
C
D
25Assumptions of distance methods
26Strengths and weaknesses of distance methods
27II. Parsimony Methods (Cladistics)
- Hennig (German entomologist) wrote in 1966
- Translated into English in 1976 very influential
28Applying parsimony
- Consider four taxa (1-4) and four characters
(A-D) - Ancestral state abcd
Trait
A B C D
1 a b c d
2 a b c d
3 a b c d
4 a b c d
Taxon
29Applying parsimony
- Consider four taxa (1-4) and four characters
(A-D) - Ancestral state abcd
Unique changes
Convergences or reversals
1 2 3 4 abcd abcd abcd abcd
Trait
A B C D
1 a b c d
2 a b c d
3 a b c d
4 a b c d
b
d
c
Taxon
b
a
5 steps
abcd
30Applying parsimony
- Consider four taxa (1-4) and four characters
(A-D) - Ancestral state abcd
Unique changes
Convergences or reversals
1 4 3 2 abcd abcd abcd abcd
Trait
A B C D
1 a b c d
2 a b c d
3 a b c d
4 a b c d
d
c
Taxon
b
a
4 steps
abcd
31Strengths and weaknesses of parsimony
32Parsimony practice
- Position
- Taxon 1234567
- K AGTACCG
- L AAGACTA
- M AACCTTA
- N AAAGTTA
Which unrooted tree is most parsimonious?
N
L
L
L
K
M
2
2
K
M
2
K
N
N
M
Plot each change on each tree. Positions 1 and 2
are done. Which positions help to determine
relationships?
33Inferring the direction of evolution
ACGCTAGCTAGG
Mouse
Where did the mutation occur, and what was the
change?
Orangutan
ACGCTAGCTAGG
ACGCTAGCTAGG
Gorilla
ACGCTAGCTAGG
Human
ACGCTAGCTACG
Bonobo
ACGCTAGCTACG
Chimp
34III. Maximum likelihood (and Bayesian)
35Maximum likelihood a starting sketch
- Probabilities
- transition 0.2 transversion 0.1 no change 0.7
A
T
A
G
G
A
G
G
G
G
A
C
G
G
G
C
A
G
G
A
Find the tree with the highest probability
36Maximum likelihood a starting sketch
- Probabilities
- transition 0.2 transversion 0.1 no change 0.7
A
T
A
G
G
P (.7)(.1)(.2)(.7)(.7)
A
T
G
G
G
A
T
A
G
G
A
G
G
G
G
A
C
G
G
G
C
A
G
G
A
Find the tree with the highest probability
37Maximum likelihood a starting sketch
- Probabilities
- transition 0.2 transversion 0.1 no change 0.7
A
T
A
G
G
P (.7)(.1)(.2)(.7)(.7)
A
T
A
G
G
A
G
G
G
G
A
C
G
G
G
P (.7)(.1)(.7)(.7)(.7)
C
A
G
G
A
A
A
G
G
G
A
A
G
G
A
C
A
G
G
A
P (.1)(.2)(.7)(.7)(.2)
Find the tree with the highest probability
38Assessment of Maximum Likelihood (also Bayesian)
39Characters to use in phylogeny
40Challenges of using DNA data
- Alignment can be very challenging!
- Taxon 1 AATGCGC
- Taxon 2 AATCGCT
- Taxon 1 AATGCGC
- Taxon 2
41Informative sequences evolve at moderate rates
- Too slow?
- not enough variation
- Taxon 1 AATGCGC
- Taxon 2 AATGCGC
- Taxon 3 AATGCGC
Polytomy
42Example of insufficient evidence metazoan
phylogeny
Metazoans
Fungi
43Challenges sunflower phylogeny
- Recent radiation (200,000 years)
- Many species, much hybridization
- Need more rapidly evolving markers!!
15 spp!
12 spp!
44Informative sequences evolve at moderate rates
- Too fast?
- homoplasy likely
- saturation only 4 possible states for DNA
- Taxon 1 ATTCTGA
- Taxon 2 GTAGTGG
- Taxon 3 CGTGCTG
Polytomy
45Saturation
- Imagine changing one nucleotide every hour to a
random nucleotide - Split the ancestral population in 2.
ACTTGCT
ACCTGAA
ACCAGAA
AGCGGAA
ACGTGCT
ACGAGCT
GCGATCC
AGCCTCC
GAGCTCC
12 hours
8 hours
One hour
Four hours
Red indicates multiple mutations at a site
24 hours?
46Saturation mammalian mitochondrial DNA
47Forces of evolution and phylogeny reconstruction
- How does each force affect the ability to
reconstruct phylogeny? - mutation?
- drift?
- selection?
- non-random mating?
- migration?
48Phylogeny case study I whales
Are whales ungulates (hoofed mammals)? Figure
14.4
49Whales DNA sequence data
Hillis, D. A. 1999.
How reliable is this tree? Bootstrapping.
50How consistent are the data?
- Take the dataset (5 taxa, 10 characters)
- Create a new data set by sampling characters at
random, with replacement
Taxon 1 2 3 4 5 6 7 8 9 10
Human A C G T T G T A C T
Chimp A G G T T C T A T T
Bonobo A G G T T C T A T G
Gorilla A C T T G C T G T C
Orang T C G T G T A C C C
Taxon 3 8 2 6 10 10 5 8 8 7 3
Human G A C G T T T A A T G
Chimp G A G C T T T A A T G
Bonobo G A G C G G T A A T G
Gorilla T G C C C C G G G T T
Orang G C C T C C G C C A G
51Whales DNA sequence data
Hillis, D. A. 1999.
52Molecular clocks
53Basic idea of molecular clocks
chimps
6 substitutions
humans
whales
60 substitutions
hippos
56 mya
54Challenges for phylogeny gene flow
55Sunflower annuals
56Different genes may have different histories!
57Phylogeny summary
58Phylogeny study questions
- Explain in words the difference between
monophyletic, paraphyletic, and polyphyletic
taxa. Draw a hypothetical phylogeny representing
each type. Give an actual example of a commonly
recognized paraphyletic taxon in both animals and
in plants. - 2) How can a reconstructed phylogeny be used to
determine if a similar character in two taxa is
due to homoplasy? - 3) Whales are classified as cetaceans, not
artiodactyl ungulates. This makes artiodactyls
paraphyletic why? What is the evidence that
whales belong in the artiodactyls? - Phenetics (distance methods) and cladistics
(parsimony) differ in the ways they recognize and
use similarities among taxa to form phylogenetic
groupings. What types of similarity does each
school recognize, and how useful is each type of
similarity considered to be for identifying
groups?
59Phylogeny study questions
- 5) What is bootstrapping in the context of
phylogenetic analysis, and why is this procedure
performed? - 6) Why are maximum likelihood methods increasing
in popularity for reconstructing phylogenies? In
your answer, include a short description of how
this method identifies the best phylogeny. - 7) For what kinds of data can maximum likelihood
methods of phylogeny construction be used? Why is
this so? What types of data are typically not
used, and why? - 8) Would animal mitochondrial DNA provide a
reasonable molecular tool for evaluating deep
phylogenetic relationships between animal phyla?
What about ribosomal DNA? Justify your answers. - 9) Integrative question Draw a pair of axes with
Time since divergence on the x axis and
percent of sites that are the same on the y
axis. Draw a graph that shows the basic pattern
for third codon sites is your graph linear?
Explain why or why not.
60Phylogeny study questions
- 10) You are studying a group of species that
lives in two very different environments. You
build two phylogenies one is based on a locus
that is probably under divergent selection in the
two environments, while the other phylogeny is
based on a neutral locus. Which phylogeny would
be more likely to represent the species history?
why? - 11) For a number of years, Anolis lizards are
found in similar micro-habitats on many separate
islands in the Carribean are very similar to each
other (for example, large lizards that feed on
the ground, smaller lizards that feed on tree
trunks, and very small lizards that feed at the
tops of branches). Two different, historical
explanations have been proposed to explain this
pattern each morph has evolved repeatedly on
each island, or each morph has evolved just once,
then dipsersed. Sketch a phylogeny that would
support each hypothesis. - 12) Integrative question the Cameroon lake
cichlid phylogeny, showing that the lake species
were monophyletic, was based on mitochondrial
DNA. Explain why this might not reflect the
species history. How could you be more certain
about the phylogeny? - 13) Explain why allopolyploid taxa pose problems
for phylogenies.