Title: Speciation history inferred from gene trees
1 Speciation history inferred from gene trees
L. Lacey KnowlesDepartment of Ecology and
Evolutionary BiologyUniversity of Michigan, Ann
Arbor MI
knowlesl_at_umich.edu
2Emphasis on multilocus data in phylogenetics and
phylogeography
Utility of single locus data for inferences about
speciation history??
- The good
- The bad
- The ugly
3Estimating population genetic parameters relevant
to the process of species divergence
4Parameterized model for making inferences about
the divergence process
T
speciation
5 Divergence M. oregonensis and M. montanus
from the Rocky Mountains
5 anonymous nuclear loci
1 mitochondrial locus
Carstens Knowles 2007, Mol. Ecol. 16619-27.
6 coalescent framework and multilocus versus
single locus data set
divergence of gene lineages within the ancestral
species
estimate from average mtDNA genetic distance
4.9 x 105 to 2.0 x 106
7Assumed species tree of Poephila finches
Parameterized model for making inferences about
the divergence process
Analysis of 30 anonymous nuclear loci
Bayes Markov chain Monte Carlo (MCMC)
method (Yang and Rannala)
- - multiple independent loci
- estimates ancestral q (present q also)
- - estimates population divergence times
- - uses branch length information
- - accounts for uncertainty in gene trees
- Assumptions
- know the species tree
- - random mating
- - no gene flow after population divergence
- - free recombination among loci (not within)
Identified role of geographic barriers in
a Pleistocene divergence of the grass finches
Jennings Edwards (2005) Evolution
8Jennings Edwards (2005) Evolution
9Estimating population genetic parameters relevant
to the process of species divergence
- The good
- The bad
- The ugly
10Estimating the history (order) of divergence
events(i.e., the species tree) for recently
derived taxa
Effects of sampling scheme contrast between
sequencing single representatives per species
versus multiple individuals per species
11Gene trees will not always match the species tree
species tree
gene tree
deep coalescence
Maddison 1997
12While there is a distribution of possible gene
trees for a given species tree, the probabilities
of each gene tree differs
high P(GtreeStree)
low P(GtreeStree)
5 taxa 105 possible gene tree
topologies
Degnan Salter (2005) Evolution
The shape of this distribution will differ
depending on the shape of the species tree
13Inferred history of species divergence differs
among loci
Gene trees from 30 anonymous markers with single
individual sequenced per species
Jennings Edwards (2005) Evolution
14Estimating the history (order) of divergence
events(i.e., the species tree) for recently
derived taxa
Gene tree from one locus with 9 individuals
sequenced in each of 8 different species
15What is the true species tree?
16Recently developed approaches for estimating the
species tree(explicitly consider the process of
gene lineage coalescence in the estimation of the
history of species divergence)
Extract the historical signal of species
divergence, despite discord between the gene
tree and species tree
Maddison Knowles 2006 Edwards et al. 2007 Liu
Pearl 2007
17Goal estimate the species tree directly (as
opposed to estimating a gene tree and equating
that gene tree with the history of the species)
18Can the history of species divergence be
recovered from a single gene tree
species tree
- minimize the number of deep coalescences
(2) shallowest divergence between species
Considers the process of lineage sorting, but the
actual probabilities of incomplete lineage
sorting are not quantified using a stochastic
model
gene tree
STEM and BEST Likelihood and Bayesian approaches
that incorporate stochastic models of both
nucleotide substitution and lineage sorting
processes
19 shallowest divergence approach
infer species tree
minimize the number of deep coalescences
simulated sequences
simulated gene trees
simulated species trees
reconstructed species trees
reconstructed gene trees
1
. . . . . .
500
Maddison Knowles 2006
20accuracy assessment
21simulated species trees
- Goals
- Examine a reasonable spectrum of topologies and
branch lengths
(500 species trees were simulated rather than
choosing a single species tree assessing how
well it can be reconstructed with many simulation
replicates)
- Determine how the extent of incomplete lineage
sorting affects - the ability to reconstruct species
histories
t 100,000 (i.e., 1Ne) 500 replicate species
trees
t 1,000,000 (i.e., 10Ne) 500 replicate
species trees (topologies of the two
sets of trees are identical)
500 replicate species trees of 8 species each
Maddison Knowles 2006
22Accuracy affected by
simulated species trees
? Increasing total sampling effort per species
(either 1, 3, 9 or 27 sequences per
species)
simulated gene trees
- Increasing the number of individuals per locus
- versus the number of loci per species for
a - given sampling effort
(1, 3, 9 or 27 gene trees representing unlinked
loci simulated independently with either 1, 3, 9
or 27 gene sequences simulated for each locus per
species)
neutral coalescence (Ne 100,000)
Maddison Knowles 2006
23Number of deep coalescences
g
e
n
e
c
o
p
ie
s
p
er
1
N
1
0
N
locus
e
e
1
7
.
6
1
.
8
3
2
8
.
7
6
.
9
9
6
3
.
2
1
4
.
7
2
7
1
1
4
.
4
2
5
.
7
Lots of discord (i.e.,our simulated data should
well reflect the challenges faced by
reconstructing evolutionary relationships near
the species/population level)
Maddison Knowles 2006
24Average proportion of correct partitions (those
in the inferred tree matching the true tree)
1
locus
a. total tree depth of 1Ne
1
gene copy
0.26 0.27
gene trees retain some signal of phylogenetic
history despite significant discord with species
tree
Deep Coalescents
Shallowest Divergence
3
gene copies
0.47 0.53
Deep Coalescents
Shallowest Divergence
Average accuracy greater as expected
9
gene copies
0.59 0.60
Deep Coalescents
1
locus
b. total tree depth of 10 Ne
Shallowest Divergence
0.76 0.73
27
gene copies
0.64 0.56
Deep Coalescents
Deep Coalescents
Shallowest Divergence
Shallowest Divergence
0.79 0.78
3
gene copies
Deep Coalescents
Shallowest Divergence
9
gene copies
0.80 0.79
Deep Coalescents
Shallowest Divergence
27
gene copies
0.82 0.84
Deep Coalescents
Shallowest Divergence
Maddison Knowles 2006
25Estimating the history (order) of divergence
events(i.e., the species tree) for recently
derived taxa
Gene tree from one locus with multiple
individuals sequenced per species and very simple
approach
What would happen if more loci were considered?
26Similar accuracy for a given sampling effort if
sample multiple individuals compared to loci for
recent divergence (t 1Ne)
Frequency distribution of species tree accuracy
with increasing number of loci
Frequency distribution of species tree accuracy
with increasing number of individuals
0.8
0.8
proportion of trees
proportion of trees
tree accuracy ( number of shared partitions with
true tree)
- The curve marked random shows the expected
distribution of the accuracy measure in comparing
two randomly simulated trees
27Acknowledgements
- Wayne Maddison
- Bryan Carstens,
- (former postdoc)
knowlesl_at_umich.edu