Title: Large Scale Phylogenetic Inference
1Large Scale Phylogenetic Inference Mark Pagel
and Andrew Meade Reading University m.pagel_at_rdg.a
c.uk
2Large-Scale Phylogenetic Inference Approaches
and Problems Availability of data Inference
from aligned gene sequences traversing the
universe MCMC and MCMCMC inference (assessing
the potential for large-scale inference) A model
of pattern-heterogeneity suitable for
concatenated sequences
A Tree of Life n 4000 species David Hills
3The accumulation of gene sequence data
- Year No. of Sequences
- 1994 215,273
- 2001 14,976,310
- 70X growth over 7 years
- Compare 20 per annum 3.6X
- growth over 7 years
4Numbers of gene sequences for metazoan phyla
5Source GenBank All nucleotide sequences
6Large-Scale Phylogenetic Inference Approaches
and Problems Availability of data Inference
from aligned gene sequences traversing the
universe MCMC and MCMCMC inference (assessing
the potential for large-scale inference) A model
of pattern-heterogeneity suitable for
concatenated sequences
7Number of Possible Phylogenetic Trees
Species Unrooted Rooted
No. rooted 275292135328356515452597297515244
30639300973035816196098326553772152587890625 No.
unrooted 283806325080779912837729172696128150920
628587998105114415737667754150390625
N50
8Sampling the Universe of Phylogenetic Trees
- Markov-Chain Monte Carlo (MCMC) Methods
- Generate a large number of phylogenetic trees
from a Markov Chain - at equilibrium randomly sample from universe of
trees
sampling mechanism The Metropolis-Hastings
Algorithm
Accept new tree with p1.0 if L(Tn1) gt
L(Tn) otherwise accept with probability ?
L(Tn1)/ L(Tn)
9(No Transcript)
10Long Interspersed Nuclear Elements --
LINEs --autonomously replicating retrotransposons
endonuclease
reverse transcriptase
3
5
6000 bases
-- as old as mammals (at least) --20-40 active
elements --500,000-1,000,000 fossil
fragments --account for 20 of nucleotide
content of human genome
11Phylogenetic tree of LINEs in the Human genome
n500 Sampled from Markov Chain
12Convergence of a Markov chainsampling
phylogenetic tree of n500 tips using
an alignment of n4400 nucleotides
log-likelihood
Iteration number
NB 99 of increase in likelihood in first 2.8
of run. 0.07 change in final 2 million
iterations
13Frequency histogram of log-likelihoods for
phylogenetic trees of n500 LINEs in Human
genome (alignment 4000 bp). Note unconverged
chain.
n 1000 trees
Mean -700299.7 Std. Dev. 15.91 n1000
log-likelihood
14Metropolis-Coupled Markov Chain Monte Carlo
(MCMCMC)
Given m simultaneous Markov chains, swap states
each iteration among a randomly chosen pair i
and j according to
xi xj
xk
likelihood ratio chain i likelihood ratio
chain j
yi yj
yk
15Temperatures of heated chains
cold chain
t0.2
1/(1t(i-1)
t0.5
1/i
number of chains, i
16Swapping behaviour of an MCMCMC analysis
17Phylogeny of Human LINE-1 elements (92 elements,
4kb sequences)
10-15
millions of years ago
90
120
18(No Transcript)
19LINEs data Log-likelihoods of trees from cold
chain (converged chain)
Log-likelihoods
pre-swap trees
post-swap trees
20Large-Scale Phylogenetic Inference Approaches
and Problems Availability of data Inference
from aligned gene sequences traversing the
universe MCMC and MCMCMC inference (assessing
the potential for large-scale inference) A model
of pattern-heterogeneity suitable for
concatenated sequences
21Pattern-Heterogeneity Model of Gene-Sequence
Evolution Allow for different genes in a single
concatenated alignment or different regions of
the same gene to evolve in qualitatively
different ways Contrast rate heterogeneity can
only detect difference in rates Implement
pattern-heterogeneity without partitioning
data P-H will always equal or better the
performance of gamma rate heterogeneity model.
Normally yields substantial improvements (100s
of log-units) Applications Detecting regions
of genes that evolve differently Large-scale
inference suitable for concatenated gene
sequences (e.g. recent phylogeny of the mammals
was based upon 16,000 nucleotides and 16 genes),
or supermatrix alignments
22Applications of pattern-heterogeneity model
Single gene alignment
species 1 species n
pattern 2
pattern 1
pattern 3
23'Oceanodroma_hornbyi'
0001000000000000011010 'Gavia_stellata'
0000000000000000000110 'Gavia_immer'
000000000000000000011
0 'Spheniscus_demersus'
1110000000000000000001 'Pygoscelis_adeliae
'
00000000000001
'Eudyptula_minor'
010000000000011110000000000000000001 'Eudyptes_p
achyrhynchus'
11000000000001
'Megadyptes_antipodes'
110000000000010110000000000000000001 'Fr
egetta_grallaria'
00000000101000000000000000000000000000000000000
000000000000000000000000000000000011010
'Pygoscelis_antarctica'
00100000000000000000000000000000000000000000000000
000000000000000000000000000000000001
'Pygoscelis_papua'
001000000000000000000
00000000000000000000000000000000000000000000000000
000000000000001 001000000000000000000
1 'Eudyptes_chrysolophus'
110000000000000000000000000000000000000000
00000000000000000000000000000000000000000001
'Eudyptes_chrysocom
e'
11000000000000000000000000000000000000000000000000
000000000000000000000000000000000001
'Aptenodytes_patagonicus'
010000000000000000000
00000000000000000000000000000000000000000000000000
000000000000001 000000000000000000000
1 'Oceanodroma_melania'
000000010000000000000000000000000000000000
00000000000000000000000000000000000000000110
'Oceanodroma_tethys
'
00010001000000000000000000000000000000000000000000
000000000000000000000000000000000110
'Halocyptena_microsoma'
000100010000000000000
00000000000000000000000000000000000000000000000000
000000000000110
'Oceanodroma_furcata'
000010100000000000000000000000000000000000
00000000000000000000000000000000000000000110
0001000000000000011010 'Oceanodroma_tristr
ami'
00000110000000000000000000000000000000000000000000
000000000000000000000000000000000110
'Oceanites_oceanicus'
000000000000000000000
00000000000000000000000000000000000000000000000000
000000000011010 000000000000000001101
0 'Fregetta_tropica'
000000001010000000000000000000000000000000
00000000000000000000000000000000000000011010
'Garrodia_nereis'
00000000011000000000000000000000000000000000000000
000000000000000000000000000000011010
'Pelagodroma_marina'
000000000110000000000
00000000000000000000000000000000000000000000000000
00000000001101000000000000110
'Pelecanoides_garnotii'
000000000000000000000000000000000000000000
00000000000000000000000000000000000110101010
'Pelecanoides_magel
lani'
00000000000000000000000000000001100000000000000000
000000000000000000000000000110101010
'Pelecanoides_georgicus'
000000000000000000000
00000000001100000000000000000000000000000000000000
00000011010101000000000111010000000000000011010101
0 'Lugensa_brevirostris'
000000000000000000000000000000000000000000
00000000000000000000000000001010101010101010
0000000001011010101010 'Calonectris_leucom
elas'
00000000000000000000000000000000000000000000000000
000000000000000011011010101010101010
'Puffinus_opisthomelas'
000000000000000000000
00000000000000000000000000000000000000000011101011
010101010101010
'Procellaria_westlandica'
000000000000000000000000000000000000000000
00000000000100001000000000000110101010101010000001
01011010 'Procellaria_parkin
soni'
00000000000000000000000000000000000000000000000000
001100001000000000000110101010101010
'Procellaria_aequinoctialis
'
000000000000000000000
00000000000000000000000000000001100001000000000000
110101010101010
'Pachyptila_turtur'
000000000000000000000000000000000000000000
00000000000000110000000000000110101010101010000100
00111010 'Pachyptila_desolat
a'
00000000000000000000000000000000000000000000000000
000011110000000000000110101010101010
'Pachyptila_salvini'
000000000000000000000
00000000000000000000000000000000011110000000000000
110101010101010
'Pachyptila_vittata'
000000000000000000000000000000000000000000
00000000000001110000000000000110101010101010000100
001110100000000000111010101010 'Halobaena_caerulea
'
00000000000000000000000000000000000000000000000000
000000010000000000000110101010101010
'Thalassoica_antarctica'
000000000000000000000
00000000000000000000000000000010000000000000000000
001101010101010 000000000011101010101
0 'Daption_capense'
000000000000000000000000000000000000000000
00000000110000000000000000000001101010101010000000
00001010 'Macronectes_halli'
00000000000000000000000000000000000000000000000101
110000000000000000000001101010101010
'Phoebastria_irrorata'
000000000000000000010
00000000010000000000000000000000000000000000000000
000000001101010 000001100000000110101
0 'Phoebastria_nigripes'
000000000001100000010000000000100000000000
00000000000000000000000000000000000001101010
0000001000000001101010 'Diomedea_sanfordi'
00000000000000000110000000000010000000000000000000
000000000000000000000000000001101010
'Diomedea_dabbenena'
000000000000000010100
00000000010000000000000000000000000000000000000000
000000001101010
'Diomedea_antipodensis'
000000000000010110100000000000100000000000
00000000000000000000000000000000000001101010
'Diomedea_gibsoni'
00000000000001011010000000000010000000000000000000
000000000000000000000000000001101010
'Thalassarche_impavida'
000000000000000000001
00011010100000000000000000000000000000000000000000
000000001101010
'Thalassarche_melanophris'
000000000000000000001000110101000000000000
00000000000000000000000000000000000001101010
'Thalassarche_salvi
ni'
00000000000000000000011101010100000000000000000000
000000000000000000000000000001101010
'Thalassarche_eremita'
000000000000000000000
11101010100000000000000000000000000000000000000000
000000001101010
'Thalassarche_cauta'
000000000000000000000011010101000000000000
00000000000000000000000000000000000001101010
0000000000000001101010 'Thalassarche_bassi
'
00000000000000000000000000110100000000000000000000
000000000000000000000000000001101010
'Thalassarche_chlororhyncho
s'
000000000000000000000
00000110100000000000000000000000000000000000000000
000000001101010
'Pterodroma_axillaris'
00000000000000000000
00000001
'Pterodroma_cervica
lis'
1000000000000000000000000001
'Pterodroma_hypoleuca'
00000000000000000000000000100000000000000000000000
00000000000000000000000110000000000000000000000000
00011010101010 0000000011011010101010
'Pterodroma_defilippiana'
011100000000000000000
0001110
'Pterodroma_cookii'
011100000000000000000000111000000000000000
00000000000000000001000000000001100000000000000000
00000000000110101010100000001101101000000000110110
10101010 'Pterodroma_leucoptera'
0011000000000000000000001110
'Pterodroma_brevipes'
000100000000000000000
0001110
'Pterodroma_longiros
tris'
000010000000000000000000111000000000000000
00000000000000000001000000000001100000000000000000
0000000000011010101010
'Pterodroma_pycrofti'
0000100000000000000000001110
'Pterodroma_inexpectata'
000000000000000000000
10101100000000000000000000000000000000000010000000
11010000000000000000000000000000110101010100000001
1011010 'Pterodroma_ultima'
0000000000000000000011010110
'Pterodroma_solandri'
0000000000000000000111010110
'Pterodroma_macroptera'
000000000000000000111
10101100000000000000000000000000000000000000111011
0101000000000000000000000000000011010101010
'Pterodroma_magentae
'
000000000000000001111101011000000000000000
00000000000000000000000001011010100000000000000000
0000000000011010101010
'Pterodroma_lessonii'
00000000000000101111110101100000000000000000000000
00000000000000011101101010000000000000000000000000
00011010101010
'Pterodroma_incerta'
000000000000011011111
10101100000000000000000000000000000000000000011011
0101000000000000000000000000000011010101010
'Pterodroma_hasitata
'
000000000000111011111101011000000000000000
00000000000000000000000000111010100000000000000000
0000000000011010101010
0000000011011010101010 'Pterodroma_cahow'
0000000000001110111111010110000000000000000000000
00000000000000010001110101000000000000000000000000
000011010101010
'Pterodroma_mollis'
00000000000000011111
11010110000000000000000000000000000000000000000000
10101000000000000000000000000000011010101010
'Pterodroma_madeira
'
0000000000010001111111010110
'Pterodroma_feae'
00000000000100011111110101100000000000000000000000
00000000000000100011101010000000000000000000000000
00011010101010
'Pterodroma_alba'
000000000000000000000
0110110
'Pterodroma_heraldic
a'
0000000001100000000000110110
'Pterodroma_sandwichensis'
0000010001100000000000110110
'Pterodroma_phaeopygia'
000001000110000000000
01101100000000000000000000000000000000000110000000
1101000000000000000000000000000011010101010
'Pterodroma_neglecta
'
000000101010000000000011011000000000000000
00000000000000000000000000000110100000000000000000
0000000000011010101010
'Pterodroma_externa'
00000010101000000000001101100000000000000000000000
00000000000011000000011010000000000000000000000000
00011010101010
'Pterodroma_arminjoniana'
000000011010000000000
0110110
0000000011011010101010 'Diomedea_epomophora
'
001010000000000110
00000000000000000110000000000010000000000000000000
00000000000000000000000000000110101000100000000110
0000111000000001101010 'Diomedea_amsterdamensis'
001010000000000110
00000000000000111010000000000010000000000000000000
000000000000000000000000000001101010
'Phoebastria_immutabilis'
000110000000000110
00000000000110000001000000000010000000000000000000
000000000000000000000000000001101010
0000111000000001101010 'Phoebastria_albatrus'
000110000000000110
00000000000010000001000000000010000000000000000000
000000000000000000000000000001101010
'Phoebetria_palpebrata'
100001000000000110
00000000000000000000000000001100000000000000000000
000000000000000000000000000001101010
'Phoebetria_fusca'
100001000000000110
00000000000000000000000000001100000000000000000000
000000000000000000000000000001101010
'Thalassarche_chrysostoma'
010001000000000110
00000000000000000000000011010100000000000000000000
000000000000000000000000000001101010
0000000000000001101010 'Thalassarche_bulleri'
010001000000000110
00000000000000000000000101010100000000000000000000
00000000000000000000000000000110101000100000000110
0000000000000001101010 'Fulmarus_glacialoides'
000000100000011010
00000000000000000000000000000000000000000000000011
110000000000000000000001101010101010
'Hydrobates_pelagicus'
000000000000000001
00001010000000000000000000000000000000000000000000
000000000000000000000000000000000110
'Oceanodroma_castro'
000000000000000001
'Pterodroma_baraui'
000000001110
0000000110100000000000110110
'Pagodroma_nivea'
000000110110
000000000000000000000000000000000000000000
00000000000000000000000000000001101010101010
'Procellaria_cinere
a' 000011010110000000000
001101010
'Pseudobulweria_rostrata'
010101010110
'Pseudobulweria_aterrima'
010101010110
'Pterodroma_nigripe
nnis' 000000001110
10000000000000000000000000010000000000000
00000000000000000000000000000000000000000000000000
00000000000011010101010
'Macronectes_giganteus'
100000110110000000000000011010
0000000000000000000000000000000000
00000000000001011100000000000000000000011010101010
10 0000000000001010101010 'Calonectri
s_diomedea'
001101010110000000000011101010
000000000000000000000000000000000000000000
00000000000000000000000011011010101010101010
0000000100111010101010 'Bulweria_bulwerii'
000011010110000000000
000101010
00000000000000000000000000000000000000000000000000
000000001000000000000110101010101010
'Pelecanoides_urinatrix'
000000000010
000000000000000000000
00000000000100000000000000000000000000000000000000
000000110101010 000000000000011010101
0 'Oceanodroma_leucorhoa'
000000000001
000001100000000000000000000000000000000000
00000000000000000000000000000000000000000110
0001000000000000011010 'Diomedea_exulans'
000000000001
00000000000000111010000000000010000000000000000000
000000000000000000000000000001101010
'Fulmarus_glacialis'
0000000010000011011000000010000001101
0 000000000000000000000
00000000000000000000000000011110000000000000000000
001101010101010
'Puffinus_creatopus'
10000011
0000000000000000000000000000000000
00000000000000000000000001100000001110101010101010
10
'Puffinus_carneipes' 10000011
0000000000000000000000000000000000000000000
0000000000000000110000000111010101010101010
'Puffinus_gravis'
00000011
00000000000000000000000000000000000000000000000000
000000000010000000111010101010101010
'Puffinus_griseus'
00000011
000000000000000000000
00000000000000000000000000000000000000010000000111
01010101010101000001101011010000000010011101010101
0 'Puffinus_tenuirostris'
00000011
'Puffinus_bulleri' 01000011
0000000000000000000000000000000000000000000
0000000000000000001000000111010101010101010
'Puffinus_pacificus'
01000011001101010110
00000000000000000000000000000000000000000000000000
000000000001000000111010101010101010
'Puffinus_nativitatis'
00000101
000000000000000000000
00000000000000000000000000000000000000000000101011
010101010101010
'Puffinus_mauretanicus'
00101101 000000011111101010
'Puffinus_yelkouan' 00101101
000000011111101010
'Puffinus_gavia'
00011101
'Puffinus_huttoni'
00011101
0000000000000000000000000000000000
00000000000000000000000000000011010110101010101010
1000001101011010
'Puffinus_assimilis' 00001101
000000000111101010
0000000000000000000000000000000000000000000
0000000000000000000111101011010101010101010
'Puffinus_lherminier
i' 00001101
00000000000000000000000000000000000000000000000000
000000000000111101011010101010101010
'Puffinus_auricularis'
00001101
'Puffinus_puffinus'
00001101 000000001111101010
0000000000000000000000000000000000
00000000000000000000000000000111010110101010101010
10
24Testing the Pattern Heterogeneity Model two
different rate matrices
Generate data on a known tree according to these
two matrices and form a concatenated alignment.
gene 1 600 bases gene 2 400 bases
25log-likelihoods obtained from three models
applied to simulated pattern-heterogeneity data
26(No Transcript)
27log-likelihoods by site in the simulated
pattern-heterogeneity data
28(No Transcript)
29Pattern-heterogeneity model Simulated and
obtained values of the rate parameters
30log-likelihoods for combined LSU/SSU nrRNA data
set 54 species n800 sites
31(No Transcript)
32 log-likelihoods by site in the LSU/SSU combined
data set
The divide between the two genes
33Log-likelihoods for cytochrome-b data set. N433
sites of which 300 are fixed for a single
nucleotide
34(No Transcript)
35(No Transcript)
36Metropolis-Hastings Algorithm Accept new tree
according to
Likelihood ratio prior ratio
proposal ratio
Xdata (e.g., gene sequences) Ttree
(topology, branches, parameters)