Large Scale Phylogenetic Inference - PowerPoint PPT Presentation

About This Presentation
Title:

Large Scale Phylogenetic Inference

Description:

Large Scale Phylogenetic Inference Mark Pagel and Andrew Meade Reading University m.pagel_at_rdg.ac.uk Metropolis-Hastings Algorithm: Accept new tree according to ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 37
Provided by: rut140
Category:

less

Transcript and Presenter's Notes

Title: Large Scale Phylogenetic Inference


1
Large Scale Phylogenetic Inference Mark Pagel
and Andrew Meade Reading University m.pagel_at_rdg.a
c.uk
2
Large-Scale Phylogenetic Inference Approaches
and Problems Availability of data Inference
from aligned gene sequences traversing the
universe MCMC and MCMCMC inference (assessing
the potential for large-scale inference) A model
of pattern-heterogeneity suitable for
concatenated sequences
A Tree of Life n 4000 species David Hills
3
The accumulation of gene sequence data
  • Year No. of Sequences
  • 1994 215,273
  • 2001 14,976,310
  • 70X growth over 7 years
  • Compare 20 per annum 3.6X
  • growth over 7 years

4
Numbers of gene sequences for metazoan phyla
5
Source GenBank All nucleotide sequences
6
Large-Scale Phylogenetic Inference Approaches
and Problems Availability of data Inference
from aligned gene sequences traversing the
universe MCMC and MCMCMC inference (assessing
the potential for large-scale inference) A model
of pattern-heterogeneity suitable for
concatenated sequences
7
Number of Possible Phylogenetic Trees
Species Unrooted Rooted
No. rooted 275292135328356515452597297515244
30639300973035816196098326553772152587890625 No.
unrooted 283806325080779912837729172696128150920
628587998105114415737667754150390625
N50
8
Sampling the Universe of Phylogenetic Trees
  • Markov-Chain Monte Carlo (MCMC) Methods
  • Generate a large number of phylogenetic trees
    from a Markov Chain
  • at equilibrium randomly sample from universe of
    trees

sampling mechanism The Metropolis-Hastings
Algorithm
Accept new tree with p1.0 if L(Tn1) gt
L(Tn) otherwise accept with probability ?
L(Tn1)/ L(Tn)
9
(No Transcript)
10
Long Interspersed Nuclear Elements --
LINEs --autonomously replicating retrotransposons
endonuclease
reverse transcriptase
3
5
6000 bases
-- as old as mammals (at least) --20-40 active
elements --500,000-1,000,000 fossil
fragments --account for 20 of nucleotide
content of human genome
11
Phylogenetic tree of LINEs in the Human genome
n500 Sampled from Markov Chain
12
Convergence of a Markov chainsampling
phylogenetic tree of n500 tips using
an alignment of n4400 nucleotides
log-likelihood
Iteration number
NB 99 of increase in likelihood in first 2.8
of run. 0.07 change in final 2 million
iterations
13
Frequency histogram of log-likelihoods for
phylogenetic trees of n500 LINEs in Human
genome (alignment 4000 bp). Note unconverged
chain.
n 1000 trees
Mean -700299.7 Std. Dev. 15.91 n1000
log-likelihood
14
Metropolis-Coupled Markov Chain Monte Carlo
(MCMCMC)
Given m simultaneous Markov chains, swap states
each iteration among a randomly chosen pair i
and j according to
xi xj
xk
likelihood ratio chain i likelihood ratio
chain j
yi yj
yk
15
Temperatures of heated chains
cold chain
t0.2
1/(1t(i-1)
t0.5
1/i
number of chains, i
16
Swapping behaviour of an MCMCMC analysis
17
Phylogeny of Human LINE-1 elements (92 elements,
4kb sequences)
10-15
millions of years ago
90
120
18
(No Transcript)
19
LINEs data Log-likelihoods of trees from cold
chain (converged chain)
Log-likelihoods
pre-swap trees
post-swap trees
20
Large-Scale Phylogenetic Inference Approaches
and Problems Availability of data Inference
from aligned gene sequences traversing the
universe MCMC and MCMCMC inference (assessing
the potential for large-scale inference) A model
of pattern-heterogeneity suitable for
concatenated sequences
21
Pattern-Heterogeneity Model of Gene-Sequence
Evolution Allow for different genes in a single
concatenated alignment or different regions of
the same gene to evolve in qualitatively
different ways Contrast rate heterogeneity can
only detect difference in rates Implement
pattern-heterogeneity without partitioning
data P-H will always equal or better the
performance of gamma rate heterogeneity model.
Normally yields substantial improvements (100s
of log-units) Applications Detecting regions
of genes that evolve differently Large-scale
inference suitable for concatenated gene
sequences (e.g. recent phylogeny of the mammals
was based upon 16,000 nucleotides and 16 genes),
or supermatrix alignments
22
Applications of pattern-heterogeneity model
Single gene alignment
species 1 species n
pattern 2
pattern 1
pattern 3
23
'Oceanodroma_hornbyi'



0001000000000000011010 'Gavia_stellata'




0000000000000000000110 'Gavia_immer'



000000000000000000011
0 'Spheniscus_demersus'



1110000000000000000001 'Pygoscelis_adeliae
'


00000000000001
'Eudyptula_minor'



010000000000011110000000000000000001 'Eudyptes_p
achyrhynchus'


11000000000001
'Megadyptes_antipodes'



110000000000010110000000000000000001 'Fr
egetta_grallaria'

00000000101000000000000000000000000000000000000
000000000000000000000000000000000011010
'Pygoscelis_antarctica'


00100000000000000000000000000000000000000000000000
000000000000000000000000000000000001
'Pygoscelis_papua'

001000000000000000000
00000000000000000000000000000000000000000000000000
000000000000001 001000000000000000000
1 'Eudyptes_chrysolophus'

110000000000000000000000000000000000000000
00000000000000000000000000000000000000000001
'Eudyptes_chrysocom
e'

11000000000000000000000000000000000000000000000000
000000000000000000000000000000000001
'Aptenodytes_patagonicus'

010000000000000000000
00000000000000000000000000000000000000000000000000
000000000000001 000000000000000000000
1 'Oceanodroma_melania'

000000010000000000000000000000000000000000
00000000000000000000000000000000000000000110
'Oceanodroma_tethys
'

00010001000000000000000000000000000000000000000000
000000000000000000000000000000000110
'Halocyptena_microsoma'

000100010000000000000
00000000000000000000000000000000000000000000000000
000000000000110
'Oceanodroma_furcata'

000010100000000000000000000000000000000000
00000000000000000000000000000000000000000110
0001000000000000011010 'Oceanodroma_tristr
ami'

00000110000000000000000000000000000000000000000000
000000000000000000000000000000000110
'Oceanites_oceanicus'

000000000000000000000
00000000000000000000000000000000000000000000000000
000000000011010 000000000000000001101
0 'Fregetta_tropica'

000000001010000000000000000000000000000000
00000000000000000000000000000000000000011010
'Garrodia_nereis'


00000000011000000000000000000000000000000000000000
000000000000000000000000000000011010
'Pelagodroma_marina'

000000000110000000000
00000000000000000000000000000000000000000000000000
00000000001101000000000000110
'Pelecanoides_garnotii'

000000000000000000000000000000000000000000
00000000000000000000000000000000000110101010
'Pelecanoides_magel
lani'

00000000000000000000000000000001100000000000000000
000000000000000000000000000110101010
'Pelecanoides_georgicus'

000000000000000000000
00000000001100000000000000000000000000000000000000
00000011010101000000000111010000000000000011010101
0 'Lugensa_brevirostris'

000000000000000000000000000000000000000000
00000000000000000000000000001010101010101010
0000000001011010101010 'Calonectris_leucom
elas'

00000000000000000000000000000000000000000000000000
000000000000000011011010101010101010
'Puffinus_opisthomelas'

000000000000000000000
00000000000000000000000000000000000000000011101011
010101010101010
'Procellaria_westlandica'

000000000000000000000000000000000000000000
00000000000100001000000000000110101010101010000001
01011010 'Procellaria_parkin
soni'

00000000000000000000000000000000000000000000000000
001100001000000000000110101010101010
'Procellaria_aequinoctialis
'
000000000000000000000
00000000000000000000000000000001100001000000000000
110101010101010
'Pachyptila_turtur'

000000000000000000000000000000000000000000
00000000000000110000000000000110101010101010000100
00111010 'Pachyptila_desolat
a'

00000000000000000000000000000000000000000000000000
000011110000000000000110101010101010
'Pachyptila_salvini'

000000000000000000000
00000000000000000000000000000000011110000000000000
110101010101010
'Pachyptila_vittata'

000000000000000000000000000000000000000000
00000000000001110000000000000110101010101010000100
001110100000000000111010101010 'Halobaena_caerulea
'

00000000000000000000000000000000000000000000000000
000000010000000000000110101010101010
'Thalassoica_antarctica'

000000000000000000000
00000000000000000000000000000010000000000000000000
001101010101010 000000000011101010101
0 'Daption_capense'

000000000000000000000000000000000000000000
00000000110000000000000000000001101010101010000000
00001010 'Macronectes_halli'


00000000000000000000000000000000000000000000000101
110000000000000000000001101010101010
'Phoebastria_irrorata'

000000000000000000010
00000000010000000000000000000000000000000000000000
000000001101010 000001100000000110101
0 'Phoebastria_nigripes'

000000000001100000010000000000100000000000
00000000000000000000000000000000000001101010
0000001000000001101010 'Diomedea_sanfordi'


00000000000000000110000000000010000000000000000000
000000000000000000000000000001101010
'Diomedea_dabbenena'

000000000000000010100
00000000010000000000000000000000000000000000000000
000000001101010
'Diomedea_antipodensis'

000000000000010110100000000000100000000000
00000000000000000000000000000000000001101010
'Diomedea_gibsoni'


00000000000001011010000000000010000000000000000000
000000000000000000000000000001101010
'Thalassarche_impavida'

000000000000000000001
00011010100000000000000000000000000000000000000000
000000001101010
'Thalassarche_melanophris'

000000000000000000001000110101000000000000
00000000000000000000000000000000000001101010
'Thalassarche_salvi
ni'

00000000000000000000011101010100000000000000000000
000000000000000000000000000001101010
'Thalassarche_eremita'

000000000000000000000
11101010100000000000000000000000000000000000000000
000000001101010
'Thalassarche_cauta'

000000000000000000000011010101000000000000
00000000000000000000000000000000000001101010
0000000000000001101010 'Thalassarche_bassi
'

00000000000000000000000000110100000000000000000000
000000000000000000000000000001101010
'Thalassarche_chlororhyncho
s'
000000000000000000000
00000110100000000000000000000000000000000000000000
000000001101010
'Pterodroma_axillaris'
00000000000000000000
00000001

'Pterodroma_cervica
lis'
1000000000000000000000000001


'Pterodroma_hypoleuca'

00000000000000000000000000100000000000000000000000
00000000000000000000000110000000000000000000000000
00011010101010 0000000011011010101010
'Pterodroma_defilippiana'
011100000000000000000
0001110

'Pterodroma_cookii'

011100000000000000000000111000000000000000
00000000000000000001000000000001100000000000000000
00000000000110101010100000001101101000000000110110
10101010 'Pterodroma_leucoptera'

0011000000000000000000001110


'Pterodroma_brevipes'
000100000000000000000
0001110

'Pterodroma_longiros
tris'
000010000000000000000000111000000000000000
00000000000000000001000000000001100000000000000000
0000000000011010101010
'Pterodroma_pycrofti'

0000100000000000000000001110


'Pterodroma_inexpectata'
000000000000000000000
10101100000000000000000000000000000000000010000000
11010000000000000000000000000000110101010100000001
1011010 'Pterodroma_ultima'

0000000000000000000011010110


'Pterodroma_solandri'

0000000000000000000111010110


'Pterodroma_macroptera'
000000000000000000111
10101100000000000000000000000000000000000000111011
0101000000000000000000000000000011010101010
'Pterodroma_magentae
'
000000000000000001111101011000000000000000
00000000000000000000000001011010100000000000000000
0000000000011010101010
'Pterodroma_lessonii'

00000000000000101111110101100000000000000000000000
00000000000000011101101010000000000000000000000000
00011010101010
'Pterodroma_incerta'
000000000000011011111
10101100000000000000000000000000000000000000011011
0101000000000000000000000000000011010101010
'Pterodroma_hasitata
'
000000000000111011111101011000000000000000
00000000000000000000000000111010100000000000000000
0000000000011010101010
0000000011011010101010 'Pterodroma_cahow'

0000000000001110111111010110000000000000000000000
00000000000000010001110101000000000000000000000000
000011010101010
'Pterodroma_mollis'
00000000000000011111
11010110000000000000000000000000000000000000000000
10101000000000000000000000000000011010101010
'Pterodroma_madeira
'
0000000000010001111111010110


'Pterodroma_feae'

00000000000100011111110101100000000000000000000000
00000000000000100011101010000000000000000000000000
00011010101010
'Pterodroma_alba'
000000000000000000000
0110110

'Pterodroma_heraldic
a'
0000000001100000000000110110


'Pterodroma_sandwichensis'

0000010001100000000000110110


'Pterodroma_phaeopygia'
000001000110000000000
01101100000000000000000000000000000000000110000000
1101000000000000000000000000000011010101010
'Pterodroma_neglecta
'
000000101010000000000011011000000000000000
00000000000000000000000000000110100000000000000000
0000000000011010101010
'Pterodroma_externa'

00000010101000000000001101100000000000000000000000
00000000000011000000011010000000000000000000000000
00011010101010
'Pterodroma_arminjoniana'
000000011010000000000
0110110

0000000011011010101010 'Diomedea_epomophora
'
001010000000000110
00000000000000000110000000000010000000000000000000
00000000000000000000000000000110101000100000000110
0000111000000001101010 'Diomedea_amsterdamensis'

001010000000000110
00000000000000111010000000000010000000000000000000
000000000000000000000000000001101010
'Phoebastria_immutabilis'

000110000000000110
00000000000110000001000000000010000000000000000000
000000000000000000000000000001101010
0000111000000001101010 'Phoebastria_albatrus'

000110000000000110
00000000000010000001000000000010000000000000000000
000000000000000000000000000001101010
'Phoebetria_palpebrata'

100001000000000110
00000000000000000000000000001100000000000000000000
000000000000000000000000000001101010
'Phoebetria_fusca'

100001000000000110
00000000000000000000000000001100000000000000000000
000000000000000000000000000001101010
'Thalassarche_chrysostoma'

010001000000000110
00000000000000000000000011010100000000000000000000
000000000000000000000000000001101010
0000000000000001101010 'Thalassarche_bulleri'

010001000000000110
00000000000000000000000101010100000000000000000000
00000000000000000000000000000110101000100000000110
0000000000000001101010 'Fulmarus_glacialoides'

000000100000011010
00000000000000000000000000000000000000000000000011
110000000000000000000001101010101010
'Hydrobates_pelagicus'

000000000000000001
00001010000000000000000000000000000000000000000000
000000000000000000000000000000000110
'Oceanodroma_castro'

000000000000000001


'Pterodroma_baraui'
000000001110
0000000110100000000000110110


'Pagodroma_nivea'
000000110110
000000000000000000000000000000000000000000
00000000000000000000000000000001101010101010
'Procellaria_cinere
a' 000011010110000000000
001101010


'Pseudobulweria_rostrata'
010101010110



'Pseudobulweria_aterrima'
010101010110


'Pterodroma_nigripe
nnis' 000000001110
10000000000000000000000000010000000000000
00000000000000000000000000000000000000000000000000
00000000000011010101010
'Macronectes_giganteus'
100000110110000000000000011010
0000000000000000000000000000000000
00000000000001011100000000000000000000011010101010
10 0000000000001010101010 'Calonectri
s_diomedea'
001101010110000000000011101010
000000000000000000000000000000000000000000
00000000000000000000000011011010101010101010
0000000100111010101010 'Bulweria_bulwerii'
000011010110000000000
000101010
00000000000000000000000000000000000000000000000000
000000001000000000000110101010101010
'Pelecanoides_urinatrix'
000000000010
000000000000000000000
00000000000100000000000000000000000000000000000000
000000110101010 000000000000011010101
0 'Oceanodroma_leucorhoa'
000000000001
000001100000000000000000000000000000000000
00000000000000000000000000000000000000000110
0001000000000000011010 'Diomedea_exulans'
000000000001

00000000000000111010000000000010000000000000000000
000000000000000000000000000001101010
'Fulmarus_glacialis'
0000000010000011011000000010000001101
0 000000000000000000000
00000000000000000000000000011110000000000000000000
001101010101010
'Puffinus_creatopus'
10000011
0000000000000000000000000000000000
00000000000000000000000001100000001110101010101010
10
'Puffinus_carneipes' 10000011

0000000000000000000000000000000000000000000
0000000000000000110000000111010101010101010
'Puffinus_gravis'
00000011

00000000000000000000000000000000000000000000000000
000000000010000000111010101010101010
'Puffinus_griseus'
00000011
000000000000000000000
00000000000000000000000000000000000000010000000111
01010101010101000001101011010000000010011101010101
0 'Puffinus_tenuirostris'
00000011



'Puffinus_bulleri' 01000011

0000000000000000000000000000000000000000000
0000000000000000001000000111010101010101010
'Puffinus_pacificus'
01000011001101010110

00000000000000000000000000000000000000000000000000
000000000001000000111010101010101010
'Puffinus_nativitatis'
00000101
000000000000000000000
00000000000000000000000000000000000000000000101011
010101010101010
'Puffinus_mauretanicus'
00101101 000000011111101010



'Puffinus_yelkouan' 00101101
000000011111101010


'Puffinus_gavia'
00011101



'Puffinus_huttoni'
00011101
0000000000000000000000000000000000
00000000000000000000000000000011010110101010101010
1000001101011010
'Puffinus_assimilis' 00001101
000000000111101010
0000000000000000000000000000000000000000000
0000000000000000000111101011010101010101010
'Puffinus_lherminier
i' 00001101

00000000000000000000000000000000000000000000000000
000000000000111101011010101010101010
'Puffinus_auricularis'
00001101



'Puffinus_puffinus'
00001101 000000001111101010
0000000000000000000000000000000000
00000000000000000000000000000111010110101010101010
10
24
Testing the Pattern Heterogeneity Model two
different rate matrices
Generate data on a known tree according to these
two matrices and form a concatenated alignment.
gene 1 600 bases gene 2 400 bases
25
log-likelihoods obtained from three models
applied to simulated pattern-heterogeneity data
26
(No Transcript)
27
log-likelihoods by site in the simulated
pattern-heterogeneity data
28
(No Transcript)
29
Pattern-heterogeneity model Simulated and
obtained values of the rate parameters
30
log-likelihoods for combined LSU/SSU nrRNA data
set 54 species n800 sites
31
(No Transcript)
32
log-likelihoods by site in the LSU/SSU combined
data set
The divide between the two genes
33
Log-likelihoods for cytochrome-b data set. N433
sites of which 300 are fixed for a single
nucleotide
34
(No Transcript)
35
(No Transcript)
36
Metropolis-Hastings Algorithm Accept new tree
according to
Likelihood ratio prior ratio
proposal ratio
Xdata (e.g., gene sequences) Ttree
(topology, branches, parameters)
Write a Comment
User Comments (0)
About PowerShow.com