Title: PAML: Phylogenetic Analysis by Maximum Likelihood
1PAMLPhylogenetic Analysis by Maximum Likelihood
Ziheng Yang Depart of BiologyUniversity College
London http//abacus.gene.ucl.ac.uk/
2Plan
- Overview of PAML, things it can do, and
especially things that other program dont do. - An example (of detecting amino acids under
positive selection) - The trouble with sliding windows
3PAML programs, currently in ver 3.15
- PAML programs are written in ANSI C. Executables
are provided for MS Windows and Mac OSX. Source
codes can be compiled for unix and other
platforms. - Free for academics (and everybody else).
- Sequential, not parallelized.
- Old-style command-line programs, with no GUI, no
menu, no mice. - Yangs theorem Every version of PAML has bugs.
4PAML programs
baseml basemlg codeml evolver yn00chi2 ML under nucleotide-based models Continuous-gamma, for bases (Yang 1993) ML for amino acids codons The worlds best named simulation program dN and dS estimation using YN2000?2 critical values and p values
pamp mcmctree Parsimony calculations (Yang and Kumar 1996)Species divergence times, soft bounds, relaxed clocks (Yang Rannala 2006)
5PAML docs examples
- doc in doc/ pamlDOC.pdf, pamlFAQ.pdf,
pamlHistory.txt - examples/ are provided with README files
- Apologies for poor support. Bug reports can
come to my mailbox. Questions should go to paml
discussion group http//www.rannala.org/gsf
6Major weaknesses
- Poor tree search
- Poor user interface
Major strength
- Many models implemented in the likelihood
framework.
7Uses of PAML (i)
Maximum likelihood parameter estimation and
likelihood ratio tests of hypotheses under a
number of substitution models based on
nucleotides, amino acids, and codons (such as the
molecular clock, rate variation among
sites). Most of the nucleotide-based models are
available in PAUP. Most of models are available
in MrBayes?
8Uses of PAML (ii)
Likelihood (empirical Bayes) reconstruction of
ancestral nucleotide, amino acid, or codon
sequences. This is the same as parsimony
reconstruction except that it accounts for
different branch lengths and different rates of
change between states.
Yang, Z., S. Kumar, and M. Nei. 1995. Genetics
1411641-1650. Koshi, J. M., and R. A. Goldstein.
1996. J. Mol. Evol. 42313-320. Pupko, T., I.
Peer, R. Shamir, and D. Graur. 2000. Mol. Biol.
Evol. 17890-896.
9Uses of PAML (iii)
- Combined analysis of heterogeneous data sets.
- MrBayes has implemented more powerful models of
this kind (Nylander, et al. 2004. Syst. Biol.
5347-67). - These should make the following debates
unnecessary - combined analysis (total evidence) vs. separate
analysis - Supertree vs. supermatrix
Yang, Z. 1996. Maximum-likelihood models for
combined analyses of multiple sequence data. J.
Mol. Evol. 42587-596. Pupko, T., D. Huchon, Y.
Cao et al. 2002. Combining multiple data sets in
a likelihood analysis which models are the best?
Mol. Biol. Evol. 192294-2307.
10Uses of PAML (iv)
Likelihood ratio test of the clock and likelihood
estimation of species divergences under clock and
relaxed-clock models (baseml codeml) Bayesian
estimation of species divergence times using soft
bounds and relaxed molecular clocks (mcmctree),
similar to Jeff Thornes multidivtime.
Rambaut, A., and L. Bromham. 1998. Mol. Biol.
Evol. 15442-448. Yoder, A. D., and Z. Yang.
2000. Mol. Biol. Evol. 171081-1090. Yang, Z.,
and A. D. Yoder. 2003. Syst. Biol.
52705-716. Yang, Z., and B. Rannala. 2006. Mol.
Biol. Evol. 23212-226. Rannala, B. and Z. Yang.
in preparation.
11Uses of PAML (iv) Codon substitution models
detection of selection in protein-coding genes
(codeml)
- Branch models to test positive selection on
lineages on the tree (Yang 1998. Mol. Biol.
Evol. 15568-573)
- Site models to test positive selection affecting
individual sites(Nielsen Yang. 1998. Genetics
148929-936 Yang, et al. 2000. Genetics
155431-449)
- Branch-site models to detect positive selection
at a few sites on a particular lineage(Yang
Nielsen. 2002. Mol. Biol. Evol. 19908-917 Yang,
et al. 2005. Mol. Biol. Evol. 221107-1118
Zhang, J., R. Nielsen, and Z. Yang. 2005. Mol.
Biol. Evol. 222472-2479)
12MacCallum, C., and E. Hill. 2006. Being positive
about selection. PLoS Biol 4e87.
PLoS Biol is receiving and rejecting too many
manuscripts that use the MK test and paml/codeml
to detect positive selection. Their main
criterion right now is that the ms. should
include experimental verification to justify
publication in such high-profile journals.
13LRT of amino acid sites under positive selection
H0 there are no sites at which ? gt 1H1 there
are such sites Compare 2?? 2(?1 ? ?0) with a ?2
distribution
(Nielsen Yang 1998 Genetics 148929-936Yang,
Nielsen, Goldman Pedersen 2000. Genetics
155431-449)
14Models M1a M2a
- M1a (neutral)
- Site class 0 1
- Proportion p0 p1
- ? ratio ?0lt1 ?11
- M2a (selection)
- Site class 0 1 2
- Proportion p0 p1 p2
- ? ratio ?0lt1 ?11 ?2gt1
Modified from Nielsen Yang (1998), where ?00
is fixed
15Human MHC Class I data192 alleles, 270 codons
Model ? Parameter estimates M1a
(neutral) ?7,490.99 p0 0.830, ?0 0.041
p1 0.170, ?1 1 M2a (selection)
?7,231.15 p0 0.776, ?0 0.058 p1
0.140, ?1 1 p2 0.084, ?2 5.389
Likelihood ratio test of positive selection 2??
2 ? 259.84 519.68, P lt 0.000, d.f. 2
16Posterior probabilities for MHC (M2a)
1725 sites identified under M2a
18There are a few wrong ways for detecting
selection, one of which is sliding windows.
19Sliding window analysis (mouse-rat BRCA1)
20Sliding window analysis (fake data)
21Two trends in sliding window analysis
- Both dS and dN fluctuate smoothly (because
consecutive windows overlap) - dS fluctuates more than dN (because there are
fewer silent than replacement sites)
Sliding windows may be useful for displaying
trends that are known to exist, but is misleading
if used to detect trends.
22- Orthodox statistical analysis
- formulate a biological hypothesis
- design the experiment collect data
- test whether the data are compatible with the
hypothesis - The more-common way of data analysis in biology
- a large amount of data, no a priori hypothesis
- filter and plot data to identify unexpected
patterns - test the patterns using statistical tests
23Acknowledgment (sliding-window analysis)
Karl Schmid