Title: Models of molecular evolution
1Models of molecular evolution
2Steps in evaluating one tree
- Pick a set of branch lengths
- Calculate the ln-likelihood of character pattern
i across all possible histories - Add the LnL of each character pattern up to get
the overall likelihood - Adjust branch-lengths, substitution parameters,
etc. so as to maximize LnL - The result is the trees Likelihood score
3Typical Simplifying Assumptions
- Stationarity
- Reversibility
- Site independence
- Markovian process (no memory)
4The simplest model of molecular evolution
Jukes-Cantor
Instantaneous rate matrix (Q-matrix)
5Calculating probabilities of change
- To convert the Q matrix into a matrix giving the
probability of starting at state i and ending in
state j, t time units later uses the formula
P(t) eQt
6The simplest model of molecular evolution
Jukes-Cantor
Substitution probability matrix (P-matrix)
7More complicated (realistic) models for DNA
- Allow deviation from equiprobable base
frequencies - HKY85 F81GTR
- Allow two substitution types (ti and tv)
- K2P HKY85
- Allow for six substitution types
- GTR
8Relationship among models
9Accommodating rate heterogeneity
- Allow different subsets of sites to have
different rates - Invariant-sites model
- Some characters assigned a rate of 0, remaining
characters analyzed as usual - Proportion of invariant characters estimated by
ML - Discrete approximation to a gamma-distribution
10Summary of the Gamma correction
- The gamma function has a scale parameter and
shape parameter (?) scale parameter 1/? - ? represents variation in rates
- Very high values all characters have rate 1
- Low values ( 0.5) most characters change little
- Value of 0 every character has its own rate
- Estimate the value of ? that maximizes L
11(No Transcript)
12Discrete approximation
- Divide the distribution into N equal sets
- Assign the median rate for the set to all sites
in the set - Empirically it performs well with only four rate
categories - Adding categories does not add parameters
13Objective criterion for choosing a model of
molecular evolution
- Pick a more complex model (one with extra
parameters) only if the gain in likelihood is
more than would be expected - Likelihood ratio test Twice the ratio of the
likelihoods under the two models follows the
chi-square distribution with the number of
parameters equal to the number of extra free
parameters in the more complex model.
14Relationship between MP and ML
- One argument - MP is inherently nonparametric ?
No direct comparison possible - MP is an ML model that makes particular
assumptions
15The Goldman (1990) model(see Lewis 1998 for more)
- We force all branch lengths to be equal
- The Likelihood for a character only includes the
set of ancestral states that maximizes the
likelihood
16Why use MP
- The model is clearly less realistic, but
- We can do more thorough searches and data
exploration (computational efficiency) - Robust results will usually still be supported
17Why use ML
- The model (assumptions) are explicit
- We can statistically compare alternative models
- We can conduct parametric statistical tests
(under the assumption that we have used the
correct model) - But, even the most complex model is still
unrealistically simple