Title: Molecular Clocks
1Molecular Clocks
Prediction of time from molecular divergence
2Molecular Clock
- Molecular divergence is ROUGHLY correlated with
divergence of time
3Evidence for Rate Constancyin Hemoglobin
from Zuckerkandl and Pauling (1965)
4- Given
- a phylogenetic tree
- branch lengths
- a time estimate for one (or more) node(s)
110 MYA
- Can we date other nodes in the tree?
- Yes... if the rate of molecular change is
constant across all branches
5The Molecular Clock Hypothesis
- Amount of genetic difference between sequences is
a function of time since separation - Rate of molecular change is constant (enough) to
predict times of divergence (within the bounds of
particular genes and taxa)
6Rate Constancy?
Page Holmes p240
7Rate Heterogeneity
- Rate of molecular evolution can differ between
- nucleotide positions
- genes
- genomic regions
- genomes within species (nuclear vs organelle)
- species
- over time
- If not considered, introduces bias into
time estimates
8Rate Heterogeneity among lineages
9Local Clocks?
- Closely related species often share similar
properties, likely to have similar rates - For example
- murid rodents on average 2-6 times faster than
apes and humans (Graur Li p150) - mouse and rat rates are nearly equal (Graur Li
p146)
10Rate Changes within a Lineage
11Identifying rate heterogeneity
- Tests of molecular clock
- Likelihood ratio test
- identifies deviance from clock but not the
deviant sequences - Relative rates tests
- compares rates of sister nodes using an outgroup
- Tajima test
- Number of sites in which character shared by
outgroup and only one of two ingroups should be
equal for both ingroups - Branch length test
- deviation of distance from root to leaf compared
to average distance
12Likelihood Ratio Test
- estimate a phylogeny under molecular clock and
without it - e.g. root-to-tip distances must be equal
- difference in likelihood 2Chi2 with n-2
degrees of freedom (n taxa in tree) - asymptotically
- when models are nested
13Relative Rates TestsSarich Wilson 1973, Wu and
Li 1985
- Tests whether distance between two taxa and an
outgroup are equal (or average rate of two clades
vs an outgroup) - need to compute expected variance
- many triples to consider, and not independent
(although modifications such as Li Bousquet
1992 correct for this) - Lacks power, esp
- short sequences
- low rates of change
- Given length and number of variable sites in
typical sequences used for dating, (Bromham et al
2000) says - unlikely to detect moderate variation between
lineages (1.5-4x) - likely to result in substantial error in date
estimates
14Relative Rates TestsSarich Wilson 1973, Wu and
Li 1985
Taxon 1
Taxon 1
0
Taxon 2
Taxon 2
Taxon 3 Outgroup
Taxon 3 Outgroup
15Relative Rates TestsSarich Wilson 1973, Wu and
Li 1985
H0 K01 K02 or K01 - K02 0 K13 K01
K03 (1) K23 K02 K03 (2) K12 K01 K02
(3) K01 (K13 K12 K23 )/2 (4) K02 (K12
K23 K13 )/2 (5) K03 (K13 K23 K12 )/2
(6) K01 K02 K13 - K23 Variance z K13 -
K23 \ var (K13 - K23) 1/2 Compare to normal
distribution
K01
Taxon 1
0
K02
Taxon 2
K03
Taxon 3 Outgroup
16Measuring Evolutionary time with a molecular clock
- Estimate genetic distance
- d number amino acid replacements
- Use paleontological data to determine date of
common ancestor - T time since divergence
- Estimate calibration rate (number of genetic
changes expected per unit time) - r d / 2T
- Calculate time of divergence for novel sequences
- Tij dij / 2r
17Perfect Molecular Clock
- Change linear function time (substitutions
Poisson) (variation is only due to stochastic
error) - Rates constant (positions/lineages)
- Tree perfect
- Molecular distance estimated perfectly
- Calibration dates without error
- Regression (time vs substitutions) without error
18Poisson Variance(Assuming A Perfect Molecular
Clock)
- If mutation every MY
- Poisson variance
- 95 lineages 15 MYA old have 8-22 substitutions
- 8 substitutions also could be 5 MYA
- Molecular Systematics p532
19Estimating Substitution Rate
- Calculate separate rate for each data set
(species/genes) using known date of divergence
(from fossil, biogeography) - One calibration point
- Rate d/2T
- More than one calibration point
- use regression
20Calibration Complexities
- Cannot date fossils perfectly
- Fossils usually not direct ancestors
- branched off tree before (after?) splitting
event. - Impossible to pinpoint the age of last common
ancestor of a group of living species
21Linear Regression
- Fix intercept at (0,0)
- Fit line between divergence estimates and
calibration times - Calculate regression and prediction confidence
limits - A regression line
- B1-B2 95 CI of regression line
- C1-C2 95 CI for predicted time values
- Molecular Systematics p536
22Molecular DatingSources of Error (assuming
constant rates)
- Both X and Y values only estimates
- substitution model could be incorrect
- tree could be incorrect
- errors in orthology assignment
- Poisson variance is large
- Pairwise divergences correlated (Molec
Systematics p534) - inflates correlation between divergence time
- Sometimes calibrations correlated
- if using derived calibration points
- Error in inferring slope
- Confidence interval for predictions much larger
than confidence interval for slope
23Working Around Rate Heterogeneity
- Identify lineages that deviate and remove them
- Quantify degree of rate variation to put limits
on possible divergence dates - requires several calibration dates, not always
available - gives very conservative estimates of molecular
dates - Explicitly model rate variation (relaxed clocks)
24Relaxing the Molecular ClockRutschmann 2006
(review)
- Likelihood analysis
- Assign each branch a rate parameter
- explosion of parameters, not realistic
- User can partition branches based on domain
knowledge - Rates of partitions are independent
- Nonparametric methods
- smooth rates along tree
- Bayesian approach
- stochastic model of evolutionary change
- prior distribution of rates
- Bayes theorem
- MCMC
25Multiple Gene Loci
- Trying to estimate time of divergence from one
protein is like trying to estimate the average
height of humans by measuring one human - --Molecular Systematics p539
- Ideally
- use multiple genes
- use multiple calibration points
26Even so, be Very cautious about divergence time
inferences
- Point estimates are absurd
- Sample errors often based only on the
difference between estimates in the
same study - Even estimates with confidence intervals
unlikely to really capture all sources of variance
27General References
- Reviews/Critiques
- Bromham and Penny. The modern molecular clock,
Nature Genetics, 2003. - Graur and Martin. Reading the entrails of
chickens...the illusion of precision. Trends in
Genetics, 2004. - Rutschmann.2006 Molecular dating of phylogenetic
trees A brief review of current methods that
estimate divergence times. Diversity and
Distributions - Textbooks
- Molecular Systematics. 2nd edition. Edited by
Hillis, Moritz, and Mable. - Inferring Phylogenies. Felsenstein.
- Molecular Evolution, a phylogenetic approach.
Page and Holmes.
28Rate Heterogeneity References
- Dealing with Rate Heterogeneity
- Yang and Yoder. Comparison of likelihood and
bayesian methods for estimating divergence times.
Syst. Biol, 2003. - Kishino, Thorne, and Bruno. Performance of a
divergence time estimation method under a
probabilistic model of rate evolution. Mol. Biol.
Evol, 2001. - Huelsenbeck, Larget, and Swofford. A compound
poisson process for relaxing the molecular clock.
Genetics, 2000. - Testing for Rate heterogeneity
- Takezaki, Rzhetsky and Nei. Phylogenetic test of
the molecular clock and linearized trees. Mol.
Bio. Evol., 1995. - Bromham, Penny, Rambaut, and Hendy. The power of
relative rates test depends on the data. J Mol
Evol, 2000.