Title: By Mireya Diaz
1The Coalescent Theory
- By Mireya Diaz
- Department of Epidemiology and Biostatistics for
EECS 458
2Agenda
- Basic concepts of population genetics
- The coalescent theory
- Coalescent process of two sequences
- Coalescent time
- Statistical inference
- Applications reconstruction of human
evolutionary history - Future venues
3Basic Concepts in Population Genetics
Mutation
Random genetic drift
Selection
4Basic Concepts in Population Genetics
- Mutation limited role in evolution due to its
slow effect, however contributes to the
maintenance of alleles in the population - Locus with 2 allelles A1 (p(n)) and A2
(q(n)1-p(n)) - Non-overlapping generations
- A1-gtA2 at rate u and A2-gtA1 at rate v (u, v
10-5, 10-6) - Allele can mutate most once/generation
- if initial gene freq. of A1p(0)
As n-gt8
equilibrium
5Basic Concepts in Population Genetics
- Random genetic drift change in gene frequency
due to random sampling of gametes from a finite
population. Important for small size populations - Each generation 2N gametes sampled at random
from parent generation - y(n) gametes of type A1, in absence of
mutation and selection
Wright-Fisher model
6Basic Concepts in Population Genetics
- Selection can act at different stages of the
life of an organism (e.g. differential fecundity,
viability) - Locus with 2 alleles A1, A2
- Three genotypes A1 A1 (w11), A1 A2 (w12), A2A2
(w22) - with fitness wij, relative survival chances of
zygotes of genotype AiAj - Under Hardy-Weinberg equilibrium
If w11gtw12gtw22 -gt A1 becomes fixed
w11ltw12ltw22 -gt A2 becomes fixed w11,w22ltw12
-gt overdominance, stable polymorphism w12lt
w11,w22 -gt underdominance, unstable
polymorphism, A1 or A2 becomes fixed
f(0)
7The Coalescent Theory
- Stochastic process continuous-time Markov
process - Large population approximation of Wright-Fisher
model, and other neutral models - Probability model for genealogical tree of random
sample of n genes from large population - Most significant progress in theoretical
population genetics (past 2 decades). Cornerstone
for rigorous statistical analysis of molecular
data from populations - Need of inferring the past from samples taken
from present population - Seminal work Kingman, J Appl Prob 19A27, 1982
8The Coalescent Theory Key Idea
- Start with a sample and trace backwards in time
to identify EVENTS in the past since the Most
Recent Common Ancestor (MRCA) in the sample - Consider sample of n sequences of a DNA region
for a population - Assume no recombination between sequences
- N sequences are connected by a single
phylogenetic tree (genealogy) where the rootMRCA
9The Coalescent Theory Usefulness
- Sample-based theory
- By-product development of highly-efficient
algorithms for simulation of samples under
various population genetics models - Particularly suitable for molecular data
- Estimate parameters of evolutionary models (vs.
history of specific locus phylogenetics)
10The Coalescent Process of Two Sequences
- Consider diploid organisms
- Wright-Fisher model
- Sequence in a population at a generation random
sample with replacement from those in the
previous generation - Mutations at locus of interest selectively
neutral (do not affect reproductive success, all
individuals likely to reproduce, all lineages
equally likely to coalesce) - P(coalescence at previous generation)?
- P1/2N, Neffective population size
- For haploid structures, use N rather than 2N
11The Coalescent Tree
Genealogical relationship of sample of genes
- Topology is independent of branch lengths
- Branch lengths are independent, exponential rvs
(waiting time between coalescent events) - Topology is generated by randomly picking
lineages to coalesce -gt all topologies are
equally likely
12The Coalescent Time
- Assume mutations in a given period Poisson
- mean time 2N generation between two sequences
- mean mutations in two sequences
- ? 4Nm (m mutation rate seq/generations)
- Underlying assumption randomly mating
- ( organisms with high mobility)
- Coalescent time time between two successive
coalescent events - Exponential variable, mean 2/k(k-1)
- k ancestral sequences between the two events
13Coalescent Tree Parameters
Expected total branch length of the tree
14The Coalescent Theory Statistical Inference
- Mutation rate
- Age of MRCA
- Recombination rate
- Ancestral population size
- Migration rate
15Reconstruction of Human Evolutionary History
- Goal estimate times of evolutionary events
(major migrations), demographic history
(population bottlenecks, expansions) - Haploid sequences mtDNA, Y chromosome
- Case study recent common ancestry of human Y
chromosome - Source Thomson et al. PNAS 2000 977360-5
- Estimations expected time to MRCA and ages of
certain mutations - Data 53-70 chromosomes, sequences variation at
three genes (SMCY, DBY, DFFRY) in Y chromosome
16Recent common ancestry of Y chromosome
- For ages of major events need mutation rate
estimate (SN substitution) - Substitutions between chimpanzee and human
sequences - Mutation rate per site per year No.
subst./2TsplitL - Tsplit time since chimp and human split (5M
years ago) - Assumptions selective neutrality of all changes
on Y since divergence
Summary of gene characteristics from sample
Source Table 1 from article () in no.
polymorphisms after removal of length variants,
repeat sequences, indels
17GENETREE Analysis
- Software www.stats.ox.ac.uk/stephens/group/softw
are.html - Estimate mean number of mutations ? 2Nem
- Ne effective number of Y chromosomes in
population - m mutation rate per gene per generation
- Also expected ages of mutation, time since MRCA
- Assumptions coalescent process,
infinitely-many-sites mutation (mutation rate low
enough -gt e/occurs at new site) - Four insertions, three deletions, two repeat
mutations (different rates from SN substitutions) - Only one segregating site in SMCY appeared to
have mutated gt1 -gt data fit infinitely-many sites
model
18Recent common ancestry of Y chromosome
MRCA distribution under constant population
MRCA distribution under exponential population
growth
1Expected age in Ne generations. 2Value in years
Ne25
19GENETREE Analysis
Expected ages of mutations in tree Mutation 1
47,000 (35,000 89,000) male movement out of
Africa Mutation 2 40,000 (31,000 79,000)
beginning of global expansion
20Future Venues
- Population genetics models incorporation of
migration, population growth, recombination,
natural selection - Longitudinal analysis
- Evolutionary analysis of quantitative trait loci
(QTL) - Properties of CT
- Accuracy of coalescent approximation under
combinations of population size, sample size,
mutation rate - Properties of estimators under MCMC
21References
- Handbook of Statistical Genetics, 2nd edition,
Vol.2 - Nature 2002 3380-390
- Theoretical Population Biology 1999 561-10.