Title: Coalescent
1Coalescent
2Classical Population Genetics
- How do evolutionary forces affect future change
in allele and genotyping frequencies. - Mutation
- Natural selection
- Random genetic drift
- Migration and population growth
3Coalescent Theory
- What past evolutionary forces have acted to give
us the scope and distribution of genetic
variation we see today? - The major difference
- Classical population genetics focuses on
properties of the entire population - While coalescent theory on those of a sample from
the population
4divergence
coalescent
5Terminology
- Gene Genealogy , Genealogy
- The phylogeny of a sample from a population.
- The Most Recent Common Ancestor (MRCA)
- The root of the genealogy
- Coalescent event
- When two alleles coalesce, it is called a
coalescent event. - The Coalescent time
- The number of generations between successive
coalescent events - The Infinite Allele Model
- Each mutation create an allele that has never be
present in the population.
6The Wright-Fisher model The genes of next
generation are random sample with replacement
from the gene pool of current generation.
Current Generation
P1/(2N)
Next Generation
7The coalescence of two sequence
t-1 generation (2N sequence)
P1/(2N)
P1-1/(2N)
t generation (2N sequence)
8The probability that the two sequence came from a
single ancestral sequence sequence t2t1
generation ago is
(1-p2)(1-p2)p2(1-p2)tp2
(1- )t
9Continuous approximation Advantage Make the
mathematics simple and simulation faster. Note
that e-x1-x when x is small. Therefore the
distribution of t2 can be approximation by a
exponential distribution.
-
Properties
2
10Generalization i-coalescent time The
distribution of ti
Properties
11The expected total of a sample genealogy
is where
i
i
i
e
where
e
e
e
e
12Mutations in a genealogy µ the mutation rate per
sequence per generation, The probability that
number ?of mutations in branch of length ?is
13The number of segregating sites Let K be the
total number of mutation that occur in a
genealogy. Then given that the total tree length
is T, K is a Poisson variable with mean equal to
Tµ .That is
Consider all possible values for T, we have
14The age of the most recent common ancestor
Since
We have
Note when simple size (n) is large, E(T) is
approximately equal to 4N.
15Major population genetics models that have been
studied
- The neutral Wright-Fisher model
- Recombination
- Selection
- Population subdivision and Migration
- Population growth
- Partially selfing
16Traditional summary statistics
- Heterozygosity
- The probability that two randomly selected
alleles are different. - Number of alleles
- The number of distinct alleles in a sample.
- Proportion of polymorphic loci
- The proportion of loci that are polymorphic.
Traditionally a locus is said to be polymorphic
if the most frequent allele is less than , say
90 in the sample.
17- Measure more suitable for DNA polymorphism
- The number of segregating site K
- The number of nucleotide sites that are variable.
- The mean number of nucleotide differences ?
- The average number of nucleotide differences
between a pair of sequences is intuitive and has
interesting statistical properties - Frequency spectrum
- Mutations in a genealogy can be partitioned into
different categories.
18The mean nucleotide difference between two
sequences
Where dij is the number of differences between
sequence i and j.
Seq1 AAGCTTTCC Seq2 AAGCATTCC Seq3 AACCATTCC
d121 d132 d231
19The mean value of dij is the same as the expected
number of segregating sites in a sample of two
sequences.So
20Estimating ?
The quantity ?4Nµ is the most important
parameter for the evolution of a DNA region in a
population. Wattersons estimator is defined as
Tajimas esimator is difined as