Title: Genetic Variation in Populations
1Genetic Variation in Populations
2Outline
- The Biological Problem
- Variation in Human Populations
- Gene mapping
- ----linkage analysis ----association
analysis - Modeling Gene Frequencies in Populations
- ----Wright-Fisher model ----Coalecence model
3The Biological Problem
- Gene mapping
- Inferring evolutionary relationships between
organisms - ----based on Genetic Variations (a kind of
character, can be described by allele
frequencies) - ----consider the parameters and statistics
required to describe the genetic characters and
genetic changes in populations
4The Concept of Population
- Population is a localized collection of
individuals of a species that are capable of
exchanging the genes that characterize that
species. - 1. San people of southern Africa
- 2. Native American people (Indian)
- 3. Brown pelicans living on Anacapa Island
off - the coast of Southern California
5Variation in Human Populations
- The eight allele frequencies of the particular
locus D12S2070 in seven geographic regions of the
world
6Describing Variation
- The heterozygosity H for any locus within a
population is defined as - Here is the frequency of allele I
- Note a function of K and
7A Simple Example
8A Simple Example
9Population Structure
- Stratified (forward, subpopulations)
- Hierarchical (backward, groupings)
- (local population ? regional population
- ? world population)
- Relationships between heterozygosities calculated
as averages of subpopulation data and
heterozygosities calculated from pooled data
10- Suppose a total population is stratified by B
subpopulations. - Considering one locus having K alleles.
- Let be the average fraction of allele i in
the total population - Let be the fraction of allele i in
subpopulation b - The total population heterozygosity
11- The heterozygosity of subpopulation
- The average heterozygosity for the
subpopulations - Thus
12Gene mapping
- Recombination Chromosome pairs usually recombine
during gamete formation
13Linkage Analysis
- Family data containing affected individual
- Recombination rate (r) the probability that
alleles at two loci on a chromatid come from
different parental chromosomes - Genetic map distance (m) the expected number of
crossovers between the two loci -
14Relation between r and m
Assume
Note m r when r is very small
15Association Analysis
- Population data
- Linkage Disequilibrium(LD)
- --nonrandom a ssociation of alleles in
haplotypes - Two allele-two loucs LD
- --locus 1 locus 2
16Some properties of LD
17Regularized LD
is the square of the Pearson product-moment
correlation coefficient
18LD Decay Property
19Factors Affecting LD
- Recombination rate
- Mutation
- Genetic drift
- Natural selection
- Migration of population
20Modeling Gene Frequencies in Populations
- The Wright-Fisher Model
- The population size N is constant from
generation to generation. - Organisms are diploid (so there are 2N copies
of each gene). - All members of each generation reproduce
simultaneously generations do not overlap - Mating among individuals is random.
- Without mutation, migration, or selection
21How to form an offspring
- Choose an individual at random from the
population, - then, choose one if its gametes at random
- Return the chosen individual to the population
- Repeat the experiment
- This results in two gametes that form an
individual in the next generation - Repeated N times to form the next generation
22(No Transcript)
23Genetic Drift
- More precisely allelic drift
- --A statistical effect that results from the
influence that chance has on the survival of
alleles
24The Wright-Fisher Model as a Markov Chain
- Denote the number of A alleles in generation n by
, outcomes being 0, 1, 2, . . . , 2N. - Sequence X0,X1, . . . is a Markov chain, the
transition matrix of the chain is
25h(n) is the expected heterozygosity in the
population in generation n. This is the
probability that two genes chosen at random (with
replacement) are different alleles
26Coalescent
- Look generations backward
- Given a population as it exists now, and we may
want to make inferences about how it reached its
current state. - The coalescent (Kingman, 1982) is a very useful
stochastic process that allows us to model the
ancestry of genes in the population
27Coalescence Model
28- Allowing each gene in generation g - 1 to
choose its ancestor from among the 2N gene
copies that existed in generation g. - Some of the gene copies in generation g may be
chosen multiple times, and others may be chosen
not at all. - The process is repeated, going back from
generation g to generation g1. - Because some gene copies are not chosen in each
generation going back, the number of ancestors
becomes smaller and smaller until the lineages
coalesce to a single ancestor some number of
generations ago
29Coalescence for Pairs of Genes
- The time to the most recent common ancestor (
) for two gene sequences from the sampled
population - The expected number of pairwise differences
between a pair of sequences
30TMRCA for two gene sequences
2N gene exists in each generation
When N is large, let t g/2N
31The expected number of pairwise differences
between a pair of sequences
- X The number of mutations that occur along a
sequence of length g generations
Let
then
32Coalescence in multiple genes
- n genes taken from 2N genes
- The probability that the n genes have distinct
ancestors in the previous generation is
33- the time taken for the first coalescence
event of the n genes
- The expected time to the most recent common
ancestor of the sample of n genes is
34THANKS!
35(No Transcript)