Title: Pr
1Atelier INSERM La Londe Les Maures Mai 2004
DETECTING SELECTION FROM DNA
SEQUENCE POLYMORPHISM DATA
N. GALTIER CNRS UMR 5171 Génome, Populations,
Interactions, Adaptation Université Montpellier
2, France
galtier_at_univ-montp2.fr
2SEQUENCE POLYMORPHISM DATA
population (species)
3SEQUENCE POLYMORPHISM DATA
sample
population (species)
4SEQUENCE POLYMORPHISM DATA
5 genes
sample
DNA fragment (locus)
population (species)
....ACGGATAGTTAGTGACGATA...
....ACGTATAGCTAGTGACGATA...
....ACGTATAGCTAGTGACGATA...
....ACGGATAGCTAGTGACGATA...
....ACGGATAGCTAGTGACGATC...
....CCAGCTAGCTACTGAAGTTG...
outgroup
5MUTATIONS SEGREGATING IN A POPULATION (1)
sample
1
mutant allele frequency
NEUTRAL
0
time
6MUTATIONS SEGREGATING IN A POPULATION (2)
1
mutant allele frequency
NEUTRAL
0
1
mutant allele frequency
PURIFYING SELECTION
0
time
- a decreased substitution rate Purifying
(negative) selection results in - a decreased
amount of polymorphism - lower allele
frequencies
7MUTATIONS SEGREGATING IN A POPULATION (3)
1
mutant allele frequency
NEUTRAL
0
1
mutant allele frequency
ADAPTIVE SELECTION
0
- an increased substitution rate Adaptive
(positive) selection results in - a decreased
amount of polymorphism - higher allele
frequencies
8LINKAGE AND HITCH-HIKING
SELECTIVE SWEEP
sampled neutral locus
linked selected locus
Directional selection decreases polymorphism at
linked (neighbour) neutral sites by increasing
the apparent drift.
9LINKAGE AND HITCH-HIKING
SELECTIVE SWEEP
sampled neutral locus
linked selected locus
10DETECTING SELECTION BY SEEKING REGIONS OF
"LOW" POLYMORPHISM
Selection reduces polymorphism, but the level of
polymorphism is determined by other factors
including population size and mutation rate.
To make sure that selection is acting, one must
control for these nuisance factors.
11HITCH-HIKING MAPPING
POPULATIONS (distinct N's)
1
2
3
4
5
0.05
A
0.07
B
0.20
C
LOCI (distinct m's)
0
0.05
0.06
0.10
0.13
D
E
0.11
F
0.03
12THE HKA TEST
focal species
outgroup
focal species
outgroup
Locus B
Locus A
13THE McDONALD-KREITMAN TEST
focal species
outgroup
The ratio of nonsynonymous to synonymous is
higher between species (divergence) than within
species (polymorphism), when the two ratios
should be equal under neutrality positive
selection has promoted the fixation of
nonsynonymous changes.
14COALESCENCE THEORY FOCUSING ON SAMPLE
GENEALOGY
2N chromosomes
Time
15COALESCENCE THEORY THE STANDARD COALESCENT
The genealogy of a sample of size n at a neutral
locus in a panmictic population of constant size
2N should be like
2N (on average)
T2
4N (on average)
T3
T4
T5
where
- all topologies are equiprobable
- coalescence times Tis are exponential random
variables of expectation E(Ti)4N/(i.(i-1))
- mutations are superimposed onto the genealogy
according to a Poisson process
16THE COALESCENCE PROCESS HAS A HIGH VARIANCE
T2 distribution
Two realisations of the coalescent with equal Tn,
Tn-1, , T3, but distinct T2
17DEPARTURE FROM NEUTRALITY THE SELECTIVE
SWEEP EXAMPLE
sampled neutral
linked selected
SELECTIVE SWEEP
18DEPARTURE FROM NEUTRALITY THE SELECTIVE
SWEEP EXAMPLE
sampled neutral
linked selected
SELECTIVE SWEEP
19DEPAULIS HAPLOTYPE TEST
"partial" selective sweep partly star-like
genealogy
neutral genealogy
9 polymorphic sites 8 haplotypes
9 polymorphic sites 3 haplotypes
A partially star-like genalogy results in a
number of haplotypes lower than expected given
the number of polymorphic sites.
Other test statistics aiming at detecting
non-neutral shapes of genealogy were
proposed Tajima's D, Fu and Li's F, Fay and
Wu's H, ...
20DEMOGRAPHY vs SELECTION
Detecting a departure from the standard
coalescent means that at least one of its
assumptions are wrong. Neutrality,
unfortunately, is only one of them.
Demographic effects (departure from the
constant-population size assumption) can distort
genealogies in a way very similar to selection.
A bottleneck (sudden decrease of population size,
followed by a restauration of the former size),
for example, has consequences highly similar to
that of a selective sweep.
To distinguish multi-locus analysis. Demography
impacts the whole genome, while selection is
locus-specific.
21A LIKELIHOOD-BASED APPROACH
M1 neutral, constant size p parameters (q1,
..., qp)
M2 bottleneck p2 parameters (T, S, q1, ..., qp)
M3 selective sweep 3p parameters (T1, S1, q1,
... , Tp, Sp, qp)
Calculate and compare the likelihood (probability
of the data) under the three models using a
likelihood ratio test.
22WHAT I DID NOT TALK ABOUT
- subdivided populations, migration, isolation by
distance, hybrid zones, clines
- other forms of selection (e.g. balancing
selection)
- weak selection applying at many loci (e.g.
codon usage)
- (biased) gene conversion
- patterns of linkage disequilibrium, coalescent
with recombination
- microsatellites and other non-sequence genetic
markers