Title: INTRODUCTION TO ASSOCIATION MAPPING
1INTRODUCTION TO ASSOCIATION MAPPING
2- We have a set of inbred lines or varieties
- We have genotyped them with a large set of
markers - We also have phenotypic data of the lines for
several traits - And now What?
3- We will take advantage of the Linkage
Disequilibrium (LD) to identify genetic regions
associated with our trait of interest - Association mapping is also called Linkage
Disequilibrium mapping
4Identify associations between markers and
phenotypes without the need to develop specific
populations
Marker Distance Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7 Line 8 Line 9 Line 10 Line 11 Line 12 Line 13 Line 14 Line 15 Line 16
_3_0363_ 0 A B B A A A B A B B A B B B B B
_1_1061_ 0.8 A B B A A A B A B B A A A B B A
_3_0703_ 1.5 B A A B B B A B A A B B B B B B
_1_1505_ 1.5 B A A B B B A B A B B B B B B B
_1_0498_ 1.5 B B B B B B B B B B B B B B B A
_2_1005_ 3.8 A B B A A A B A B A A B B B B B
_1_1054_ 3.8 A A A A A A A A A B A A A A A A
_2_0674_ 6 A B B A A A B A B A A A A A A B
_1_0297_ 8.8 A A B B B B B A A A A A A A A B
_1_0638_ 10.7 A A B B B B B A A B A A A A A A
_1_1302_ 11.4 B A A A B B A A A B A B B B B A
_1_0422_ 11.4 B A A A B B A A A B A B B B B A
_2_0929_ 15.3 A B B B A A B B B A B A A A A B
_3_1474_ 15.4 A B B B A A B B B A B A A A A A
_1_1522_ 17.3 A B B B A A B B B A B A A A A A
_2_1388_ 17.3 A A A A A A A A A A A A A A A A
_3_0259_ 18.1 B B B B B B B B B B B A A A A A
_1_0325_ 18.1 B B B B B B B B B B B A A A A A
_2_0602_ 20.8 A A B A A A A B A B A A A A A A
_1_0733_ 23.9 B B B B B B B B B B B A A A A A
_2_0729 23.9 B B B B B B B B B B B A A A A A
_1_1272_ 23.9 A B B B A A B B B B B B B B B B
_2_0891_ 26.1 A A A A A A A A A B A A A A A A
_2_0748_ 26.6 B B B B B B B B B A B B B B B B
_3_0251_ 27.4 A B A A A B A A A B A A A B A A
_1_0997_ 35.5 B B A A A B B B B B B B B B B B
_1_1133_ 41.8 B B A A A B B B B A B A A A A A
_2_0500_ 42.5 A A A A A A A A A B A B B B B B
_3_0634_ 43.3 B B B B B B B B B A B A A A A A
10
Desease severity
5
0
5- Definition of Linkage Disequilibrium is very
simple - is the non-random association of alleles at
different loci
Equilibrium
Disequilibrium
6Equilibrium
Disequilibrium
Locus 1
Locus 2
Locus 3
Locus 4
Locus 5
Locus 1
Locus 2
Locus 3
Locus 4
Locus 5
Random mating population with loci segregating
independently
- Non random mating population LD due to selection,
mutation, drift/sampling, population structure
7How do we measure LD?
- The LD is measured with a parameter called D.
- If alleles at different loci are not inherited
independently, then - PAB ? PA x PB and
DAB PAB PA x PB - (PA and PB are allele frequencies and PAB is the
haplotype frequency) - Standarized measures of LD D and r2
for D lt 0
for D gt 0
8Line
a a
A A
a a
a a
a a
a a
A A
a a
a a
a a
a a
a a
A A
a a
a a
A A
a a
A A
a a
A A
A A
a a
a a
a a
a a
a a
A A
a a
A A
A A
b b
B B
B B
b b
b b
b b
B B
b b
B B
b b
b b
B B
B B
b b
B B
B B
b b
B B
b b
B B
b b
b b
b b
b b
b b
B B
B B
B B
B B
B B
Locus 1
Locus 2
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
Allele frequencies PA 10/30 Pa 20/30 PB
15/30 Pb 15/30
Haplotype frequencies PAB 9/30 PaB 6/30 PAb
1/30 Pab 14/30
DAB PAB PA x PB 9/30 (10/30 x 15/30)
0.13
9Spring barley Two rows Chromosome 5H
r2
Distance (bp)
10(No Transcript)
11Extension of LD
Humans 80kb (Europeans) 5kb (Nigerians) Outcrossing
Cattle gt 10 cM Outcrossing
Arabidopsis 250 kb Selfing
Maize 1 kb (Diverse maize) 1.5 kb (diverse inbred lines) gt100 kb (Elite lines Outcrossing
Barley Up to 100kb Selfing
Flint-Garcia et al., Annu. Rev. Plant Biol. 2003.
5435774
12- Factors that increase LD
- mutation
- mating system (self-pollination),
- population structure
- admixture
- relatedness (kinship)
- small founder population size or genetic drift
- selection (natural, artificial, and balancing)
- Factors that decrease LD
- high recombination and mutation rate
- recurrent mutations
- outcrossing
13Mutation provides the original material for
producing polymorphism that will be in LD
Allele b appears on gamete carrying A A and b
will appear together
14- Mating system
- Generally LD decays more rapidly in outcrossing
species compared to selfing, where individuals
are likely to be homozygous - In selfing species, most recombination occurs
between identical haplotypes, as a result of high
individual homozygosity, and thus these events do
not reduce LD - Selfing reduces the rate at which LD breaks down
- When loci are closely linked in a selfing
population they remain in high LD for many
generations
Selfing, little or no recombination
Outcrossing 0.00 Selfing 0.99
Little recombination 0.05 High recombination
0.5
Outcrossing, high recombination
15- Drift / Sampling
- In small populations the effects of genetic drift
results in the loss of rare allelic combination,
which increases LD. - Sampling increases or reduces certain allelic
combinations by chance
- Selection
- Strong selection at a locus is expected to reduce
diversity and increase LD in the surrounding
region - Selection operating on a gene will increase LD
and reduce diversity in the vicinity of that
gene. Alleles flanking the selected gene will be
fixed. - Can cause LD also between unlinked loci typical
result of coselection of loci during breeding for
multiple traits
16(No Transcript)
17LOD
18(No Transcript)
19- What information we need to know the association
mapping analysis? - Genotypic
- Linkage disequilibrium decay
- Number of markers and Marker density
- Quality of the data missing values, minor allele
frequency - Phenotypic
- Quantitative or qualitative traits
- Heritability of the trait, repeatability
- Population
- Structure
- Kinship
20- Genotypic Information
- Linkage disequilibrium decay.
- The power of detection is highly influenced by
the LD between the QTL and the marker
r2
r2
10 kb
100 kb
Physical distance
Physical distance
21- Marker density
- The extend of LD shows the expected r2 at a given
distance - According to it, it is important to chose an
adequate marker density to increase the power of
detection
r2
r2
10 kb
100 kb
Physical distance
Physical distance
22- Quality of the data
- Number of individuals with small samples sizes,
the probability of a significant association
between maker and QTL is high.
Marker Distance Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7 Line 8 Line 9 Line 10 Line 11 Line 12 Line 13 Line 14 Line 15 Line 16
_3_0363_ 0 A B B A A A B A B B A B B B B B
_1_1061_ 0.8 A B B A A A B A B B A A A B B A
_3_0703_ 1.5 B A A B B B A B A A B B B B B B
_1_1505_ 1.5 B A A B B B A B A B B B B B B B
_1_0498_ 1.5 B B B B B B B B B B B B B B B A
_2_1005_ 3.8 A B B A A A B A B A A B B B B B
_1_1054_ 3.8 A A A A A A A A A B A A A A A A
_2_0674_ 6 A B B A A A B A B A A A A A A B
_1_0297_ 8.8 A A B B B B B A A A A A A A A B
_1_0638_ 10.7 A A B B B B B A A B A A A A A A
_1_1302_ 11.4 B A A A B B A A A B A B B B B A
_1_0422_ 11.4 B A A A B B A A A B A B B B B A
_2_0929_ 15.3 A B B B A A B B B A B A A A A B
_3_1474_ 15.4 A B B B A A B B B A B A A A A A
_1_1522_ 17.3 A B B B A A B B B A B A A A A A
_2_1388_ 17.3 A A A A A A A A A A A A A A A A
_3_0259_ 18.1 B B B B B B B B B B B A A A A A
_1_0325_ 18.1 B B B B B B B B B B B A A A A A
_2_0602_ 20.8 A A B A A A A B A B A A A A A A
_1_0733_ 23.9 B B B B B B B B B B B A A A A A
_2_0729 23.9 B B B B B B B B B B B A A A A A
_1_1272_ 23.9 A B B B A A B B B B B B B B B B
_2_0891_ 26.1 A A A A A A A A A B A A A A A A
_2_0748_ 26.6 B B B B B B B B B A B B B B B B
10
Desease severity
5
0
23- Quality of the data
- Number of individuals with small samples sizes,
the probability of a significant association
between maker and QTL is high.
Marker Distance Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7 Line 8
_3_0363_ 0 A B B A A A B A
_1_1061_ 0.8 A B B A A A B A
_3_0703_ 1.5 B A A B B B A B
_1_1505_ 1.5 B A A B B B A B
_1_0498_ 1.5 B B B B B B B B
_2_1005_ 3.8 A B B A A A B A
_1_1054_ 3.8 A A A A A A A A
_2_0674_ 6 A B B A A A B A
_1_0297_ 8.8 A A B B B B B A
_1_0638_ 10.7 A A B B B B B A
_1_1302_ 11.4 B A A A B B A A
_1_0422_ 11.4 B A A A B B A A
_2_0929_ 15.3 A B B B A A B B
_3_1474_ 15.4 A B B B A A B B
_1_1522_ 17.3 A B B B A A B B
_2_1388_ 17.3 A A A A A A A A
_3_0259_ 18.1 B B B B B B B B
_1_0325_ 18.1 B B B B B B B B
_2_0602_ 20.8 A A B A A A A B
_1_0733_ 23.9 B B B B B B B B
_2_0729 23.9 B B B B B B B B
_1_1272_ 23.9 A B B B A A B B
_2_0891_ 26.1 A A A A A A A A
_2_0748_ 26.6 B B B B B B B B
10
Desease severity
5
0
24Quality of the data Minor allele frequency
Line
Locus 2
Locus 1
a a
a a
a a
a a
a a
a a
a a
a a
a a
a a
a a
a a
a a
a a
a a
A A
a a
a a
a a
a a
a a
b b
b b
b b
b b
b b
b b
b b
b b
b b
b b
b b
B B
b b
b b
b b
b b
b b
b b
b b
b b
b b
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
Two loci can be completely unlinked and still
show high LD
25Quality of the data Missing data
b
-
b
b
B
b
B
b
b
B
-
b
b
-
b
B
b
b
-
b
-
b
b
b
b
b
26- What information we need to know the association
mapping analysis? - Genotypic
- Linkage disequilibrium decay
- Number of markers and Marker density
- Quality of the data missing values, minor allele
frequency - Phenotypic
- Quantitative or qualitative traits
- Heritability of the trait, repeatability
- Population
- Structure
- Kinship
27- Phenotypic
- Quantitative or qualitative traits
- One or more QTL involved
- The higher the effect of the QTL, the higher the
power of detection - Quantitative traits usually many genes involved
of small effect - The problem of epistatic traits
Heritability of the trait, repeatability
h2Vgenotipic/Vphenotypic
28The problem of epistatic traits
Phenotype heading date
Line
VRN1
VRN2
1
62
a
c
VRN1 and VRN2 located in different chromosomes
2
152
A
c
3
59
a
c
4
58
a
c
5
60
A
D
6
60
No association between individuals genes (VRN1 or
VRN2) and heading date
a
c
7
57
a
D
8
64
a
c
9
151
A
c
10
59
a
D
11
58
a
D
12
152
However, late heading date only when haplotype Ac
is present
a
c
13
60
a
c
14
151
A
c
15
58
a
c
16
149
A
c
17
64
A
D
18
58
a
c
19
154
A
c
20
58
a
c
21
63
a
D
60
a
22
c
153
A
23
c
58
a
24
c
57
a
25
c
64
a
26
c
29- What information we need to know the association
mapping analysis? - Genotypic
- Linkage disequilibrium decay
- Number of markers and Marker density
- Quality of the data missing values, minor allele
frequency - Phenotypic
- Quantitative or qualitative traits
- Heritability of the trait, repeatability
- Population
- Structure
- Kinship
30Population Structure
The classical example of interference by
population structure
- Study of type 2 diabetes in 2 tribes of Native
Americans from Arizona - A correlation between a haplotype at the
immunoglobulin G locus and reduced diabetes - However on further analysis it was found that
those with diabetes had a lower proportion of
European ancestry - And that the haplotype associated with reduced
diabetes was more prevalent in Europeans - When the analysis was restricted to individuals
with similar European ancestry, the association
was no longer detected.
Knowler WC, et al. 1988. Am. J.Hum. Genet.
4352026
31- Population Structure
- Similar structure exists in plants
- Breeding history of many important crop species
and limited gene flow have created complex
stratification within the germplasm. - Different geographic origin of the germplasm
causes population structure (usually natural
selection tends to fix alleles at many loci
related to adaptation). - Also the destination of the crop, growth habit,
certain morphological traits. - This is a common cause of spurious associations
32- How can we allocate individuals to
sub-populations? - First, we need to know in advance how many
sub-populations there are. - If unknown, this can be estimated
- The allocation process is repeated for different
possible numbers and the best fitting selected.
33- The computer program STRUCTURE
- Uses computationally intensive methods to
partition individuals into populations. - Many individuals or lines will not belong
uniquely to one, but will be the descendents of
crosses between two or more ancestral
populations. - STRUCTURE also estimates the proportion of
ancestry attributable to each population.
34(No Transcript)
35The effect of kinship
y Xß Qv Zu e
Xß includes all fixed effects population means,
environments, and marker allele effects
Q is a subpopulation incidence matrix v are
estimates of subpopulation mean effects
There is a degree of relatedness not captured by
population structure u is the polygenic effect
gnerated by othre loci unlinked to the one being
tested