Title: Association Studies To Locate Human Disease Genes
1Association Studies To Locate Human Disease Genes
- Wentian Li, Ph.D
- The Robert S Boas Center for Genomics and Human
Genetics - North Shore LIJ Institute for Medical Research
-
March 08, 2005
2GENE
PHENOTYPE/DISEASEENVIRONMENT
3GENETIC MARKERGENEPHENOTYPE/DISEASEENV
IRONMENT (controlled, fixed)
Linkage disequilibrium
4Early history of association analysis (1921)
- blood type (ABO) and disease association
-
- JA Buchanan, ET Higley (1921) "The relationship
of blood groups to disease", British Journal of
Experimental Pathology 2247-255.
5Early history of association analysis (1945)
-
- The suggestion to use ABO blood type/secretor
polymorphism to detect association with diseases - EB Ford (1945), "Polymorphism", Biological
Reviews, 2073-88.
6(No Transcript)
7Early history of association analysis (1953-54)
- Ian Aird, HH Bentall, JA Fraser-Roberts (1953),
"A relationship between cancer of stomach and the
ABO blood groups", British Medical Journal,
1799-801. - I Aird, HH Bentall, JA Mehigan, JAF Roberts
(1954), "The blood groups in relation to peptic
ulceratiuon and carcinoma of the colon, rectum,
breast and bronchus an association between the
ABO groups and peptic ulceration", British
Medical Journal, 2315-321.
8Early history of association analysis (1960s)
- Polymorphism in Human Leukocyte Antigen (HLA)
system (also known as Major Histocompatibility
(MHC)) and disease association - International Histocompatibility Workshop (first
one in 1964)
9Divergence between linkage and association
analysis for human disease gene detection
(1970s-1980s?)
- Both are based on the same principle that the
genetic polymorphism (itself may not have
function) and the disease gene (it has function)
lie close to each other on the chromosome. - Only the techniques are different
- Association (and linkage disequilibrium) became
mainly a topic in population genetics (with the
exception of HLA-disease association analysis)
10Differences between linkage analysis and
association analysis
- Linkage analysis is based on pedigree data
- Association analysis is based on population data
- Linkage analyses rely on recombination events in
action - Association analyses rely on ancestral
recombinations - The statistic is linkage analysis is to count the
number of recombinants and non-recombinants - The statistical method for association analysis
is statistical correlation
11The domination of linkage analysis (1980s?)
- The easy determination for restriction fragment
length polymorphism (RFLP) made linkage analysis
popular again - Linkage analysis helped to locate chromosomal
regions for dozens of rare Mendelian diseases (in
1983, the first disease gene, for Huntington
disease, was mapped ) - Even easier for typing and denser genetic marker
microsatellite markers
12Association analysis was brought back to disease
mapping (1990s). I. Family-based association
- The most often criticized aspect of association
analysis, its inability to deal with population
stratification, was thought to be solved by the
family-based design - Genotype-based haplotype relative risk (Falk and
Rubinstein, 1987) - Haplotype-based haplotype relative risk
(Terwilliger and Ott, 1992) - McNemar test (Terwilliger and Ott, 1992),
Transmission disequilibrium test (TDT) (Spielman,
McGinnis, Ewen, 1993)
13Association analysis was brought back to disease
mapping (1990s). II. Weaker signal in complex
diseases
- TDT is shown to be more powerful than the
affected-sib identical-by-descent sharing method
(a nonparametric linkage analysis) for complex
diseases (diseases with lower genotypic relative
risk) - N Risch, K Merikangas (1996), "The future of
genetic studies of complex human diseases",
Science, 2731516-1517
14Statistical genetic methods for disease gene
identification
15Association studies
- Association between risk factor and disease risk
factor is significantly more frequent among
affected than among unaffected individuals - In genetic epidemiology
- Risk factors alleles/genotypes/haplotypes
16Association studies
- Candidate genes (functional or positional)
- Fine mapping in linkage regions
- Genome wide screen
17Candidate gene analysis
- Direct analysis
- Association studies between disease and
functional SNPs (causative of disease) of
candidate gene
18Candidate gene analysis
- Indirect analysis
- Association studies between disease and random
SNPs within or near candidate gene - Linkage Disequilibrium mapping
19Case-control studies ?2 test
Risk factor
contingency table
Test of independence ?2 ? (O-E)2 / E with
1 df
20Case-control studies ?2 test
2x3 contingency table
Genotypes
AA Aa aa Cases nAA nAa naa N Controls
mAA mAa maa M tAA tAa taa NM
Test of independence ?2 ? (O-E)2 / E with
2 df
21Case-control studies ?2 test
2x2 contingency table
Alleles
A a Cases nA na 2N Controls
mA ma 2M tA ta 2(NM)
Test of independence ?2 ? (O-E)2 / E with
1 df
22Hardy-Weinberg Equilibrium
Biallelic locus A, a genotypes AA, Aa,
aa Allele frequencies A P(A) p a P(a)
q Genotype frequencies are in HWE
if AA P(AA) p2 Aa P(Aa)
2pq aa P(aa) q2
23Haplotypes
GENOTYPES
Locus 1
2
1
3
Locus 2
6
1
1
5
9
1
7
4
9
1
Identification of phase
6
2
9
1
7
2
1
2
1
2
7
6
1
4
1
7
1
8
1
8
1
4
Locus N
1
0
1
0
24Statistical significance of a correlation versus
correlation strength
- Statistical significance is usually measured by
p-value the probability for observing the same
amount of correlation or more if the true
correlation is zero. - Correlation strength can be measured by many many
quantities D, D, r2 - Correlation strength between a marker and the
disease status is usually measured by odd-ratio
(OR) - The 95 confidence interval (CI) of OR contains
both information on strength and significance - When the sample size is increased, typically the
p-value can become even more significant, whereas
OR usually stays the same (but 95 CI of OR
becomes more narrow).
25Graphic representation of LD
r2
D
GOLD
26Main Issues in Association Analysis
- The association is typically detected between a
non-function marker and the disease, instead of
the disease gene itself and the disease status.
(non-direct role of the disease gene in
association analysis) - When the disease (case) group and the normal
(control) group both are a mixture of
subpopulations with a different proportion of
mixing, even markers not associated with the
disease will exhibit spurious association
(heterogeneity)
27Zondervan Cardon, 2004
28Solution to the first issue
- Choose the marker, haplotype, to have a matching
(allele, haplotype, ) frequency as the disease
gene. - Whenever possible, typing a marker that is also
functional (e.g. coding SNP, functional SNP,
regulatory SNP)
29Association due to population stratification
Marchini et al, 2004
30Well-known problem when case/control groups
consist of two different subpopulations with
different mixing proportion
- Example comparing peoples height between two
places 1. prison, and 2. nurse school - In prison, maybe 80 are men
- In nursing school, maybe 80 are women
- Men are on average taller than women
- People in prison are taller than people in nurse
school - But the cause of this difference is due to the
different mixing proportions, not due to staying
in prison makes people taller
31Solution to the second issue
- Try to use people from the same population in
both case and control group. - Use neutral marker to test whether subpopulations
exist - If possible use an isolated population (the extra
benefit is to reduce the heterogeneity in the
case group) - Use family-based association design (the
disadvantage is that it is more costly, and
parents of late-onset patients are hard to find)
32Lee et al. Gene and Immunity (2005)
33dis.e.qui.lib.ri.um, n. Loss or lack of stability
or equilibrium
link.age, n. (genetics) An association between
two or more genes such that the traits they
control tend to be inherited together.
as.so.ci.a.tion, n. 1. The act of associating or
the state of being associated.
cor.re.la.tion, n. (statistics) the simultaneous
change in value of two numerically valued random
variables
ASSOCITION IS THE LEAST RIGOROUSLY DEFINED WORD!
34Criswell et al. Am J Hum Genetics (2005)