Title: Design Considerations in Large-Scale Genetic Association Studies
1Design Considerations in Large-Scale Genetic
Association Studies
- Michael Boehnke,
- Andrew Skol, Laura Scott, Cristen Willer,
- Gonçalo Abecasis, Anne Jackson,
- and the FUSION Study Investigators
- Department of Biostatistics
- Center for Statistical Genetics
- University of Michigan
2Outline
- Assess the utility of HapMap samples for tagSNP
selection in a study of type 2 diabetes in
Finnish subjects - Discuss the impact of several design factors on
cost and efficiency of genome-wide association
(GWA) studies
3FUSION Study Finland-United States
Investigation of NIDDM Genetics
National Public Health Institute, Helsinki USC
Keck School of Medicine, Los Angeles National
Human Genome Research Institute,
Bethesda University of Michigan School of Public
Health, Ann Arbor University of North Carolina
School of Medicine, Chapel Hill
4Chromosome 14 SNP Selection
- Used early HapMap (May 2004) to select tagSNPs in
18 Mb linkage interval on chr 14 - MAF gt .05, Illumina design score gt .40
- Unselected SNPs had r2 gt .8 with ?1 tagSNP
- Added annotation-based SNPs
- Double tagged large bins, filled large gaps
5Chromosome 14 SNP Selection
6Utility of HapMap for tagSNP Selection for
Finnish Subjects
- Question How comparable were allele, haplotype
frequency and r2 in HapMap, Finnish data? - Compared HapMap data and 1448 Finnish samples
from FUSION and Finrisk 2002 studies - Poster 1621, Willer et al., Friday 130 ? 330 pm
7Allele Frequencies FUSION vs. HapMap
CEU
YRI
CHB
JPT
8Allele Frequencies FUSION vs. CEU
7.5 SNP frequencies differ at p lt .01
r .98
9LD r2 FUSION vs. CEU
r .91
10Haplotype Frequencies FUSION vs. CEU
r .97
11Summary Chromosome 14 SNP Selection
- CEU excellent basis for tagSNP selection in Finns
- Strong correlation between allele frequencies,
haplotype frequencies, LD in two samples - Excess of significant allele and haplotype
frequency differences (7 at .01 level), but
mostly small - Nearly all common haplotypes (frequency gt .05) in
one sample present in both samples - 579/583 from CEU in FUSION
- 557/563 from FUSION in CEU
12Design of Genome-wide Association Studies
- GWA provides unprecedented opportunity to
identify genetic variants predisposing to disease
- Enabled by HapMap, ? genotyping costs
- Since we may type 100s-1000s of samples on 100Ks
of SNPs, efficient study design critical - Examine two-stage designs for large-scale genetic
studies (see Satagopan, Elston, Thomas)
13One- and Two-Stage GWA Designs
Two-Stage Design
One-Stage Design
SNPs
SNPs
1,2,3,,M
1,2,3,,M
1,2,3,,N
1,2,3,,N
?samples
Stage 1
Samples
Samples
Stage 2
?markers
14One-Stage Design
SNPs
Samples
Two-Stage Design
Replication-based analysis
Joint analysis
SNPs
SNPs
Samples
Stage 1
Stage 1
Samples
Stage 2
Stage 2
15Joint Analysis is More Powerful than
Replication-Based Analysis Skol et al., Friday
845, 180, Hall 3
One-stage power
- 300,000 markers genotyped on 1000 cases, 1000
controls - Multiplicative model, prevalence 10, GRR 1.4
16Factors that Influence Cost and Efficiency of
GWAs
- Fraction samples typed in Stage 1 (?samples)
- Fraction SNPs typed in Stage 2 (?markers)
- Stage 2 to Stage 1 per genotype cost ratio (R)
17For a two-stage GWA study, what is the optimal
fraction of samples genotyped in Stage 1
(?samples) ?
- Stage 2 per genotype cost
- R ???????????
- Stage 1 per genotype cost
- Case 1 R 1
- Case 2 R 1, 2, 5, 10
18Cost as a Function of Samples Typed in Stage 1
Per Genotype Cost Ratio R1 Fraction of Markers
Followed-up Varies to Ensure Constant Power
19For a two-stage GWA study, what is the optimal
fraction of samples genotyped in Stage 1
(?samples) ?
- Stage 2 per genotype cost
- R ???????????
- Stage 1 per genotype cost
- Case 1 R 1
- Case 2 R 1, 2, 5, 10
20 Cost as a Function of Samples Typed in Stage 1
Per Genotype Cost Ratio R 1, 2, 5, 10 Fraction
of Markers Followed-up Varies to Ensure Constant
Power
R10
R5
R2
R1
21Summary Two-Stage GWA Designs
- Two-stage GWA designs efficient, cost-effective
joint analysis more powerful than replication - For equal Stage 1, 2 per genotype costs (R1),
250K SNPs, genomewide significance ?.05,
genotype 20-30 of samples in Stage 1 - For Rgt1, less stringent significance, fewer SNPs,
genotype 30-40 SNPs in Stage 1
22Acknowledgements
- Chromosome 14 Cristen Willer, Anne Jackson
FUSION, CIDR, and HapMap investigators - Two-stage designs Andrew Skol, Laura Scott,
Gonçalo Abecasis - Thanks!
23(No Transcript)
24Excluded slides follow
25FUSION Chromosome 14 T2D Linkage
26Power of One- and Two-Stage Designs
27How does a change in significance level change
the optimal proportion of samples in Stage 1
(?samples)?
- Case 1 ?.05/250,000 genomewide significance
- Case 2 ?10/250,000 less stringent
significance - Case 2 ?.05/1,250 candidate gene significance
-
28Impact of Significance Level on Optimal
Proportion of Samples in Stage 1