Title: Selecting a Subset of Markers to Carry Forward
1Selecting a Subset of Markers to Carry Forward
Follow-up to GWAs
Nancy J. Cox, Ph.D. The University of Chicago
2Overview
- Nitty-gritty decisions on thresholds of all types
- Primary analyses beyond the SNPs genotyped on
your platform - Additional information to consider in choosing
SNPs - External (linkage signals, other validation)
- Internal (bioinformatics)
3Nitty-Gritty Decisions on Thresholds of All Types
- Set thresholds prior to analysis, or use quality
flags in prioritizing SNPs for follow up? - Minor allele frequencies?
- HWE controls? HWE cases (for departures not
consistent with a genetic model)?
4Thresholds Before or Flags After Analysis?
- Using quality flags has the advantage of
preserving FLS (funny-looking signals) that may
be real, and/or CNVs - Under-appreciated challenge of this approach is
confusion in sharing data (what the top 100,
1000, etc signals are depends on how you weight
the flags)
5Follow Up Limited by Cost? Time? Samples? Variant
type?
- Testing UNtyped Alleles (TUNA) or imputation
approaches may eliminate need for follow up of
additional SNPs within original samples, but
CNV follow up may be different - Number of SNPs for follow up samples may be
predetermined, or significance or FDR threshold
may be predetermined, or decided afterward
6Overview
- Nitty-gritty decisions on thresholds of all types
- Primary analyses beyond the SNPs genotyped on
your platform - Additional information to consider in choosing
SNPs - External (linkage signals, other validation)
- Internal (bioinformatics)
7TUNA Nicolae
- Key ingredients are multi-locus measure of LD
and reference sample (HapMap)
8Haplotype T-A1-A2-A3 Frequency
H1 0 - 0 - 0 - 0 0.058
H2 1 - 0 - 1 - 0 0.350
H3 1 - 1 - 0 - 1 0.575
H4 1 - 0 - 0 - 1 0.017
9Advantages Disadvantages
- Can utilize existing information on LD and
haplotypes in HapMap data - No arbitrary block definitions
- 1 df test for each known variant with specified
uniqueness
- In silico comparisons require biological
validation - Cannot capture all of the information from
variation not yet known
10TUNA
- For high-density screens, can be used for in
silico follow-up - Set low threshold and TUNA type every SNP in
the vicinity of a signal to decide which to
actually type - Can convert lower density screens to higher
density - Really useful in enabling comparisons of studies
across disparate platforms
11TUNA
- For each untyped SNP, determine if that SNP
provides sufficiently unique information to
interrogate (r2lt0.7 to a typed SNP) - For each of those SNPs identify, using all
combinations of 2-, 3- and 4-locus haplotypes
comprised of platform SNPs within 100Kb on each
side, the smallest subset able to interrogate
genotypes with sufficient accuracy multi-locus
R2 value gt threshold (0.7 to 1.0)
12TUNA
- Primary template need be derived only once for
each high-throughput SNP set and each HapMap
release (but may choose to optimize for the set
of SNPs passing QC for any given study) - Deriving template takes a few hours on our local
cluster conducting full analyses, given template
takes less than 1 hour
13 Affymetrix 100K Set
HapMap Sample SNPs Typed r2gt.7 r2lt.7 Mdgt.7 Residual
CEU 95482 813533 208697 1405338
CHBJPT 90814 717922 161625 1413278
YRI 94682 428106 171014 2123772
14Affymetrix 500K Platform
HapMap Sample SNPs Typed r2gt.7 r2lt.7 Mdgt.7 Residual
CEU 419780 1393622 189643 553877
CHBJPT 400882 1289371 162067 567398
YRI 452247 968909 335162 1099764
15Illumina 317K Platform
HapMap Sample SNPs Typed r2gt.7 r2lt.7 Mdgt.7 Residual
CEU 306784 1549401 256098 443847
CHBJPT 278170 1313317 248813 578423
YRI 276142 784412 418378 1375858
16TUNA
- With TUNA, biased or inaccurate estimators of
reference (HapMap) frequencies affects only the
power, not the type I error - May be of particular value in samples, such as
the Mexican Americans, which may not be
well-represented in reference samples
17(No Transcript)
18(No Transcript)
19(No Transcript)
20Overview
- Nitty-gritty decisions on thresholds of all types
- Primary analyses beyond the SNPs genotyped on
your platform - Additional information to consider in choosing
SNPs - External (linkage signals, other validation)
- Internal (bioinformatics)
21External Information for Prioritizing SNPs
- Cases from some of the studies come from families
used in prior linkage mapping studies or linkage
mapping information available - Other GWAs on the same or related phenotypes can
add substantially to strategies for prioritizing
22MODY Genes and Pancreatic Beta-cell Function - A
Transcriptional Regulatory Network
Neurogenin3 NeuroD4
Insulin receptor Diabetes
HNF-6
HNF-3b
HNF-4g
Nkx2.2 Nkx6.1
Islet-brain-1 Diabetes
NeuroD1 MODY6
HNF-4a
MODY1
Islet-1 MODY?
IPF-1
HNF-1b
HNF-1a
MODY4 PNDM1 T2DM
Pax4 MODY?
MODY5
MODY3 T2DM
Glucokinase MODY2 PNDM2
Insulin Familial hyperinsulinemia/ hyperproinsulin
emia
Target genes Glycolysis (Aldolase B, L-
pyruvate kinase) Mitochondrial function
(UCP2) Genes that control b-cell replication
and apoptosis
GLUT2 Fanconi-Bickel syndrome
Amylin T2DM
23(No Transcript)
24Internal Information for Prioritizing SNPs
- Downstream bioinformatics analysis (pathway,
network theory, etc.) requires input of GENEs - Signals come as SNPs or CNVs
- NEED BETTER ANNOTATIONS FOR SNP gt GENE CONVERSION
25Better SNP and CNV Annotation
- Physical annotations (coding synonymous,
nonsynonymous, intronic, within 2 kb of gene,
between genes) - LD relationships to local genes
- Expression phenotype information
- Need measurement of how well each gene is
interrogated directly or indirectly by your
platform for downstream bioinformatics
26Functional Patterns
- genes associated at p10-3
Biological Processes (N18,484) T2D GWA 100K platform P-value
Cell adhesion 3.7 1.7 .0010
Neuronal Activities 4.1 2.0 .0014
Developmental Processes 10.7 7.7 .0177
Immunity and defense 1.2 4.8 .0002
Molecular Functions (N12,454) T2D GWA 100K platform P-value
Cell adhesion molecule 4.1 1.6 .001
Nucleic acid binding 11.7 8.3 .033
Proteases 2.5 1.2 .039
Defense/immunity protein 0.3 1.8 .041
Pathways (N3,730) T2D GWA 100K platform P-value
VEGF signaling 9.3 1.3 1.7x10-15
Endothelin signaling 5.6 1.7 4.2x10-4
p53 pathway feedback loops 2 3.7 1.6 0.04
27EACC
NJC1
TCW4
28EACC
NJC1
TCW4
29EACC
NJC1
TCW4
SNP Physical Class
rs2311458 NJC1 intron
30EACC
NJC1
TCW4
SNP Physical Class LD R2gt.9
rs2311458 NJC1 intron NJC1 EACC
31EACC
NJC1
TCW4
SNP Physical Class LD Global Local
rs2311458 NJC1 intron NJC1 EACC NAH1 10-9 MAH2 10-6 EACC 10-6 5/22 .04 NJC1 1/35 10-6 EACC 18/18 TCW4
32Obvious Gene Annotation NJC1
Alternative/Additional Gene Annotations EACC NAH1
MAH2
SNP
rs2311458
33SCAN SNP CNV ANnotation
34Colleagues and Collaborators
- University of Chicago
- Nancy Cox Lab Geoffrey Hayes, Anna Pluzhnikov,
Cheri Roe, Piper Below, Anuar Konkashbaev, Ying
Sun - Dept. of Medicine Graeme Bell
- Dept. of Human Genetics Mark Abney, Anna Di
Rienzo, Carole Ober, Jonathan Pritchard - Dept. of Statistics Dan Nicolae, Mary Sara
McPeek, Matthew Stephens - University of Texas Health Sciences Center
- Craig Hanis, Eric Boerwinkle
35(No Transcript)