Selecting a Subset of Markers to Carry Forward

1 / 35
About This Presentation
Title:

Selecting a Subset of Markers to Carry Forward

Description:

Testing UNtyped Alleles (TUNA) or imputation approaches may eliminate need for ... Set low threshold and TUNA 'type' every SNP in the vicinity of a signal to ... TUNA ... –

Number of Views:40
Avg rating:3.0/5.0
Slides: 36
Provided by: hay748
Category:

less

Transcript and Presenter's Notes

Title: Selecting a Subset of Markers to Carry Forward


1
Selecting a Subset of Markers to Carry Forward
Follow-up to GWAs
Nancy J. Cox, Ph.D. The University of Chicago
2
Overview
  • Nitty-gritty decisions on thresholds of all types
  • Primary analyses beyond the SNPs genotyped on
    your platform
  • Additional information to consider in choosing
    SNPs
  • External (linkage signals, other validation)
  • Internal (bioinformatics)

3
Nitty-Gritty Decisions on Thresholds of All Types
  • Set thresholds prior to analysis, or use quality
    flags in prioritizing SNPs for follow up?
  • Minor allele frequencies?
  • HWE controls? HWE cases (for departures not
    consistent with a genetic model)?

4
Thresholds Before or Flags After Analysis?
  • Using quality flags has the advantage of
    preserving FLS (funny-looking signals) that may
    be real, and/or CNVs
  • Under-appreciated challenge of this approach is
    confusion in sharing data (what the top 100,
    1000, etc signals are depends on how you weight
    the flags)

5
Follow Up Limited by Cost? Time? Samples? Variant
type?
  • Testing UNtyped Alleles (TUNA) or imputation
    approaches may eliminate need for follow up of
    additional SNPs within original samples, but
    CNV follow up may be different
  • Number of SNPs for follow up samples may be
    predetermined, or significance or FDR threshold
    may be predetermined, or decided afterward

6
Overview
  • Nitty-gritty decisions on thresholds of all types
  • Primary analyses beyond the SNPs genotyped on
    your platform
  • Additional information to consider in choosing
    SNPs
  • External (linkage signals, other validation)
  • Internal (bioinformatics)

7
TUNA Nicolae
  • Key ingredients are multi-locus measure of LD
    and reference sample (HapMap)

8
Haplotype T-A1-A2-A3 Frequency
H1 0 - 0 - 0 - 0 0.058
H2 1 - 0 - 1 - 0 0.350

H3 1 - 1 - 0 - 1 0.575

H4 1 - 0 - 0 - 1 0.017
9
Advantages Disadvantages
  • Can utilize existing information on LD and
    haplotypes in HapMap data
  • No arbitrary block definitions
  • 1 df test for each known variant with specified
    uniqueness
  • In silico comparisons require biological
    validation
  • Cannot capture all of the information from
    variation not yet known

10
TUNA
  • For high-density screens, can be used for in
    silico follow-up
  • Set low threshold and TUNA type every SNP in
    the vicinity of a signal to decide which to
    actually type
  • Can convert lower density screens to higher
    density
  • Really useful in enabling comparisons of studies
    across disparate platforms

11
TUNA
  • For each untyped SNP, determine if that SNP
    provides sufficiently unique information to
    interrogate (r2lt0.7 to a typed SNP)
  • For each of those SNPs identify, using all
    combinations of 2-, 3- and 4-locus haplotypes
    comprised of platform SNPs within 100Kb on each
    side, the smallest subset able to interrogate
    genotypes with sufficient accuracy multi-locus
    R2 value gt threshold (0.7 to 1.0)

12
TUNA
  • Primary template need be derived only once for
    each high-throughput SNP set and each HapMap
    release (but may choose to optimize for the set
    of SNPs passing QC for any given study)
  • Deriving template takes a few hours on our local
    cluster conducting full analyses, given template
    takes less than 1 hour

13
Affymetrix 100K Set
HapMap Sample SNPs Typed r2gt.7 r2lt.7 Mdgt.7 Residual
CEU 95482 813533 208697 1405338
CHBJPT 90814 717922 161625 1413278
YRI 94682 428106 171014 2123772
14
Affymetrix 500K Platform
HapMap Sample SNPs Typed r2gt.7 r2lt.7 Mdgt.7 Residual
CEU 419780 1393622 189643 553877
CHBJPT 400882 1289371 162067 567398
YRI 452247 968909 335162 1099764
15
Illumina 317K Platform
HapMap Sample SNPs Typed r2gt.7 r2lt.7 Mdgt.7 Residual
CEU 306784 1549401 256098 443847
CHBJPT 278170 1313317 248813 578423
YRI 276142 784412 418378 1375858
16
TUNA
  • With TUNA, biased or inaccurate estimators of
    reference (HapMap) frequencies affects only the
    power, not the type I error
  • May be of particular value in samples, such as
    the Mexican Americans, which may not be
    well-represented in reference samples

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Overview
  • Nitty-gritty decisions on thresholds of all types
  • Primary analyses beyond the SNPs genotyped on
    your platform
  • Additional information to consider in choosing
    SNPs
  • External (linkage signals, other validation)
  • Internal (bioinformatics)

21
External Information for Prioritizing SNPs
  • Cases from some of the studies come from families
    used in prior linkage mapping studies or linkage
    mapping information available
  • Other GWAs on the same or related phenotypes can
    add substantially to strategies for prioritizing

22
MODY Genes and Pancreatic Beta-cell Function - A
Transcriptional Regulatory Network
Neurogenin3 NeuroD4
Insulin receptor Diabetes
HNF-6
HNF-3b
HNF-4g
Nkx2.2 Nkx6.1
Islet-brain-1 Diabetes
NeuroD1 MODY6
HNF-4a
MODY1
Islet-1 MODY?
IPF-1
HNF-1b
HNF-1a
MODY4 PNDM1 T2DM
Pax4 MODY?
MODY5
MODY3 T2DM
Glucokinase MODY2 PNDM2
Insulin Familial hyperinsulinemia/ hyperproinsulin
emia
Target genes Glycolysis (Aldolase B, L-
pyruvate kinase) Mitochondrial function
(UCP2) Genes that control b-cell replication
and apoptosis
GLUT2 Fanconi-Bickel syndrome
Amylin T2DM
23
(No Transcript)
24
Internal Information for Prioritizing SNPs
  • Downstream bioinformatics analysis (pathway,
    network theory, etc.) requires input of GENEs
  • Signals come as SNPs or CNVs
  • NEED BETTER ANNOTATIONS FOR SNP gt GENE CONVERSION

25
Better SNP and CNV Annotation
  • Physical annotations (coding synonymous,
    nonsynonymous, intronic, within 2 kb of gene,
    between genes)
  • LD relationships to local genes
  • Expression phenotype information
  • Need measurement of how well each gene is
    interrogated directly or indirectly by your
    platform for downstream bioinformatics

26
Functional Patterns
  • genes associated at p10-3

Biological Processes (N18,484) T2D GWA 100K platform P-value
Cell adhesion 3.7 1.7 .0010
Neuronal Activities 4.1 2.0 .0014
Developmental Processes 10.7 7.7 .0177
Immunity and defense 1.2 4.8 .0002

Molecular Functions (N12,454) T2D GWA 100K platform P-value
Cell adhesion molecule 4.1 1.6 .001
Nucleic acid binding 11.7 8.3 .033
Proteases 2.5 1.2 .039
Defense/immunity protein 0.3 1.8 .041

Pathways (N3,730) T2D GWA 100K platform P-value
VEGF signaling 9.3 1.3 1.7x10-15
Endothelin signaling 5.6 1.7 4.2x10-4
p53 pathway feedback loops 2 3.7 1.6 0.04
27
EACC
NJC1
TCW4
28
EACC
NJC1
TCW4
29
EACC
NJC1
TCW4
SNP Physical Class
rs2311458 NJC1 intron
30
EACC
NJC1
TCW4
SNP Physical Class LD R2gt.9
rs2311458 NJC1 intron NJC1 EACC
31
EACC
NJC1
TCW4
SNP Physical Class LD Global Local
rs2311458 NJC1 intron NJC1 EACC NAH1 10-9 MAH2 10-6 EACC 10-6 5/22 .04 NJC1 1/35 10-6 EACC 18/18 TCW4
32
Obvious Gene Annotation NJC1
Alternative/Additional Gene Annotations EACC NAH1
MAH2
SNP
rs2311458
33
SCAN SNP CNV ANnotation
34
Colleagues and Collaborators
  • University of Chicago
  • Nancy Cox Lab Geoffrey Hayes, Anna Pluzhnikov,
    Cheri Roe, Piper Below, Anuar Konkashbaev, Ying
    Sun
  • Dept. of Medicine Graeme Bell
  • Dept. of Human Genetics Mark Abney, Anna Di
    Rienzo, Carole Ober, Jonathan Pritchard
  • Dept. of Statistics Dan Nicolae, Mary Sara
    McPeek, Matthew Stephens
  • University of Texas Health Sciences Center
  • Craig Hanis, Eric Boerwinkle

35
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com