Title: Cancer Genetic Markers of Susceptibility CGEMS:
1Selecting Initial GWAS and replication
studies David Hunter Harvard School of Public
Health Brigham and Womens Hospital Broad
Institute of MIT and Harvard
2Initial Study for GWAS
- Cases and controls well matched with respect to
ancestry to minimize population stratification - (restriction to one self-identified group)
- Genomic control or other methods
- e.g. Eigenstrat (Price et al, 2006), may
compensate for looser matching
3Control of population stratification e.g. hair
colorin Nurses Health Study (European ancestry)
Chi-squared inflation factors and Q-Q plots of
log10 p-values with no adjustment for
population stratification and adjusting for the
top four and fifty eigenvectors (Price et al,
2006) 45, 19 and 19 SNPs (respectively) with
pKraft P, unpublished
4Article Nature 447, 661-678 (7 June 2007)
doi10.1038/nature05911 Received 26 March 2007
Accepted 11 May 2007 Genome-wide association
study of 14,000 cases of seven common diseases
and 3,000 shared controls The Wellcome Trust Case
Control Consortium
5Conclusions
Broad matching on ancestry and region adequate
for discovery of strongest hits Statistical
methods for control of population stratification
(within populations of European ancestry)
adequate to assist in discovery of strongest
hits Will more rigorous designs permit discovery
of weaker associations? When signal-noise is
low, how does noise due to multiple comparisons
compare with noise due to poor matching of
controls? False negatives the biggest problem
(can deal with false ves via replication).
6Criteria for follow-up of initial reports of
genotypephenotype associations Replication
studies should be of sufficient sample size to
convincingly distinguish the proposed effect from
no effect Replication studies should preferably
be conducted in independent data sets, to avoid
the tendency to split one well-powered study into
two less conclusive ones The same or a very
similar phenotype should be analysed A similar
population should be studied, and notable
differences between the populations studied in
the initial and attempted replication studies
should be described Similar magnitude of effect
and significance should be demonstrated, in the
same direction, with the same SNP or a SNP in
perfect or very high linkage disequilibrium with
the prior SNP (r2 close to 1.0) Statistical
significance should first be obtained using the
genetic model reported in the initial study When
possible, a joint or combined analysis should
lead to a smaller P-value than that seen in the
initial report A strong rationale should be
provided for selecting SNPs to be replicated from
the initial study, including linkage-disequilibriu
m structure, putative functional data or
published literature Replication reports should
include the same level of detail for study design
and analysis plan as reported for the initial
study
Chanock, Maniolo et al. Nature, June 7th 2007
7Initial Study for GWAS technical issues
- Standard advice case and control samples
handled exactly the same at every stage - Source of DNA
- Blood/buffy coat mostly good results
- Buccal cell variable results (Feigelson et al.
CEBP, 2007 - encouraging) - Whole genome amplified DNA (Affy OK, Illumina in
development)
8Replication studies
- For statistical replication, prefer
- Similar phenotype
- Similar ancestry
- For generalizability, prefer
- Different populations
- Different ancestry backgrounds (may
- also help with fine mapping)
9Study design?
- Prospective
- Protect from survivor bias
- Protect from selection bias
- Interpretability of gene-environment analyses
- Possibility of interpretable biomarkers
10Study quality?
- Importance depends on strength of signal
- To date little apparent relation between
- probability of replication and quality
- May matter more for weak signals
- Sample size may trump quality
- (within limits)
11NCI BPC3 Results 7909 cases, 8683 controls
Rs1447295 Overall p, trend 4 x 10-19
Schumacher et al. Can Res, April 2007
12a, rs2981582 b, rs3803662 c, rs889312 d,
rs13281615 and e, rs3817198
FGFR2
Forest plots of the per-allele odds ratios for
each of the five SNPs reaching genome-wide
significance for breast cancer. Easton et al.
Nature, May 2007
13Cancer Genetic Markers of Susceptibility (CGEMS)
http//cgems.cancer.gov
14General Strategy for Multistage analysis of
Prostate Breast Cancer
Initial GWAS Study 1150 cases/1150 controls
540,000 Tag SNPs
Follow-up Study 1 4500 cases/ 4500 controls
28,000 SNPs
Follow-up Study 2 3500 cases/ 3500 controls
at least 1,500 SNPs
30 20 loci
Fine Mapping
15Committed Studies CGEMS
Breast Cancer NHS (GWAS) PLCO WHI Polish
C/C ACS EPIC MEC
Prostate Cancer PLCO (GWAS) ACS HPFS PHS ATBC CeRe
PP EPIC MEC
16CGEMS caBIG Posting Pre-Computed Analysis
Pre-computed Analysis No Restrictions Raw
Genotype Case/control Age (in 5 yrs)
Family Hx (/-) Registration
http//cgems.cancer.gov/data
17- Association Tests
- Prostate 10/06
- Breast 04/07
- 528,000 SNPs
- Illumina 550k
- Instant
- Replication!
http//cgems.cancer.gov
18Additional In silico replication possibilities
- dbGAP ncbi.nlm.nih.gov/dbgap
- Framingham nhlbi.nih.gov/about/framingham
- WTCCC wtccc.org.uk
- DGI broad.mit.edu/diabetes
19Chromosomes
1
2
3
4
5
6
7
8
p
q
p
q
-2
-3
-4
-5
22
21
20
19
18
17
16
15
14
13
12
11
10
9
X
p
q
p
q
-2
Log10(p-value)
-3
-4
FGFR2
-5
-6
20- The six SNPs with the smallest P values of the
528,173 tested among 1,145 cases of
postmenopausal invasive breast cancer and 1,141
controls (full results available at
http//cgems.cancer.gov ). - SNP ID ?2 P ORhet ORhomo Chromosome Gene
- rs10510126 25.37 0.0000031 0.59
0.62 10
- rs1219648 23.56 0.0000076 1.24
1.81 10 FGFR2 - rs17157903 23.39 0.0000083 1.60
0.79 7 RELN - rs2420946 23.17 0.0000095 1.25
1.81 10 FGFR2 - rs7696175 22.40 0.0000137 1.38
0.86 4
TLR1,TLR6 - rs12505080 21.99 0.0000168 1.21
0.52 4 - From analyses adjusting for age, matching
factors (see Methods), and three eigenvectors of
the principal components identified by
Eigenstrat. P value obtained by a score test
with 2df.
Hunter et al, Nat Gen, May 2007
21Scatterplot of P values for the FGFR2 locus from
the GWAS.
22Results of associations of rs1219648 in the
Nurses Health Study, Nurses Health Study 2, and
the PLCO study. Study Population Allele
Frequency ORhet ORhomo Ptrend (N cases/N
controls) Cases Controls (95 CI) (95
CI) () () Nurses Health Study
(1,145/1,141) 45.54 38.47 1.24 1.81 2.0 x
10-6 (1.04-1.50) (1.43-2.31) Nurses
Health Study 2 (302/594) 48.18 40.57 1.29 1.93
0.002 (0.95-1.75) (1.31-2.86) PLCO
(919/922) 44.50 41.49 1.06 1.22 0.13
(0.86-1.30) (0.94-1.58) ACS CPS-II
(555/556) 44.95 37.41 1.32 2.06 0.0002 (1.02
-1.72) (1.42-2.97) Pooled estimates
(2,921/3,213) 1.20 1.64 1.1 x
10-10 (1.07-1.34) (1.42-1.90)
23Results of associations of rs1219648 in the
Nurses Health Study, Nurses Health Study 2, and
the PLCO study. Study Population Allele
Frequency ORhet ORhomo Ptrend (N cases/N
controls) Cases Controls (95 CI) (95
CI) () () Nurses Health Study
(1,145/1,141) 45.54 38.47 1.24 1.81 2.0 x
10-6 (1.04-1.50) (1.43-2.31) Nurses
Health Study 2 (302/594) 48.18 40.57 1.29 1.93
0.002 (0.95-1.75) (1.31-2.86) PLCO
(919/922) 44.50 41.49 1.06 1.22 0.13
(0.86-1.30) (0.94-1.58) ACS CPS-II
(555/556) 44.95 37.41 1.32 2.06 0.0002 (1.02
-1.72) (1.42-2.97) Pooled estimates
(2,921/3,213) 1.20 1.64 1.1 x
10-10 (1.07-1.34) (1.42-1.90)
UNFINISHED AGENDA Where is the causal
variant? What does this tell us about mechanisms
of breast carcinogenesis?
24THE HITS KEEP COMING.
UNFINISHED EPIDEMIOLOGIC/PUBLIC HEALTH
AGENDA Gene-environment interaction, what do the
genes tell us about environmental
exposures? Gene-gene interaction Pathway
analysis Clinical implications risk
stratification for screening? Intervention? Healt
h policy implications? Much of the substrate
data publicly available or relatively cheap.
25NHS/HPFS/PHS GENETIC STUDIES
Immaculata De Vivo NHS/HPFS Peter Kraft
Sue Hankinson Hardeep Ranu Shelley
Tworoger Crystal Arnone Eric Rimm Carolyn
Guo Frank Hu Pati Soule Meir
Stampfer Craig Labadie Walt Willett Carolyn
Guo Frank Speizer Jiali Han Charles
Fuchs Monica Macgrath Ed Giovannucci Chunyan
He Andy Chan, Debra Patrick
Dennett Schaumberg David Cox Fran
Grodstein, Jae Tim Niu Hee Kang Aditi
Hazra PHS Jing Ma Fred Schumacher Mike
Gaziano, P Ridker
26 Harvard cohorts
ACS cohort
EPIC cohorts
Multiethnic Cohort
PLCO cohort
ATBC cohort
BROAD INSTITUTE
NCI Core Gen Facility
CEPH
NCI BPC3 STEERING COMMITTEE Harvard
David Hunter, Michael Gaziano,
Julie Buring, Graham Colditz, Walter
Willett EPIC,CEPH, Cambridge Elio Riboli, Rudolf
Kaaks, Federico Canzian, Gilles Thomas, ACS
Michael Thun,
Heather Feigelson, Jeanne Calle NCI
Richard Hayes, Demetrius
Albanes, Bob Hoover, Stephen Chanock Program -
Mukesh Verma MEC Broad Brian Henderson,
Laurence Kolonel, David Altshuler, Malcolm
Pike SECRETARIAT David Hunter, Elio Riboli
GENOMICS subgroup David Altshuler (Chair)
Steve Chanock Gilles Thomas
STATISTICS subgroup Dan Stram (Chair) Peter
Kraft Rudolf Kaaks Paul Pharoah Malcolm
Pike Gilles Thomas Shalom Wacholder
Genotyping subgroup Chris Haiman
(Chair) Federico Canzian Alison Dunning Steve
Chanock David Cox David Hunter Loic
LeMarchand James Mackay
PUBLICATIONS COMMITTEE Michael Thun (Chair) Elio
Riboli Brian Henderson David Hunter Graham
Colditz Richard Hayes Demetrius Albanes
27CGEMS Acknowledgements
HSPH David Hunter Peter Kraft Fred
Schumacher David Cox ACS Heather
Feigelson Carmen Rodriguez Eugenia Calle Michael
Thun PLCO Regina Ziegler Chris Berg Saundra
Buys Chris MacCarty
- NCI
- Stephen Chanock
- Gilles Thomas
- Robert Hoover
- Joseph Fraumeni
- Daniela Gerhard
- Kevin Jacobs
- Zhaoming Wang
- Meredith Yeager
- Robert Welch
- Richard Hayes
- Sholom Wacholder
- Nilanjan Chatterjee
- Kai Yu
- Margaret Tucker
- Marianne Rivera-Silva
- NCICB
28(No Transcript)
29Selecting initial and replication samples from
existing studies I. What studies of the same
phenotype exist? II. Can a consortium or
collaborative approach provide a study with
adequate power for the initial GWAS, along with
pre-planned replication studies? III. Do any
of these studies have pre-existing data that
would increase power e.g. free controls for a
prior GWAS of another phenotype? IV. Is the
phenotype defined in the same or similar
manner? V. Are covariate data available, and
defined similarly? VI. Do any of the studies
have additional phenotypic information e.g.
biomarkers that would create opportunities for
added value analyses, if these are the subjects
of the GWAS?