Title: UK Biobank and biobank harmonization
1UK Biobank andbiobank harmonization
- Paul Burton
- Dept of Health Sciences
- Dept of Genetics
- University of Leicester
2Structure of talk
- What is UK Biobank?
- Scientific rationale?
- Statistical power of nested case-control studies
- Expected event rates in UK Biobank
- Biobank Harmonization
- Conclusions
3What is UK Biobank?
4Basic design features
- A prospective cohort study
- 500,000 adults across UK
- Middle aged (40-69 years)
- A population-based biobank
- Not disease or exposure based
- Recruitment via electronic GP lists
- Broad spectrum not fully representative
- Individuals not families
- MRC, Wellcome Trust, DH, Scottish Executive
- 61M
5Basic design features
- Longitudinal health tracking
- Nested case-control studies
- Long time-horizon
- Owned by the Nation
- Central Administration Manchester
- PI Prof Rory Collins - Oxford
- 6 collaborating groups of university scientists
6The Fosse Way
UK BIOBANK FOSSE WAY REGIONAL COLLABORATION
CENTRE Local Collection Centres B Birmingham E
Exeter L Leicester N Nottingham P
Plymouth S Sheffield T Truro W Warwick
S
N
L
B
W
E
P
T
7Scientific Rationale
8Justification for UK Biobank
- Primary justifications
- Roles that can best be fulfilled by a new large
cohort study of the type represented by UK
Biobank - Secondary justifications
- Roles that could be provided by other types of
study, but given that UK Biobank is to go ahead
anyway these additional roles can be taken on at
relatively low marginal cost
9A platform for research in biomedical science
- Studies of the joint effects of genes and
environment/life-style - Genotype-based studies
- The genetics of disease progression
- Direct association of genes with disease
- Universal controls
- Family-based studies
10Statistical powerand sample size
11Issues that are often ignored in standard power
calculations
- Multiple testing/low prior probability of
association - Interactions
- Unobserved frailty
- Misclassification
- Genotype
- Environmental determinant
- Case-control status
- Subgroup analyses
- Population substructure
12Power calculations
- Work with least powerful setting
- Binary disease, binary genotype, binary
environmental exposure - Logistic regression interactions departure
from a multiplicative model - Complexity
13Summarise power using MDORs calculated by
iterative simulation
- Estimate minimum ORs detectable with 80 power at
stated level of statistical significance under
specified scenario
14Whole genome scan
- Genetic main effect, plt10-7
15Summary
- 80 power for genotype frequency 0.1, (allele
frequency ? 0.05 under dominant model) - Genetic main effect ? 1.5, p10-4 ? 5,000 cases
- Genetic main effect ? 1.3, p10-4 ? 10,000 cases
- Genetic main effect ? 1.2, p10-4 ? 20,000 cases
- Genetic main effect ? 1.4, p10-7 ? 10,000 cases
- Genetic main effect ? 1.3, p10-7 ? 20,000 cases
- (allele frequency 0.1 ?
10,000 cases) - GE interaction with environmental exposure
- prevalance 0.2 ? 2.0, p10-4 ? 20,000
cases
16Expected event ratesin UK Biobank
17Taking account of
- Age range at recruitment 40-69 years
- Recruitment over 5 years
- All cause mortality
- Disease incidence (healthy cohort effect)
- Migration overseas
- Comprehensive withdrawal (max 1/500 p.a.)
- Partial withdrawal (c.f. 1958 Birth Cohort)
18No need to contact subjects
19Smaller sample sizes
20Conclusions
- Having taken account of realistic bioclinical
complexity, UK Biobank is just about large enough
to be of great value as a stand-alone research
infrastructure - Its value will be greatly augmented if it proves
possible to set up a coherent and scientifically
harmonized international network of Biobanks and
large cohort studies
21Harmonizing biobanks internationally
22Why harmonize?
- Investigate less common (but not rare) conditions
- UKBB Ca stomach 2,500 cases in 29 years
- 6 UKBB equivalents ? 10,000 cases in 20 years
- Investigate smaller ORs
- GME 1.5 ? 1.2 requires 5,000 ? 20,000
- 4 UKBB equivalents
- Analysis based on subsets homogeneous classes
of phenotype, or e.g. by sex
23Why harmonize?
- Earlier analyses
- UKBB Alzheimers disease, 10,000 cases in 18 yrs
- 5 UKBB equivalents ? 9 years
- Events at younger ages
- Broad range of environmental exposures
- Aim for 4-6 UKBB equivalents
- 2M 3M recruits
24(No Transcript)
25Harmonization initiatives
- Population Biobanks
- FP6 Co-ordination Action
- Camilla Stoltenberg, Paul Burton, Leena Peltonen,
George Davey Smith .. - GenomeEUhealth
- Proposed FP6 Integrated Project
- Leena Peltonen .
- Public Population Program in Genomics (P3G)
- Canada Europe
- Tom Hudson, Bartha Knoppers ..
26Extra slides
27Genetic main effects
28Geneenvironment interaction
29Rarer genotypes
30Necessary to contact subjects
31(No Transcript)
32Summarise power using MDORs calculated by
iterative simulation
- Want minimum ORs detectable with 80 power at
stated level of statistical significance - 1. Guess starting values for ORs
- 2. Simulate population under specified scenario
- 3. Sample required number of cases and controls
- 4. Analyse resultant case-control study in
standard way - 5. Repeat 2,3,4 1,000 times
- 6. Use empirical statistical power results from
the 1,000 analyses to update ORs to new values
expected to generate a power of 80 - Repeat 2-6 till all ORs have 80 power
33Proposed assessment visit model
34Hattersley AT, McCarthy MI. A question of
standards what makes a good genetic association
study? Lancet 2005 in press.