Title: Principles of Epidemiology Lecture 8 CaseControl Studies I
1Principles of EpidemiologyLecture 8Case-Control
Studies (I)
- Wei J. Chen, MD, ScD
- Institute of Epidemiology
- College of Public Health
- National Taiwan University
2Outlines
- I. Rationale
- I-1. Sampling from a fixed cohort
- I-2. Sampling from a dynamic cohort (density
sampling) - II. Source of controls
- III. Comparability
- IV. Variants of the case-control design
3I. Rationale
4Basic Elements of Case-Control Studies
- Natures
- Select cases and controls
- Assess their exposure experience retrospectively
- Estimates
- Odds
- Odds for exposure in cases Pr (E D) / Pr (E-
D) - Odds for exposure in controls Pr (E D-) / Pr
(E- D-) - Odds ratio
- OR Pr (E D) / Pr (E- D) / Pr (E
D-) / Pr (E- D-)
5Rationales
- Early
- Rare disease assumption
- Sampling from a fixed cohort
- Recent
- Sampling from a dynamic cohort
- Density sampling or sampling from person-time pool
6I-1. Sampling from a Fixed Cohort
- Cumulative incidence ratio (CIR)
- (X1/N1)/(X0/N0) (X1/X0)/(N1/N0)
- If X1 N1 and X0 N0,, then Z1? N1, Z0? N0
- CIR ? (X1/X0)/(Z1/Z0) , exposure odds among
cases/exposure odds among non-cases - Noncases sampled among the population of noncases
- F sampling fraction among noncases
- CIR ? (X1/X0)/(fZ1/fZ0) (X1/X0)/(Y1/Y0)
7Example of Controls Selected from a Fixed Cohort
- Breslow and Day (1980)
- A true cohort
- N 10,000, exposure rate30
- IR (exposed) 0.02 / year IR (unexposed) 0.01
/ year - For 3 years
- Exposed cases 3000 x (1 - e-IRD) 3000 x (1 -
e-0.06) 175 - Unexposed cases 7000 x (1 - e-0.03) 207
8(1) Analyzed as a Case-control Study
- Calculate OR instead of IRR
9(2) A Case-Control Study
- All patients were ascertained as cases
- Along with a 10 sample of controls
- Sampling fraction for cases and controls must be
the same regardless of exposure category
10I-2. Density Sampling in Control Selection
- Density case-control study design
- I1A1/T1, I0A0/T0
- Goal of case-control design
- Use a control series in place of complete
assessment of the T1 and T0 - Density sampling controls selected in such a way
that the relative sizes of the T1 and T0 can be
validly estimated - Nested within a source population
- A description of the source population correspond
to the ideal eligibility criteria for both cases
and controls to be in the study
11Pseudo-Rates and Odds Ratio
- Goal of control sampling
- The exposure distribution among controls is the
same as it is in the source population of cases - Control sampling rate
- B1/T1 B0/T0 r, if controls are selected
independently of exposure - Pseudo-rate
- (A1/B1) / (A0/B0) (A1/T1) / (A0/T0)
- Ratio of pseudo-rates is an estimate of the IR
ratio - Penalty
- precision
12Features of Density Sampling
- A clear definition of source population needed
- Sampling of controls and cases should be
independent of exposure - Main advantage
- Easy to see the equivalence of odds ratio to IR
ratio - No rare disease assumption needed
13A Hypothetical Scenario for Sampling from
Person-time Pool
- Select a date at random from the case accrual
period - Select a person at random from the population
list - Was the subject resident within the predetermined
area as of the random date chosen? - Repeat 1-3 until the desired number
- Asking exposure information reference point
- cases onset of illness
- control the random point in time
14Guidelines of Density Sampling in Control
Selection
- From the same population that give rise to cases
- Independent of exposure status
- Probability of selecting proportional to person
time - Risk-set sampling
- Eligible time for a control is the time when one
is eligible to become a case - Risk set the set of individuals in the source
population who are at risk of becoming a case at
the time that the case is diagnosed - Controls are matched to the case with respect to
sampling
15Special Situation for Control Selection
- An individual selected as control who later
develop the disease and is selected as a case - Counted both as a control and a case
- The same person may appear in the control group
two or more times - The same person at different times may provide
different exposure (or confounder) information
16Previous Guidelines on the Selection of Controls
- Schlesselman (1982)
- the control series is intended to provide an
estimate of the exposure rate that would be
expected to occur in the cases if there were no
association between the study disease and
exposure - Miettinen (1976)
- the controls should be selected in an unbiased
manner from those individuals who would have been
included in the case series, had they developed
the disease under study
17II. Source of Controls
18Source of Control Series
- Population controls
- Cases are a representative sample of all cases in
a precisely defined and identified population - Control
- A random sampling from registry
- Selecting probability is proportional to the
individuals person-time at risk - Neighborhood controls
- Controls are matched to the cases on neighborhood
- Neighborhood may be related to exposure should
be accounted for in the analysis - Random digital dialing
- Matched to cases on area code and prefix
19Source of Control Series (cont.)
- Hospital- or clinic-based controls
- The source of population is often not
identifiable - Control selection
- Limited the diagnoses for controls to those not
related to the exposure of interest - Other diseases
- In populations with established registries or
insurance-claims databases - Friend controls
- May be related in exposure
- List provision dependent on the cases
- Overlapping
- Dead controls
- Proxy respondents if cases are dead
20Methods for Obtaining Population-based Controls
- Random Digit Dialing (RDD)
- A two-stage sampling method to minimize the
chances of calling telephone numbers that are not
assigned to households (Waksberg, 1978) - Any household with kgt1 residential telephone
numbers was subsampled with probability 1/k - Screener question How many people living in
this household (including yourself) are X to Y
years old? - After enumberation, select a sample randomly
(Kishs sampling tables) - Area Probability Sampling (APS)
- Typically multi-stage
- Block groups
- Segments (one or more blocks)
- Listing of housing units
- Random sample of housing units
21- Steps for RDD
- Obtaining a list of all telephone area codes and
existing prefix numbers (first 6 digits) - Add all possible choices for the next two digits
the 8-digit numbers as Primary Sampling Units
(PSUs) - Randomly select an 8-digit number and also
randomly select the final 2 digits - Dial the number
- If a residential address, select more additional
2 digits until the desired number k conduct
interviews on k1 numbers - If not residential, reject the PSU
- Repeat steps 1-4 until the desired number of
PSUs, m, is reached - Total sample size m (k1), m and k are chosen to
satisfy criteria for an optimal sampling design
22Example of One Kish Selection Table
2312 Kish Selection Tables to be used in order
24III. Comparability
25Misconception about Control Selection
- Representativeness
- Wrong
- Of all person with diseases
- Of the entire nondiseased population
- Correct
- the source population for the cases is the one
that the controls should represent - Exposure opportunity
- Not needed, as in a real follow-up study
26Comparability of Information
- Comparable or nondifferential error in exposure
measurement tends to bias the observed odds ratio
toward the null - Not always true
- Unless exposure errors are also independent of
errors in other variables - Efforts to insure comparable exposure information
lead to comparable information on other variables
27Number of Control Groups
- The value of using more than one control group is
quite limited - A lack of difference between the groups only
tells us that both groups incorporate similar net
bias - A difference only tells us that at least one is
biased but does not tell us which is best or
which is worse
28Timing of Classification and Diagnosis
- For cases
- A lag period before diagnosis for exposure
assessment - For controls
- Selection time
- Natural event analogous to the case diagnosis
time, e.g., time of hospitalization for hospital
control - Actual time of selection
29IV. Variants of the Case-control Design
30Variants of the Case-control Design
- Case-cohort studies
- Nested case-control studies
- Cumulative (Epidemic) case-control studies
- Controls are selected from those who remain free
of disease at the end of epidemic - Case-only studies
- In studies of gene-environment interaction
- Case-crossover studies
- Analogue to classical crossover study for
interventions without carry-over effect - For each case
- Pre-disease time periods selected as control
period - Exposure at onset vs. exposure during control
period - Example sexual activity and myocardial infarction
31Variants of the Case-control Design (cont.)
- Two-stage sampling
- The control series comprises a large number of
individuals with a limited information (e.g.,
exposure status) - A subsample of the controls were investigated for
more detailed information (e.g., covariates) - Case-control studies with prevalent cases
- In studies
- Congenital malformations
- Chronic conditions with ill-defined onset times
and limited effect on mortality (e.g., obesity)
32Case-Cohort Study
- For a fixed cohort
- Cases all incident cases in a given risk period
- Controls a random sample from the population at
risk at the start of the risk period - Rationale
- Risk ratio (exposure odds in cases) / (exposure
odds in the total cohort at risk)
33Example of Case-Cohort Study
- An existing cohort
- Blood drawn on 10,000 individuals
- Control 400 sampled from original 10000
- Typing results 40 , 360 -
- Follow-up
- 200 with rheumatoid arthritis
- Typing results 80 , 120
- OR 80x360/40x120 6
- 150 with ankylosing spondylitis
- Typing results 15 , 135
- OR 1
34Example of Nested Case-control Studies
- Risk-set sampling (Sahl et al., 1993)
- Mortality from various cancers and exposure to
electromagnetic fields - Case cancer case from the worker cohort
- Control
- Individuals in the worker cohort who were alive
on the date of death of the case - Who had the same birth year, sex, and ethnicity
as the case - Randomly select 10 matching controls for each case
35Further Readings on Control Selection
- Potthoff RF (1994) Telephone sampling in
epidemiologic research to reap the benefits,
avoid the pittfalls. American Journal of
Epidemiology, 139, 967-978 - Reilly M (1996) Optimal sampling strategies for
two-stage studies. American Journal of
Epidemiology, 143, 92-100 - Brogan DJ et al. (2001) Comparison of Telephone
Sampling and Area Sampling Response Rates and
Within-Household Coverage. American Journal of
Epidemiology, 153, 1119-1127. - DiGaetano R Waksberg J (2002) Commentary
trade-offs in the development of a sample design
for case-control studies. American Journal of
Epidemiology, 155, 771-775