Title: CaseControl and Cohort Studies
1Case-Control and Cohort Studies
Robert Heimer Yale University School of Public
Health April 2009
2Overview for Today
- For Cohort
- Definition and description
- Dynamic, fixed
- Prospective, retrospective
- Example -- Nurses Health Study and breast cancer
- Conduct
- Assemble the cohort
- Classify exposed/non-exposed
- Follow over time for outcome
- Analysis
- Strengths limitations
- For Case-Control
- Definitions
- Cases
- Definition, selection
- Controls
- Definition, purpose, selection
- Sampling
- Types
- Analysis
- Strengths limitations
- Examples
3Main types of epidemiologic studies
Review slide
Epidemiologic research study designs
Experimental studies
Observational studies
Cohort
Cross-sectional
Ecological
Case-control
Some examples and special types
- Individual-level RCT
- Community randomized trial
- Quasi-experimental
- Retrospective
- Prospective
4Case-Control A Definition
- Examines exposure-disease relationship by
enrolling cases (with disease) and controls
(without disease) and comparing exposure history - Backward design flash back
- Two groups are enrolled (cases and non-cases) and
compared with respect to past exposure
5Hypothetical Example
- Research question Pesticide exposure increases
the risk of bladder cancer. - Methods Consider a prospective cohort study, in
which you enrolled 89,949 individuals aged 34-59
and followed the cohort for 8 years - Outcome 1,439 bladder cancer cases identified
over 8 years of follow-up - Exposure Blood drawn and frozen at beginning of
study can be analyzed for level of pesticides
6A Practical Problem
- Quantifying pesticide levels in the blood is
expensive - it's not practical to analyze all 89,949 blood
samples - To be efficient, analyze a select number
- all cases (N1,439)
- just take a sample of the cohort participants who
did not get bladder cancer - For example, two times as many cases (N2,878)
7Case-Control Data
Bladder Cancer
Pesticide
We have just identified cases of disease from a
defined population, and then taken a sample of
that population (at-risk population) for
comparison. Exposure histories are determined for
each group. This is an example of a
case-control study that is nested in a cohort.
8A Refined Definition of Case-Control Study
- A method of sampling a population in which cases
of disease are identified and enrolled, and a
sample of the population that produced the cases
is identified and enrolled exposures are
determined for individuals in each group. - If done properly, case-control is a method of
sampling a population so that controls reflect
the source population that gave rise to cases.
9Cases
- Need a clear case definition that leads to
anaccurate classification of disease - Disease definitions can be based on
- Signs, symptoms
- Clinical exam
- Laboratory test results
- May come from registries, hospitals/clinics,
surveillance reports, etc. - Cases may be enrolled going forward in time
- Incident cases preferred to prevalent cases since
these are a better measure of risk because not
confounded by duration - Cases may be sampled if you think youll have too
many, but this is not usually the situation
10Cases in Doll Hill 1950
- Twenty London hospitals were asked to refer all
patients admitted with carcinoma of the lung. - Identified by admitting clerk, house physician,
cancer registrar, radiotherapy department - most diagnoses by necropsy, biopsy, or
exploratory operation and some were diagnosed by
other criteria - Diagnoses confirmed by hospital diagnosis on
discharge - As a general rule this was the final diagnosis.
- Some diagnoses were changed after discharge.
- Patients were excluded from case population if
- Subsequent checking revealed primary carcinoma
was another site (e.g. breast, colon) - Histologic exam revealed growth was not carcinoma
- Found not to be malignant disease at all
11Controls
- A sample of the source population (study base)
that gave rise to the cases - Purpose is to estimate exposure distribution in
the source population that produced the cases - Controls provide a fast and inexpensive
(efficient) means of obtaining the exposure
experience in the source population - Controls are selected
- Without outcome of interest
- Independent of exposure status
- The would criterion
- If a member of the control group actually had the
disease under study WOULD the person end up as a
case in your study?
12Selecting Controls
- Sources
- General population
- Hospital, clinic
- Other
- Sampling
- Risk-set sampling
- Survivor sampling
- Case-base sampling
13General Population Versus Hospital Controls
- General Population
- Often cases are selected from a defined
geographic population that gave rise to the cases - Can use residence lists, drivers license
records, voter lists, etc. - Advantages
- High likelihood controls are from same study base
as cases - Disadvantages
- Need an enumerated list
- Time consuming, expensive
- Typically have high refusals
- Hospital Controls
- Use patients with diseases that have no relation
to the exposures under study - Advantages
- May have similar selection factors to cases
- Identifiable and accessible
- May be more willing to participate than general
population controls - Some potential disadvantages
- Referral patterns may not be the same, for ex If
a hospital has a world-famous cancer center - Controls are sick, so exposure patterns may not
reflect study base
14Other Types of Controls
- Friends
- Spouses
- Siblings/twins
- (Deceased individuals)
- The goal with each of these types of controls is
- Measure exposure in the source population
- Minimize differences between cases and controls
15Matching
- You pair each case with someone who is like the
case but who does not have the disease or outcome
under study. - You can do FREQUENCY MATCHING, which means
picking people from the general groups from which
cases come, so that the overall make up of the
two groups is similar. - You can connect each case to more than one person
who resembles that case (R1 MATCHING) - Requires a different analytical approach, most
common are - Matched OR
- McNemar chi-square
- Conditional logistic regression
16Doll and Hill, 1950
- Required to make similar inquires of a group of
non-cancer control patients - For each lung cancer patient, interviewers were
instructed to interview a patient of same sex,
within the same five-year age group, and in the
same hospital at or about the same time - Could not always find a suitable control
- 743 general medical and surgical patients
- Some differences with regard to place of
residence - Higher proportion of cases from outside London
17Data Collection Exposure Assessment
- Once cases and controls are identified and
enrolled, collect information of exposure of
interest and other variables - Ideally, use same data collection techniques for
both cases and controls
18Analysis of Case-Control Data Overview
- Because controls are a sample of the population
that produced the cases, you do not know the size
of the total population - Therefore, you cannot get a prevalence,
cumulative incidence or incidence rate of
disease. - Do not have the appropriate denominator for these
calculations. - Instead, we compute odds and odds ratios that we
use for estimation of relative risk in special
circumstances.
19Doll and Hill, 1950 Table IV
- Problem with usual interpretation of this table
- 1298 is not an at-risk population, it is the
study population - 1269 and 29 do not tell us about exposure is the
source population - 647/1269 and 2/29 do not tell us about risk of
outcome (lung cancer) among exposed and unexposed
(smokers and non-smokers) in source population
20Need a Different Analytical Approach
- Use Odds Ratio
- Delete marginals from the table because they do
not have a lot of meaning for understanding risk. - Odds probability(event)/probability(non-event)
- Compares the frequency of occurrence of something
to the frequency of non-occurrence. - Calculation of exposure odds ratio (EOR)
- Odds of exposure vs. no exposure in diseased
persons a/c - Odds of exposure vs. no exposure in non-diseased
persons b/d - Odds ratio of exposure for cases compared to
controls (a/c) / (b/d) ad/bc
21Expsoure Versus Disease Odds Ratio
- Suppose we want to determine the odds of disease
rather than the odds of exposure - The odds of disease in the exposed a/b
- The odds of disease in the unexposed c/d
- The odds ratio of disease for exposed compared to
unexposed (a/b) / (c/d) ad/bc - This is the same as the exposure odds ratio
22OR Approximates RR
- when disease is rare
- Proportion of cases in exposed and unexposed
groups is low in total (source) population - ab b and cd d
- RR a/(ab)/c/(cd) a/b / c/d ad/bc
- if disease is not rare
- depends on sampling When case-base or risk-set
sampling is used for control selection - when cases are newly diagnosed and prevalent
cases are excluded from control group, and
selection of cases and controls is not based on
exposure status
23Doll and Hill, 1950 Table IV
- OR (64727) / (2622) 14.0
- Technically The odds of smoking among lung
cancer cases is 14 times higher than the odds of
smoking among non-lung cancer cases. - Loosely People who smoke have a 14-fold
increased risk of lung cancer compared to people
who do not smoke.
24Advantages of Case-Control Method
- Useful when exposure data are expensive or
difficult to obtain - Pesticide and bladder cancer example
- Useful when little is known about disease
- Vaginal cancer in women
- Useful when disease has long induction or latent
period - Lung cancer
- Many special cases
- Outbreaks, vaccine effectiveness, etc.
- A major advantage is efficiency (time and money)
25Disadvantages
- Retrospective nature makes them prone to many
biases - Information bias
- Recall bias, if cases and controls report
exposures differently because of their
case/control status - Selection bias
- If selection of control group is not
representative of source population that gave
rise to cases with respect to exposure
26Rare Vaginal Cancer in Young WomenHerbst et al.
NEJM 1971284(15)878-881
- Background
- Initial clinical observation of 7 women, ages
15-22, with adenocarcinoma of the vagina - Never before seen at that hospital
- Study design
- We then decided to conduct a case-control,
retrospective study that would compare in detail
these patients and their families with an
appropriate control group to uncover factors that
might be associated with the sudden appearance of
these tumors. - Cases
- 8 women with diagnosed with clear-cell or
endometrial type adenocarcinoma of the vagina
between 1966 and 1969 at Boston hospitals
27Controls and Exposure Assessment
- Controls
- 4 matched controls per case
- Using persons born at the same hospital as the
case and within 5 days and on the same type of
service (ward or private) - Exposure assessment
- Reproductive and other factors
28Major factor Ingestion of estrogen
(Diethylstilbesterol or DES) during first
trimester of pregnancy by 7 of 8 mothers of
affected women and by 0 of 32 mothers of control
women (plt.001) (OR?) Conclusion DES is a risk
factor for subsequent adenocarcinoma of the
vagina in offspring. Implications It is
unwise to administer estrogen to women early in
pregnancy. Abnormal bleeding in adolescent
women should be examined for vaginal tumors.
29Summary Points on Case-Control
- Case-control studies are useful in many
situations for epidemiologic research because of
their efficiency - Selection of control population is often
challenging - With appropriate methodologies and mindfulness
toward common biases, they can produce valid and
important results.
30Cohort Studies
31Definition of a Cohort
- Cohort
- A group of persons followed over time
- Cohort study
- A study in which two or more groups of people
that are free of disease and that differ
according to exposure level(s) are followed over
time and compared with respect to disease
incidence to assess the association between
exposure and disease - Also called prospective, follow-up,
longitudinal studies - May be considered a natural experiment
- People are exposed to substances and risk
behaviors all the time, either on purpose or not
these can be studied as exposures - Interventions may be considered a subset of
cohort studies
32Some Characteristics of Cohort Studies
- May be open (dynamic), fixed, or closed
- Dynamic members can enter and leave during
follow-up time - Residents of Kazan
- Fixed membership is fixed (permanent), but
members can exit the cohort - People present in lower Manhattan 9/11/2001
- Women who have given birth
- Closed members cannot enter after start of
study and nobody is lost to follow-up defined
start and end time - Attendees of church supper
- Everybody has same follow-up time
- May be prospective or retrospective
- Depends on temporal relationship between
initiation of study and occurrence of disease - Calendar time vs. follow-up time
33Dynamic, Fixed, and Closed Cohort
- Closed cohort is like a fixed cohort, but all
members have the same exposure time
34Timing of Cohort Studies
- Prospective
- Exposure has occurred but disease has not
occurred at start of study - Exposure---------------?Disease
- Study starts here (Calendar time and follow-up
time are concurrent)
- Retrospective
- Both exposure and disease have occurred at start
of study - Exposure----------------? Disease
-
Study starts here - Study starts here (Calendar time and follow-up
time NOT concurrent)
35Comparison
- Retrospective
- Cheaper, faster
- Efficient with diseases with long latent period
- Exposure data and other information may be
limited or missing - Prospective
- More expensive, time consuming
- Not efficient for diseases with long latent
periods - Better exposure and confounder data (planned)
- Enhanced follow-up
- Less vulnerable to some bias
- How to choose?
- Necessity (logistics time, money)
- Research question (science available data)
36Design Overview
- Identify and assemble a group of individuals
without disease of interest - Classify with respect to exposure status at start
of study - Monitor subsequent development of disease in
exposed and non-exposed subjects over time - Analysis
37Example Nurses Health Study
- Background and Purpose
- Based at Harvard Medical School and School of
Public Health - Originally conceived as a study to examine the
association between oral contraceptive use
(widespread 1960s and 1970s) and breast cancer - In part, due to conflicting previous studies and
concerns of limitations of case-control approach
for this association - Cohort
- Enrolled 120,000 married female nurses age 30-55
registered in one of 11 states in 1976 - Identified by American Nursing Association and
state boards of nursing - Initial baseline mail survey collected
information about demographic, reproductive,
medical, and life-style variables
38Assembling a Prospective Cohort
- Exclude those with disease or not at risk
- Depending on research question and feasibility
- General cohorts and special cohorts
- Internal and external comparison groups
39General Cohorts
- Select a group of individuals from general
population - Geographically defined areas
- Well-identified groups (Ex NHS)
- Others
- Not chosen for exposure status but rather for
feasibility and logistical reasons that make the
study possible - Nurses Health Study
- Believed that nurses could be interested in
participating in a health study - Believed that nurses could answer survey
questions correctly - Often useful for common exposures
- NHS -- OC use 42 past users, 6 current users
- Often used for multiple exposures and multiple
outcomes
40Special Cohorts
- Groups with a particular health status or other
special characteristic - For example repeated x-rays, live near toxic
waste dump site, present at event such as
Chernobyl - Useful for occupational settings
- Often have unusual exposures
- Danish workers exposed to trichloroethylene
- Often useful for rare exposures
- Allows accrual of sufficient exposed individuals
because of targeted recruitment
41Comparison Groups (Exposure)
- Principle You want the comparison (unexposed)
group to be as similar as possible to the exposed
group with respect to all other factors except
the exposure. If the exposure has no effect on
disease occurrence, then the rate of disease in
the exposed and comparison groups will be the
same. - Counterfactual ideal The ideal comparison group
consists of exactly the same individuals in the
exposed group had they not been exposed. Since it
is impossible for the same person to be exposed
and unexposed simultaneously, epidemiologists
must select different sets of people who are as
similar as possible.
42Classifying Exposure
- Define and measure
- Need a way to handle exposures that may change
over time - NHS and OC use
- Current and past (ever), never Current defined
as past 2 years - Total duration, duration prior to first
pregnancy, duration prior age 25 - May want to consider multiple levels of exposure
43Internal Comparison Group
- Unexposed members of the same cohort
- Single cohort in which individuals are classified
into exposure categories - NHS nurses were enrolled, surveyed about risks
and classified as exposed or unexposed - Usually preferred because of higher likelihood of
similarity between exposed and unexposed - Selected into cohort in same way
- Because you are a nurse, because you live in a
city - Measurement and follow-up of disease done in the
same way
44External Comparison Group
- A different cohort, from another similar
population, that is not exposed - Different cohort for example, same type of work
at different organization - General population
- Useful when a special exposure group is used
and/or when entire cohort is exposed - Inclusion in the cohort meant exposed so had to
go elsewhere for comparison
45Following the Cohort
- A major challenge, a major expense, a major
potential threat to validity - Method
- Passive or active (NHS)
- Length of time
- Depends on outcome and sample size
- For chronic diseases, follow-up will need to be
years or decades - Data collection
- For exposure (updated and new) and outcome
(multiple) information - Pre-existing records (medical, employment),
questionnaires, physical exams, medical tests,
laboratory assays, external sources (e.g. cancer
registries, vital statistics such as the US
National Death Index)
46Important to Minimize Losses to Follow-up
- For reasons of sample size and bias
- Depends on population, duration, etc.
- Methods
- Collecting sufficient locating information
- Maintaining regular contact with participants
- Using multiple methods (phone, mail, internet,
postal office databases, disease registries,
vital statistics, physicians) - Make participation worthwhile for participants
47Nurses Health Study
- Follow-up
- Every 2 years complete another mail survey that
collects information about development of
outcomes, updated exposures, new exposures - Biologic specimens also collected
- Self-reported outcomes confirmed by medial record
reviews, pathology reports, review of National
Death Index - Promoting follow-up
- Follow-up surveys included a newsletter
- 2-4 follow-up mailings 5th mailing was an
abbreviated survey - Added a telephone follow-up
- Use of certified mail
- Follow-up through state boards of nursing
- gt90 follow-up at each cycle
48Other Advantage to Cohorts
- Nurses Health Study has continued and expanded to
include examination of associations between - Exposures diet, physical activity, obesity,
post-menopausal hormone use, reproductive
factors, smoking, alcohol, coffee, hair dyes - Outcomes cancer, CVD, diabetes, mortality,
osteoporosis to name just a few - gt800 publications to date
49Analysis
- Basic analysis involves calculation and
comparison of incidence of disease among exposed
and unexposed - Depending on available data, you can calculate
cumulative incidence or incidence rates and
corresponding ratios
50OC use and breast cancer in NHSRomieu et al.
JNCI 1989811313-1321.
- Background
- Conflicting evidence re OC use and breast cancer
- OC use is widespread, therefore possible large
impact on public health - Cohort
- 118,273 women who did not report a diagnosis of
cancer (other than non-melanoma skin cancer) on
1976 questionnaire - Exposure
- Self-reported OC use on each questionnaire (every
2 years) - Classified as current, past, never time since
first use, time since last use, duration of use,
use before first pregnancy
51(No Transcript)
52Conclusions from NHS on Risk of Breast Cancer
following Oral Contraception Use
- Overall past use not associated with breast
cancer (RR1.06) - Slight increased risk for current users (RR1.56)
- Number of women who used for long duration early
in reproductive life was too small for meaningful
analysis
53Advantages of Cohort Studies
- Known temporal association exposure ? outcome
- Preferred for causal inference
- Can evaluate multiple outcomes for given
exposure(s) - NHS examined relationship between OC use and
breast cancer, ovarian cancer, malignant melanoma
and myocardial infarction - Less prone to certain types of bias
- Recall bias outcome does not influence recall of
past exposures (pre-classified) - Selection bias disease status does not influence
selection of subjects with respect to exposure - Can estimate risk and incidence (person-time)
54Disadvantages of Cohort Studies
- Expensive
- Time-consuming
- Loss to follow-up bias
- If those lost are different in ways related to
exposure and outcome - Inefficient for rare diseases
- unless AR is high
- retrospective cohort can then established
- Inefficient for diseases with long induction or
latent period (unless retrospective cohort)
55Summary
- Cohort studies are generally considered the
strongest of the observational designs - They are often the most expensive and time
consuming - Vulnerable to different set of limitations than
ecological, cross-sectional, and case-control
studies
56??????? ?? ????????