Title: EPB PHC 6000 EPIDEMIOLOGY FALL, 1997
1Unit 8 Cohort Studies
2Unit 8 Learning Objectives Considering the
prospective cohort study 1. Understand
strengths and limitations of this study
design. 2. Understand approaches to selecting an
exposed population. 3. Understand approaches to
selecting a comparison group(s). 4. Recognize
primary sources of exposure and outcome
information.
3Unit 8 Learning Objectives Considering the
prospective cohort study 5. Recognize
contributions of major studies conducted in the
United States. --- Framingham Heart
Study --- Nurses Health Study 6. Understand
primary sources of bias. 7. Understand the
purpose and methods for conducting sensitivity
analyses.
4Unit 9 Learning Objectives 8. Understand design
features and strengths and limitations of
retrospective cohort studies. 9. Differentiate
between incidence risk and rate, and risk ratio
and rate ratio. 10. Calculate person time for
time-dependent exposures. 11. Understand
factors that influence accurate classification of
person-time exposure. 12. Understand the concept
and components of the empirical induction
period. 13. Understand the concept of
non-exposed person-time among exposed
subjects.
5Axiom Since most epidemiologic research
is observational by nature, epidemiologic studie
s typically obtain imprecise answers, but to the
right health-related questions that cannot be
evaluated using experimental study designs.
6Prospective Cohort Study
7Review Prospective Cohort Study
Prospective cohort (follow-up) study Disease
free individuals are selected and their exposure
status is ascertained. Subjects are followed
for a period of time to record and compare the
incidence of disease between exposed and
non-exposed individuals (e.g. risk ratio or rate
ratio).
8Review Prospective Cohort Study
Prospective cohort (follow-up) study
Exposure Disease
?
?
Exposure may or may not have occurred at study
entry Outcome definitely has not occurred at
study entry
9Prospective Cohort Studies (Also called
longitudinal studies)
10Design Features
Strengths Can elucidate temporal relationship
between exposure and disease (hence, strongest
observational design for establishing cause and
effect). Minimizes bias in the ascertainment of
exposure (e.g. recall bias). Particularly
efficient for study of rare exposures.
11Design Features
Strengths (cont.) Can examine multiple effects
of single exposure. Can yield information on
multiple exposures. Allows direct measurement
of incidence of disease in exposed and
non-exposed groups (hence, calculation of
relative risk).
12Design Features
Limitations Not efficient for the study of
rare diseases. Can be very costly and time
consuming. Often requires a large sample
size. Losses to follow-up can affect validity
of results. Changes over time in diagnostic
methods may lead to biased
results.
13Design Features
Selection of the Exposed Population The exposed
population should relate to the hypothesis For
common exposures (e.g. smoking, coffee drinking)
and relatively common chronic diseases, the
general population/geographically-defined areas
are good choices. For rare exposures, special
cohorts are more desirable (e.g. particular
occupations or environmental factors in specific
geographic locations).
14Design Features
Selection of the Exposed Population Although
cohort studies are not optimal for evaluation of
rare diseases, certain outcomes may be
sufficiently common in special exposure
cohorts to yield an adequate number of cases.
To enhance validity, some exposed populations are
selected for their ability to facilitate complete
and accurate information (e.g. doctors, nurses,
entire companies, etc.).
15Design Features
Selection of the Comparison Group The groups
being compared should be as similar as possible
on all factors that relate to disease other than
the exposure under investigation (e.g. to reduce
the potential for confounding). Ability to
collect adequate information from the
non-exposed group is essential.
16Design Features
Internal Comparison Group Members of a single
general cohort are classified into exposed and
non-exposed categories. Most often used for
common exposures. The non-exposed group becomes
the comparison group. Must be careful of other
potential differences between the exposed and
non-exposed groups.
17Design Features
General Population Comparison Group The
general population will probably include some
exposed persons. Due to the healthy worker
effect, the general population may be expected
to experience higher mortality than most
occupational cohorts. Comparisons with
population rates are possible only for outcomes
for which population rates are available.
18Design Features
Special Exposure Comparison Group Another
cohort with demographic characteristics similar
to the exposed group, but considered non-exposed
to the factor of interest is selected (e.g.
another occupational group). Note To enhance
validity, it may be important to have multiple
comparison groups.
19Design Features
Sources of Exposure Information Pre-existing
Records Advantages --- Inexpensive --- Relati
vely easy to work with --- Usually unbiased
since the data were collected for non-study
purposes
20Design Features
Sources of Exposure Information Pre-existing
Records Disadvantages --- Exposure
information may not be
precise enough to address the
research question. --- Records frequently do not
contain data on potential
confounding factors.
21Design Features
Sources of Exposure Information Self Report
(interviews, surveys, etc.) Advantages --- Oppo
rtunity to question subjects on
as many factors as necessary. --- Good for
collecting information on exposures not
routinely recorded.
22Design Features
Sources of Exposure Information Self Report
(interviews, surveys, etc.) Disadvantages --- S
ubject to response bias (e.g. due to stigma,
response expectations, etc.). --- Subject to
interviewer bias. --- Subjects may be
sufficiently unaware of
their exposure status (e.g.
chemical exposure).
23Design Features
Sources of Exposure Information Direct
Measurement If obtained in a comparable manner,
can provide objective and unbiased exposure
ascertainment (e.g. blood pressure, serum
samples, environmental measurements,
etc.). --- Can be used on a fraction of the
cohort to validate other types
of exposure ascertainment.
24Design Features
Sources of Exposure Information Repeated
Measurements -- If frequency of exposure changes
over follow-up, repeated measurements allows
for revision of exposure classification. ---
Periodic questioning of cohort members allows
for newly identified exposures of interest to
be measured. --- Good for transient exposures.
25Design Features
Types of Exposure Measurements Dichotomous
(e.g. presence of HLA type) Intensity (e.g.
mean blood pressure level) Duration (e.g.
weeks of chronic stress) Cumulative (e.g.
pack-years of smoking) Regularity (e.g.
frequency of episodic anger) Variability (e.g.
range of cardiovascular reactivity)
26Design Features
Sources of Outcome Information Death
certificates (National Death Index)
for some causes, notoriously unreliable
Clinical history Self-reports Medical
examination (periodic
re- examination of the cohort)
Hospital discharge logs
27Design Features
Outcome Information Procedures for identifying
outcomes must be equally applied to all exposed
and non-exposed individuals. Goal is to obtain
complete, comparable, and unbiased information
on the health experience of each study subject.
Combinations of various sources of outcome data
may be necessary.
28Prospective Cohort Study
Examples Framingham Heart Study Nurses
Health Study
29Prospective Cohort Study
Framingham Heart Study Framingham, MA (1948)
? 5,000 of the 30,000 town residents ages 30 to
59 years of age without established coronary
disease participated. Exposures include
smoking, obesity, elevated blood pressure, high
cholesterol, physical activity, and others.
Outcomes include development of coronary heart
disease, stroke, gout, and others.
30Prospective Cohort Study
Framingham Heart Study Outcome events were
identified by examining the study population
every 2 years, and by daily surveillance of
hospitalizations in the only hospital in
Framingham, MA. Participants followed for more
than 30 years. Study has made fundamental
contributions to our understanding of the
epidemiology of cardiovascular disease.
31Prospective Cohort Study
Framingham Heart Study More than 200 published
reports. Unfortunately, Framingham, MA is
almost exclusively Caucasian.
32Prospective Cohort Study
Nurses Health Study In 1976, gt 120,000 married
female nurses ages 30 to 55 in one of 11 U.S.
states participated. At 2-year intervals,
follow-up questionnaires were completed on
development of outcomes and exposure
information. Exposures include use of oral
contraceptives, post-menopausal hormones, hair
dyes, dietary fat consumption, age at first
birth, and others.
33Prospective Cohort Study
Nurses Health Study Outcomes include heart
disease, various types of cancer, and others.
Many new exposures have been added to the
biennial questionnaires (e.g. electric blanket
use, selenium levels, etc.).
34Prospective Cohort Study
Follow-up Issues Major challenge is to
collect follow-up data on every study
subject. Loss to follow-up is a major source
of bias and is related to --- Length
of follow-up --- Monitoring methods used in the
study Multiple sources of information can be
used to obtain complete follow-up
information.
35Prospective Cohort Study
Sources of Error (Bias) Loss to Follow-up If
large (e.g. gt 30), validity of study results
may be severely compromised. Probability of
loss to follow-up may be related to exposure,
disease, or both this may lead to a biased
exposure/disease estimate. Can use
sensitivity analysis to estimate potential
effect of subjects lost to follow-up.
36Prospective Cohort Study
Sensitivity Analysis General Definition
Substitution of a value or range of values to
evaluate the robustness of study findings, while
taking into account the potential impact of study
limitations. For example, how might the final
outcome of the analysis change when taking into
account loss to follow-up?
37Prospective Cohort Study
Sensitivity Analysis (Example) Prospective
cohort study of lumber mill occupation and low
back pain. 1,000 subjects recruited --- 518
exposed (lumber mill workers) --- 482
non-exposed (other workers) 100 of 1,000 lost to
follow-up --- 60 exposed, 40 non-exposed
38Sensitivity Analysis
IncidenceE 54/458 0.118 IncidenceE- 44/442
0.100 RR 0.118 / 0.100 1.18 95, C.I.
(0.81, 1.72)
Possible Scenarios from loss to
follow-up Scenario 1 (Extreme) All 60 exposed
lost to follow-up experienced low back pain,
whereas the rate in the 40 non-exposed lost to
follow-up was same as those with complete
follow-up.
39Sensitivity Analysis
Scenario 1
Actual
IncidenceE 54/458 0.118 IncidenceE- 44/442
0.100 RR 0.118 / 0.100 1.18 95, C.I.
(0.81, 1.72)
IncidenceE 114/518 0.220 IncidenceE-
48/482 0.100 RR 0.220 / 0.100 2.21 95,
C.I. (1.61, 3.03)
40Sensitivity Analysis
Possible Scenarios from loss to
follow-up Scenario 2 (Possible) The incidence
of the 60 exposed lost to follow-up is twice the
rate of the incidence of the 40 non-exposed lost
to follow-up. The incidence of the 40 non-exposed
lost to follow-up is the same as the incidence
of the 442 non-exposed in the study.
41Sensitivity Analysis
Actual
Scenario 2
IncidenceE 54/458 0.118 IncidenceE- 44/442
0.100 RR 0.118 / 0.100 1.18 95, C.I.
(0.81, 1.72)
IncidenceE 66/518 0.127 IncidenceE- 48/482
0.100 RR 0.127 / 0.100 1.28 95, C.I.
(0.90, 1.82)
42Sensitivity Analysis
Possible Scenarios from loss to
follow-up Scenario 3 (Possible) The incidence
of the 60 exposed lost to follow-up is half the
rate of the incidence of the 40 non-exposed lost
to follow-up. The incidence of the 40 non-exposed
lost to follow-up is the same as the incidence of
the 442 non-exposed in the study.
43Sensitivity Analysis
Actual
Scenario 3
IncidenceE 57/518 0.110 IncidenceE- 48/482
0.100 RR 0.127 / 0.100 1.11 95, C.I.
(0.77, 1.59)
IncidenceE 54/458 0.118 IncidenceE- 44/442
0.100 RR 0.118 / 0.100 1.18 95, C.I.
(0.81, 1.72)
44Sensitivity Analysis
Actual
Scenario 1
RR 1.18 95, C.I. (0.81, 1.72)
RR 2.21 95, C.I. (1.61, 3.03)
Scenario 2
Scenario 3
RR 1.28 95, C.I. (0.90, 1.82)
RR 1.11 95, C.I. (0.77, 1.59)
With 10 loss to follow-up, the observed risk
ratio estimate of 1.18 appears to be robust with
regard to possible (but not extreme) impact of
loss to follow-up (e.g. Scenarios 2 and 3).
45Sensitivity Analysis
Note Even if loss to follow-up is low (e.g.
10), if the incidence is very low in the
observed study population (e.g. lt 5), yet
relatively high in those lost to follow-up (e.g.
gt 15), the observed point estimate may be
severely biased.. e.g. because of loss to
follow-up, you missed all of the action (where
the cases occurred).
46Prospective Cohort Study
Sources of Error (Bias) Misclassification of
Exposure and/or Outcome Random
(non-differential) misclassification Non-random
(differential) misclassification Can use
sensitivity analysis to estimate potential
effect of postulated degree(s) of
misclassification.
47Prospective Cohort Study
Non-Participation Participants often differ
from non-participants in important ways. A
valid result will not be affected by
non- participation, although generalizability may
be affected. True exposure/disease
relationship will be biased if non-participation
is related to both the exposure and other risk
factors for the outcome under study.
48Review of Recommended ReadingCRP, LDL, and First
CVD Events
--- Prospective cohort study within an randomized
trial of 27,939 apparently healthy American women
(1992-95) in the Womens Health Study (WHS). ---
WHS is an ongoing evaluation of aspirin and
vitamin E for primary prevention of CVD events
among women gt45 yrs. --- Before randomization,
blood samples collected and stored with assays
performed for CRP and LDL. --- First CVD event
defined as non-fatal MI, non-fatal ischemic
stroke, coronary revascularization, and death
from cardiovascular causes. --- Participants
followed for average of 8 years. --- Analyses
conducted separately by HRT status.
49Discussion Question 1
Interpret results from figure 1 and table
2. Among CRP and LDL cholesterol at
baseline, which variable seems to best
predict the risk of cardiovascular disease over 8
years of follow-up?
Source NEJM 2002 3471557-1565.
50Discussion Question 2
Interpret the results from table 3. For risk
estimates associated with CRP, is there evidence
of effect measure modification by hormone
replacement therapy status? What about the risk
estimates for LDL?
Discussion Question 3
Interpret the results from figure 3 and 4. Do
baseline levels of CRP and LDL cholesterol
independently predict subsequent cardiovascular
risk, or do they simply measure a common (shared)
domain of risk?