Title: Concepts of Causation
1Concepts of Causation
- Introduction to Epidemiology
- Fall 2002
2Epidemiologic Reasoning
- Derive inferences regarding possible causal
relationships - Determine whether these relationships are
spurious or true - Today we
- discuss causal relationships
- introduce threats to validity
- discuss play of chance (statistical association)
3Epidemiologic Study ofdisease etiology
- unplanned or natural experiments
- residents of Bhopal, India exposed to toxic
chemicals - residents of Hiroshima and Nagasaki exposed to
atomic bomb radiation in 1945
4Sequence of studies in humans
5Study Designs
- Descriptive Studies
- Population
- correlation or ecologic studies
- Individual
- Case Reports
- Case Series
- Cross Sectional Survey
6Study Designs
- Analytic Studies
- Observational Studies
- Case-Control or Case-Comparison
- Cohort Studies
- Intervention Studies
- Clinical Trials
7APPLICATIONS
- Study of Risk to Individuals
- Observational studies
- Case-control study design
- Cohort study design
- Clinical/Policy decision criteria of causality
- Intervention study design
8Causal Factor or Risk Factor
- A cause is a factor (or member of a set of
factors) which results in a sequence of events
that eventually result in an effect. - Causation is often unknown
- disease varies by category of risk factor
- risk factor precedes onset of disease
- observed association is not due to error
9Risk Factor or Cause
10Risk Factor or Cause
- Is current oral contraceptive use associated with
increased risk of myocardial infarction in
premenopausal female nurses?
11Risk Factor or Cause
- Does use or oral contraceptives increase the risk
of developing urinary tract infections among
women aged 16-49.
12Risk Factor or Cause
- Are women who have had one MI more likely to be
current OC users than women who have not had an
MI?
13Risk Factor or Cause
- Is CHD mortality (age-adjusted rates by state)
associated with per capita cigarette sales?
14(No Transcript)
15The epidemiologic triangle
Host Agent Environment
Host
VECTOR
Agent
Environment
16Henle-Koch Postulates
- First formulated by Henle and
- adapted by Robert Koch in 1877
- to described the discovery of the tubercule
bacillus.
17These postulates should be met
- before a causative relationship can be accepted
between a disease agent and the disease in
question - The agent must be shown to be present in every
case of the disease by isolation in a pure
culture. - The agent must not be found in cases of other
disease. - Once isolated, the agent must be capable of
reproducing the disease in experimental animals. - The agent must be recovered from the experimental
disease produced
18Difficult Issues / Inferring
- How do we prove anything
- Henle-Koch postulates used to prove causation of
microorganisms in pathogenesis of an infectious
disease - Agent must be present in every case
- How to apply to cardiovascular diseaseoverweight,
physically inactive, smoking, high blood
pressure, elevated cholesterol - Agent should occur in no other disease (one
agent-one disease model)what about cigarette
smoking?
19Difficult Issues / Inferring
- Exposure of healthy subjects to suspected
agentsethical? - must rely on epidemiologic evidence from
observational studies - even agent, host, environmental model not
sufficient
20Henle-Koch Postulates
- Useful in infectious diseases
- Not useful with complicated chronic diseases
- Heart disease
- Diabetes
- Cancer
- Violence
21Bradford Hill Criteria (1965)
- criteria for assessing causality
- ?Strength ?Consistency
- ?Specificity ?Temporality
- ?Plausibility ?Coherence
- ?Dose Response ?Analogy
- ?Experimental evidence
22Bradford Hill Criteria
- Hill stated
- None of my criteria can bring indisputable
evidence for or against the cause-and-effect
hypothesis - None can be required as sufficient alone
23DETERMINATION OF CAUSATION
- One way of determining causation is personal
experience by directly observing a sequence of
events.
24Personal Experience / Insufficient
- Long latency period between exposure and disease
- Common exposure to the risk factor
- Small risk from common or uncommon exposure
- Rare disease
- Common disease
- Multiple causes of disease
25OBSERVATIONAL STUDIES
- The general QUESTION
- Is there a cause and effect relationship between
the presence of factor X and the development of
disease Y?
26OBSERVATIONAL STUDIES
- The answer is made by inference and relies on a
- summary of all valid evidence.
- Temporal sequence
- Strength of the association
- Dose-response
- Replication of findings (consistency)
- Biologic credibility
- Consideration of alternate explanations
- Cessation of exposure (dynamics)
- Specificity
27SMOKING AND LUNG CANCER
- 1. Strength of Association
- The relative risks for the association of smoking
and lung cancer are in the order of - 2. Biologic Credibility
- The burning of tobacco produces carcinogenic
compounds which are inhaled and come into contact
with pulmonary tissue.
28SMOKING AND LUNG CANCER
- 3. Replication of findings
- The association of cigarette smoke and lung
cancer is found in both sexes in all races, in
all socioeconomic classes, etc. - 4. Temporal Sequence
- Cohort studies clearly demonstrate that smoking
precedes lung cancer and that lung cancer does
not cause an individual to become a cigarette
smoker.
29SMOKING AND LUNG CANCER
- 5. Dose-Response
- The more cigarette smoke an individual inhales,
over a life-time, the greater the risk of
developing lung cancer. - 6. Dynamics (cessation of exposure)
- Reduction in cigarette smoking reduces the risk
of developing lung cancer.
30- Smoking is cited as a cause of lung cancer,
however. . . - . . . smoking is not necessary (is not a
prerequisite) to get lung cancer. Some people
get lung cancer who have never smoked. - . . . smoking alone does not cause lung cancer.
Some smokers never get lung cancer. - Smoking is a member of a set of factors (i.e.,
web of causation) which cause lung cancer. - The identity of all the other factors in the set
are unknown. (One factor in the web of causation
is probably genetic susceptibility.)
31Causation
- The world is richer in associations than
meanings, and it is part of wisdom to
differentiate the two. - John Barth
- All scientific work is incomplete whether it
be observational or experimental. All scientific
work is liable to be upset or modified by
advancing knowledge. That does not confer upon
us the freedom to ignore the knowledge we already
have, or to postpone the action it appears to
demand at a given time. Bradford Hill (1965)
32Nature of Evidence
- 1. Temporal Sequence
- exposure precede disease
- 2. Strength of Association
- significant high risk
- 3. Dose-Response
- higher dose exposure, higher risk
33Nature of Evidence
- 4. Replication of Findings
- consistent in populations
- 5. Biologic Credibility
- exposure linked to pathogenesis
- Consideration of alternative explanations
- the extent to which other explanations have been
considered.
34Nature of Evidence
- 7. Cessation of exposure (Dynamics)
- removal of exposure reduces risk
- 8. Specificity
- specific exposure is associated with only one
disease
35necessary / sufficient
Disease Not Present
Disease Present
Disease Present
A is necessary since it appears in each
sufficient causal complex A is not sufficient
36necessary / sufficient
- necessary and sufficient
- the factor always causes disease and disease is
never present without the factor - most infectious diseases
- necessary but not sufficient
- multiple factors are required
- cancer
- sufficient but not necessary
- many factors may cause same disease
- leukemia
- neither sufficient nor necessary
- multiple cause
37necessary / sufficient
- Component cause
- any one of a set of conditions which are
necessary for the completion of a sufficient
cause (piece of pie) - Necessary cause
- a component cause that is a member of every
sufficient cause
38necessary / sufficient
- Few causes are necessary and sufficient
- HPV is necessary for cervical cancer but not
sufficient because not every woman infected with
HPV develops cervical cancer.
39necessary / sufficient
- Few causes are necessary and sufficient
- High cholesterol is neither necessary nor
sufficient for CVD because many individuals who
develop CVD do not have high serum cholesterol
levels
40H. pylori
- Temporal relationship
- 11 of chronic gastritis patients go on the
develop duodenal ulcers over a 10-year period. - Strength
- H. pylori is found in at least 90 of patients
with duodenal ulcer - Dose response
- density of H.pylori is higher in patients with
duodenal ulcer than in patients without - Consistency
- association has been replicated in other studies
41H. pylori
- Biologic plausibility
- originally no biologic plausibility
- then H. pylori binding sties were found
- know H. pylori induces inflammation
- Specificity
- prevalence of H. pylori in patients with duodenal
ulcers is 90 to 100 - Consistency with other knowledge
- prevalence is the same in males and females
42Establishing Causality
- trials
- randomized, double-blind, placebo-controlled with
sufficient power and appropriate analysis - cohort studies case-control studies
- hypothesis specified prior to analysis
- case-series
- no comparison groups
43Accuracy and Sources of Error
- Purpose of epidemiologic study
- To estimate the effect of an exposure on an
outcome - Main objective
- To measure the exposure and outcome accurately
- That is, to measure without error
44Evaluate Validity
- Absence of systematic errors
- Findings represent the study sample
- Findings are generalizable to larger populations
- Internal validity is the primary objective
- Without internal validity
- there is no reason to generalize
45Threats to validity
- Internal validity
- do these results represent what is really
happening in the study population. - are the results due to
- Bias
- Confounding
- Chance
46Evaluation of findings
- Bias
- Confounding
- Play of Chance
- Frequency Measures
- Prevalence
- Incidence
- Measures of Association
- Causal Inference
47BIAS
- systematic errors in collecting or interpreting
data such that there is deviation of results or
inferences from the truth. - selection bias noncomparable criteria used to
enroll participants. - information bias noncomparable information is
obtained due to interviewer bias or due to recall
bias
48BIAS
- Bias has to do with research design
- Bias results from systematic flaws in
- study design
- data collection
- analysis
- interpretation
49BIAS
- Two major types to consider
- selection bias non-comparable
- criteria used to enroll participants
- information bias non-comparable
- information obtained due to
- interviewer or recall bias
50Confounding
- a mixing of effects
- between the exposure, the disease, and other
factors associated with both the exposure and the
disease - such that the effects the effects of the two
processes are not separated.
51Confounding
- Confounding results when the effect of an
exposure on the disease (or outcome) is distorted
because of the association of exposure with other
factor(s) that influence the outcome under study.
52Confounding Biomedical Bestiary Michael,
Boyce Wilcox, Little Brown. 1984
Observed association, presumed causation
Gambling
Cancer
Smoking, Alcohol, other Factors
Unobserved association
True association
53Play of Chance
- Measures of Association
- Statistical Issues
54Statistical Issues
- The evaluation of the role of chance is done in 2
steps - Estimate the magnitude of the association.
- Hypothesis testing
- Calculate a test statistic,
- obtain a p value or confidence interval
55Statistical Issues
- We have to remember that epidemiologic studies
draw inferences about the experiences of an
entire population based on an evaluation of only
a sample.
56Magnitude of Association
- Epidemiologist tend to view cause and effect as
binary variables - Either you are exposed (or diseased)
- Are you arent exposed (or diseased)
- How we measure these variables can have a
profound influence on our results
57Statistical Issues
- What do we mean by chance and how does this
relate to determining a true association - Where do we start?
58Magnitude of Association
- THE 2 x 2 table
- Disease on top exposure on the left
59Measures of Disease Association
60COHORT STUDY
- Disease Occurrence Among Exposed
- Incidence (Ie) a / (ab)
- Disease Occurrence Among non-Exposed
- Incidence (Io) c / (cd)
61COHORT STUDY
- Disease Occurrence Among Exposure and Non-Exposure
62Magnitude of Association
- Risk Ratio
- Relative Risk Ie / Io
- Risk Difference excess risk of disease among
exposures - Attributable Risk Ie Io
- Attributable Risk (Ie Io) / Ie 100
63Case-Control Study
- Odds of Exposed vs Non-Exposed Among Disease and
Non-Disease Cases
64Statistical Issues
- Evaluating chance
- This can be done by calculating a test statistic
of the general format
65Statistical Issues Primarily Sampling Issues
- p-value the probability of obtaining a sample
showing an association of the observed size or
larger by chance alone under the hypothesis that
no association exists. - Confidence interval a range of values that one
can say, with a specific degree of confidence,
contains the true population value.
66Statistical Issues
- The p value indicates the possibility that
findings at least as extreme as those observed
were unlikely to have occurred by chance alone.
67Statistical Issues
- A statistically significant finding does not mean
that the results DID NOT occur by chance - only that it is unlikely that the findings did
occur by chance. - A non-significant finding does not mean that
there is not association - only that it is highly unlikely that there is an
association.
68Statistical Issues
- More often in epidemiology we are examining
discrete data - the 2 x 2 table presents discrete data.
- Here we are testing whether the distribution of
counts in the 4 cells is different than expected
under the null hypothesis.
69Statistical Issues
- All tests of statistical significance lead to a
- probability statement
- which is usually expressed as a p value
70Statistical Issues
- But how do we determine the expected value for
the cells of a - 2 x 2 table?
71Statistical Issues
- A probability of 0.05 is the level set for
statistical significance - this is the usual and arbitrary cutt-off
- If p lt0.05, we conclude that chance is an
unlikely explanation for the finding - these results would occur by change only very
rarely - The null hypothesis is rejected, and the
statistical association is said to be significant
72Statistical Issues
- If p gt0.05, we conclude that chance cannot be
excluded as an explanation for the finding - and we fail to reject the null hypothesis.
- That is
- these results are likely to occur by chance more
than 5 of the time.
73Statistical Issues
- No p value
- however small - completely excludes chance
- No p value
- however large - completely mandates chance
74Statistical Issues
- p values only evaluate the role of chance
- they say nothing about other alternative
explanations or about causality - p values reflect the strength of the association
and the study sample size
75Statistical Issues
- A small difference may achieve statistical
significance if the sample size is large - A large difference may not achieve statistical
significance if the sample size is too small
76Statistical Issues
- We address these problems by calculating
confidence intervals - The confidence interval (CI) gives all the
information of a p value - PLUS the expected range of effect sizes
77Statistical Issues
- Confidence Interval indicates the range within
which the true magnitude of effect lies with a
certain degree of assurance. - The degree of assurance is defined by the p value
you assign.
78Statistical Issues
- If the null value is included in a 95 confidence
interval - then the corresponding p value is, by definition,
greater than 0.05. - What do I mean???
79Statistical Issues
- If the null value is not included, the
association is considered to be statistically
significant. - WHAT IS THE NULL VALUE???
80Statistical Issues
- HOWEVER Before we get to the major result - we
need to examine several issues - 1. What was the question that this study
intended to answer? - 2. What were the methods used to answer this
question? - 3. Are there errors in the study design that
might invalidate the results?
81Statistical Issues
- Is chance a likely explanation for the results?
- Is selection bias a likely explanation for the
results? - Is information bias a likely explanation for the
results? - Are the authors conclusions reasonable in terms
of the information presented?
82Statistical Issues
- Test Based CI for either OR or RR
- NOTE variance for either RR or OR may be
estimated using the chi-square test statistic.
Miettinen, Am J Epidemiol 103226-235, 1976
83Statistical Issues
- Taylor Series to estimate the lnOR variance
Woolf, Ann Human Gen 19251-253, 1955
Note e is a function on you calculator. You
need a key marked ex and you enter the OR times e
raised to the power of the results between the
brackets .
84Statistical Issues
- Taylor Series to estimate the lnRR variance
Katz, Biometrics, 34469, 1973
Note e is a function on you calculator. You
need a key marked ex and you enter the OR times e
raised to the power of the results between the
brackets .