Title: Causal Diagrams for Epidemiological Research
1Causal Diagrams for Epidemiological Research
Eyal Shahar, MD, MPH Professor Division of
Epidemiology Biostatistics Mel and Enid
Zuckerman College of Public Health The University
of Arizona
2What is it and why does it matter?
- A tool (method) that
- clarifies our wordy or vague causal thoughts
about the research topic - helps us to decide which covariates should enter
the statistical modeland which should not - unifies our understanding of confounding bias,
selection bias, and information bias
3What is the key question in a non-randomized
study?
- When estimating the effect of E (exposure) on
D (disease), what should we adjust for? -
- or
- Confounder selection strategy
4Adjusting for ConfoundersCommon Practice
- The change-in-estimate method
- List potential confounders
- Adjust for (condition on) potential confounders
- Compare adjusted estimate to crude estimate
- (or fully adjusted to partially adjusted)
- Decide whether potential confounders were real
confounders - Decide how much confounding existed
- Premise The data informs us about confounding.
Are we asking too much from the data?
5Adjusting for ConfoundersCommon Practice
- What is a potential confounder?
- Typically, a cause of the disease that is
associated with - the exposure
Confounder
E
D
- What is the effect of a confounder?
- Contributes to the crude (observed, marginal)
association - between E and D
6Adjusting for ConfoundersCommon Practice
- Extension to multiple confounders
C1
C3
C2
E
E
D
E
D
D
C4
C6
C5
E
E
D
E
D
D
7Adjusting for ConfoundersCommon PracticeProblems
- A sequence of isolated, independent, causal
diagrams - but C1, C2, C3, C4, C5,.. might be connected
causally - Unidirectional arrow a causal direction
- but what is the meaning of the bidirectional
arrow? - Even with a single confounder, the
change-in-estimate method could fail
8Adjusting for ConfoundersProblems
- An example where the change-in-estimate method
fails
U1
U2
C
E
D
- The crude estimate may be closer to the truth
than the C-adjusted estimate - To be explained
9AlternativeA Causal Diagram
- A method for selecting covariates
- Extension of the confounder triangle
- Premises displayed in the diagram
- New terms
- Path
- Collider on a path
- Confounding path
10Selected references
- Pearl J. Causality models, reasoning, and
inference. 2000. Cambridge University Press - Greenland S et al. Causal diagrams for
epidemiologic research. Epidemiology
19991037-48 - Robins JM. Data, design, and background knowledge
in etiologic inference. Epidemiology
200111313-320 - Hernan MA et al. A structural approach to
selection bias. Epidemiology 200415615-625 - Shahar E. Causal diagrams for encoding and
evaluation of information bias. J Eval Clin Pract
(forthcoming)
11A Causal Diagram Notation and Terms
- An arrowcausal direction between two variables
E
D
- An arrow could abbreviate both direct and
indirect effects
U1
E
E
D
D
could summarize
U2
U3
12A Causal Diagram Notation and Terms
- A path between E and D any sequence of causal
arrows that connects E to D
E
D
E
U1
U2
D
E
U1
U2
D
E
U1
U2
D
13A Causal Diagram Notation and Terms
- Circularity (self-causation) does not exist
Directed Acyclic Graph
E
U1
D
U2
- A collider on the path between E and D
E
U1
U2
D
14A Causal Diagram Notation and Terms
- A confounding path for the effect of E on D Any
path between E and D that meets the following
criteria - The arrow next to E points to E
- There are no colliders on the path
C
U1
V1
U2
V2
U3
E
D
In short a path showing a common cause of E and D
15- The paths below are NOT confounding paths for the
effect of E on D
C
U1
V1
U2
C
V2
U3
U1
V1
E
D
U2
C
V2
U3
U1
V1
E
D
U2
V2
U3
E
D
16What can affect the association between E and
D?(Why do we observe an association between two
variables?)
- Causal path E causes D
- Causal path D causes E
- Confounding paths
- Adjustment for colliders on a path from E to D
E
D
D
E
C
E
D
Later
17Why does a confounding path affect the crude
(marginal) association between E and D?
- Intuitively
- Association being able to guess the value of
one variable (D) from the value of another (E) - E?D allows us to guess D from E (and E from D)
- A confounding path allows for sequential guesses
along the path
C
U1
V1
U2
V2
U3
E
D
18How can we block a confounding path between E
and D?
- Condition on a variable on the path (on any
variable) - Methods for conditioning
- Restriction
- Stratification
- Regression
C
U1
V1
U2
V2
U3
E
D
19A point to remember
- We dont need to adjust for confounders (the top
of the triangle.) Adjustment for any U or V
below will do. - U and V are surrogates for the confounder C
C
U1
V1
U2
V2
U3
E
D
20Example
- If the diagram below corresponds to reality, then
we have several options for conditioning - For example
- On C and U2
- Only on U2
- Only on U3
C
U1
V1
U2
V2
U3
E
D
21What can affect the association between E and D?
- Causal path E causes D
- Causal path D causes E
- Confounding paths
- Adjustment for colliders on a path from E to D
E
D
D
E
C
E
D
NOW!
22Conditioning on a ColliderA Trap
- A collider may be viewed as the opposite of a
confounder - Collider and confounder are symmetrical entities,
like matter and anti-matter
C
U1
V1
U2
V2
U3
E
D
23Conditioning on a ColliderA Trap
- A path from E to D that contains a collider is
NOT a confounding path. There is no transfer of
guesses across a collider. - A path from E to D that contains a collider does
NOT generate an association between E and D - Conditioning on the collider, however, will turn
that path into a confounding path.
Why?
24Conditioning on a ColliderA Trap
C
V1
U1
U2
V2
U3
E
D
The horizontal line indicates an association (the
possibility of guesses) that was induced by
conditioning on a collider
25Properties of a ColliderIntuitive Explanation
- A dataset contains three variables for N cars
- Brake condition (good/bad)
- Street condition in the owners town (good/bad)
- Involved in an accident in the owners town?
(yes/no)
Brake condition (good, bad)
Accident (yes, no)
Street condition (good, bad)
- Accident is a collider.
- Brake condition and street condition are not
associated in the dataset. We cannot use the
data to guess one from the other.
26Properties of a ColliderIntuitive Explanation
- Why cant we make a guess from the data?
- Lets try. Suppose we are told
- Car A has good brakes and car B has bad brakes.
- This information tells us nothing about the
street condition in each owners town.
- Intuition a common effect (collider) does not
induce an association between its causes
(colliding variables)
27Properties of a ColliderIntuitive Explanation
- If, however, we condition (stratify) on the
collider accident, we can make some guesses
about the street condition from the brake
condition.
Stratum 1 Accident yes
28Properties of a ColliderIntuitive Explanation
- Similarly, in the other stratum
Stratum 2 Accident no
29Properties of a Collider
- In summary
- Conditioning on a collider creates an association
between the colliding variables and, therefore,
may open a confounding path
Before conditioning on C
After conditioning on C
U1
U1
U2
U2
C
C
E
E
D
D
30Derivations
- The change-in-estimate method could fail if we
condition on colliders, and thereby open
confounding paths - To (rationally) select covariates for adjustment,
we must commit to a causal diagram (premises) - (But we often say that we dont know and cant
commit, and hope that the change-in-estimate
method will work.)
- Causal inference, like all scientific inference,
is conditional on premises (which may be
false)not on ignorance
31Derivations
- Do not condition on colliders, if possible
- If you condition on a collider,
- Connect the colliding variables by a line
- Check if you opened a new confounding path
- Condition on another variable to block that new
path
Conditioning on C alone
Conditioning on C and (U1 or U2)
U1
U1
U2
U2
C
C
E
E
D
D
32Practical advice
- Study one exposure at a time
- A model that may be good for exposure A might not
be good for exposure B (even if B is in the
model) - Never adjust for an effect of the exposure
- Never adjust for an effect of the disease
- Never select covariates by stepwise regression
- Never look at p-values to decide on confounding
- (actually, never look at p-values)
33Extension to other problems of causal inquiry
- Causation always remains uncertain, even if we
deal with a single confounder
Unbeknown to us the reality happens to be
We draw
U1
U2
C
C
E
E
D
D
And naively condition on C
And our adjustment may fail
34Extension to other problems of causal inquiry
- Estimating the direct effect by conditioning on
an intermediary variable, I
I
D
E
- We should remember that variable I may be a
collider
I
E
35Extension to other problems of causal inquiry
- Causal diagrams explain the mechanism of
selection bias - Example
- What happens if we estimate the effect of
marital status on dementia in a sample of nursing
home residents? -
- Assume no effect
- both variables affect place of residence
(home, or nursing home)
36Extension to other problems of causal inquiry
Marital status
Dementia
Place of residence (home, nursing home)
- By studying a sample of nursing home residents,
we are conditioning on a collider (on a sampling
collider) and might create an association
between marital status and dementia in that
stratum
37Extension to other problems of causal inquiry
Marital status
Dementia
Place of residence (home, nursing home)
Stratification
Nursing home
Home
38Extensions control selection bias(Source
Hernan et al, Epidemiology 2004)
39Extensions control selection bias(Source
Hernan et al, Epidemiology 2004)
Estrogen
MI
E
D
F
S (0,1)
S1 (our case-control sample)
S0 (remainder of the source cohort)
HRT
MI
E
D
Association of E and D was created
40Extensions information bias(LAST EXAMPLE)
41Summary Points
- The change-in-estimate method could fail if we
condition on colliders, and thereby open
confounding paths - The theory of causal diagrams extends the idea of
a confounder to the multi-confounder case - Unification of confounding bias, selection bias,
and information bias under a single theoretical
framework
42- Back-door algorithm
- Sufficient set for adjustment
- Minimally sufficient set
- Differential losses to follow-up
- Time-dependent confounders
- Interpretation of hazard ratios
- Conditioning on a common effect always induced an
association between its causes, but this
association could be restricted to some levels of
the common effect
43 Age (young, old)
Sex
Smoking drive (low, high)
Physical activity (low, high)
Asthma (yes, no)
?
Smoking status
FEV1
44 Age (young, old)
Sex
Smoking drive (low, high)
Physical activity (low, high)
Asthma (yes, no)
?
Smoking status
FEV1
45 Age (young, old)
Sex
Smoking drive (low, high)
Physical activity (low, high)
Asthma (yes, no)
?
Smoking status
FEV1
46Pneumonia
Ulcer
Hospitalization Status hospitalized
not hospitalized
Abdominal Pain
?
Coughing
Stratification
hospitalized patients
other patients
Ulcer
Pneumonia
?
Abdominal Pain
Coughing
47Example Do men have higher systolic blood
pressure than women? (In other words estimate
the gender effect on systolic blood
pressure) The following table summarizes the
answer to this question from two regression models
 So, which is the true estimate and which is
biased?
48WHR
Gender
SBP
BMI
Z1
Z2
. .
49U
WHR
Gender
SBP
BMI
Z1
Z2
. .
50U
WHR
Gender
SBP
BMI
Z1
Z2
. .