Title: Recent Advanced in Causal Modelling Using Directed Graphs
1Causal Inference and Ambiguous Manipulations
Richard Scheines Grant Reaber, Peter
Spirtes Carnegie Mellon University
21. Motivation
- Wanted Answers to Causal Questions
- Does attending Day Care cause Aggression?
- Does watching TV cause obesity?
- How can we answer these questions empirically?
- When and how can we estimate the size of the
effect? - Can we know our estimates are reliable?
3Causation Intervention
Conditioning is not the same as intervening
- P(Lung Cancer Tar-stained teeth no)
- ?
- P(Lung Cancer Tar-stained teeth set no)
Show Teeth Slides
4(No Transcript)
5Causal Inference Experiments
- Gold Standard Randomized Clinical Trials -
Intervene Randomly assign treatment - Observe
Response - Estimate P( Response Treatment assigned)
6Causal Inference Observational Studies
- Collect a sample on
- - Potential Causes (X)
- - Response (Y)
- - Covariates (potential confounders Z)
- Estimate P(Y X, Z)
- Highly unreliable
- We can estimate sampling variability, but we
dont know how to estimate specification
uncertainty from data
72. Progress 1985 Present
- Representing causal structure, and connecting it
to probability - Modeling Interventions
- Indistinguishability and Discovery Algorithms
8Representing Causal Structures
- Causal Graph G V,E
- Each edge X ? Y represents a direct causal
claim - X is a direct cause of Y relative to V
9Direct Causation
- X is a direct cause of Y relative to S, iff
- ?z,x1 ? x2 P(Y X set x1 , Z set z)
- ? P(Y X set x2 , Z set z)
- where Z S - X,Y
10Causal Bayes Networks
The Joint Distribution Factors According to the
Causal Graph, i.e., for all X in V P(V)
?P(XImmediate Causes of(X))
- P(S 0) .7
- P(S 1) .3
- P(YF 0 S 0) .99 P(LC 0 S 0) .95
- P(YF 1 S 0) .01 P(LC 1 S 0) .05
- P(YF 0 S 1) .20 P(LC 0 S 1) .80
- P(YF 1 S 1) .80 P(LC 1 S 1) .20
P(S,Y,F) P(S) P(YF S) P(LC S)
11Modeling Ideal Interventions
Interventions on the Effect
Pre-experimental System
Room Temperature
Wearing Sweater
12Modeling Ideal Interventions
Interventions on the Cause
Pre-experimental System
Post
Room Temperature
Wearing Sweater
13Interventions Causal Graphs
- Model an ideal intervention by adding an
intervention variable outside the original
system - Erase all arrows pointing into the variable
intervened upon
Intervene to change Inf Post-intervention graph?
Pre-intervention graph
14Calculating the Effect of Interventions
Pre-manipulation Joint Distribution
P(Exp,Inf,Rash) P(Exp)P(Inf Exp)P(RashInf)
Intervention on Inf
Post-manipulation Joint Distribution
P(Exp,Inf,Rash) P(Exp)P(Inf I) P(RashInf)
15Causal Discovery from Observational Studies
16Equivalence Class with LatentsPAGs Partial
Ancestral Graphs
- Assumptions
- Acyclic graphs
- Latent variables
- Sample Selection Bias
- Equivalence
- Independence over measured variables
17Knowing when we know enough to calculate the
effect of Interventions The Prediction
Algorithm (SGS, 2000)
Causal Inference from Observational Studies
18Causal Discovery from Observational Studies
193. The Ambiguity of Manipulation
- Assumptions
- Causal graph known
(Cholesterol is a cause of Heart Condition) - No Unmeasured Common Causes
Therefore The manipulated and unmanipulated
distributions are the same
P(H TC x) P(H TC set x)
20The Problem with Predicting the Effects of Acting
Problem the cause is a composite of causes that
dont act uniformly, E.g., Total Blood
Cholesterol (TC) HDL LDL
- The observed distribution over TC is determined
by the unobserved joint distribution over HDL and
LDL - Ideally Intervening on TC does not determine a
joint distribution for HDL and LDL
21The Problem with Predicting the Effects of
Setting TC
- P(H TC set1 x) puts NO constraints on P(H
TC set2 x), - P(H TC x) puts NO constraints on P(H TC
set x) -
- Nothing in the data tips us off about our
ignorance, i.e., we dont know that we dont
know.
22Examples Abound
23Possible Ways Out
- Causal Graph is Not Known
- Cholesterol does not really cause Heart
Condition - Confounders (unmeasured common causes) are
present - LDL and HDL are confounders
24Cholesterol is not really a cause of Heart
Condition
- Relative to a set of variables S (and a
background), - X is a cause of Y iff
- ?x1 ? x2 P(Y X set x1) ? P(Y X set x2)
- Total Cholesterol is a cause of Heart Disease
25Cholesterol is not really a cause of Heart
Condition
- Is Total Cholesterol is a direct cause of Heart
Condition relative to TC, LDL, HDL, HD? - TC is logically related to LDL, HDL, so
manipulating it once LDL and HDL are set is
impossible.
26LDL, HDL are confounders
- No way to manipulate TCl without affecting HDL,
LDL - HDL, LDL are logically related to TC
27Logico-Causal Systems
- S Atomic Variables
- independently manipulable
- effects of all manipulations are unambiguous
- S Defined Variables
- defined logically from variables in S
- For example
- S LDL, HDL, HD, Disease1, Disease2
- S TC
28Logico-Causal Systems Adding Edges
S LDL, HDL, HD, D1, D2 S TC System over
S System over S U S
TC ? HD iff manipulations of TC are unambiguous
wrt HD
29Logico-Causal Systems Unambiguous Manipulations
For each variable X in S, let Parents(X) be the
set of variables in S that logically determine
X, i.e., X f(Parents(X)), e.g., TC LDL
HDL Inv(x) set of all values p of Parents(X)
s.t., f(p) x
A manipulation of a variable X in S to a value
x wrt another variable Y is unambiguous iff
?p1? p2 P(Y p1 ? Inv(x)) P(Y p2 ?
Inv(x))
TC ? HD iff all manipulations of TC are
unambiguous wrt HD
30Logico-Causal Systems Removing Edges
S LDL, HDL, HD, D1, D2 S TC System over
S System over S U S
Remove LDL ? HD iff LDL __ HD TC
31Logico-Causal Systems Faithfulness
Faithfulness Independences entailed by
structure, not by special parameter values.
Crucial to inference
- Effect of TC on HD unambiguous
- Unfaithfulness LDL __ HDL TC
- Because LDL and TC determine HDL, and similarly,
HDL and TC determine TC
32Effect on Prediction Algorithm
Still sound but less informative
Observed System TC, HD, D1, D2
Manipulate Effect on Assume manipulation unambiguous Manipulation Maybe ambiguous
Disease 1 Disease 2 None None
Disease 1 HD Cant tell Cant tell
Disease 1 TC Cant tell Cant tell
Disease 2 Disease 1 None None
Disease 2 HD Cant tell Cant tell
Disease 2 TC Cant tell Cant tell
TC Disease 1 None Cant tell
TC Disease 2 None Cant tell
TC HD Cant tell Cant tell
HD Disease 1 None Cant tell
HD Disease 2 None Cant tell
HD TC Cant tell Cant tell
33Effect on Prediction Algorithm
Observed System TC, HD, D1, D2, X
Not completely sound
No general characterization of when the
Prediction algorithm, suitably modified, is still
informative and sound. Conjectures, but no proof
yet.
- Example
- If observed system has no deterministic
relations - All orientations due to marginal independence
relations are still valid
34Effect on Causal Inference ofAmbiguous
Manipulations
- Experiments, e.g., RCTs
- Manipulating treatment is
- unambiguous ? sound
- ambiguous ? unsound
- Observational Studies, e.g., Prediction
Algorithm - Manipulation is
- unambiguous ? potentially sound
- ambiguous ? potentially sound
35References
- Causation, Prediction, and Search, 2nd Edition,
(2000), by P. Spirtes, C. Glymour, and R.
Scheines ( MIT Press) - Causality Models, Reasoning, and Inference,
(2000), Judea Pearl, Cambridge Univ. Press - Spirtes, P., Scheines, R.,Glymour, C.,
Richardson, T., and Meek, C. (2004), Causal
Inference, in Handbook of Quantitative
Methodology in the Social Sciences, ed. David
Kaplan, Sage Publications, 447-478 - Spirtes, P., and Scheines, R. (2004). Causal
Inference of Ambiguous Manipulations. in
Proceedings of the Philosophy of Science
Association Meetings, 2002. - Reaber, Grant (2005). The Theory of Ambiguous
Manipulations. Masters Thesis, Department of
Philosophy, Carnegie Mellon University