Title: Recent Advanced in Causal Modelling Using Directed Graphs
1The TETRAD Project Computational Aids to
Causal Discovery
Peter Spirtes, Clark Glymour, Richard
Scheines and many others Department of
Philosophy Carnegie Mellon
2Agenda
- Morning I Theoretical Overview Representation,
Axioms, Search - Morning II Research Problems
- Afternoon TETRAD Demo - Workshop
3Part I Agenda
- Motivation
- Representation
- Connecting Causation to Probability
(Independence) - Searching for Causal Models
- Improving on Regression for Causal Inference
41. Motivation
- Non-experimental Evidence
- Typical Predictive Questions
- Can we predict aggressiveness from the amount of
violent TV watched - Causal Questions
- Does watching violent TV cause Aggression?
- I.e., if we intervene to change TV watching, will
the level of Aggression change?
5Causal Estimation
When and how can we use non-experimental data to
tell us about the effect of an intervention?
- Manipulated Probability P(Y X set x, Zz)
- from
- Unmanipulated Probability P(Y,X,Z)
6Spartina in the Cape Fear Estuary
7What FactorsDirectly Influence Spartina Growth in
the Cape Fear Estuary?
- pH, salinity, sodium, phosphorus, magnesium,
ammonia, zinc, potassium, what? - 14 variables for 45 samples of Spartina from Cape
Fear Estuary. - Biologist concluded salinity must be a factor.
- Bayes net analysis says only pH directly affects
Spartina biomass - Biologists subsequent greenhouse experiment
says if pH is controlled for, variations in
salinity do not affect growth but if salinity is
controlled for, variations in pH do affect
growth. -
82. Representation
- Association causal structure - qualitatively
- Interventions
- Statistical Causal Models
- Bayes Networks
- Structural Equation Models
9Causation Association
X and Y are associated (X __ Y) iff ?x1 ? x2
P(Y X x1) ? P(Y X x2) Association is
symmetric X __ Y ? Y __ X
- X is a cause of Y iff
- ?x1 ? x2 P(Y X set x1) ? P(Y X set x2)
- Causation is asymmetric X Y ? Y X
10Direct Causation
- X is a direct cause of Y relative to S, iff
- ?z,x1 ? x2 P(Y X set x1 , Z set z)
- ? P(Y X set x2 , Z set z)
- where Z S - X,Y
11Causal Graphs
- Causal Graph G V,E
- Each edge X ? Y represents a direct causal
claim - X is a direct cause of Y relative to V
Chicken Pox
12Causal Graphs
Common Cause Complete
13Modeling Ideal Interventions
Interventions on the Effect
Pre-experimental System
Post
Room Temperature
Sweaters On
14Modeling Ideal Interventions
Interventions on the Cause
Pre-experimental System
Post
Sweaters On
Room Temperature
15Ideal Interventions Causal Graphs
- Model an ideal intervention by adding an
intervention variable outside the original
system - Erase all arrows pointing into the variable
intervened upon
Intervene to change Inf Post-intervention graph?
Pre-intervention graph
16Conditioning vs. Intervening
P(Y X x1) vs. P(Y X set x1)Teeth Slides
17Causal Bayes Networks
The Joint Distribution Factors According to the
Causal Graph, i.e., for all X in V P(V)
?P(XImmediate Causes of(X))
- P(S 0) .7
- P(S 1) .3
- P(YF 0 S 0) .99 P(LC 0 S 0) .95
- P(YF 1 S 0) .01 P(LC 1 S 0) .05
- P(YF 0 S 1) .20 P(LC 0 S 1) .80
- P(YF 1 S 1) .80 P(LC 1 S 1) .20
P(S,YF, L) P(S) P(YF S) P(LC S)
18Structural Equation Models
Causal Graph
Statistical Model
- 1. Structural Equations
- 2. Statistical Constraints
19Structural Equation Models
Causal Graph
- Structural Equations
- One Equation for each variable V in the
graph - V f(parents(V), errorV)
- for SEM (linear regression) f is a linear
function - Statistical Constraints
- Joint Distribution over the Error terms
20Structural Equation Models
- Equations
- Education ?ed
- Income ????Education????income
- Longevity ????Education????Longevity
- Statistical Constraints
- (?ed, ?Income,?Income ) N(0,?2)
- ?????????2?diagonal
- - no variance is zero
213. Connecting Causation to Probability
22The Markov Condition
Statistical Predictions
Causal Markov Axiom
Independence X __ Z Y i.e., P(X Y) P(X
Y, Z)
Causal Graphs
23Causal Markov Axiom
- If G is a causal graph, and P a probability
distribution over the variables in G, then in P - every variable V is independent of its
non-effects, conditional on its immediate causes.
24Causal Markov Condition
- Two Intuitions
-
- 1) Immediate causes make effects independent of
remote causes (Markov). - 2) Common causes make their effects independent
(Salmon).
25Causal Markov Condition
- 1) Immediate causes make effects independent
of remote causes (Markov).
E Exposure to Chicken Pox I Infected S
Symptoms
Markov Cond.
E S I
26Causal Markov Condition
- 2) Effects are independent conditional on their
common causes.
YF LC S
Markov Cond.
27Causal Structure ? Statistical Data
28Causal Markov Axiom
- In SEMs, d-separation follows from assuming
independence among error terms that have no
connection in the path diagram - i.e., assuming
that the model is common cause complete.
29Causal Markov and D-Separation
- In acyclic graphs equivalent
- Cyclic Linear SEMs with uncorrelated errors
- D-separation correct
- Markov condition incorrect
- Cyclic Discrete Variable Bayes Nets
- If equilibrium --gt d-separation correct
- Markov incorrect
30D-separation Conditioning vs. Intervening
314. Search From Statistical Data to
Probability to Causation
32Causal DiscoveryStatistical Data ? Causal
Structure
33Faithfulness
34Faithfulness Assumption
Statistical Constraints arise from Causal
Structure, not Coincidence All independence
relations holding in a probability distribution P
generated by a causal structure G are entailed by
d-separation applied to G.
35Faithfulness Assumption
- Revenues aRate cEconomy ?Rev.
- Economy bRate ?Econ.
- a ? -bc
36Representations ofD-separation Equivalence
Classes
- We want the representations to
- Characterize the Independence Relations Entailed
by the Equivalence Class - Represent causal features that are shared by
every member of the equivalence class
37Patterns PAGs
- Patterns (Verma and Pearl, 1990) graphical
representation of an acyclic d-separation
equivalence - no latent variables. - PAGs (Richardson 1994) graphical representation
of an equivalence class including latent variable
models and sample selection bias that are
d-separation equivalent over a set of measured
variables X
38Patterns
39Patterns What the Edges Mean
40Patterns
41PAGs Partial Ancestral Graphs
What PAG edges mean.
42PAGs Partial Ancestral Graphs
43Overview of Search Methods
- Constraint Based Searches
- TETRAD
- Scoring Searches
- Scores BIC, AIC, etc.
- Search Hill Climb, Genetic Alg., Simulated
Annealing - Difficult to extend to latent variable models
- Heckerman, Meek and Cooper (1999). A Bayesian
Approach to Causal Discovery chp. 4 in
Computation, Causation, and Discovery, ed. by
Glymour and Cooper, MIT Press, pp. 141-166
44Search - Illustration
45Search Adjacency
46(No Transcript)
47Search Orientation in Patterns
48Search Orientation
After Orientation Phase
X1 X2 X1 X4 X3 X2 X4 X3
49The theory of interventions, simplified
- Start with an graphical causal model, without
feedback. - Simplest Problem To predict the probability
distribution of other represented variables
resulting from an intervention that forces a
value x on a variable X, (e.g., everybody has to
smoke) but does not otherwise alter the causal
structure.
50First Thing
- Remember The probability distribution for
values of Y conditional on X x is not in
general the same as the probability distribution
for values of Y on an intervention that sets X
x. - Recent work by Waldemann gives evidence that
adults are sensitive to the difference.
51Example
X Y Z W
Because X influences Y, the value of X gives
information about the value of Y, and vice versa.
X and Y are dependent in probability. But An
intervention that forces a value Y y on Y, and
otherwise does not disturb the system should not
change the probability distribution for values of
X. It should, necessarily, make the value of Y
independent of Xinformally, the value of Y
should give no information about the value of X,
and vice-versa.
52Representing a Simple Manipulation
- Observed Structure
- Structure upon
- Manipulating
- Yellow Fingers
53Intervention Calculations
X Y Z W
- Set Y y
- Do surgery on the graph eliminate edges into Y
- Use the Markov factorization of the resulting
graph and probability distribution to compute the
probability distribution for X, Z, Wvarious
effective rules incorporated in what Pearl calls
the Do calculus.
54Intervention Calculations
X Y y Z W
Original Markov Factorization Pr(X, Y, Z, W)
Pr(W X,Z) Pr(Z Y) Pr(Y X) Pr(X) The
Factorization After Intervention Pr(X, Y, W
Do(Y y) Pr(W X,Z) Pr(Z Y y) Pr(X)
55Whats The Point?
- Pr(X, Y, W Do(Y y) Pr(W X,Z) Pr(Z Y
y) Pr(X) - The probability distribution on the left hand
side is a prediction of the effects of an
intervention. - The probabilities on the right are all known
before the intervention. - So causal structure plus probabilities gt
prediction of intervention effects provide a
basis for planning. -
56Surprising Results
X Y Z W
Suppose we know the causal structure the
joint probability for Y, Z, W onlynot for X We
CAN predict the effect on Z of an intervention on
Yeven though Y and W are confounded by
unobserved X.
57Surprising Result
X Y Z W
The effect of Z on W is confounded by the the
probabilistic effect of Y on Z, X on Y and X on
W. But the probabilistic effect on W of an
intervention on Z CAN be computed from the
probabilities before the intervention. How? By
conditioning on Y.
58Surprising Result
X Y Z W
Pr(W, Y, X Do(Z z )) Pr(W Z z, X) Pr(Y
X) Pr(X) (by surgery) Pr(W, X Do(Z z), Y)
Pr(W Z z, X) Pr(X Y) (condition on Y) Pr(W
Do(Z z), Y) Sx Pr(W Z z, X, Y) Pr(X Y)
(marginalize out X) Pr(W Do(Z z), Y) Pr(W
Z z, Y y) (obscure probability theorem) Pr(W
Do(Z z)) Sy Pr(W Z z, Y) Pr(Y)
(marginalize out Y) The right hand side is
composed entirely of observed probabilities.
59Pearls Do Calculus
- Provides rules that permit one to avoid the
probability calculations we just went
throughgraphical properties determine whether
effects of an intervention can be predicted.
60Applications
- Rock Classification
- College Plans
- Political Exclusion
- Satellite Calibration
- Naval Readiness
- Spartina Grass
- Parenting among Single, Black Mothers
- Pneumonia
- Photosynthesis
- Lead - IQ
- College Retention
- Corn Exports
61References
- Causation, Prediction, and Search, 2nd Edition,
(2000), by P. Spirtes, C. Glymour, and R.
Scheines ( MIT Press) - Computation, Causation, Discovery (1999),
edited by C. Glymour and G. Cooper, MIT Press - Causality in Crisis?, (1997) V. McKim and S.
Turner (eds.), Univ. of Notre Dame Press. - TETRAD IV www.phil.cmu.edu/projects/tetrad
- Web Course on Causal and Statistical Reasoning
www.phil.cmu.edu/projects/csr/ - Causality Lab www.phil.cmu.edu/projects/causalit
y-lab