Recent Advanced in Causal Modelling Using Directed Graphs - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Recent Advanced in Causal Modelling Using Directed Graphs

Description:

Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others Department of Philosophy Carnegie Mellon Penn State - March 23 ... – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 62
Provided by: Christo506
Category:

less

Transcript and Presenter's Notes

Title: Recent Advanced in Causal Modelling Using Directed Graphs


1
The TETRAD Project Computational Aids to
Causal Discovery
Peter Spirtes, Clark Glymour, Richard
Scheines and many others Department of
Philosophy Carnegie Mellon
2
Agenda
  1. Morning I Theoretical Overview Representation,
    Axioms, Search
  2. Morning II Research Problems
  3. Afternoon TETRAD Demo - Workshop

3
Part I Agenda
  1. Motivation
  2. Representation
  3. Connecting Causation to Probability
    (Independence)
  4. Searching for Causal Models
  5. Improving on Regression for Causal Inference

4
1. Motivation
  • Non-experimental Evidence
  • Typical Predictive Questions
  • Can we predict aggressiveness from the amount of
    violent TV watched
  • Causal Questions
  • Does watching violent TV cause Aggression?
  • I.e., if we intervene to change TV watching, will
    the level of Aggression change?

5
Causal Estimation
When and how can we use non-experimental data to
tell us about the effect of an intervention?
  • Manipulated Probability P(Y X set x, Zz)
  • from
  • Unmanipulated Probability P(Y,X,Z)

6
Spartina in the Cape Fear Estuary
7
What FactorsDirectly Influence Spartina Growth in
the Cape Fear Estuary?
  • pH, salinity, sodium, phosphorus, magnesium,
    ammonia, zinc, potassium, what?
  • 14 variables for 45 samples of Spartina from Cape
    Fear Estuary.
  • Biologist concluded salinity must be a factor.
  • Bayes net analysis says only pH directly affects
    Spartina biomass
  • Biologists subsequent greenhouse experiment
    says if pH is controlled for, variations in
    salinity do not affect growth but if salinity is
    controlled for, variations in pH do affect
    growth.

8
2. Representation
  1. Association causal structure - qualitatively
  2. Interventions
  3. Statistical Causal Models
  4. Bayes Networks
  5. Structural Equation Models

9
Causation Association
X and Y are associated (X __ Y) iff ?x1 ? x2
P(Y X x1) ? P(Y X x2) Association is
symmetric X __ Y ? Y __ X
  • X is a cause of Y iff
  • ?x1 ? x2 P(Y X set x1) ? P(Y X set x2)
  • Causation is asymmetric X Y ? Y X

10
Direct Causation
  • X is a direct cause of Y relative to S, iff
  • ?z,x1 ? x2 P(Y X set x1 , Z set z)
  • ? P(Y X set x2 , Z set z)
  • where Z S - X,Y

11
Causal Graphs
  • Causal Graph G V,E
  • Each edge X ? Y represents a direct causal
    claim
  • X is a direct cause of Y relative to V

Chicken Pox
12
Causal Graphs
  • Not Cause Complete

Common Cause Complete
13
Modeling Ideal Interventions
Interventions on the Effect
Pre-experimental System
Post
Room Temperature
Sweaters On
14
Modeling Ideal Interventions
Interventions on the Cause
Pre-experimental System
Post
Sweaters On
Room Temperature
15
Ideal Interventions Causal Graphs
  • Model an ideal intervention by adding an
    intervention variable outside the original
    system
  • Erase all arrows pointing into the variable
    intervened upon

Intervene to change Inf Post-intervention graph?
Pre-intervention graph
16
Conditioning vs. Intervening
P(Y X x1) vs. P(Y X set x1)Teeth Slides
17
Causal Bayes Networks
The Joint Distribution Factors According to the
Causal Graph, i.e., for all X in V P(V)
?P(XImmediate Causes of(X))
  • P(S 0) .7
  • P(S 1) .3
  • P(YF 0 S 0) .99 P(LC 0 S 0) .95
  • P(YF 1 S 0) .01 P(LC 1 S 0) .05
  • P(YF 0 S 1) .20 P(LC 0 S 1) .80
  • P(YF 1 S 1) .80 P(LC 1 S 1) .20

P(S,YF, L) P(S) P(YF S) P(LC S)
18
Structural Equation Models
Causal Graph
Statistical Model
  • 1. Structural Equations
  • 2. Statistical Constraints

19
Structural Equation Models
Causal Graph
  • Structural Equations
  • One Equation for each variable V in the
    graph
  • V f(parents(V), errorV)
  • for SEM (linear regression) f is a linear
    function
  • Statistical Constraints
  • Joint Distribution over the Error terms

20
Structural Equation Models
  • Equations
  • Education ?ed
  • Income ????Education????income
  • Longevity ????Education????Longevity
  • Statistical Constraints
  • (?ed, ?Income,?Income ) N(0,?2)
  • ?????????2?diagonal
  • - no variance is zero

21
3. Connecting Causation to Probability
22
The Markov Condition
  • Causal
  • Structure

Statistical Predictions
Causal Markov Axiom
Independence X __ Z Y i.e., P(X Y) P(X
Y, Z)
Causal Graphs
23
Causal Markov Axiom
  • If G is a causal graph, and P a probability
    distribution over the variables in G, then in P
  • every variable V is independent of its
    non-effects, conditional on its immediate causes.

24
Causal Markov Condition
  • Two Intuitions
  • 1) Immediate causes make effects independent of
    remote causes (Markov).
  • 2) Common causes make their effects independent
    (Salmon).

25
Causal Markov Condition
  • 1) Immediate causes make effects independent
    of remote causes (Markov).

E Exposure to Chicken Pox I Infected S
Symptoms
Markov Cond.
E S I
26
Causal Markov Condition
  • 2) Effects are independent conditional on their
    common causes.

YF LC S
Markov Cond.
27
Causal Structure ? Statistical Data
28
Causal Markov Axiom
  • In SEMs, d-separation follows from assuming
    independence among error terms that have no
    connection in the path diagram - i.e., assuming
    that the model is common cause complete.

29
Causal Markov and D-Separation
  • In acyclic graphs equivalent
  • Cyclic Linear SEMs with uncorrelated errors
  • D-separation correct
  • Markov condition incorrect
  • Cyclic Discrete Variable Bayes Nets
  • If equilibrium --gt d-separation correct
  • Markov incorrect

30
D-separation Conditioning vs. Intervening
31
4. Search From Statistical Data to
Probability to Causation
32
Causal DiscoveryStatistical Data ? Causal
Structure
33
Faithfulness
34
Faithfulness Assumption
Statistical Constraints arise from Causal
Structure, not Coincidence All independence
relations holding in a probability distribution P
generated by a causal structure G are entailed by
d-separation applied to G.
35
Faithfulness Assumption
  • Revenues aRate cEconomy ?Rev.
  • Economy bRate ?Econ.
  • a ? -bc

36
Representations ofD-separation Equivalence
Classes
  • We want the representations to
  • Characterize the Independence Relations Entailed
    by the Equivalence Class
  • Represent causal features that are shared by
    every member of the equivalence class

37
Patterns PAGs
  • Patterns (Verma and Pearl, 1990) graphical
    representation of an acyclic d-separation
    equivalence - no latent variables.
  • PAGs (Richardson 1994) graphical representation
    of an equivalence class including latent variable
    models and sample selection bias that are
    d-separation equivalent over a set of measured
    variables X

38
Patterns
39
Patterns What the Edges Mean
40
Patterns
41
PAGs Partial Ancestral Graphs
What PAG edges mean.
42
PAGs Partial Ancestral Graphs
43
Overview of Search Methods
  • Constraint Based Searches
  • TETRAD
  • Scoring Searches
  • Scores BIC, AIC, etc.
  • Search Hill Climb, Genetic Alg., Simulated
    Annealing
  • Difficult to extend to latent variable models
  • Heckerman, Meek and Cooper (1999). A Bayesian
    Approach to Causal Discovery chp. 4 in
    Computation, Causation, and Discovery, ed. by
    Glymour and Cooper, MIT Press, pp. 141-166

44
Search - Illustration
45
Search Adjacency
46
(No Transcript)
47
Search Orientation in Patterns
48
Search Orientation
After Orientation Phase
X1 X2 X1 X4 X3 X2 X4 X3
49
The theory of interventions, simplified
  • Start with an graphical causal model, without
    feedback.
  • Simplest Problem To predict the probability
    distribution of other represented variables
    resulting from an intervention that forces a
    value x on a variable X, (e.g., everybody has to
    smoke) but does not otherwise alter the causal
    structure.

50
First Thing
  • Remember The probability distribution for
    values of Y conditional on X x is not in
    general the same as the probability distribution
    for values of Y on an intervention that sets X
    x.
  • Recent work by Waldemann gives evidence that
    adults are sensitive to the difference.

51
Example
X Y Z W

Because X influences Y, the value of X gives
information about the value of Y, and vice versa.
X and Y are dependent in probability. But An
intervention that forces a value Y y on Y, and
otherwise does not disturb the system should not
change the probability distribution for values of
X. It should, necessarily, make the value of Y
independent of Xinformally, the value of Y
should give no information about the value of X,
and vice-versa.
52
Representing a Simple Manipulation
  • Observed Structure
  • Structure upon
  • Manipulating
  • Yellow Fingers

53
Intervention Calculations
X Y Z W
  1. Set Y y
  2. Do surgery on the graph eliminate edges into Y
  3. Use the Markov factorization of the resulting
    graph and probability distribution to compute the
    probability distribution for X, Z, Wvarious
    effective rules incorporated in what Pearl calls
    the Do calculus.

54
Intervention Calculations
X Y y Z W

Original Markov Factorization Pr(X, Y, Z, W)
Pr(W X,Z) Pr(Z Y) Pr(Y X) Pr(X) The
Factorization After Intervention Pr(X, Y, W
Do(Y y) Pr(W X,Z) Pr(Z Y y) Pr(X)
55
Whats The Point?
  • Pr(X, Y, W Do(Y y) Pr(W X,Z) Pr(Z Y
    y) Pr(X)
  • The probability distribution on the left hand
    side is a prediction of the effects of an
    intervention.
  • The probabilities on the right are all known
    before the intervention.
  • So causal structure plus probabilities gt
    prediction of intervention effects provide a
    basis for planning.

56
Surprising Results
X Y Z W

Suppose we know the causal structure the
joint probability for Y, Z, W onlynot for X We
CAN predict the effect on Z of an intervention on
Yeven though Y and W are confounded by
unobserved X.
57
Surprising Result
X Y Z W

The effect of Z on W is confounded by the the
probabilistic effect of Y on Z, X on Y and X on
W. But the probabilistic effect on W of an
intervention on Z CAN be computed from the
probabilities before the intervention. How? By
conditioning on Y.
58
Surprising Result
X Y Z W

Pr(W, Y, X Do(Z z )) Pr(W Z z, X) Pr(Y
X) Pr(X) (by surgery) Pr(W, X Do(Z z), Y)
Pr(W Z z, X) Pr(X Y) (condition on Y) Pr(W
Do(Z z), Y) Sx Pr(W Z z, X, Y) Pr(X Y)
(marginalize out X) Pr(W Do(Z z), Y) Pr(W
Z z, Y y) (obscure probability theorem) Pr(W
Do(Z z)) Sy Pr(W Z z, Y) Pr(Y)
(marginalize out Y) The right hand side is
composed entirely of observed probabilities.
59
Pearls Do Calculus
  • Provides rules that permit one to avoid the
    probability calculations we just went
    throughgraphical properties determine whether
    effects of an intervention can be predicted.

60
Applications
  • Rock Classification
  • College Plans
  • Political Exclusion
  • Satellite Calibration
  • Naval Readiness
  • Spartina Grass
  • Parenting among Single, Black Mothers
  • Pneumonia
  • Photosynthesis
  • Lead - IQ
  • College Retention
  • Corn Exports

61
References
  • Causation, Prediction, and Search, 2nd Edition,
    (2000), by P. Spirtes, C. Glymour, and R.
    Scheines ( MIT Press)
  • Computation, Causation, Discovery (1999),
    edited by C. Glymour and G. Cooper, MIT Press
  • Causality in Crisis?, (1997) V. McKim and S.
    Turner (eds.), Univ. of Notre Dame Press.
  • TETRAD IV www.phil.cmu.edu/projects/tetrad
  • Web Course on Causal and Statistical Reasoning
    www.phil.cmu.edu/projects/csr/
  • Causality Lab www.phil.cmu.edu/projects/causalit
    y-lab
Write a Comment
User Comments (0)
About PowerShow.com