Causal Inference and Graphical Models - PowerPoint PPT Presentation

About This Presentation
Title:

Causal Inference and Graphical Models

Description:

It affects murder rates directly, not via its effect on cocaine usage, because ... d is the dimensionality of the model, which in DAGs without latent variables is ... – PowerPoint PPT presentation

Number of Views:399
Avg rating:3.0/5.0
Slides: 82
Provided by: micros205
Learn more at: https://ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Causal Inference and Graphical Models


1
Causal Inference and Graphical Models
  • Peter Spirtes
  • Carnegie Mellon University

2
Overview
  • Manipulations
  • Assuming no Hidden Common Causes
  • From DAGs to Effects of Manipulation
  • From Data to Sets of DAGs
  • From Sets of Dags to Effects of Manipulation
  • May be Hidden Common Causes
  • From Data to Sets of DAGs
  • From Sets of DAGs to Effects of Manipulations

3
If I were to force a group of people to smoke one
pack a day, what what percentage would develop
lung cancer?
The Evidence
4
P(Lung cancer yes) 1/2
5
Conditioning on Teeth white yes
P(Lung Cancer yesTeeth white yes) 1/4
6
Manipulating Teeth white yes
7
Manipulating Teeth white yes - After Waiting
P(Lung Cancer yes White teeth yes) 1/2
?
P(Lung Cancer yesWhite teeth yes) 1/4
8
Smoking Decision
  • Setting insurance rates for smokers -
    conditioning
  • Suppose the Surgeon General is considering
    banning smoking?
  • Will this decrease smoking?
  • Will decreasing smoking decrease cancer?
  • Will it have negative side-effects e.g. more
    obesity?
  • How is greater life expectancy valued against
    decrease in pleasure from smoking?

9
Manipulations and Distributions
  • Since Smoking determines Teeth white, P(T,L,R,W)
    P(S,L,R,W)
  • But the manipulation of Teeth white leads to
    different results than the manipulation of
    Smoking
  • Hence the distribution does not always uniquely
    determine the results of a manipulation

10
Causation
  • We will infer average causal effects.
  • We will not consider quantities such as
    probability of necessity, probability of
    sufficiency, or the counterfactual probability
    that I would get a headache conditional on taking
    an aspirin, given that I did not take an aspirin
  • The causal relations are between properties of a
    unit at a time, not between events.
  • Each unit is assumed to be causally isolated.
  • The causal relations may be genuinely
    indeterministic, or only apparently
    indeterministic.

11
Causal DAGs
  • Probabilistic Interpretation of DAGs
  • A DAG represents a distribution P when each
    variable is independent of its non-descendants
    conditional on its parents in the DAG
  • Causal Interpretation of DAGs
  • There is a directed edge from A to B (relative to
    V) when A is a direct cause of B.
  • An acyclic graph is not a representation of
    reversible or feedback processes

12
Conditioning
  • Conditioning maps a probability distribution and
    an event into a new probability distribution
  • f(P(V),e) ? P(V), where P(Vv) P(Vv)/P(e)

13
Manipulating
  • A manipulation maps a population joint
    probability distribution, a causal DAG, and a set
    of new probability distributions for a set of
    variables, into a new joint distribution
  • Manipulating for X1,,Xn ? V
  • f P(V), population distribution
  • G, causal DAG
  • P(X1Non-Descendants(G,X1)),,
    manipulated variables
  • P(XnNon-Descendants(G,Xn))
  • ?
  • P(V) manipulated distribution
  • (assumption that manipulations are
    independent)

14
Manipulation Notation - Adapting Lauritzen
  • The distribution of Lung Cancer given the
    manipulated distribution of Smoking
  • P(Lung CancerP(Smoking))
  • The distribution of Lung Cancer conditional on
    Radon given the manipulated distribution of
    Smoking
  • P(Lung CancerRadonP(Smoking))
  • P(Lung Cancer,RadonP(Smoking))/
    P(RadonP(Smoking))
  • First manipulate, then condition

15
Ideal Manipulations
  • No fat hand
  • Effectiveness
  • Whether or not any actual action is an ideal
    manipulation of a variable Z is not part of the
    theory - it is input to the theory.
  • With respect to a system of variables containing
    murder rates, outlawing cocaine is not an ideal
    manipulation of cocaine usage
  • It is not entirely effective - people still use
    cocaine
  • It affects murder rates directly, not via its
    effect on cocaine usage, because of increased
    gang warfare

16
3 Representations of Manipulations
  • Structural Equation
  • Policy Variable
  • Potential Outcomes

17
College Plans
  • Sewell and Shah (1968) studied five variables
    from a sample of 10,318 Wisconsin high school
    seniors.
  • SEX male 0, female 1
  • IQ Intelligence Quotient, lowest 0, highest
    3
  • CP college plans yes 0, no 1
  • PE parental encouragement low 0, high 1
  • SES socioeconomic status lowest 0, highest
    3

18
College Plans - A Hypothesis
SES SEX PE
CP IQ
19
Equational Representation
  • xi f(pai(G), ei)
  • If the ei are causes of two or more variables,
    they must be included in the analysis
  • There is a distribution over the ei
  • The equations and the distribution over the ei
    determine a distribution over the xi
  • When manipulating variable to a value, replace
    with xi c

20
Policy Variable Representation
  • P(PE,SES,SEX,IQ,CPpolicy off)
  • P(PE1policy on) 1
  • P(SES,SEX,IQ,CP,PE1policyon)
  • P(CPPEpolicy on)
  • P(PE,SES,SEX,IQ,CP)
  • Suppose P(PE1)1
  • P(SES,SEX,IQ,CP,PE1P(PE))
  • P(CPPEP(PE))

Pre-manipulation
Post-manipulation
21
From DAG to Effects of Manipulation
  • Effect of Manipulation
  • Causal DAGs Background Knowledge
  • Causal
    Axioms, Prior
  • Population Distribution
  • Sampling and
    Distributional
  • Sample Assumptions, Prior

22
Causal Sufficiency
  • A set of variables is causally sufficient if
    every cause of two variables in the set is also
    in the set.
  • PE,CP,SES is causally sufficient
  • IQ,CP,SES is not causally sufficient.

23
Causal Markov Assumption
  • For a causally sufficient set of variables, the
    joint distribution is the product of each
    variable conditional on its parents in the causal
    DAG.
  • P(SES,SEX,PE,CP,IQ) P(SES)P(SEX)P(IQSES)P(PESE
    S,SEX,IQ)P(CPPE)

24
Equivalent Forms of Causal Markov Assumption
  • In the population distribution, each variable is
    independent of its non-descendants in the causal
    DAG (non-effects) conditional on its parents
    (immediate causes).
  • If X is d-separated from Y conditional on Z
    (written as ltX,YZgt) in the causal graph, then X
    is independent of Y conditional on Z in the
    population distribution) denoted I(X,YZ)).

25
Causal Markov Assumption
  • Causal Markov implies that if X is d-separated
    from Y conditional on Z in the causal DAG, then X
    is independent of Y conditional on Z.
  • Causal Markov is equivalent to assuming that the
    causal DAG represents the population
    distribution.
  • What would a failure of Causal Markov look like?
    If X and Y are dependent, but X does not cause Y,
    Y does not cause X, and no variable Z causes both
    X and Y.

26
Causal Markov Assumption
  • Assumes that no unit in the population affects
    other units in the population
  • If the natural units do affect each other, the
    units should be re-defined to be aggregations of
    units that dont affect each other
  • For example, individual people might be
    aggregated into families
  • Assumes variables are not logically related, e.g.
    x and x2
  • Assumes no feedback

27
Manipulation Theorem - No Hidden Variables
  • P(PE,SES,SEX,CP,IQP(PE))
  • P(PE)P(SEX)P(CPPE,SES,IQ)P(IQSES)P(PEpolicyon)
  • P(PE)P(SEX)P(CPPE,SES,IQ)P(IQSES)P(PE)

SES
SES
Policy
SEX
PE
CP
IQ
28
Invariance
  • Note that P(CPPE,SES,IQ,policy on)
    P(CPPE,SES,IQ,policy off) because the policy
    variable is d-separated from CP conditional on
    PE,SES,IQ
  • We say that P(CPPE,SES,IQ) is invariant
  • An invariant quantity can be estimated from the
    pre-manipulation distribution
  • This is equivalent to one of the rules of the Do
    Calculus and can also be applied to latent
    variable models

IQ
29
Calculating Effects
IQ
30
From Sample to Sets of DAGs
  • Effect of Manipulation
  • Causal DAGs Background Knowledge
  • Causal
    Axioms, Prior
  • Population Distribution
  • Sampling and
    Distributional
  • Sample Assumptions, Prior

31
From Sample to Population to DAGs
  • Constraint - Based
  • Uses tests of conditional independence
  • Goal Find set of DAGs whose d-separation
    relations match most closely the results of
    conditional independenc tests
  • Score - Based
  • Uses scores such as Bayesian Information
    Criterion or Bayesian posterior
  • Goal Maximize score

32
Two Kinds Of Search
Constraint Score
Use non conditional independence information No Yes
Quantitative comparison of models No Yes
Single test result leads astray Yes No
Easy to apply to latent Yes No
33
Bayesian Information Criterion
  • D is the sample data
  • G is a DAG
  • is the vector of maximum likelihood
    estimates of the parameters for DAG G
  • N is the sample size
  • d is the dimensionality of the model, which in
    DAGs without latent variables is simply the
    number of free parameters in the model

34
3 Kinds of Alternative Causal Models
SES
SES
SES
SES
SEX
PE
CP
SEX
PE
CP
IQ
IQ
True Model Alternative 1
SES
SES
SES
SES
SEX
PE
SEX
CP
PE
CP
IQ
IQ
Alternative 3 Alternative 2
35
Alternative Causal Models
SES
SES
SES
SES
SEX
PE
CP
SEX
PE
CP
IQ
IQ
True Model Alternative 1
  • Constraint - Based Alternative 1 violates Causal
    Markov Assumption by entailing that SES and IQ
    are independent
  • Score - Based Use a score that prefers a model
    that contains the true distribution over one that
    does not.

36
Alternative Causal Models
SES
SES
SES
SES
SEX
PE
CP
SEX
PE
CP
IQ
IQ
True Model Alternative 2
  • Constraint - Based Assume that if Sex and CP are
    independent (conditional on some subset of
    variables such as PE, SES, and IQ) then Sex and
    CP are adjacent - Causal Adjacency Faithfulness
    Assumption.
  • Score - Based Use a score such that if two
    models contain the true distribution, choose the
    one with fewer parameters. The True Model has
    fewer parameters.

37
Both Assumptions Can Be False
Independence holds only for parameters on lower
dimensional surface - Lebesgue measure 0
Independence holds for all values of parameters
Alternative 2
True Model
38
When Not to Assume Faithfulness
  • Deterministic relationships between variables
    entail extra conditional independence
    relations, in addition to those entailed by the
    global directed Markov condition.
  • If A ? B ? C, and B A, and C B, then not only
    I(A,CB), which is entailed by the global
    directed Markov condition, but also I(B,CA),
    which is not.
  • The deterministic relations are theoretically
    detectible, and when present, faithfulness should
    not be assumed.
  • Do not assume in feedback systems in equilibrium.

39
Alternative Causal Models
SES
SES
SES
SES
SEX
PE
SEX
CP
PE
CP
IQ
IQ
True Model Alternative 3
  • Constraint - Based Alternative 2 entails the
    same set of conditional independence relations -
    there is no principled way to choose.

40
Alternative Causal Models
SES
SES
SES
SES
SEX
PE
SEX
CP
PE
CP
IQ
IQ
True Model Alternative 2
  • Score - Based Whether or not one can choose
    depends upon the parametric family.
  • For unrestricted discrete, or linear Gaussian,
    there is no way to choose - the BIC scores will
    be the same.
  • For linear non-Gaussian, the True Model will be
    preferred (because while the two models entail
    the same second order moments, they entail
    different fourth order moments.)

41
Patterns
  • A pattern (or p-dag) represents a set of DAGs
    that all have the same d-separation relations,
    i.e. a d-separation equivalence class of DAGs.
  • The adjacencies in a pattern are the same as the
    adjacencies in each DAG in the d-separation
    equivalence class.
  • An edge is oriented as A ? B in the pattern if it
    is oriented as A ? B in every DAG in the
    equivalence class.
  • An edge is oriented as A ? B in the pattern if
    the edge is oriented as A ? B in some DAGs in the
    equivalence class, and as A ? B in other DAGs in
    the equivalence class.

42
Patterns to Graphs
  • All of the DAGs in a d-separation equivalence
    class can be derived from the pattern that
    represents the d-separation equivalence class by
    orienting the unoriented edges in the pattern.
  • Every orientation of the unoriented edges is
    acceptable as long as it creates no new
    unshielded colliders.
  • That is A ? B ? C can be oriented as A ? B? C, A
    ? B ? C, or A ? B ? C, but not as A ? B ? C.

43
Patterns
SES
SES
SEX
PE
CP
IQ
D-separation Equivalence Class
SES
SES
SEX
PE
CP
IQ
Pattern
44
Search Methods
  • Constraint Based
  • PC (correct in limit)
  • Variants of PC (correct in limit, better on small
    sample sizes)
  • Score - Based
  • Greedy hill climbing
  • Simulated annealing
  • Genetic algorithms
  • Greedy Equivalence Search (correct in limit)

45
From Sets of DAGs to Effects of Manipulation
  • Effect of Manipulation
  • Causal DAGs Background Knowledge
  • Causal
    Axioms, Prior
  • Population Distribution
  • Sampling and
    Distributional
  • Sample Assumptions, Prior

46
Causal Inference in Patterns
  • Is P(IQ) invariant when SES is manipulated to a
    constant? Cant tell.
  • If SES ? IQ, then policy is d-connected to IQ
    given empty set - no invariance.
  • If SES ? IQ, then policy is not d-connected to IQ
    given empty set - invariance.

SES
SES
?
policy
SEX
PE
CP
IQ
47
Causal Inference in Patterns
  • Different DAGs represented by pattern give
    different answers as to the effect of
    manipulating SES on IQ - not identifiable.
  • In these cases, should ouput cant tell.
  • Note the difference from using Bayesian networks
    for classification - we can use either DAG
    equally well for correct classification, but we
    have to know which one is true for correct
    inference about the effect of a manipulation.

SES
SES
?
policy
SEX
PE
CP
IQ
48
Causal Inference in Patterns
  • Is P(CPPE,SES,IQ) invariant when PE is
    manipulated to a constant? Can tell.
  • policy variable is d-separated from CP given PE,
    SES, IQ regardless of which way the edge points -
    invariance in every DAG represented by the
    pattern.

SES
SES
?
SEX
PE
CP
policy
IQ
49
College Plans
not invariant, but is identifiable
invariant
50
Good News
In the large sample limit, there are algorithms
(PC, Greedy Equivalence Search) that are
arbitrarily close to correct (or output cant
tell) with probability 1 (pointwise consistency).
  • Effect of Manipulation
  • Causal DAGs Background Knowledge
  • Causal
    Axioms, Prior
  • Population Distribution
  • Sampling and
    Distributional
  • Sample Assumptions, Prior

51
Bad News
At every finite sample size, every method will be
far from truth with high probability for some
values of the truth (no uniform consistency.)
(Typically not true of classification problems.)
  • Effect of Manipulation
  • Causal DAGs Background Knowledge
  • Causal
    Axioms, Prior
  • Population Distribution
  • Sampling and
    Distributional
  • Sample Assumptions, Prior

52
Why Bad News?
The problem - small differences in population
distribution can lead to big changes in inference
to causal DAGs.
  • Effect of Manipulation
  • Causal DAGs Background Knowledge
  • Causal
    Axioms, Prior
  • Population Distribution
  • Sampling and
    Distributional
  • Sample Assumptions, Prior

53
Strengthening Faithfulness Assumption
  • Strong versus weak
  • Weak adjacency faithfulness assumes a zero
    conditional dependence between X and Y entails a
    zero-strength edge between X and Y
  • Strong adjacency faithfulness assumes in addition
    that a weak conditional dependence between X and
    Y entails a weak-strength edge between X and Y
  • Under this assumption, there are uniform
    consistent estimators of the effects of
    manipulations.

54
Obstacles to Causal Inference from
Non-experimental Data
  • unmeasured confounders
  • measurement error, or discretization of data
  • mixtures of different causal structures in the
    sample
  • feedback
  • reversibility
  • the existence of a number of models that fit the
    data equally well
  • an enormous search space
  • low power of tests of independence conditional on
    large sets of variables
  • selection bias
  • missing values
  • sampling error
  • complicated and dense causal relations among sets
    of variables,
  • complcated probability distributions

55
From Data to Sets of DAGs - Possible Hidden
Variables
  • Effect of Manipulation
  • Causal DAGs Background Knowledge
  • Causal
    Axioms, Prior
  • Population Distribution
  • Sampling and
    Distributional
  • Sample Assumptions, Prior

56
Why Latent Variable Models?
  • For classification problems, introducing latent
    variables can help get closer to the right answer
    at smaller sample sizes - but they are needed to
    get the right answer in the limit.
  • For causal inference problems, introducing latent
    variables are needed to get the right answer in
    the limit.

57
Score-Based Search Over Latent Models
  • Structural EM interleaves estimation of
    parameters with structural search
  • Can also search over latent variable models by
    calculating posteriors
  • But there are substantial computational and
    statistical problems with latent variable models

58
DAG Models with Latent Variables
  • Facilitates construction of causal models
  • Provides a finite search space
  • Nice statistical properties
  • Always identified
  • Correspond to a set of distributions
    characterized by independence relations
  • Have a well-defined dimension
  • Asymptotic existence of ML estimates

59
Solution
  • Embed each latent variable model in a larger
    model without latent variables that is easier to
    characterize.
  • Disadvantage - uses only conditional independence
    information in the distribution.

Latent variable model
Model imposing only independence constraints on
observed variables
Sets of distributions
60
Alternative Hypothesis and Some D-separations
SES
SEX PE
CP L1
L2 IQ
ltL2,SES,L1,SEX, PE?gt ltSEX,L1,SES,L2,IQ?gt ltL1
,SES,L2,SEX?gt ltSEX,CPPE,SES) These entail
conditional independence relations in population.
ltCP,IQ,L1,SEXL2,PE,SESgt ltPE,IQ,L2L1,SEX,
SESgt ltIQ,SEX,PE,CPL1,L2,SESgt ltSES,SEX,IQ,L1
,L2?gt
61
D-separations Among Observed
SES
SEX PE
CP L1
L2 IQ
ltL2,SES,L1,SEX, PE?gt ltSEX,L1,SES,L2,IQ?gt ltL1
,SES,L2,SEX?gt ltSEX,CPPE,SES)
ltCP,IQ,L1,SEXL2,PE,SESgt ltPE,IQ,L2L1,SEX,
SESgt ltIQ,SEX,PE,CPL1,L2,SESgt ltSES,SEX,IQ,L1
,L2?gt
62
D-separations Among Observed
SES
SEX PE
CP L1
L2 IQ
It can be shown that no DAG with just the
measured variables has exactly the set of
d-separation relations among the observed
variables. In this sense, DAGs are not closed
under marginalization.
63
Mixed Ancestral Graphs
  • Under a natural extension of the concept of
    d-separation to graphs with ?, MAG(G) is a
    graphical object that contains only the observed
    variables, and has exactly the d-separations
    among the observed variables.

SES
SEX PE
CP IQ
SES
SEX PE
CP IQ
L1
L2
Latent Variable DAG Corresponding MAG
64
Mixed Ancestral Graph Construction
  • There is an edge between A and B if and only if
    for every ltA,BCgt, there is a latent variable
    in C.
  • If A and B are adjacent, then A ? B if and only
    if A is an ancestor of B.
  • If A and B are adjacent, then A ? B if and only
    if A is not an ancestor of B and B is not an
    ancestor of A.

65
Suppose SES Unmeasured

SEX PE
CP IQ
SES
SEX PE
CP IQ
L1
L2
DAG
Corresponding MAG

SEX PE
CP IQ
Another DAG with the same MAG
L1
L2
66
Mixed Ancestral Models
  • Can score and evaluate in the usual ways
  • Not every parameter is directly interpreted as a
    structural (causal) coefficient
  • Not every part of marginal manipulated model can
    be predicted from mixed ancestral graph
  • Because multiple DAGs can have the same MAG, they
    might not all agree on the effect of a
    manipulation.
  • It is possible to tell from the MAG when all of
    the DAGs with that MAG all agree on the effect of
    a manipulation.

67
Mixed Ancestral Graph
  • Mixed ancestral models are closed under
    marginalization.
  • In the linear normal case, the parameterization
    of a MAG is just a special case of the
    parameterization of a linear structural equation
    model.
  • There is a maximum liklihood estimator of the
    parameters (Drton).
  • The BIC score is easy to calculate.
  • In the discrete case, it is not known how to
    parameterize a MAG - some progress has been made.

68
Some Markov Equivalent Mixed Ancestral Graphs
SEX PE
CP IQ
SEX PE
CP IQ
SEX PE
CP IQ
SEX PE
CP IQ
These different MAGs all have the same
d-separation relations.
69
Partial Ancestral Graphs
SEX PE
CP IQ
SEX PE
CP IQ

SEX PE
CP IQ
o
o
o
o
SEX PE
CP IQ
SEX PE
CP IQ
o
o
Partial Ancestral Graph
70
Partial Ancestral Graph represents MAG M
  • A is adjacent to B iff A and B are adjacent in M.
  • A ? B iff A is an ancestor of B in every MAG
    d-separation equivalent to M.
  • A ? B iff A and B are not ancestors of each other
    in every MAG d-separation equivalent to M.
  • A o? B iff B is not an ancestor of A in every MAG
    d-separation equivalent to M, and A is an
    ancestor of B in some MAGs d-separation
    equivalent to M, but not in others.
  • A o?o B iff A is an ancestor of B in some MAGs
    d-separation equivalent to M, but not in others,
    and B is an ancestor of A in some MAGs
    d-separation equivalent to M, but not in others.

71
Partial Ancestral Graph
  • Partial Ancestral Graph
  • represents ancestor features common to MAGs that
    are d-separation equivalent
  • d-separation relations in the d-separation
    equivalence class of MAGs.
  • Can be parameterized by turning it into a mixed
    ancestral graph
  • Can be scored and evaluated like MAG

72
FCI Algorithm
  • In the large sample limit, with probability 1,
    the output is a PAG that represents the true
    graph over O
  • If the algorithm needs to test high order
    conditional independence relations then
  • Time consuming - worst case number of
    conditional independence tests (complete PAG)
  • Unreliable (low power of tests)
  • Modified versions can halt at any given order of
    conditional independence test, at the cost of
    more Cant tell answers.
  • Not useful information when each pair of
    variables have common hidden cause.
  • There is a provably correct score-based search,
    but it outputs cant tell in most cases

73
Output for College Plans
SES
SEX PE
CP IQ
SES
SEX PE
CP IQ
o
o
o
o
o
o
Output of FCI Algorithm PAG
Corresponding to Output of PC
Algorithm These are different because no DAG can
represent the d-separations in the output of the
FCI algorithm.
74
From Sets of DAGs to Effects of Manipultions -
May Be Hidden Common Causes
  • Effect of Manipulation
  • Causal DAGs Background Knowledge
  • Causal
    Axioms, Prior
  • Population Distribution
  • Sampling and
    Distributional
  • Sample Assumptions, Prior

75
Manipulation Model for PAGs
  • A PAG can be used to calculate the results of
    manipulations for which every DAG represented by
    the PAG gives the same answer.
  • It is possible to tell from the PAG that the
    policy variable for PE is d-separated from CP
    given PE. Hence P(CPPE) is invariant.

SES
SEX PE
CP IQ
o
o
76
Comparison with non-latent case
  • FCI
  • P(cppeP(PE)) P(cppe).
  • P(CP0PE0P(PE)) .063
  • P(CP1PE0P(PE)) .937
  • P(CP0PE1P(PE)) .572
  • P(CP1PE1P(PE)) .428
  • PC
  • P(CP0PE0P(PE)) .095
  • P(CP1PE0P(PE)) .905
  • P(CP0PE1P(PE)) .484
  • P(CP1PE1P(PE)) .516

77
Good News
In the large sample limit, there is an algorithm
(FCI) whose output is arbitrarily close to
correct (or output cant tell) with probability
1 (pointwise consistency).
  • Effect of Manipulation
  • Causal DAGs Background Knowledge
  • Causal
    Axioms, Prior
  • Population Distribution
  • Sampling and
    Distributional
  • Sample Assumptions, Prior

78
Bad News
At every finite sample size, every method will be
arbitrarily far from truth with high probability
for some values of the truth (no uniform
consistency.)
  • Effect of Manipulation
  • Causal DAGs Background Knowledge
  • Causal
    Axioms, Prior
  • Population Distribution
  • Sampling and
    Distributional
  • Sample Assumptions, Prior

79
Other Constraints
  • The disadvantage of using MAGs or FCI is they
    only use conditional independence information
  • In the case of latent variable models, there are
    constraints implied on the observed margin that
    are not conditional independence relations,
    regardless of the family of distributions
  • These can be used to choose between two different
    latent variable models that have the same
    d-separation relations over the observed
    variables
  • In addition, there are constraints implied on the
    observed margin that are particular to a family
    of distributions

80
Examples of Open Questions
  • Complete non-parametric manipulation calculations
    for partially known DAGs with latent variables
  • Define strong faithfulness for the latent case.
  • Calculating constraints (non-parametric or
    parametric) from latent variable DAGs
  • Using constraints (non-parametric or parametric)
    to guide search for latent variable DAGs
  • Latent variable score-based search over PAGs
  • Parameterizations of MAGs for other families of
    distsributions
  • Completeness of do-calculus for PAGs
  • Time series inference

81
Introductory Books on Graphical Causal Inference
  • Causation, Prediction, and Search, by P. Spirtes,
    C. Glymour, R. Scheines, MIT Press, 2000.
  • Causality Models, Reasoning, and Inference by J.
    Pearl, Cambridge University Press, 2000.
  • Computation, Causation, and Discovery (Paperback)
    , ed. by C. Glymour and G. Cooper, AAAI Press,
    1999.
Write a Comment
User Comments (0)
About PowerShow.com