Title: Learning Causality
1Learning Causality
Some slides are from Judea Pearls class
lecture http//bayes.cs.ucla.edu/BOOK-2K/viewgraph
s.html
2A causal model Example
- Statement rain causes mud implies an asymmetric
relationship the rain will create mud, but the
mud will not create rain. - Use ? when refer such causal relationship
- There is no arrow between rain and other
causes of mud means that there is no direct
causal relationship between them
3Directed (causal) Graphs
- A and B are causally independent
- C, D, E, and F are causally dependent on A and B
- A and B are direct causes C
- A and B are indirect causes D, E and F
- If C is prevented from changing with A and B,
then A and B will no longer cause changes in D, E
and F.
4Conditional Independence
5Conditional Independence
6Conditional Independence (Notation)
7Causal Structure
8Causal Structure (contd)
- A Causal Structure serves as a blueprint for
forming a casual model a precise
specification of how each variable is influenced
by its parents in the DAG. - We assume that Nature is at liberty to impose
arbitrary functional relationships between each
effect and its causes and then to perturb these
relationships by introducing arbitrary
disturbance - These disturbances reflect hidden or
unmeasurable conditions.
9Causal Model
10Causal Model (Contd)
- Once a causal model M is formed, it defines a
joint probability distribution P(M) over the
variables in the system - This distribution reflects some features of the
causal structure - Each variable must be independent of its
grandparents, given the values of its parents - We may allowed to inspect a select subset O?V of
observed variables to ask questions about Po,
the probability distribution over the
observations - We may recover the topology D of the DAG, from
features of the probability distribution Po.
11Inferred Causation
12Latent Structure
13Structure Preference
14Structure Preference (Contd)
- The set of independencies entailed by a causal
structure imposes limits on its power to mimic
other structure - L1 cannot be preferred to L2 if there is even one
observable dependency that is permitted by L1 and
forbidden by L2 - L1 is preferred to L2 if L2 has subset of L1s
independence - Thus, test for preference and equivalence can
sometimes be reduced to test dependencies, which
can be determined by topology of the DAGs without
concerning parameters.
15Minimality
16Consistency
17Inferred Causation
18Examples
- a,b,c,d reveal two independencies
- a is independent of b
- d is independent of a,b given c
- Assume further that the data reveals no other
independencies - a having a cold
- b having hay fever
- c having to sneeze
- d having to wipe ones nose.
19Example (Contd)
- a,b,c,d reveal two independencies
- a is independent of b
- d is independent of a,b given c
Not minimal fails to impose conditional
Independence between d and a,b
Not consistent with data impose
marginal independence between d and a,b
20Stability
The stability condition states that, as we vary
the parmeters from ? to ??, no indpendence in P
can be destroyed. In other words, if the
independency exists, it will always exists.
21Stable distribution
- A probability distribution P is a faithful/stable
distribution if there exist a directed acyclic
graph (DAG) D such that the conditional
independence relationship in P is also shown in
the D, and vice versa.
22IC algorithm (Inductive Causation)
- IC algorithm (Pearl)
- Based on variable dependencies
- Find all pairs of variables that are dependent of
each other (applying standard statistical method
on the database) - Eliminate (as much as possible) indirect
dependencies - Determine directions of dependencies
23Comparing abduction, deduction and induction
A gt B A --------- B
- Deduction major premise All balls in the
box are black - minor premise
These balls are from the box - conclusion
These balls are black - Abduction rule All balls
in the box are black - observation
These balls are black - explanation These balls
are from the box - Induction case These
balls are from the box - observation
These balls are black - hypothesized rule All
ball in the box are black -
A gt B B ------------- Possibly A
Whenever A then B but not vice versa -------------
Possibly A gt B
Induction from specific cases to general
rules Abduction and deduction both from
part of a specific case to other part of
the case using general rules (in different ways)
Source from httpwww.csee.umbc.edu/ypeng/F02671/le
cture-notes/Ch15.ppt
24IC Algorithm (Contd)
- Input
- P a stable distribution on a set V of
variables - Output
- A pattern H(P) compatible with P
- Patten is a partially directed DAG
- some edges are directed and
- some edges are undirected
25IC Algorithm Step 1
- For each pair of variables a and b in V, search
for a set Sab such that (a-b Sab) holds in P
in other words, a and b should be independent in
P, conditioned on Sab . - Construct an undirected graph G such that
vertices a and b are connected with an edge if
and only if no set Sab can be found.
26IC Algorithm Step 2
- For each pair of nonadjacent variables a and b
with a common neighbor c, check if c? Sab. - If it is, then continue
- Else add arrowheads at c
- i.e a? c ? b
27Example
28IC Algorithm Step 3
- In the partially directed graph that results,
orient as many of the undirected edges as
possible subject to two conditions - The orientation should not create a new
v-structure - The orientation should not create a directed
cycle
29Rules required to obtaining a maximally oriented
pattern
- R1 Orient b c into b?c whenever there is an
arrow a?b such that a and c are non adjacent
30Rules required to obtaining a maximally oriented
pattern
- R2 Orient a b into a?b whenever there is a
chain a?c?b
31Rules required to obtaining a maximally oriented
pattern
- R3 Orient a b into a?b whenever there are two
chains ac?b and ad?b such that c and d are
nonadjacent
32Rules required to obtaining a maximally oriented
pattern
- R4 Orient a b into a?b whenever there are two
chains ac?d and c?d?b such that c and b are
nonadjacent
c
a
d
d
c
b
33IC Algorithm
- Input
- P, a sampled distribution
- Output
- core(P), a marked pattern
34Marked PatternFour types of edges
35IC Algorithm Step 1
- For each pair of variables a and b, search for a
set Sab such that a and b are independent in P,
conditioned on Sab. If there is no such Sab,
place an undirected link between the two
variables, a b.
36IC Algorithm Step 2
- For each pair of nonadjacent variables a and b
with a common neighbor c, check if c?Sab - If it is, then continue
- If it is not, then add arrow heads pointing at c
(i.e. a ? c ? b). - In the partially directed graph that results, add
(recursively) as many arrowheads as possible, and
mark as many edges as possible, according to the
following two rules
37IC Algorithm Rule 1
- R1 For each pair of non-adjacent nodes a and b
with a common neighbor c, if the link between a
and c has an arrow head into c and if the link
between c and b has no arrowhead into c, then add
an arrow head on the link between c and b
pointing at b and mark that link to obtain c ?
b
38IC Algorithm Rule 2
- R2 If a and b are adjacent and there is a
directed path (composed strictly of marked links)
from a to b, then add an arrowhead pointing
toward b on the link between a and b