Title: Part II: Graphical models
1Part II Graphical models
2Challenges of probabilistic models
- Specifying well-defined probabilistic models with
many variables is hard (for modelers) - Representing probability distributions over those
variables is hard (for computers/learners) - Computing quantities using those distributions is
hard (for computers/learners)
3Representing structured distributions
- Four random variables
- X1 coin toss produces heads
- X2 pencil levitates
- X3 friend has psychic powers
- X4 friend has two-headed coin
Domain 0,1 0,1 0,1 0,1
4Joint distribution
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001
1010 1011 1100 1101 1110 1111
- Requires 15 numbers to specify probability of all
values x1,x2,x3,x4 - N binary variables, 2N-1 numbers
- Similar cost when computing conditional
probabilities
5How can we use fewer numbers?
- Four random variables
- X1 coin toss produces heads
- X2 coin toss produces heads
- X3 coin toss produces heads
- X4 coin toss produces heads
Domain 0,1 0,1 0,1 0,1
6Statistical independence
- Two random variables X1 and X2 are independent if
P(x1x2) P(x1) - e.g. coinflips P(x1Hx2H) P(x1H) 0.5
- Independence makes it easier to represent and
work with probability distributions - We can exploit the product rule
If x1, x2, x3, and x4 are all independent
7Expressing independence
- Statistical independence is the key to efficient
probabilistic representation and computation - This has led to the development of languages for
indicating dependencies among variables - Some of the most popular languages are based on
graphical models
8Part II Graphical models
- Introduction to graphical models
- representation and inference
- Causal graphical models
- causality
- learning about causal relationships
- Graphical models and cognitive science
- uses of graphical models
- an example causal induction
9Part II Graphical models
- Introduction to graphical models
- representation and inference
- Causal graphical models
- causality
- learning about causal relationships
- Graphical models and cognitive science
- uses of graphical models
- an example causal induction
10Graphical models
- Express the probabilistic dependency structure
among a set of variables (Pearl, 1988) - Consist of
- a set of nodes, corresponding to variables
- a set of edges, indicating dependency
- a set of functions defined on the graph that
specify a probability distribution
11Undirected graphical models
X3
X4
X1
- Consist of
- a set of nodes
- a set of edges
- a potential for each clique, multiplied together
to yield the distribution over variables - Examples
- statistical physics Ising model, spinglasses
- early neural networks (e.g. Boltzmann machines)
X2
X5
12Directed graphical models
X3
X4
X1
- Consist of
- a set of nodes
- a set of edges
- a conditional probability distribution for each
node, conditioned on its parents, multiplied
together to yield the distribution over variables - Constrained to directed acyclic graphs (DAGs)
- Called Bayesian networks or Bayes nets
X2
X5
13Bayesian networks and Bayes
- Two different problems
- Bayesian statistics is a method of inference
- Bayesian networks are a form of representation
- There is no necessary connection
- many users of Bayesian networks rely upon
frequentist statistical methods - many Bayesian inferences cannot be easily
represented using Bayesian networks
14Properties of Bayesian networks
- Efficient representation and inference
- exploiting dependency structure makes it easier
to represent and compute with probabilities - Explaining away
- pattern of probabilistic reasoning characteristic
of Bayesian networks, especially early use in AI
15Properties of Bayesian networks
- Efficient representation and inference
- exploiting dependency structure makes it easier
to represent and compute with probabilities - Explaining away
- pattern of probabilistic reasoning characteristic
of Bayesian networks, especially early use in AI
16Efficient representation and inference
- Four random variables
- X1 coin toss produces heads
- X2 pencil levitates
- X3 friend has psychic powers
- X4 friend has two-headed coin
17The Markov assumption
- Every node is conditionally independent of its
non-descendants, given its parents
where Pa(Xi) is the set of parents of Xi
(via the product rule)
18Efficient representation and inference
- Four random variables
- X1 coin toss produces heads
- X2 pencil levitates
- X3 friend has psychic powers
- X4 friend has two-headed coin
1
1
4
2
total 7 (vs 15)
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
19Reading a Bayesian network
- The structure of a Bayes net can be read as the
generative process behind a distribution - Gives the joint probability distribution over
variables obtained by sampling each variable
conditioned on its parents
20Reading a Bayesian network
- Four random variables
- X1 coin toss produces heads
- X2 pencil levitates
- X3 friend has psychic powers
- X4 friend has two-headed coin
X3
X4
X1
X2
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
21Reading a Bayesian network
- The structure of a Bayes net can be read as the
generative process behind a distribution - Gives the joint probability distribution over
variables obtained by sampling each variable
conditioned on its parents - Simple rules for determining whether two
variables are dependent or independent - Independence makes inference more efficient
22Computing with Bayes nets
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
23Computing with Bayes nets
sum over 8 values
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
24Computing with Bayes nets
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
25Computing with Bayes nets
sum over 4 values
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
26Computing with Bayes nets
- Inference algorithms for Bayesian networks
exploit dependency structure - Message-passing algorithms
- belief propagation passes simple messages
between nodes, exact for tree-structured networks - More general inference algorithms
- exact junction-tree
- approximate Monte Carlo schemes (see Part IV)
27Properties of Bayesian networks
- Efficient representation and inference
- exploiting dependency structure makes it easier
to represent and compute with probabilities - Explaining away
- pattern of probabilistic reasoning characteristic
of Bayesian networks, especially early use in AI
28Explaining away
- Assume grass will be wet if and only if it
rained last night, or if the sprinklers were left
on
29Explaining away
Compute probability it rained last night, given
that the grass is wet
30Explaining away
Compute probability it rained last night, given
that the grass is wet
31Explaining away
Compute probability it rained last night, given
that the grass is wet
32Explaining away
Compute probability it rained last night, given
that the grass is wet
33Explaining away
Compute probability it rained last night, given
that the grass is wet
34Explaining away
Compute probability it rained last night, given
that the grass is wet and sprinklers were left
on
35Explaining away
Compute probability it rained last night, given
that the grass is wet and sprinklers were left
on
36Explaining away
Discounting to prior probability.
37Contrast w/ production system
Rain
Grass Wet
- Formulate IF-THEN rules
- IF Rain THEN Wet
- IF Wet THEN Rain
- Rules do not distinguish directions of inference
- Requires combinatorial explosion of rules
38Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet
- Observing rain, Wet becomes more active.
- Observing grass wet, Rain and Sprinkler become
more active - Observing grass wet and sprinkler, Rain cannot
become less active. No explaining away!
- Excitatory links Rain Wet, Sprinkler
Wet
39Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet
- Excitatory links Rain Wet, Sprinkler
Wet - Inhibitory link Rain Sprinkler
- Observing grass wet, Rain and Sprinkler become
more active - Observing grass wet and sprinkler, Rain becomes
less active explaining away
40Contrast w/ spreading activation
Rain
Burst pipe
Sprinkler
Grass Wet
- Each new variable requires more inhibitory
connections - Not modular
- whether a connection exists depends on what
others exist - big holism problem
- combinatorial explosion
41Contrast w/ spreading activation
(McClelland Rumelhart, 1981)
42Graphical models
- Capture dependency structure in distributions
- Provide an efficient means of representing and
reasoning with probabilities - Allow kinds of inference that are problematic for
other representations explaining away - hard to capture in a production system
- more natural than with spreading activation
43Part II Graphical models
- Introduction to graphical models
- representation and inference
- Causal graphical models
- causality
- learning about causal relationships
- Graphical models and cognitive science
- uses of graphical models
- an example causal induction
44Causal graphical models
- Graphical models represent statistical
dependencies among variables (ie. correlations) - can answer questions about observations
- Causal graphical models represent causal
dependencies among variables (Pearl,
2000) - express underlying causal structure
- can answer questions about both observations and
interventions (actions upon a variable)
45Bayesian networks
Nodes variables Links dependency Each node has
a conditional probability distribution Data
observations of x1, ..., x4
- Four random variables
- X1 coin toss produces heads
- X2 pencil levitates
- X3 friend has psychic powers
- X4 friend has two-headed coin
46Causal Bayesian networks
Nodes variables Links causality Each node has
a conditional probability distribution Data
observations of and interventions on x1, ..., x4
- Four random variables
- X1 coin toss produces heads
- X2 pencil levitates
- X3 friend has psychic powers
- X4 friend has two-headed coin
47Interventions
- Four random variables
- X1 coin toss produces heads
- X2 pencil levitates
- X3 friend has psychic powers
- X4 friend has two-headed coin
48Learning causal graphical models
- Strength how strong is a relationship?
- Structure does a relationship exist?
49Causal structure vs. causal strength
- Strength how strong is a relationship?
B
B
50Causal structure vs. causal strength
- Strength how strong is a relationship?
- requires defining nature of relationship
B
B
51Parameterization
- Structures h1 h0
-
- Parameterization
C
B
C
B
E
E
C
B
h1 P(E 1 C, B)
h0 P(E 1 C, B)
0 0 1 0 0 1 1 1
52Parameterization
- Structures h1 h0
-
- Parameterization
C
B
C
B
E
E
C
B
h1 P(E 1 C, B)
h0 P(E 1 C, B)
0 0 1 0 0 1 1 1
53Parameterization
- Structures h1 h0
-
- Parameterization
C
B
C
B
E
E
C
B
h1 P(E 1 C, B)
h0 P(E 1 C, B)
0 0 1 0 0 1 1 1
54Parameter estimation
- Maximum likelihood estimation
- maximize ?i P(bi,ci,ei w0, w1)
- Bayesian methods as in Part I
55Causal structure vs. causal strength
- Structure does a relationship exist?
B
B
56Approaches to structure learning
- Constraint-based
- dependency from statistical tests (eg. ?2)
- deduce structure from dependencies
C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
57Approaches to structure learning
- Constraint-based
- dependency from statistical tests (eg. ?2)
- deduce structure from dependencies
C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
58Approaches to structure learning
- Constraint-based
- dependency from statistical tests (eg. ?2)
- deduce structure from dependencies
C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
59Approaches to structure learning
- Constraint-based
- dependency from statistical tests (eg. ?2)
- deduce structure from dependencies
C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
Attempts to reduce inductive problem to deductive
problem
60Approaches to structure learning
- Constraint-based
- dependency from statistical tests (eg. ?2)
- deduce structure from dependencies
C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
- Bayesian
- compute posterior
- probability of structures,
- given observed data
C
B
C
B
E
E
P(h1data)
P(h0data)
P(hdata) ? P(datah) P(h)
(Heckerman, 1998 Friedman, 1999)
61Bayesian Occams Razor
h0 (no relationship)
P(d h )
h1 (relationship)
All possible data sets d
For any model h,
62Causal graphical models
- Extend graphical models to deal with
interventions as well as observations - Respecting the direction of causality results in
efficient representation and inference - Two steps in learning causal models
- strength parameter estimation
- structure structure learning
63Part II Graphical models
- Introduction to graphical models
- representation and inference
- Causal graphical models
- causality
- learning about causal relationships
- Graphical models and cognitive science
- uses of graphical models
- an example causal induction
64Uses of graphical models
- Understanding existing cognitive models
- e.g., neural network models
- Representation and reasoning
- a way to address holism in induction (c.f. Fodor)
- Defining generative models
- mixture models, language models (see Part IV)
- Modeling human causal reasoning
65Human causal reasoning
- How do people reason about interventions?
- (Gopnik, Glymour, Sobel, Schulz, Kushnir Danks,
2004 Lagnado Sloman, 2004 Sloman Lagnado,
2005 Steyvers, Tenenbaum, Wagenmakers Blum,
2003) - How do people learn about causal relationships?
- parameter estimation (Shanks, 1995
Cheng, 1997) - constraint-based models
(Glymour, 2001) - Bayesian structure learning
- (Steyvers et al., 2003 Griffiths Tenenbaum,
2005)
66Causation from contingencies
C present (c)
C absent (c-)
a
c
E present (e)
d
b
E absent (e-)
Does C cause E? (rate on a scale from 0 to 100)
67Two models of causal judgment
- Delta-P (Jenkins Ward, 1965)
- Power PC (Cheng, 1997)
Power
68Buehner and Cheng (1997)
People
DP
Power
69Buehner and Cheng (1997)
People
DP
Power
Constant ?P, changing judgments
70Buehner and Cheng (1997)
People
DP
Power
Constant causal power, changing judgments
71Buehner and Cheng (1997)
People
DP
Power
?P 0, changing judgments
72Causal structure vs. causal strength
- Strength how strong is a relationship?
- Structure does a relationship exist?
B
B
73Causal strength
- Assume structure
- DP and causal power are maximum likelihood
estimates of the strength parameter w1, under
different parameterizations for P(EB,C)
- linear ? DP, Noisy-OR ? causal power
B
74Causal structure
- Hypotheses h1 h0
-
- Bayesian causal inference
- support
B
B
P(dh1)
likelihood ratio (Bayes factor) gives evidence in
favor of h1
P(dh0)
75Buehner and Cheng (1997)
People
DP (r 0.89)
Power (r 0.88)
Support (r 0.97)
76The importance of parameterization
- Noisy-OR incorporates mechanism assumptions
- generativity causes increase probability of
effects - each cause is sufficient to produce the effect
- causes act via independent mechanisms
- (Cheng, 1997)
- Consider other models
- statistical dependence ?2 test
- generic parameterization (cf. Anderson, 1990)
77People
Support (Noisy-OR)
?2
Support (generic)
78Generativity is essential
0/8 0/8
P(ec)
8/8 8/8
6/8 6/8
4/8 4/8
2/8 2/8
P(ec-)
100 50 0
Support
- Predictions result from ceiling effect
- ceiling effects only matter if you believe a
cause increases the probability of an effect
79Blicket detector (Dave Sobel, Alison Gopnik, and
colleagues)
80Backwards blocking (Sobel, Tenenbaum Gopnik,
2004)
A Trial
AB Trial
- Two objects A and B
- Trial 1 A B on detector detector active
- Trial 2 A on detector detector active
- 4-year-olds judge whether each object is a
blicket - A a blicket (100 say yes)
- B probably not a blicket (34 say yes)
81Possible hypotheses
B
A
E
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
E
E
E
E
E
E
E
E
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
E
E
E
E
E
E
E
E
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
E
E
E
E
E
E
E
E
82Bayesian inference
- Evaluating causal models in light of data
- Inferring a particular causal relation
83Bayesian inference
With a uniform prior on hypotheses, and the
generic parameterization
Probability of being a blicket
A
B
0.32
0.32
0.34
0.34
84Modeling backwards blocking
- Assume
- Links can only exist from blocks to detectors
- Blocks are blickets with prior probability q
- Blickets always activate detectors, but detectors
never activate on their own - deterministic Noisy-OR, with wi 1 and w0 0
85Modeling backwards blocking
P(h00) (1 q)2
P(h10) q(1 q)
P(h01) (1 q) q
P(h11) q2
B
A
B
A
B
A
B
A
E
E
E
E
P(E1 A0, B0) 0 0
0
0 P(E1 A1, B0) 0
0 1
1 P(E1 A0, B1) 0
1 0
1 P(E1 A1, B1)
0 1
1 1
86Modeling backwards blocking
P(h00) (1 q)2
P(h10) q(1 q)
P(h01) (1 q) q
P(h11) q2
B
A
B
A
B
A
B
A
E
E
E
E
P(E1 A1, B1) 0
1 1
1
87Modeling backwards blocking
P(h10) q(1 q)
P(h01) (1 q) q
P(h11) q2
B
A
B
A
B
A
E
E
E
P(E1 A1, B0) 0
1
1 P(E1 A1, B1)
1 1
1
88Manipulating prior probability(Tenenbaum, Sobel,
Griffiths, Gopnik, submitted)
A Trial
Initial
AB Trial
89Summary
- Graphical models provide solutions to many of the
challenges of probabilistic models - defining structured distributions
- representing distributions on many variables
- efficiently computing probabilities
- Causal graphical models provide tools for
defining rational models of human causal
reasoning and learning
90(No Transcript)