Part II: Graphical models

About This Presentation

Title:

Part II: Graphical models

Description:

Specifying well-defined probabilistic models with many variables ... of probabilistic reasoning characteristic of Bayesian networks, especially early use in AI ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 91

Provided by: joshtenenb

Learn more at: http://cocosci.berkeley.edu

more less

Transcript and Presenter's Notes

Title: Part II: Graphical models

1
Part II Graphical models
2
Challenges of probabilistic models

Specifying well-defined probabilistic models with
many variables is hard (for modelers)
Representing probability distributions over those
variables is hard (for computers/learners)
Computing quantities using those distributions is
hard (for computers/learners)

3
Representing structured distributions

Four random variables
X1 coin toss produces heads
X2 pencil levitates
X3 friend has psychic powers
X4 friend has two-headed coin

Domain 0,1 0,1 0,1 0,1
4
Joint distribution
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001
1010 1011 1100 1101 1110 1111

Requires 15 numbers to specify probability of all
values x1,x2,x3,x4
N binary variables, 2N-1 numbers
Similar cost when computing conditional
probabilities

5
How can we use fewer numbers?

Four random variables
X1 coin toss produces heads
X2 coin toss produces heads
X3 coin toss produces heads
X4 coin toss produces heads

Domain 0,1 0,1 0,1 0,1
6
Statistical independence

Two random variables X1 and X2 are independent if
P(x1x2) P(x1)
e.g. coinflips P(x1Hx2H) P(x1H) 0.5
Independence makes it easier to represent and
work with probability distributions
We can exploit the product rule

If x1, x2, x3, and x4 are all independent
7
Expressing independence

Statistical independence is the key to efficient
probabilistic representation and computation
This has led to the development of languages for
indicating dependencies among variables
Some of the most popular languages are based on
graphical models

8
Part II Graphical models

Introduction to graphical models
representation and inference
Causal graphical models
causality
learning about causal relationships
Graphical models and cognitive science
uses of graphical models
an example causal induction

9
Part II Graphical models

Introduction to graphical models
representation and inference
Causal graphical models
causality
learning about causal relationships
Graphical models and cognitive science
uses of graphical models
an example causal induction

10
Graphical models

Express the probabilistic dependency structure
among a set of variables (Pearl, 1988)
Consist of
a set of nodes, corresponding to variables
a set of edges, indicating dependency
a set of functions defined on the graph that
specify a probability distribution

11
Undirected graphical models
X3
X4
X1

Consist of
a set of nodes
a set of edges
a potential for each clique, multiplied together
to yield the distribution over variables
Examples
statistical physics Ising model, spinglasses
early neural networks (e.g. Boltzmann machines)

X2
X5
12
Directed graphical models
X3
X4
X1

Consist of
a set of nodes
a set of edges
a conditional probability distribution for each
node, conditioned on its parents, multiplied
together to yield the distribution over variables
Constrained to directed acyclic graphs (DAGs)
Called Bayesian networks or Bayes nets

X2
X5
13
Bayesian networks and Bayes

Two different problems
Bayesian statistics is a method of inference
Bayesian networks are a form of representation
There is no necessary connection
many users of Bayesian networks rely upon
frequentist statistical methods
many Bayesian inferences cannot be easily
represented using Bayesian networks

14
Properties of Bayesian networks

Efficient representation and inference
exploiting dependency structure makes it easier
to represent and compute with probabilities
Explaining away
pattern of probabilistic reasoning characteristic
of Bayesian networks, especially early use in AI

15
Properties of Bayesian networks

Efficient representation and inference
exploiting dependency structure makes it easier
to represent and compute with probabilities
Explaining away
pattern of probabilistic reasoning characteristic
of Bayesian networks, especially early use in AI

16
Efficient representation and inference

Four random variables
X1 coin toss produces heads
X2 pencil levitates
X3 friend has psychic powers
X4 friend has two-headed coin

17
The Markov assumption

Every node is conditionally independent of its
non-descendants, given its parents

where Pa(Xi) is the set of parents of Xi
(via the product rule)
18
Efficient representation and inference

Four random variables
X1 coin toss produces heads
X2 pencil levitates
X3 friend has psychic powers
X4 friend has two-headed coin

1
1
4
2
total 7 (vs 15)
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
19
Reading a Bayesian network

The structure of a Bayes net can be read as the
generative process behind a distribution
Gives the joint probability distribution over
variables obtained by sampling each variable
conditioned on its parents

20
Reading a Bayesian network

Four random variables
X1 coin toss produces heads
X2 pencil levitates
X3 friend has psychic powers
X4 friend has two-headed coin

X3
X4
X1
X2
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
21
Reading a Bayesian network

The structure of a Bayes net can be read as the
generative process behind a distribution
Gives the joint probability distribution over
variables obtained by sampling each variable
conditioned on its parents
Simple rules for determining whether two
variables are dependent or independent
Independence makes inference more efficient

22
Computing with Bayes nets
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
23
Computing with Bayes nets
sum over 8 values
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
24
Computing with Bayes nets
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
25
Computing with Bayes nets
sum over 4 values
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
26
Computing with Bayes nets

Inference algorithms for Bayesian networks
exploit dependency structure
Message-passing algorithms
belief propagation passes simple messages
between nodes, exact for tree-structured networks
More general inference algorithms
exact junction-tree
approximate Monte Carlo schemes (see Part IV)

27
Properties of Bayesian networks

Efficient representation and inference
exploiting dependency structure makes it easier
to represent and compute with probabilities
Explaining away
pattern of probabilistic reasoning characteristic
of Bayesian networks, especially early use in AI

28
Explaining away

Assume grass will be wet if and only if it
rained last night, or if the sprinklers were left
on

29
Explaining away
Compute probability it rained last night, given
that the grass is wet
30
Explaining away
Compute probability it rained last night, given
that the grass is wet
31
Explaining away
Compute probability it rained last night, given
that the grass is wet
32
Explaining away
Compute probability it rained last night, given
that the grass is wet
33
Explaining away
Compute probability it rained last night, given
that the grass is wet
34
Explaining away
Compute probability it rained last night, given
that the grass is wet and sprinklers were left
on
35
Explaining away
Compute probability it rained last night, given
that the grass is wet and sprinklers were left
on
36
Explaining away
Discounting to prior probability.
37
Contrast w/ production system
Rain
Grass Wet

Formulate IF-THEN rules
IF Rain THEN Wet
IF Wet THEN Rain
Rules do not distinguish directions of inference
Requires combinatorial explosion of rules

38
Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet

Observing rain, Wet becomes more active.
Observing grass wet, Rain and Sprinkler become
more active
Observing grass wet and sprinkler, Rain cannot
become less active. No explaining away!

Excitatory links Rain Wet, Sprinkler
Wet

39
Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet

Excitatory links Rain Wet, Sprinkler
Wet
Inhibitory link Rain Sprinkler

Observing grass wet, Rain and Sprinkler become
more active
Observing grass wet and sprinkler, Rain becomes
less active explaining away

40
Contrast w/ spreading activation
Rain
Burst pipe
Sprinkler
Grass Wet

Each new variable requires more inhibitory
connections
Not modular
whether a connection exists depends on what
others exist
big holism problem
combinatorial explosion

41
Contrast w/ spreading activation
(McClelland Rumelhart, 1981)
42
Graphical models

Capture dependency structure in distributions
Provide an efficient means of representing and
reasoning with probabilities
Allow kinds of inference that are problematic for
other representations explaining away
hard to capture in a production system
more natural than with spreading activation

43
Part II Graphical models

Introduction to graphical models
representation and inference
Causal graphical models
causality
learning about causal relationships
Graphical models and cognitive science
uses of graphical models
an example causal induction

44
Causal graphical models

Graphical models represent statistical
dependencies among variables (ie. correlations)
can answer questions about observations
Causal graphical models represent causal
dependencies among variables (Pearl,
2000)
express underlying causal structure
can answer questions about both observations and
interventions (actions upon a variable)

45
Bayesian networks
Nodes variables Links dependency Each node has
a conditional probability distribution Data
observations of x1, ..., x4

Four random variables
X1 coin toss produces heads
X2 pencil levitates
X3 friend has psychic powers
X4 friend has two-headed coin

46
Causal Bayesian networks
Nodes variables Links causality Each node has
a conditional probability distribution Data
observations of and interventions on x1, ..., x4

Four random variables
X1 coin toss produces heads
X2 pencil levitates
X3 friend has psychic powers
X4 friend has two-headed coin

47
Interventions

Four random variables
X1 coin toss produces heads
X2 pencil levitates
X3 friend has psychic powers
X4 friend has two-headed coin

48
Learning causal graphical models

Strength how strong is a relationship?
Structure does a relationship exist?

49
Causal structure vs. causal strength

Strength how strong is a relationship?

B
B
50
Causal structure vs. causal strength

Strength how strong is a relationship?
requires defining nature of relationship

B
B
51
Parameterization

Structures h1 h0
Parameterization

C
B
C
B
E
E
C
B
h1 P(E 1 C, B)
h0 P(E 1 C, B)
0 0 1 0 0 1 1 1
52
Parameterization

Structures h1 h0
Parameterization

C
B
C
B
E
E
C
B
h1 P(E 1 C, B)
h0 P(E 1 C, B)
0 0 1 0 0 1 1 1
53
Parameterization

Structures h1 h0
Parameterization

C
B
C
B
E
E
C
B
h1 P(E 1 C, B)
h0 P(E 1 C, B)
0 0 1 0 0 1 1 1
54
Parameter estimation

Maximum likelihood estimation
maximize ?i P(bi,ci,ei w0, w1)
Bayesian methods as in Part I

55
Causal structure vs. causal strength

Structure does a relationship exist?

B
B
56
Approaches to structure learning

Constraint-based
dependency from statistical tests (eg. ?2)
deduce structure from dependencies

C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
57
Approaches to structure learning

Constraint-based
dependency from statistical tests (eg. ?2)
deduce structure from dependencies

C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
58
Approaches to structure learning

Constraint-based
dependency from statistical tests (eg. ?2)
deduce structure from dependencies

C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
59
Approaches to structure learning

Constraint-based
dependency from statistical tests (eg. ?2)
deduce structure from dependencies

C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
Attempts to reduce inductive problem to deductive
problem
60
Approaches to structure learning

Constraint-based
dependency from statistical tests (eg. ?2)
deduce structure from dependencies

C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)

Bayesian
compute posterior
probability of structures,
given observed data

C
B
C
B
E
E
P(h1data)
P(h0data)
P(hdata) ? P(datah) P(h)
(Heckerman, 1998 Friedman, 1999)
61
Bayesian Occams Razor
h0 (no relationship)
P(d h )
h1 (relationship)
All possible data sets d
For any model h,
62
Causal graphical models

Extend graphical models to deal with
interventions as well as observations
Respecting the direction of causality results in
efficient representation and inference
Two steps in learning causal models
strength parameter estimation
structure structure learning

63
Part II Graphical models

Introduction to graphical models
representation and inference
Causal graphical models
causality
learning about causal relationships
Graphical models and cognitive science
uses of graphical models
an example causal induction

64
Uses of graphical models

Understanding existing cognitive models
e.g., neural network models
Representation and reasoning
a way to address holism in induction (c.f. Fodor)
Defining generative models
mixture models, language models (see Part IV)
Modeling human causal reasoning

65
Human causal reasoning

How do people reason about interventions?
(Gopnik, Glymour, Sobel, Schulz, Kushnir Danks,
2004 Lagnado Sloman, 2004 Sloman Lagnado,
2005 Steyvers, Tenenbaum, Wagenmakers Blum,
2003)
How do people learn about causal relationships?
parameter estimation (Shanks, 1995
Cheng, 1997)
constraint-based models
(Glymour, 2001)
Bayesian structure learning
(Steyvers et al., 2003 Griffiths Tenenbaum,
2005)

66
Causation from contingencies
C present (c)
C absent (c-)
a
c
E present (e)
d
b
E absent (e-)
Does C cause E? (rate on a scale from 0 to 100)
67
Two models of causal judgment

Delta-P (Jenkins Ward, 1965)
Power PC (Cheng, 1997)

Power
68
Buehner and Cheng (1997)
People
DP
Power
69
Buehner and Cheng (1997)
People
DP
Power
Constant ?P, changing judgments
70
Buehner and Cheng (1997)
People
DP
Power
Constant causal power, changing judgments
71
Buehner and Cheng (1997)
People
DP
Power
?P 0, changing judgments
72
Causal structure vs. causal strength

Strength how strong is a relationship?
Structure does a relationship exist?

B
B
73
Causal strength

Assume structure
DP and causal power are maximum likelihood
estimates of the strength parameter w1, under
different parameterizations for P(EB,C)
linear ? DP, Noisy-OR ? causal power

B
74
Causal structure

Hypotheses h1 h0
Bayesian causal inference
support

B
B
P(dh1)
likelihood ratio (Bayes factor) gives evidence in
favor of h1
P(dh0)
75
Buehner and Cheng (1997)
People
DP (r 0.89)
Power (r 0.88)
Support (r 0.97)
76
The importance of parameterization

Noisy-OR incorporates mechanism assumptions
generativity causes increase probability of
effects
each cause is sufficient to produce the effect
causes act via independent mechanisms
(Cheng, 1997)
Consider other models
statistical dependence ?2 test
generic parameterization (cf. Anderson, 1990)

77
People
Support (Noisy-OR)
?2
Support (generic)
78
Generativity is essential
0/8 0/8
P(ec)
8/8 8/8
6/8 6/8
4/8 4/8
2/8 2/8
P(ec-)
100 50 0
Support

Predictions result from ceiling effect
ceiling effects only matter if you believe a
cause increases the probability of an effect

79
Blicket detector (Dave Sobel, Alison Gopnik, and
colleagues)
80
Backwards blocking (Sobel, Tenenbaum Gopnik,
2004)
A Trial
AB Trial

Two objects A and B
Trial 1 A B on detector detector active
Trial 2 A on detector detector active
4-year-olds judge whether each object is a
blicket
A a blicket (100 say yes)
B probably not a blicket (34 say yes)

81
Possible hypotheses
B
A
E
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
E
E
E
E
E
E
E
E
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
E
E
E
E
E
E
E
E
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
E
E
E
E
E
E
E
E
82
Bayesian inference

Evaluating causal models in light of data
Inferring a particular causal relation

83
Bayesian inference
With a uniform prior on hypotheses, and the
generic parameterization
Probability of being a blicket
A
B
0.32
0.32
0.34
0.34
84
Modeling backwards blocking

Assume
Links can only exist from blocks to detectors
Blocks are blickets with prior probability q
Blickets always activate detectors, but detectors
never activate on their own
deterministic Noisy-OR, with wi 1 and w0 0

85
Modeling backwards blocking
P(h00) (1 q)2
P(h10) q(1 q)
P(h01) (1 q) q
P(h11) q2
B
A
B
A
B
A
B
A
E
E
E
E
P(E1 A0, B0) 0 0
0
0 P(E1 A1, B0) 0
0 1
1 P(E1 A0, B1) 0
1 0
1 P(E1 A1, B1)
0 1
1 1
86
Modeling backwards blocking
P(h00) (1 q)2
P(h10) q(1 q)
P(h01) (1 q) q
P(h11) q2
B
A
B
A
B
A
B
A
E
E
E
E
P(E1 A1, B1) 0
1 1
1
87
Modeling backwards blocking
P(h10) q(1 q)
P(h01) (1 q) q
P(h11) q2
B
A
B
A
B
A
E
E
E
P(E1 A1, B0) 0
1
1 P(E1 A1, B1)
1 1
1
88
Manipulating prior probability(Tenenbaum, Sobel,
Griffiths, Gopnik, submitted)
A Trial
Initial
AB Trial
89
Summary

Graphical models provide solutions to many of the
challenges of probabilistic models
defining structured distributions
representing distributions on many variables
efficiently computing probabilities
Causal graphical models provide tools for
defining rational models of human causal
reasoning and learning

90
(No Transcript)

Write a Comment

User Comments (0)