Part II: Graphical models

1 / 90
About This Presentation
Title:

Part II: Graphical models

Description:

Specifying well-defined probabilistic models with many variables ... of probabilistic reasoning characteristic of Bayesian networks, especially early use in AI ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Part II: Graphical models


1
Part II Graphical models
2
Challenges of probabilistic models
  • Specifying well-defined probabilistic models with
    many variables is hard (for modelers)
  • Representing probability distributions over those
    variables is hard (for computers/learners)
  • Computing quantities using those distributions is
    hard (for computers/learners)

3
Representing structured distributions
  • Four random variables
  • X1 coin toss produces heads
  • X2 pencil levitates
  • X3 friend has psychic powers
  • X4 friend has two-headed coin

Domain 0,1 0,1 0,1 0,1
4
Joint distribution
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001
1010 1011 1100 1101 1110 1111
  • Requires 15 numbers to specify probability of all
    values x1,x2,x3,x4
  • N binary variables, 2N-1 numbers
  • Similar cost when computing conditional
    probabilities

5
How can we use fewer numbers?
  • Four random variables
  • X1 coin toss produces heads
  • X2 coin toss produces heads
  • X3 coin toss produces heads
  • X4 coin toss produces heads

Domain 0,1 0,1 0,1 0,1
6
Statistical independence
  • Two random variables X1 and X2 are independent if
    P(x1x2) P(x1)
  • e.g. coinflips P(x1Hx2H) P(x1H) 0.5
  • Independence makes it easier to represent and
    work with probability distributions
  • We can exploit the product rule

If x1, x2, x3, and x4 are all independent
7
Expressing independence
  • Statistical independence is the key to efficient
    probabilistic representation and computation
  • This has led to the development of languages for
    indicating dependencies among variables
  • Some of the most popular languages are based on
    graphical models

8
Part II Graphical models
  • Introduction to graphical models
  • representation and inference
  • Causal graphical models
  • causality
  • learning about causal relationships
  • Graphical models and cognitive science
  • uses of graphical models
  • an example causal induction

9
Part II Graphical models
  • Introduction to graphical models
  • representation and inference
  • Causal graphical models
  • causality
  • learning about causal relationships
  • Graphical models and cognitive science
  • uses of graphical models
  • an example causal induction

10
Graphical models
  • Express the probabilistic dependency structure
    among a set of variables (Pearl, 1988)
  • Consist of
  • a set of nodes, corresponding to variables
  • a set of edges, indicating dependency
  • a set of functions defined on the graph that
    specify a probability distribution

11
Undirected graphical models
X3
X4
X1
  • Consist of
  • a set of nodes
  • a set of edges
  • a potential for each clique, multiplied together
    to yield the distribution over variables
  • Examples
  • statistical physics Ising model, spinglasses
  • early neural networks (e.g. Boltzmann machines)

X2
X5
12
Directed graphical models
X3
X4
X1
  • Consist of
  • a set of nodes
  • a set of edges
  • a conditional probability distribution for each
    node, conditioned on its parents, multiplied
    together to yield the distribution over variables
  • Constrained to directed acyclic graphs (DAGs)
  • Called Bayesian networks or Bayes nets

X2
X5
13
Bayesian networks and Bayes
  • Two different problems
  • Bayesian statistics is a method of inference
  • Bayesian networks are a form of representation
  • There is no necessary connection
  • many users of Bayesian networks rely upon
    frequentist statistical methods
  • many Bayesian inferences cannot be easily
    represented using Bayesian networks

14
Properties of Bayesian networks
  • Efficient representation and inference
  • exploiting dependency structure makes it easier
    to represent and compute with probabilities
  • Explaining away
  • pattern of probabilistic reasoning characteristic
    of Bayesian networks, especially early use in AI

15
Properties of Bayesian networks
  • Efficient representation and inference
  • exploiting dependency structure makes it easier
    to represent and compute with probabilities
  • Explaining away
  • pattern of probabilistic reasoning characteristic
    of Bayesian networks, especially early use in AI

16
Efficient representation and inference
  • Four random variables
  • X1 coin toss produces heads
  • X2 pencil levitates
  • X3 friend has psychic powers
  • X4 friend has two-headed coin

17
The Markov assumption
  • Every node is conditionally independent of its
    non-descendants, given its parents

where Pa(Xi) is the set of parents of Xi
(via the product rule)
18
Efficient representation and inference
  • Four random variables
  • X1 coin toss produces heads
  • X2 pencil levitates
  • X3 friend has psychic powers
  • X4 friend has two-headed coin

1
1
4
2
total 7 (vs 15)
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
19
Reading a Bayesian network
  • The structure of a Bayes net can be read as the
    generative process behind a distribution
  • Gives the joint probability distribution over
    variables obtained by sampling each variable
    conditioned on its parents

20
Reading a Bayesian network
  • Four random variables
  • X1 coin toss produces heads
  • X2 pencil levitates
  • X3 friend has psychic powers
  • X4 friend has two-headed coin

X3
X4
X1
X2
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
21
Reading a Bayesian network
  • The structure of a Bayes net can be read as the
    generative process behind a distribution
  • Gives the joint probability distribution over
    variables obtained by sampling each variable
    conditioned on its parents
  • Simple rules for determining whether two
    variables are dependent or independent
  • Independence makes inference more efficient

22
Computing with Bayes nets
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
23
Computing with Bayes nets
sum over 8 values
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
24
Computing with Bayes nets
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
25
Computing with Bayes nets
sum over 4 values
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
26
Computing with Bayes nets
  • Inference algorithms for Bayesian networks
    exploit dependency structure
  • Message-passing algorithms
  • belief propagation passes simple messages
    between nodes, exact for tree-structured networks
  • More general inference algorithms
  • exact junction-tree
  • approximate Monte Carlo schemes (see Part IV)

27
Properties of Bayesian networks
  • Efficient representation and inference
  • exploiting dependency structure makes it easier
    to represent and compute with probabilities
  • Explaining away
  • pattern of probabilistic reasoning characteristic
    of Bayesian networks, especially early use in AI

28
Explaining away
  • Assume grass will be wet if and only if it
    rained last night, or if the sprinklers were left
    on

29
Explaining away
Compute probability it rained last night, given
that the grass is wet
30
Explaining away
Compute probability it rained last night, given
that the grass is wet
31
Explaining away
Compute probability it rained last night, given
that the grass is wet
32
Explaining away
Compute probability it rained last night, given
that the grass is wet
33
Explaining away
Compute probability it rained last night, given
that the grass is wet
34
Explaining away
Compute probability it rained last night, given
that the grass is wet and sprinklers were left
on
35
Explaining away
Compute probability it rained last night, given
that the grass is wet and sprinklers were left
on
36
Explaining away
Discounting to prior probability.
37
Contrast w/ production system
Rain
Grass Wet
  • Formulate IF-THEN rules
  • IF Rain THEN Wet
  • IF Wet THEN Rain
  • Rules do not distinguish directions of inference
  • Requires combinatorial explosion of rules

38
Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet
  • Observing rain, Wet becomes more active.
  • Observing grass wet, Rain and Sprinkler become
    more active
  • Observing grass wet and sprinkler, Rain cannot
    become less active. No explaining away!
  • Excitatory links Rain Wet, Sprinkler
    Wet

39
Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet
  • Excitatory links Rain Wet, Sprinkler
    Wet
  • Inhibitory link Rain Sprinkler
  • Observing grass wet, Rain and Sprinkler become
    more active
  • Observing grass wet and sprinkler, Rain becomes
    less active explaining away

40
Contrast w/ spreading activation
Rain
Burst pipe
Sprinkler
Grass Wet
  • Each new variable requires more inhibitory
    connections
  • Not modular
  • whether a connection exists depends on what
    others exist
  • big holism problem
  • combinatorial explosion

41
Contrast w/ spreading activation
(McClelland Rumelhart, 1981)
42
Graphical models
  • Capture dependency structure in distributions
  • Provide an efficient means of representing and
    reasoning with probabilities
  • Allow kinds of inference that are problematic for
    other representations explaining away
  • hard to capture in a production system
  • more natural than with spreading activation

43
Part II Graphical models
  • Introduction to graphical models
  • representation and inference
  • Causal graphical models
  • causality
  • learning about causal relationships
  • Graphical models and cognitive science
  • uses of graphical models
  • an example causal induction

44
Causal graphical models
  • Graphical models represent statistical
    dependencies among variables (ie. correlations)
  • can answer questions about observations
  • Causal graphical models represent causal
    dependencies among variables (Pearl,
    2000)
  • express underlying causal structure
  • can answer questions about both observations and
    interventions (actions upon a variable)

45
Bayesian networks
Nodes variables Links dependency Each node has
a conditional probability distribution Data
observations of x1, ..., x4
  • Four random variables
  • X1 coin toss produces heads
  • X2 pencil levitates
  • X3 friend has psychic powers
  • X4 friend has two-headed coin

46
Causal Bayesian networks
Nodes variables Links causality Each node has
a conditional probability distribution Data
observations of and interventions on x1, ..., x4
  • Four random variables
  • X1 coin toss produces heads
  • X2 pencil levitates
  • X3 friend has psychic powers
  • X4 friend has two-headed coin

47
Interventions
  • Four random variables
  • X1 coin toss produces heads
  • X2 pencil levitates
  • X3 friend has psychic powers
  • X4 friend has two-headed coin

48
Learning causal graphical models
  • Strength how strong is a relationship?
  • Structure does a relationship exist?

49
Causal structure vs. causal strength
  • Strength how strong is a relationship?

B
B
50
Causal structure vs. causal strength
  • Strength how strong is a relationship?
  • requires defining nature of relationship

B
B
51
Parameterization
  • Structures h1 h0
  • Parameterization

C
B
C
B
E
E
C
B
h1 P(E 1 C, B)
h0 P(E 1 C, B)
0 0 1 0 0 1 1 1
52
Parameterization
  • Structures h1 h0
  • Parameterization

C
B
C
B
E
E
C
B
h1 P(E 1 C, B)
h0 P(E 1 C, B)
0 0 1 0 0 1 1 1
53
Parameterization
  • Structures h1 h0
  • Parameterization

C
B
C
B
E
E
C
B
h1 P(E 1 C, B)
h0 P(E 1 C, B)
0 0 1 0 0 1 1 1
54
Parameter estimation
  • Maximum likelihood estimation
  • maximize ?i P(bi,ci,ei w0, w1)
  • Bayesian methods as in Part I

55
Causal structure vs. causal strength
  • Structure does a relationship exist?

B
B
56
Approaches to structure learning
  • Constraint-based
  • dependency from statistical tests (eg. ?2)
  • deduce structure from dependencies

C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
57
Approaches to structure learning
  • Constraint-based
  • dependency from statistical tests (eg. ?2)
  • deduce structure from dependencies

C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
58
Approaches to structure learning
  • Constraint-based
  • dependency from statistical tests (eg. ?2)
  • deduce structure from dependencies

C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
59
Approaches to structure learning
  • Constraint-based
  • dependency from statistical tests (eg. ?2)
  • deduce structure from dependencies

C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
Attempts to reduce inductive problem to deductive
problem
60
Approaches to structure learning
  • Constraint-based
  • dependency from statistical tests (eg. ?2)
  • deduce structure from dependencies

C
B
B
E
(Pearl, 2000 Spirtes et al., 1993)
  • Bayesian
  • compute posterior
  • probability of structures,
  • given observed data

C
B
C
B
E
E
P(h1data)
P(h0data)
P(hdata) ? P(datah) P(h)
(Heckerman, 1998 Friedman, 1999)
61
Bayesian Occams Razor
h0 (no relationship)
P(d h )
h1 (relationship)
All possible data sets d
For any model h,
62
Causal graphical models
  • Extend graphical models to deal with
    interventions as well as observations
  • Respecting the direction of causality results in
    efficient representation and inference
  • Two steps in learning causal models
  • strength parameter estimation
  • structure structure learning

63
Part II Graphical models
  • Introduction to graphical models
  • representation and inference
  • Causal graphical models
  • causality
  • learning about causal relationships
  • Graphical models and cognitive science
  • uses of graphical models
  • an example causal induction

64
Uses of graphical models
  • Understanding existing cognitive models
  • e.g., neural network models
  • Representation and reasoning
  • a way to address holism in induction (c.f. Fodor)
  • Defining generative models
  • mixture models, language models (see Part IV)
  • Modeling human causal reasoning

65
Human causal reasoning
  • How do people reason about interventions?
  • (Gopnik, Glymour, Sobel, Schulz, Kushnir Danks,
    2004 Lagnado Sloman, 2004 Sloman Lagnado,
    2005 Steyvers, Tenenbaum, Wagenmakers Blum,
    2003)
  • How do people learn about causal relationships?
  • parameter estimation (Shanks, 1995
    Cheng, 1997)
  • constraint-based models
    (Glymour, 2001)
  • Bayesian structure learning
  • (Steyvers et al., 2003 Griffiths Tenenbaum,
    2005)

66
Causation from contingencies
C present (c)
C absent (c-)
a
c
E present (e)
d
b
E absent (e-)
Does C cause E? (rate on a scale from 0 to 100)
67
Two models of causal judgment
  • Delta-P (Jenkins Ward, 1965)
  • Power PC (Cheng, 1997)

Power
68
Buehner and Cheng (1997)
People
DP
Power
69
Buehner and Cheng (1997)
People
DP
Power
Constant ?P, changing judgments
70
Buehner and Cheng (1997)
People
DP
Power
Constant causal power, changing judgments
71
Buehner and Cheng (1997)
People
DP
Power
?P 0, changing judgments
72
Causal structure vs. causal strength
  • Strength how strong is a relationship?
  • Structure does a relationship exist?

B
B
73
Causal strength
  • Assume structure
  • DP and causal power are maximum likelihood
    estimates of the strength parameter w1, under
    different parameterizations for P(EB,C)
  • linear ? DP, Noisy-OR ? causal power

B
74
Causal structure
  • Hypotheses h1 h0
  • Bayesian causal inference
  • support

B
B
P(dh1)
likelihood ratio (Bayes factor) gives evidence in
favor of h1
P(dh0)
75
Buehner and Cheng (1997)
People
DP (r 0.89)
Power (r 0.88)
Support (r 0.97)
76
The importance of parameterization
  • Noisy-OR incorporates mechanism assumptions
  • generativity causes increase probability of
    effects
  • each cause is sufficient to produce the effect
  • causes act via independent mechanisms
  • (Cheng, 1997)
  • Consider other models
  • statistical dependence ?2 test
  • generic parameterization (cf. Anderson, 1990)

77
People
Support (Noisy-OR)
?2
Support (generic)
78
Generativity is essential
0/8 0/8
P(ec)
8/8 8/8
6/8 6/8
4/8 4/8
2/8 2/8
P(ec-)
100 50 0
Support
  • Predictions result from ceiling effect
  • ceiling effects only matter if you believe a
    cause increases the probability of an effect

79
Blicket detector (Dave Sobel, Alison Gopnik, and
colleagues)
80
Backwards blocking (Sobel, Tenenbaum Gopnik,
2004)
A Trial
AB Trial
  • Two objects A and B
  • Trial 1 A B on detector detector active
  • Trial 2 A on detector detector active
  • 4-year-olds judge whether each object is a
    blicket
  • A a blicket (100 say yes)
  • B probably not a blicket (34 say yes)

81
Possible hypotheses
B
A
E
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
E
E
E
E
E
E
E
E
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
E
E
E
E
E
E
E
E
B
A
B
A
B
A
B
A
B
A
B
A
B
A
B
A
E
E
E
E
E
E
E
E
82
Bayesian inference
  • Evaluating causal models in light of data
  • Inferring a particular causal relation

83
Bayesian inference
With a uniform prior on hypotheses, and the
generic parameterization
Probability of being a blicket
A
B
0.32
0.32
0.34
0.34
84
Modeling backwards blocking
  • Assume
  • Links can only exist from blocks to detectors
  • Blocks are blickets with prior probability q
  • Blickets always activate detectors, but detectors
    never activate on their own
  • deterministic Noisy-OR, with wi 1 and w0 0

85
Modeling backwards blocking
P(h00) (1 q)2
P(h10) q(1 q)
P(h01) (1 q) q
P(h11) q2
B
A
B
A
B
A
B
A
E
E
E
E
P(E1 A0, B0) 0 0
0
0 P(E1 A1, B0) 0
0 1
1 P(E1 A0, B1) 0
1 0
1 P(E1 A1, B1)
0 1
1 1
86
Modeling backwards blocking
P(h00) (1 q)2
P(h10) q(1 q)
P(h01) (1 q) q
P(h11) q2
B
A
B
A
B
A
B
A
E
E
E
E
P(E1 A1, B1) 0
1 1
1
87
Modeling backwards blocking
P(h10) q(1 q)
P(h01) (1 q) q
P(h11) q2
B
A
B
A
B
A
E
E
E
P(E1 A1, B0) 0
1
1 P(E1 A1, B1)
1 1
1
88
Manipulating prior probability(Tenenbaum, Sobel,
Griffiths, Gopnik, submitted)
A Trial
Initial
AB Trial
89
Summary
  • Graphical models provide solutions to many of the
    challenges of probabilistic models
  • defining structured distributions
  • representing distributions on many variables
  • efficiently computing probabilities
  • Causal graphical models provide tools for
    defining rational models of human causal
    reasoning and learning

90
(No Transcript)
Write a Comment
User Comments (0)