Graphical models Tom Griffiths UC Berkeley

1 / 60
About This Presentation
Title:

Graphical models Tom Griffiths UC Berkeley

Description:

Assume grass will be wet if and only if it rained last night, or if the sprinklers were left on: Explaining away. Rain. Sprinkler. Grass Wet ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 61
Provided by: josht83
Learn more at: http://www.ipam.ucla.edu

less

Transcript and Presenter's Notes

Title: Graphical models Tom Griffiths UC Berkeley


1
Graphical modelsTom GriffithsUC Berkeley
2
Challenges of probabilistic models
  • Specifying well-defined probabilistic models with
    many variables is hard (for modelers)
  • Representing probability distributions over those
    variables is hard (for computers/learners)
  • Computing quantities using those distributions is
    hard (for computers/learners)

3
Representing structured distributions
  • Four random variables
  • X1 coin toss produces heads
  • X2 pencil levitates
  • X3 friend has psychic powers
  • X4 friend has two-headed coin

Domain 0,1 0,1 0,1 0,1
4
Joint distribution
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001
1010 1011 1100 1101 1110 1111
  • Requires 15 numbers to specify probability of all
    values x1,x2,x3,x4
  • N binary variables, 2N-1 numbers
  • Similar cost when computing conditional
    probabilities

5
How can we use fewer numbers?
  • Four random variables
  • X1 coin toss produces heads
  • X2 coin toss produces heads
  • X3 coin toss produces heads
  • X4 coin toss produces heads

Domain 0,1 0,1 0,1 0,1
6
Statistical independence
  • Two random variables X1 and X2 are independent if
    P(x1x2) P(x1)
  • e.g. coinflips P(x1Hx2H) P(x1H) 0.5
  • Independence makes it easier to represent and
    work with probability distributions
  • We can exploit the product rule

If x1, x2, x3, and x4 are all independent
7
Expressing independence
  • Statistical independence is the key to efficient
    probabilistic representation and computation
  • This has led to the development of languages for
    indicating dependencies among variables
  • Some of the most popular languages are based on
    graphical models

8
Graphical models
  • Introduction to graphical models
  • definitions
  • efficient representation and inference
  • explaining away
  • Graphical models and cognitive science
  • uses of graphical models

9
Graphical models
  • Introduction to graphical models
  • definitions
  • efficient representation and inference
  • explaining away
  • Graphical models and cognitive science
  • uses of graphical models

10
Graphical models
  • Express the probabilistic dependency structure
    among a set of variables (Pearl, 1988)
  • Consist of
  • a set of nodes, corresponding to variables
  • a set of edges, indicating dependency
  • a set of functions defined on the graph that
    specify a probability distribution

11
Undirected graphical models
X3
X4
X1
  • Consist of
  • a set of nodes
  • a set of edges
  • a potential for each clique, multiplied together
    to yield the distribution over variables
  • Examples
  • statistical physics Ising model, spinglasses
  • neural networks (e.g. Boltzmann machines)

X2
X5
12
Ising models
X1
X2
  • Consist of
  • a set of nodes
  • a set of edges
  • a potential for each clique, multiplied together
    to yield the distribution over variables
  • Distribution is specified as

X3
X4
13
Ising models
14
Boltzmann machines
X3
X4
X1
  • Consist of
  • a set of nodes
  • a set of edges
  • a potential for each clique, multiplied together
    to yield the distribution over variables
  • Distribution is specified as

X2
X5
15
Boltzmann machines
True image
Boltzmann
PCA
PCA
Boltzmann
(Hinton Salakhutdinov, 2006)
16
Directed graphical models
X3
X4
X1
  • Consist of
  • a set of nodes
  • a set of edges
  • a conditional probability distribution for each
    node, conditioned on its parents, multiplied
    together to yield the distribution over variables
  • Constrained to directed acyclic graphs (DAGs)
  • Called Bayesian networks or Bayes nets

X2
X5
17
Bayesian networks and Bayes
  • Two different problems
  • Bayesian statistics is a method of inference
  • Bayesian networks are a form of representation
  • There is no necessary connection
  • many users of Bayesian networks rely upon
    frequentist statistical methods
  • many Bayesian inferences cannot be easily
    represented using Bayesian networks

18
Graphical models
  • Introduction to graphical models
  • definitions
  • efficient representation and inference
  • explaining away
  • Graphical models and cognitive science
  • uses of graphical models

19
Efficient representation and inference
  • Four random variables
  • X1 coin toss produces heads
  • X2 pencil levitates
  • X3 friend has psychic powers
  • X4 friend has two-headed coin

20
The Markov assumption
  • Every node is conditionally independent of its
    non-descendants, given its parents

where Pa(Xi) is the set of parents of Xi
(via the product rule)
21
Efficient representation and inference
  • Four random variables
  • X1 coin toss produces heads
  • X2 pencil levitates
  • X3 friend has psychic powers
  • X4 friend has two-headed coin

1
1
4
2
total 8 (vs 15)
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
22
Reading a Bayesian network
  • The structure of a Bayes net can be read as the
    generative process behind a distribution
  • Gives the joint probability distribution over
    variables obtained by sampling each variable
    conditioned on its parents

23
Reading a Bayesian network
  • Four random variables
  • X1 coin toss produces heads
  • X2 pencil levitates
  • X3 friend has psychic powers
  • X4 friend has two-headed coin

X3
X4
X1
X2
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
24
Reading a Bayesian network
  • The structure of a Bayes net can be read as the
    generative process behind a distribution
  • Gives the joint probability distribution over
    variables obtained by sampling each variable
    conditioned on its parents
  • Simple rules for determining whether two
    variables are dependent or independent

25
Identifying independence
X1 and X3 dependent
X1 and X3 independent
X2
X2
X2
X2
X2
X2
(shaded variables are observed)
26
Identifying independence
  • Four random variables
  • X1 coin toss produces heads
  • X2 pencil levitates
  • X3 friend has psychic powers
  • X4 friend has two-headed coin

X4 and X2 are independent
X3
X4
X3
X4
X1
X2
X1
X2
X4 and X2 are independent
X4 and X2 are dependent
27
Reading a Bayesian network
  • The structure of a Bayes net can be read as the
    generative process behind a distribution
  • Gives the joint probability distribution over
    variables obtained by sampling each variable
    conditioned on its parents
  • Simple rules for determining whether two
    variables are dependent or independent
  • Independence makes inference more efficient

28
Computing with Bayes nets
X3
X4
X1
X2
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
29
Computing with Bayes nets
sum over 8 values
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
30
Computing with Bayes nets
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
31
Computing with Bayes nets
sum over 4 values
P(x1, x2, x3, x4) P(x1x3, x4)P(x2x3)P(x3)P(x4)
32
Computing with Bayes nets
  • Inference algorithms for Bayesian networks
    exploit dependency structure
  • Message-passing algorithms
  • belief propagation passes simple messages
    between nodes, exact for tree-structured networks
  • More general inference algorithms
  • exact junction-tree
  • approximate Monte Carlo schemes

33
Logic and probability
  • Bayesian networks are equivalent to a
    probabilistic propositional logic
  • Associate variables with atomic propositions
  • Bayes net specifies a distribution over possible
    worlds, probability of a proposition is a sum
    over worlds
  • More efficient than simply enumerating worlds
  • Developing similarly efficient schemes for
    working with other probabilistic logics is a
    major topic of current research

34
Graphical models
  • Introduction to graphical models
  • definitions
  • efficient representation and inference
  • explaining away
  • Graphical models and cognitive science
  • uses of graphical models

35
Identifying independence
X1 and X3 dependent
X1 and X3 independent
X2
X2
X2
X2
X2
X2
(shaded variables are observed)
36
Explaining away
  • Assume grass will be wet if and only if it
    rained last night, or if the sprinklers were left
    on

37
Explaining away
Compute probability it rained last night, given
that the grass is wet
38
Explaining away
Compute probability it rained last night, given
that the grass is wet
39
Explaining away
Compute probability it rained last night, given
that the grass is wet
40
Explaining away
Compute probability it rained last night, given
that the grass is wet
41
Explaining away
Compute probability it rained last night, given
that the grass is wet
42
Explaining away
Compute probability it rained last night, given
that the grass is wet and sprinklers were left
on
43
Explaining away
Compute probability it rained last night, given
that the grass is wet and sprinklers were left
on
44
Explaining away
Discounting to prior probability.
45
Contrast w/ production system
Rain
Grass Wet
  • Formulate IF-THEN rules
  • IF Rain THEN Wet
  • IF Wet THEN Rain
  • Rules do not distinguish directions of inference
  • Requires combinatorial explosion of rules

46
Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet
  • Observing rain, Wet becomes more active.
  • Observing grass wet, Rain and Sprinkler become
    more active
  • Observing grass wet and sprinkler, Rain cannot
    become less active. No explaining away!
  • Excitatory links Rain Wet, Sprinkler
    Wet

47
Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet
  • Excitatory links Rain Wet, Sprinkler
    Wet
  • Inhibitory link Rain Sprinkler
  • Observing grass wet, Rain and Sprinkler become
    more active
  • Observing grass wet and sprinkler, Rain becomes
    less active explaining away

48
Contrast w/ spreading activation
Rain
Burst pipe
Sprinkler
Grass Wet
  • Each new variable requires more inhibitory
    connections
  • Not modular
  • whether a connection exists depends on what
    others exist
  • big holism problem
  • combinatorial explosion

49
Contrast w/ spreading activation
(McClelland Rumelhart, 1981)
50
Graphical models
  • Capture dependency structure in distributions
  • Provide an efficient means of representing and
    reasoning with probabilities
  • Support kinds of inference that are problematic
    for other cognitive models explaining away
  • hard to capture in a production system
  • more natural than with spreading activation

51
Graphical models
  • Introduction to graphical models
  • definitions
  • efficient representation and inference
  • explaining away
  • Graphical models and cognitive science
  • uses of graphical models

52
Uses of graphical models
  • Understanding existing cognitive models
  • e.g., neural network models

53
Sigmoid belief networks
y
  • We can view multilayer perceptrons as Bayes nets
    with specific probabilities
  • (e.g., Neal, 1992)
  • Makes it possible to use Bayesian tools with
    existing neural network models
  • (e.g., Mackay, 1992)

z1
z2
x1
x2
54
Uses of graphical models
  • Understanding existing cognitive models
  • e.g., neural network models
  • Representation and reasoning
  • a way to address holism in induction (c.f. Fodor)

55
The holism of confirmation
  • If everything we know is one big probability
    distribution, then discovering one small fact
    requires changing all of our beliefs
  • Used by Fodor (2001) as an argument against the
    possibility of inductive logic
  • Bayes nets everything can be connected to
    everything, but inference can still be efficient

vs.
56
Uses of graphical models
  • Understanding existing cognitive models
  • e.g., neural network models
  • Representation and reasoning
  • a way to address holism in induction (c.f. Fodor)
  • Defining generative models
  • mixture models, language models,

57
Graphical models and coinflipping
q
d1
d2
d3
d4
d1
d2
d3
d4
d1
d2
d3
d4
Hidden Markov model si Fair coin, Trick
coin
Fair coin P(H) 0.5
P(H) q
58
A hierarchical Bayesian model
physical knowledge
Coins
q Beta(FH,FT)
FH,FT
...
Coin 1
Coin 2
Coin 200
q200
q1
q2
d1 d2 d3 d4
d1 d2 d3 d4
d1 d2 d3 d4
59
Uses of graphical models
  • Understanding existing cognitive models
  • e.g., neural network models
  • Representation and reasoning
  • a way to address holism in induction (c.f. Fodor)
  • Defining generative models
  • mixture models, language models,
  • Modeling human causal reasoning
  • more on Friday!

60
(No Transcript)
Write a Comment
User Comments (0)