Markov chain Monte Carlo with people - PowerPoint PPT Presentation

About This Presentation

Title:

Markov chain Monte Carlo with people

Description:

How can cognition inspire new statistical models? applications of ... distinguished by neck length, ... category distributions Markov chains Variables ... – PowerPoint PPT presentation

Number of Views:142

Avg rating:3.0/5.0

Slides: 97

Provided by: JoshTen6

Learn more at: https://cocosci.princeton.edu

Category:

more less

Transcript and Presenter's Notes

Title: Markov chain Monte Carlo with people

1
Markov chain Monte Carlowith people

Tom Griffiths
Department of Psychology
Cognitive Science Program
UC Berkeley

with Mike Kalish, Stephan Lewandowsky, and Adam
Sanborn
2
Inductive problems
3
Computational cognitive science

Identify the underlying computational problem
Find the optimal solution to that problem
Compare human cognition to that solution
For inductive problems, solutions come from
statistics

4
Statistics and inductive problems
Cognitive science Categorization Causal
learning Function learning Language
Statistics Density estimation Graphical
models Regression Probabilistic grammars
5
Statistics and human cognition

How can we use statistics to understand
cognition?
How can cognition inspire new statistical models?
applications of Dirichlet process and Pitman-Yor
process models to natural language
exchangeable distributions on infinite binary
matrices via the Indian buffet process (priors on
causal structure)
nonparametric Bayesian models for relational data

6
Statistics and human cognition

How can we use statistics to understand
cognition?
How can cognition inspire new statistical models?
applications of Dirichlet process and Pitman-Yor
process models to natural language
exchangeable distributions on infinite binary
matrices via the Indian buffet process (priors on
causal structure)
nonparametric Bayesian models for relational data

7
Statistics and human cognition

How can we use statistics to understand
cognition?
How can cognition inspire new statistical models?
applications of Dirichlet process and Pitman-Yor
process models to natural language
exchangeable distributions on infinite binary
matrices via the Indian buffet process
nonparametric Bayesian models for relational data

8
Are people Bayesian?
Reverend Thomas Bayes
9
Bayes theorem
h hypothesis d data
10
People are stupid
11
Predicting the future

How often is Google News updated?
t time since last update
ttotal time between updates
What should we guess for ttotal given t?

12
The effects of priors
13
Evaluating human predictions

Different domains with different priors
a movie has made 60 million power-law
your friend quotes from line 17 of a poem
power-law
you meet a 78 year old man Gaussian
a movie has been running for 55 minutes
Gaussian
a U.S. congressman has served for 11 years
Erlang
Prior distributions derived from actual data
Use 5 values of t for each
People predict ttotal

14
people
empirical prior
parametric prior
Gotts rule
15
A different approach

Instead of asking whether people are rational,
use assumption of rationality to investigate
cognition
If we can predict peoples responses, we can
design experiments that measure psychological
variables

16
Two deep questions

What are the biases that guide human learning?
prior probability distribution P(h)
What do mental representations look like?
category distribution P(xc)

17
Two deep questions

What are the biases that guide human learning?
prior probability distribution on hypotheses,
P(h)
What do mental representations look like?
distribution over objects x in category c, P(xc)

Develop ways to sample from these distributions
18
Outline

Markov chain Monte Carlo
Sampling from the prior
Sampling from category distributions

19
Outline

Markov chain Monte Carlo
Sampling from the prior
Sampling from category distributions

20
Markov chains
x
x
x
x
x
x
x
x
Transition matrix T P(x(t1)x(t))

Variables x(t1) independent of history given
x(t)
Converges to a stationary distribution under
easily checked conditions (i.e., if it is ergodic)

21
Markov chain Monte Carlo

Sample from a target distribution P(x) by
constructing Markov chain for which P(x) is the
stationary distribution
Two main schemes
Gibbs sampling
Metropolis-Hastings algorithm

22
Gibbs sampling

For variables x x1, x2, , xn and target P(x)
Draw xi(t1) from P(xix-i)
x-i x1(t1), x2(t1),, xi-1(t1), xi1(t), ,
xn(t)

23
Gibbs sampling
(MacKay, 2002)
24
Metropolis-Hastings algorithm(Metropolis et al.,
1953 Hastings, 1970)

Step 1 propose a state (we assume
symmetrically)
Q(x(t1)x(t)) Q(x(t))x(t1))
Step 2 decide whether to accept, with
probability

Metropolis acceptance function
Barker acceptance function
25
Metropolis-Hastings algorithm
p(x)
26
Metropolis-Hastings algorithm
p(x)
27
Metropolis-Hastings algorithm
p(x)
28
Metropolis-Hastings algorithm
p(x)
A(x(t), x(t1)) 0.5
29
Metropolis-Hastings algorithm
p(x)
30
Metropolis-Hastings algorithm
p(x)
A(x(t), x(t1)) 1
31
Outline

Markov chain Monte Carlo
Sampling from the prior
Sampling from category distributions

32
Iterated learning(Kirby, 2001)
What are the consequences of learners learning
from other learners?
33
Analyzing iterated learning
PL(hd)
PL(hd)
PP(dh)
PP(dh)
PL(hd) probability of inferring hypothesis h
from data d PP(dh) probability of generating
data d from hypothesis h
34
Iterated Bayesian learning
PL(hd)
PL(hd)
PP(dh)
PP(dh)
35
Analyzing iterated learning
36
Stationary distributions

Markov chain on h converges to the prior, P(h)
Markov chain on d converges to the prior
predictive distribution

(Griffiths Kalish, 2005)
37
Explaining convergence to the prior
PL(hd)
PL(hd)
PP(dh)
PP(dh)

Intuitively data acts once, prior many times
Formally iterated learning with Bayesian agents
is a Gibbs sampler on P(d,h)

(Griffiths Kalish, in press)
38
Revealing inductive biases

Many problems in cognitive science can be
formulated as problems of induction
learning languages, concepts, and causal
relations
Such problems are not solvable without bias
(e.g., Goodman, 1955 Kearns Vazirani, 1994
Vapnik, 1995)
What biases guide human inductive inferences?
If iterated learning converges to the prior,
then it may provide a method for investigating
biases

39
Serial reproduction(Bartlett, 1932)

Participants see stimuli, then reproduce them
from memory
Reproductions of one participant are stimuli for
the next
Stimuli were interesting, rather than controlled
e.g., War of the Ghosts

40
General strategy

Use well-studied and simple stimuli for which
peoples inductive biases are known
function learning
concept learning
color words
Examine dynamics of iterated learning
convergence to state reflecting biases
predictable path to convergence

41
Iterated function learning

Each learner sees a set of (x,y) pairs
Makes predictions of y for new x values
Predictions are data for the next learner

(Kalish, Griffiths, Lewandowsky, in press)
42
Function learning experiments
Examine iterated learning with different initial
data
43
Initial data
Iteration
1 2 3 4
5 6 7 8 9
44
Identifying inductive biases

Formal analysis suggests that iterated learning
provides a way to determine inductive biases
Experiments with human learners support this idea
when stimuli for which biases are well understood
are used, those biases are revealed by iterated
learning
What do inductive biases look like in other
cases?
continuous categories
causal structure
word learning
language learning

45
Statistics and cultural evolution

Iterated learning for MAP learners reduces to a
form of the stochastic EM algorithm
Monte Carlo EM with a single sample
Provides connections between cultural evolution
and classic models used in population genetics
MAP learning of multinomials Wright-Fisher
More generally, an account of how products of
cultural evolution relate to the biases of
learners

46
Outline

Markov chain Monte Carlo
Sampling from the prior
Sampling from category distributions

47
Categories are central to cognition
48
Sampling from categories
Frog distribution P(xc)
49
A task

Ask subjects which of two alternatives comes
from a target category

Which animal is a frog?
50
A Bayesian analysis of the task
Assume
51
Response probabilities

If people probability match to the posterior,
response probability is equivalent to the Barker
acceptance function for target distribution p(xc)

52
Collecting the samples
Which is the frog?
Trial 1
Trial 2
Trial 3
53
Verifying the method
54
Training

Subjects were shown schematic fish of
different sizes and trained on whether they came
from the ocean (uniform) or a fish farm (Gaussian)

55
Between-subject conditions
56
Choice task

Subjects judged which of the two fish came
from the fish farm (Gaussian) distribution

57
Examples of subject MCMC chains
58
Estimates from all subjects

Estimated means and standard deviations are
significantly different across groups
Estimated means are accurate, but standard
deviation estimates are high
result could be due to perceptual noise or
response gain

59
Sampling from natural categories

Examined distributions for four natural
categories giraffes, horses, cats, and dogs

Presented stimuli with nine-parameter stick
figures (Olman Kersten, 2004)
60
Choice task
61
Samples from Subject 3(projected onto plane from
LDA)
62
Mean animals by subject
S1
S2
S3
S4
S5
S6
S7
S8
giraffe
horse
cat
dog
63
Marginal densities (aggregated across subjects)

Giraffes are distinguished by neck length,
body height and body tilt
Horses are like giraffes, but with shorter
bodies and nearly uniform necks
Cats have longer tails than dogs

64
Relative volume of categories
Convex Hull
Minimum Enclosing Hypercube
Convex hull content divided by enclosing
hypercube content
Giraffe Horse Cat Dog
0.00004 0.00006 0.00003 0.00002

65
Discrimination method(Olman Kersten, 2004)
66
Parameter space for discrimination

Restricted so that most random draws were
animal-like

67
MCMC and discrimination means
68
Conclusion

Markov chain Monte Carlo provides a way to sample
from subjective probability distributions
Many interesting questions can be framed in terms
of subjective probability distributions
inductive biases (priors)
mental representations (category distributions)
Other MCMC methods may provide further empirical
methods
Gibbs for categories, adaptive MCMC,

69
A different approach

Instead of asking whether people are rational,
use assumption of rationality to investigate
cognition
If we can predict peoples responses, we can
design experiments that measure psychological
variables

Randomized algorithms ? Psychological experiments
70
(No Transcript)
71
From sampling to maximizing
72
From sampling to maximizing

General analytic results are hard to obtain
(r ? is Monte Carlo EM with a single sample)
For certain classes of languages, it is possible
to show that the stationary distribution gives
each hypothesis h probability proportional to
P(h)r
the ordering identified by the prior is
preserved, but not the corresponding probabilities

(Kirby, Dowman, Griffiths, in press)
73
Implications for linguistic universals

When learners sample from P(hd), the
distribution over languages converges to the
prior
identifies a one-to-one correspondence between
inductive biases and linguistic universals
As learners move towards maximizing, the
influence of the prior is exaggerated
weak biases can produce strong universals
cultural evolution is a viable alternative to
traditional explanations for linguistic
universals

74
(No Transcript)
75
Iterated concept learning

Each learner sees examples from a species
Identifies species of four amoebae
Iterated learning is run within-subjects

hypotheses
data
(Griffiths, Christian, Kalish, in press)
76
Two positive examples
data (d)
hypotheses (h)
77
Bayesian model(Tenenbaum, 1999 Tenenbaum
Griffiths, 2001)
d 2 amoebae h set of 4 amoebae
78
Classes of concepts(Shepard, Hovland, Jenkins,
1958)
color
size
shape
Class 1
Class 2
Class 3
Class 4
Class 5
Class 6
79
Experiment design (for each subject)
6 iterated learning chains
6 independent learning chains
80
Estimating the prior
data (d)
hypotheses (h)
81
Estimating the prior
Prior
Bayesian model
Human subjects
0.861
Class 1
Class 2
0.087
0.009
Class 3
0.002
Class 4
0.013
Class 5
Class 6
0.028
r 0.952
82
Two positive examples(n 20)
Human learners
Bayesian model
Probability
Probability
Iteration
Iteration
83
Two positive examples(n 20)
Human learners
Probability
Bayesian model
84
Three positive examples
data (d)
hypotheses (h)
85
Three positive examples(n 20)
Human learners
Bayesian model
Probability
Probability
Iteration
Iteration
86
Three positive examples(n 20)
Human learners
Bayesian model
87
(No Transcript)
88
Classification objects
89
Parameter space for discrimination

Restricted so that most random draws were
animal-like

90
MCMC and discrimination means
91
Problems with classification objects
92
Problems with classification objects
Minimum Enclosing Hypercube
Convex Hull
Convex hull content divided by enclosing
hypercube content
Giraffe Horse Cat Dog
0.00004 0.00006 0.00003 0.00002

93
(No Transcript)
94
Allowing a Wider Range of Behavior

An exponentiated choice rule results in a
Markov chain with stationary distribution
corresponding to an exponentiated version of the
category distribution, proportional to p(xc)?

95
Category drift

For fragile categories, the MCMC procedure could
influence the category representation
Interleaved training and test blocks in the
training experiments

96
(No Transcript)

Write a Comment

User Comments (0)