Title: Revealing inductive biases with Bayesian models
1Revealing inductive biases with Bayesian models
- Tom Griffiths
- UC Berkeley
with Mike Kalish, Brian Christian, and Steve
Lewandowsky
2Inductive problems
3Generalization requires induction
- Generalization predicting the properties of
an entity from observed properties of others
4What makes a good inductive learner?
- Hypothesis 1 more representational power
- more hypotheses, more complexity
- spirit of many accounts of learning and
development
5Some hypothesis spaces
- Linear functions
- Quadratic functions
- 8th degree polynomials
6Minimizing squared error
7Minimizing squared error
8Minimizing squared error
9Minimizing squared error
10Measuring prediction error
11What makes a good inductive learner?
- Hypothesis 1 more representational power
- more hypotheses, more complexity
- spirit of many accounts of learning and
development - Hypothesis 2 good inductive biases
- constraints on hypotheses that match the
environment
12Outline
- The bias-variance tradeoff
- Bayesian inference and inductive biases
- Revealing inductive biases
- Conclusions
13Outline
- The bias-variance tradeoff
- Bayesian inference and inductive biases
- Revealing inductive biases
- Conclusions
14A simple schema for induction
- Data D are n pairs (x,y) generated from function
f - Hypothesis space of functions, y g(x)
- Error is E (y - g(x))2
- Pick function g that minimizes error on D
- Measure prediction error, averaging over x and y
15Bias and variance
- A good learner makes (f(x) - g(x))2 small
- g is chosen on the basis of the data D
- Evaluate learners by the average of (f(x) -
g(x))2 over data D generated from f
(Geman, Bienenstock, Doursat, 1992)
16Making things more intuitive
- The next few slides were generated by
- choosing a true function f(x)
- generating a number of datasets D from p(x,y)
defined by uniform p(x), p(yx) f(x) plus noise - finding the function g(x) in the hypothesis space
that minimized the error on D - Comparing average of g(x) to f(x) reveals bias
- Spread of g(x) around average is the variance
17Linear functions (n 10)
18Linear functions (n 10)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
19Quadratic functions (n 10)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
208-th degree polynomials (n 10)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
21Bias and variance
(for our (quadratic) f(x), with n 10) Linear
functions high bias, medium variance Quadratic
functions low bias, low variance 8-th order
polynomials low bias, super-high variance
22In general
- Larger hypothesis spaces result in higher
variance, but low bias across several f(x) - The bias-variance tradeoff
- if we want a learner that has low bias on a range
of problems, we pay a price in variance - This is mainly an issue when n is small
- the regime of much of human learning
23Quadratic functions (n 100)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
248-th degree polynomials (n 100)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
25The moral
- General-purpose learning mechanisms do not work
well with small amounts of data - more representational power isnt always better
- To make good predictions from small amounts of
data, you need a bias that matches the problem - these biases are the key to successful induction,
and characterize the nature of an inductive
learner - So how can we identify human inductive biases?
26Outline
- The bias-variance tradeoff
- Bayesian inference and inductive biases
- Revealing inductive biases
- Conclusions
27Bayesian inference
- Rational procedure for updating beliefs
- Foundation of many learning algorithms
- Lets us make the inductive biases of learners
precise
Reverend Thomas Bayes
28Bayes theorem
h hypothesis d data
29Priors and biases
- Priors indicate the kind of world a learner
expects to encounter, guiding their conclusions - In our function learning example
- likelihood gives probability to data that
decrease with sum squared errors (i.e. a
Gaussian) - priors are uniform over all functions in
hypothesis spaces of different kinds of
polynomials - having more functions corresponds to a belief in
a more complex world
30Outline
- The bias-variance tradeoff
- Bayesian inference and inductive biases
- Revealing inductive biases
- Conclusions
31Two ways of using Bayesian models
- Specify models that make different assumptions
about priors, and compare their fit to human data - (Anderson Schooler, 1991
- Oaksford Chater, 1994
- Griffiths Tenenbaum, 2006)
- Design experiments explicitly intended to reveal
the priors of Bayesian learners
32Iterated learning(Kirby, 2001)
What are the consequences of learners learning
from other learners?
33Objects of iterated learning
- Knowledge communicated across generations through
provision of data by learners - Examples
- religious concepts
- social norms
- myths and legends
- causal theories
- language
34Analyzing iterated learning
PL(hd)
PL(hd)
PP(dh)
PP(dh)
PL(hd) probability of inferring hypothesis h
from data d PP(dh) probability of generating
data d from hypothesis h
35Markov chains
x
x
x
x
x
x
x
x
Transition matrix T P(x(t1)x(t))
- Variables x(t1) independent of history given
x(t) - Converges to a stationary distribution under
easily checked conditions (i.e., if it is ergodic)
36Analyzing iterated learning
37Iterated Bayesian learning
PL(hd)
PL(hd)
PP(dh)
PP(dh)
38Stationary distributions
- Markov chain on h converges to the prior, P(h)
- Markov chain on d converges to the prior
predictive distribution
(Griffiths Kalish, 2005)
39Explaining convergence to the prior
PL(hd)
PL(hd)
PP(dh)
PP(dh)
- Intuitively data acts once, prior many times
- Formally iterated learning with Bayesian agents
is a Gibbs sampler on P(d,h)
(Griffiths Kalish, in press)
40Revealing inductive biases
- If iterated learning converges to the prior, it
might provide a tool for determining the
inductive biases of human learners - We can test this by reproducing iterated learning
in the lab, with stimuli for which human biases
are well understood
41Iterated function learning
- Each learner sees a set of (x,y) pairs
- Makes predictions of y for new x values
- Predictions are data for the next learner
(Kalish, Griffiths, Lewandowsky, in press)
42Function learning experiments
Examine iterated learning with different initial
data
43Initial data
Iteration
1 2 3 4
5 6 7 8 9
44Identifying inductive biases
- Formal analysis suggests that iterated learning
provides a way to determine inductive biases - Experiments with human learners support this idea
- when stimuli for which biases are well understood
are used, those biases are revealed by iterated
learning - What do inductive biases look like in other
cases? - continuous categories
- causal structure
- word learning
- language learning
45Outline
- The bias-variance tradeoff
- Bayesian inference and inductive biases
- Revealing inductive biases
- Conclusions
46Conclusions
- Solving inductive problems and forming good
generalizations requires good inductive biases - Bayesian inference provides a way to make
assumptions about the biases of learners explicit - Two ways to identify human inductive biases
- compare Bayesian models assuming different priors
- design tasks to extract biases from Bayesian
learners - Iterated learning provides a lens for magnifying
the inductive biases of learners - small effects for individuals are big effects for
groups
47(No Transcript)
48Iterated concept learning
- Each learner sees examples from a species
- Identifies species of four amoebae
- Iterated learning is run within-subjects
hypotheses
data
(Griffiths, Christian, Kalish, in press)
49Two positive examples
data (d)
hypotheses (h)
50Bayesian model(Tenenbaum, 1999 Tenenbaum
Griffiths, 2001)
d 2 amoebae h set of 4 amoebae
51Classes of concepts(Shepard, Hovland, Jenkins,
1961)
color
size
shape
Class 1
Class 2
Class 3
Class 4
Class 5
Class 6
52Experiment design (for each subject)
6 iterated learning chains
6 independent learning chains
53Estimating the prior
data (d)
hypotheses (h)
54Estimating the prior
Prior
Bayesian model
Human subjects
0.861
Class 1
Class 2
0.087
0.009
Class 3
0.002
Class 4
0.013
Class 5
Class 6
0.028
r 0.952
55Two positive examples(n 20)
Human learners
Bayesian model
Probability
Probability
Iteration
Iteration
56Two positive examples(n 20)
Human learners
Probability
Bayesian model
57Three positive examples
data (d)
hypotheses (h)
58Three positive examples(n 20)
Human learners
Bayesian model
Probability
Probability
Iteration
Iteration
59Three positive examples(n 20)
Human learners
Bayesian model
60(No Transcript)
61Serial reproduction(Bartlett, 1932)
- Participants see stimuli, then reproduce them
from memory - Reproductions of one participant are stimuli for
the next - Stimuli were interesting, rather than controlled
- e.g., War of the Ghosts
62(No Transcript)
63Discovering the biases of models
Generic neural network
64Discovering the biases of models
EXAM (Delosh, Busemeyer, McDaniel, 1997)
65Discovering the biases of models
POLE (Kalish, Lewandowsky, Kruschke, 2004)
66(No Transcript)