Revealing inductive biases with Bayesian models - PowerPoint PPT Presentation

About This Presentation

Title:

Revealing inductive biases with Bayesian models

Description:

Tom Griffiths. UC Berkeley. with Mike Kalish, Brian Christian, and Steve Lewandowsky ... Griffiths & Tenenbaum, 2006) ... (Griffiths & Kalish, in press) ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 67

Provided by: cocosciB

Learn more at: https://cocosci.princeton.edu

Category:

more less

Transcript and Presenter's Notes

Title: Revealing inductive biases with Bayesian models

1
Revealing inductive biases with Bayesian models

Tom Griffiths
UC Berkeley

with Mike Kalish, Brian Christian, and Steve
Lewandowsky
2
Inductive problems
3
Generalization requires induction

Generalization predicting the properties of
an entity from observed properties of others

4
What makes a good inductive learner?

Hypothesis 1 more representational power
more hypotheses, more complexity
spirit of many accounts of learning and
development

5
Some hypothesis spaces

Linear functions
Quadratic functions
8th degree polynomials

6
Minimizing squared error
7
Minimizing squared error
8
Minimizing squared error
9
Minimizing squared error
10
Measuring prediction error
11
What makes a good inductive learner?

Hypothesis 1 more representational power
more hypotheses, more complexity
spirit of many accounts of learning and
development
Hypothesis 2 good inductive biases
constraints on hypotheses that match the
environment

12
Outline

The bias-variance tradeoff
Bayesian inference and inductive biases
Revealing inductive biases
Conclusions

13
Outline

The bias-variance tradeoff
Bayesian inference and inductive biases
Revealing inductive biases
Conclusions

14
A simple schema for induction

Data D are n pairs (x,y) generated from function
f
Hypothesis space of functions, y g(x)
Error is E (y - g(x))2
Pick function g that minimizes error on D
Measure prediction error, averaging over x and y

15
Bias and variance

A good learner makes (f(x) - g(x))2 small
g is chosen on the basis of the data D
Evaluate learners by the average of (f(x) -
g(x))2 over data D generated from f

(Geman, Bienenstock, Doursat, 1992)
16
Making things more intuitive

The next few slides were generated by
choosing a true function f(x)
generating a number of datasets D from p(x,y)
defined by uniform p(x), p(yx) f(x) plus noise
finding the function g(x) in the hypothesis space
that minimized the error on D
Comparing average of g(x) to f(x) reveals bias
Spread of g(x) around average is the variance

17
Linear functions (n 10)
18
Linear functions (n 10)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
19
Quadratic functions (n 10)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
20
8-th degree polynomials (n 10)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
21
Bias and variance
(for our (quadratic) f(x), with n 10) Linear
functions high bias, medium variance Quadratic
functions low bias, low variance 8-th order
polynomials low bias, super-high variance
22
In general

Larger hypothesis spaces result in higher
variance, but low bias across several f(x)
The bias-variance tradeoff
if we want a learner that has low bias on a range
of problems, we pay a price in variance
This is mainly an issue when n is small
the regime of much of human learning

23
Quadratic functions (n 100)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
24
8-th degree polynomials (n 100)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
25
The moral

General-purpose learning mechanisms do not work
well with small amounts of data
more representational power isnt always better
To make good predictions from small amounts of
data, you need a bias that matches the problem
these biases are the key to successful induction,
and characterize the nature of an inductive
learner
So how can we identify human inductive biases?

26
Outline

The bias-variance tradeoff
Bayesian inference and inductive biases
Revealing inductive biases
Conclusions

27
Bayesian inference

Rational procedure for updating beliefs
Foundation of many learning algorithms
Lets us make the inductive biases of learners
precise

Reverend Thomas Bayes
28
Bayes theorem
h hypothesis d data
29
Priors and biases

Priors indicate the kind of world a learner
expects to encounter, guiding their conclusions
In our function learning example
likelihood gives probability to data that
decrease with sum squared errors (i.e. a
Gaussian)
priors are uniform over all functions in
hypothesis spaces of different kinds of
polynomials
having more functions corresponds to a belief in
a more complex world

30
Outline

The bias-variance tradeoff
Bayesian inference and inductive biases
Revealing inductive biases
Conclusions

31
Two ways of using Bayesian models

Specify models that make different assumptions
about priors, and compare their fit to human data
(Anderson Schooler, 1991
Oaksford Chater, 1994
Griffiths Tenenbaum, 2006)
Design experiments explicitly intended to reveal
the priors of Bayesian learners

32
Iterated learning(Kirby, 2001)
What are the consequences of learners learning
from other learners?
33
Objects of iterated learning

Knowledge communicated across generations through
provision of data by learners
Examples
religious concepts
social norms
myths and legends
causal theories
language

34
Analyzing iterated learning
PL(hd)
PL(hd)
PP(dh)
PP(dh)
PL(hd) probability of inferring hypothesis h
from data d PP(dh) probability of generating
data d from hypothesis h
35
Markov chains
x
x
x
x
x
x
x
x
Transition matrix T P(x(t1)x(t))

Variables x(t1) independent of history given
x(t)
Converges to a stationary distribution under
easily checked conditions (i.e., if it is ergodic)

36
Analyzing iterated learning
37
Iterated Bayesian learning
PL(hd)
PL(hd)
PP(dh)
PP(dh)
38
Stationary distributions

Markov chain on h converges to the prior, P(h)
Markov chain on d converges to the prior
predictive distribution

(Griffiths Kalish, 2005)
39
Explaining convergence to the prior
PL(hd)
PL(hd)
PP(dh)
PP(dh)

Intuitively data acts once, prior many times
Formally iterated learning with Bayesian agents
is a Gibbs sampler on P(d,h)

(Griffiths Kalish, in press)
40
Revealing inductive biases

If iterated learning converges to the prior, it
might provide a tool for determining the
inductive biases of human learners
We can test this by reproducing iterated learning
in the lab, with stimuli for which human biases
are well understood

41
Iterated function learning

Each learner sees a set of (x,y) pairs
Makes predictions of y for new x values
Predictions are data for the next learner

(Kalish, Griffiths, Lewandowsky, in press)
42
Function learning experiments
Examine iterated learning with different initial
data
43
Initial data
Iteration
1 2 3 4
5 6 7 8 9
44
Identifying inductive biases

Formal analysis suggests that iterated learning
provides a way to determine inductive biases
Experiments with human learners support this idea
when stimuli for which biases are well understood
are used, those biases are revealed by iterated
learning
What do inductive biases look like in other
cases?
continuous categories
causal structure
word learning
language learning

45
Outline

The bias-variance tradeoff
Bayesian inference and inductive biases
Revealing inductive biases
Conclusions

46
Conclusions

Solving inductive problems and forming good
generalizations requires good inductive biases
Bayesian inference provides a way to make
assumptions about the biases of learners explicit
Two ways to identify human inductive biases
compare Bayesian models assuming different priors
design tasks to extract biases from Bayesian
learners
Iterated learning provides a lens for magnifying
the inductive biases of learners
small effects for individuals are big effects for
groups

47
(No Transcript)
48
Iterated concept learning

Each learner sees examples from a species
Identifies species of four amoebae
Iterated learning is run within-subjects

hypotheses
data
(Griffiths, Christian, Kalish, in press)
49
Two positive examples
data (d)
hypotheses (h)
50
Bayesian model(Tenenbaum, 1999 Tenenbaum
Griffiths, 2001)
d 2 amoebae h set of 4 amoebae
51
Classes of concepts(Shepard, Hovland, Jenkins,
1961)
color
size
shape
Class 1
Class 2
Class 3
Class 4
Class 5
Class 6
52
Experiment design (for each subject)
6 iterated learning chains
6 independent learning chains
53
Estimating the prior
data (d)
hypotheses (h)
54
Estimating the prior
Prior
Bayesian model
Human subjects
0.861
Class 1
Class 2
0.087
0.009
Class 3
0.002
Class 4
0.013
Class 5
Class 6
0.028
r 0.952
55
Two positive examples(n 20)
Human learners
Bayesian model
Probability
Probability
Iteration
Iteration
56
Two positive examples(n 20)
Human learners
Probability
Bayesian model
57
Three positive examples
data (d)
hypotheses (h)
58
Three positive examples(n 20)
Human learners
Bayesian model
Probability
Probability
Iteration
Iteration
59
Three positive examples(n 20)
Human learners
Bayesian model
60
(No Transcript)
61
Serial reproduction(Bartlett, 1932)

Participants see stimuli, then reproduce them
from memory
Reproductions of one participant are stimuli for
the next
Stimuli were interesting, rather than controlled
e.g., War of the Ghosts

62
(No Transcript)
63
Discovering the biases of models
Generic neural network
64
Discovering the biases of models
EXAM (Delosh, Busemeyer, McDaniel, 1997)
65
Discovering the biases of models
POLE (Kalish, Lewandowsky, Kruschke, 2004)
66
(No Transcript)

Write a Comment

User Comments (0)