Revealing inductive biases with Bayesian models - PowerPoint PPT Presentation

About This Presentation
Title:

Revealing inductive biases with Bayesian models

Description:

Tom Griffiths. UC Berkeley. with Mike Kalish, Brian Christian, and Steve Lewandowsky ... Griffiths & Tenenbaum, 2006) ... (Griffiths & Kalish, in press) ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 67
Provided by: cocosciB
Category:

less

Transcript and Presenter's Notes

Title: Revealing inductive biases with Bayesian models


1
Revealing inductive biases with Bayesian models
  • Tom Griffiths
  • UC Berkeley

with Mike Kalish, Brian Christian, and Steve
Lewandowsky
2
Inductive problems
3
Generalization requires induction
  • Generalization predicting the properties of
    an entity from observed properties of others

4
What makes a good inductive learner?
  • Hypothesis 1 more representational power
  • more hypotheses, more complexity
  • spirit of many accounts of learning and
    development

5
Some hypothesis spaces
  • Linear functions
  • Quadratic functions
  • 8th degree polynomials

6
Minimizing squared error
7
Minimizing squared error
8
Minimizing squared error
9
Minimizing squared error
10
Measuring prediction error
11
What makes a good inductive learner?
  • Hypothesis 1 more representational power
  • more hypotheses, more complexity
  • spirit of many accounts of learning and
    development
  • Hypothesis 2 good inductive biases
  • constraints on hypotheses that match the
    environment

12
Outline
  • The bias-variance tradeoff
  • Bayesian inference and inductive biases
  • Revealing inductive biases
  • Conclusions

13
Outline
  • The bias-variance tradeoff
  • Bayesian inference and inductive biases
  • Revealing inductive biases
  • Conclusions

14
A simple schema for induction
  • Data D are n pairs (x,y) generated from function
    f
  • Hypothesis space of functions, y g(x)
  • Error is E (y - g(x))2
  • Pick function g that minimizes error on D
  • Measure prediction error, averaging over x and y

15
Bias and variance
  • A good learner makes (f(x) - g(x))2 small
  • g is chosen on the basis of the data D
  • Evaluate learners by the average of (f(x) -
    g(x))2 over data D generated from f

(Geman, Bienenstock, Doursat, 1992)
16
Making things more intuitive
  • The next few slides were generated by
  • choosing a true function f(x)
  • generating a number of datasets D from p(x,y)
    defined by uniform p(x), p(yx) f(x) plus noise
  • finding the function g(x) in the hypothesis space
    that minimized the error on D
  • Comparing average of g(x) to f(x) reveals bias
  • Spread of g(x) around average is the variance

17
Linear functions (n 10)
18
Linear functions (n 10)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
19
Quadratic functions (n 10)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
20
8-th degree polynomials (n 10)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
21
Bias and variance
(for our (quadratic) f(x), with n 10) Linear
functions high bias, medium variance Quadratic
functions low bias, low variance 8-th order
polynomials low bias, super-high variance
22
In general
  • Larger hypothesis spaces result in higher
    variance, but low bias across several f(x)
  • The bias-variance tradeoff
  • if we want a learner that has low bias on a range
    of problems, we pay a price in variance
  • This is mainly an issue when n is small
  • the regime of much of human learning

23
Quadratic functions (n 100)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
24
8-th degree polynomials (n 100)
pink is g(x) for each dataset red is average
g(x) black is f(x)
y
x
25
The moral
  • General-purpose learning mechanisms do not work
    well with small amounts of data
  • more representational power isnt always better
  • To make good predictions from small amounts of
    data, you need a bias that matches the problem
  • these biases are the key to successful induction,
    and characterize the nature of an inductive
    learner
  • So how can we identify human inductive biases?

26
Outline
  • The bias-variance tradeoff
  • Bayesian inference and inductive biases
  • Revealing inductive biases
  • Conclusions

27
Bayesian inference
  • Rational procedure for updating beliefs
  • Foundation of many learning algorithms
  • Lets us make the inductive biases of learners
    precise

Reverend Thomas Bayes
28
Bayes theorem
h hypothesis d data
29
Priors and biases
  • Priors indicate the kind of world a learner
    expects to encounter, guiding their conclusions
  • In our function learning example
  • likelihood gives probability to data that
    decrease with sum squared errors (i.e. a
    Gaussian)
  • priors are uniform over all functions in
    hypothesis spaces of different kinds of
    polynomials
  • having more functions corresponds to a belief in
    a more complex world

30
Outline
  • The bias-variance tradeoff
  • Bayesian inference and inductive biases
  • Revealing inductive biases
  • Conclusions

31
Two ways of using Bayesian models
  • Specify models that make different assumptions
    about priors, and compare their fit to human data
  • (Anderson Schooler, 1991
  • Oaksford Chater, 1994
  • Griffiths Tenenbaum, 2006)
  • Design experiments explicitly intended to reveal
    the priors of Bayesian learners

32
Iterated learning(Kirby, 2001)
What are the consequences of learners learning
from other learners?
33
Objects of iterated learning
  • Knowledge communicated across generations through
    provision of data by learners
  • Examples
  • religious concepts
  • social norms
  • myths and legends
  • causal theories
  • language

34
Analyzing iterated learning
PL(hd)
PL(hd)
PP(dh)
PP(dh)
PL(hd) probability of inferring hypothesis h
from data d PP(dh) probability of generating
data d from hypothesis h
35
Markov chains
x
x
x
x
x
x
x
x
Transition matrix T P(x(t1)x(t))
  • Variables x(t1) independent of history given
    x(t)
  • Converges to a stationary distribution under
    easily checked conditions (i.e., if it is ergodic)

36
Analyzing iterated learning
37
Iterated Bayesian learning
PL(hd)
PL(hd)
PP(dh)
PP(dh)
38
Stationary distributions
  • Markov chain on h converges to the prior, P(h)
  • Markov chain on d converges to the prior
    predictive distribution

(Griffiths Kalish, 2005)
39
Explaining convergence to the prior
PL(hd)
PL(hd)
PP(dh)
PP(dh)
  • Intuitively data acts once, prior many times
  • Formally iterated learning with Bayesian agents
    is a Gibbs sampler on P(d,h)

(Griffiths Kalish, in press)
40
Revealing inductive biases
  • If iterated learning converges to the prior, it
    might provide a tool for determining the
    inductive biases of human learners
  • We can test this by reproducing iterated learning
    in the lab, with stimuli for which human biases
    are well understood

41
Iterated function learning
  • Each learner sees a set of (x,y) pairs
  • Makes predictions of y for new x values
  • Predictions are data for the next learner

(Kalish, Griffiths, Lewandowsky, in press)
42
Function learning experiments
Examine iterated learning with different initial
data
43
Initial data
Iteration
1 2 3 4
5 6 7 8 9
44
Identifying inductive biases
  • Formal analysis suggests that iterated learning
    provides a way to determine inductive biases
  • Experiments with human learners support this idea
  • when stimuli for which biases are well understood
    are used, those biases are revealed by iterated
    learning
  • What do inductive biases look like in other
    cases?
  • continuous categories
  • causal structure
  • word learning
  • language learning

45
Outline
  • The bias-variance tradeoff
  • Bayesian inference and inductive biases
  • Revealing inductive biases
  • Conclusions

46
Conclusions
  • Solving inductive problems and forming good
    generalizations requires good inductive biases
  • Bayesian inference provides a way to make
    assumptions about the biases of learners explicit
  • Two ways to identify human inductive biases
  • compare Bayesian models assuming different priors
  • design tasks to extract biases from Bayesian
    learners
  • Iterated learning provides a lens for magnifying
    the inductive biases of learners
  • small effects for individuals are big effects for
    groups

47
(No Transcript)
48
Iterated concept learning
  • Each learner sees examples from a species
  • Identifies species of four amoebae
  • Iterated learning is run within-subjects

hypotheses
data
(Griffiths, Christian, Kalish, in press)
49
Two positive examples
data (d)
hypotheses (h)
50
Bayesian model(Tenenbaum, 1999 Tenenbaum
Griffiths, 2001)
d 2 amoebae h set of 4 amoebae
51
Classes of concepts(Shepard, Hovland, Jenkins,
1961)
color
size
shape
Class 1
Class 2
Class 3
Class 4
Class 5
Class 6
52
Experiment design (for each subject)
6 iterated learning chains
6 independent learning chains
53
Estimating the prior
data (d)
hypotheses (h)
54
Estimating the prior
Prior
Bayesian model
Human subjects
0.861
Class 1
Class 2
0.087
0.009
Class 3
0.002
Class 4
0.013
Class 5
Class 6
0.028
r 0.952
55
Two positive examples(n 20)
Human learners
Bayesian model
Probability
Probability
Iteration
Iteration
56
Two positive examples(n 20)
Human learners
Probability
Bayesian model
57
Three positive examples
data (d)
hypotheses (h)
58
Three positive examples(n 20)
Human learners
Bayesian model
Probability
Probability
Iteration
Iteration
59
Three positive examples(n 20)
Human learners
Bayesian model
60
(No Transcript)
61
Serial reproduction(Bartlett, 1932)
  • Participants see stimuli, then reproduce them
    from memory
  • Reproductions of one participant are stimuli for
    the next
  • Stimuli were interesting, rather than controlled
  • e.g., War of the Ghosts

62
(No Transcript)
63
Discovering the biases of models
Generic neural network
64
Discovering the biases of models
EXAM (Delosh, Busemeyer, McDaniel, 1997)
65
Discovering the biases of models
POLE (Kalish, Lewandowsky, Kruschke, 2004)
66
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com