Bayesian approaches to cognitive sciences - PowerPoint PPT Presentation

1 / 66

About This Presentation

Title:

Bayesian approaches to cognitive sciences

Description:

Given a few instances of a particular words, say dog', how do we generalize to new instance? Hypothesis elimination: use deductive logic (along with prior ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 67

Provided by: Pou1

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian approaches to cognitive sciences

1
Bayesian approaches to cognitive sciences
2

Word learning
Bayesian property induction
Theory-based causal inference

3
Word Learning
4
Word Learning

Some constrains on word learning
Very few examples required
Learning is possible with only positive examples
Word meanings overlap
Learning is often graded

5
Word Learning

Given a few instances of a particular words, say
dog, how do we generalize to new instance?
Hypothesis elimination use deductive logic
(along with prior knowledge) to eliminate
hypothesis that are inconsistent with the use of
the word.

6
Word Learning

Some constrains on word learning
Very few examples required
Learning is possible with only positive examples
Word meaning overlap
Learning is often graded

7
Word Learning

Given a few instances of a particular words, say
dog, how do we generalize to new instance?
Connectionist (associative) approach compute the
probability of co-occurrences of object features
and the corresponding word

8
Word Learning

Some constrains on word learning
Very few examples required
Learning is possible with only positive examples
Word meaning overlap
Learning is often graded

9
Word learning

Alternative Rational statistical inference with
structure hypothesis space
Suppose you see a Dalmatian and you hear fep.
Does fep refer to all dogs or just Dalmatians?
What if you hear 3 more example, all
corresponding to Dalmatians? Then it should be
clear fep are Dalmatians because this
observation would be suspicious coincidence if
fep referred to all dogs.
Therefore, logic is not enough, you also need
probabilities. However, you dont need that many
examples. And co-occurrence frequencies is not
enough (in our example, fep is associated 100 of
the time with Dalmatians whether you see one or
three examples)
We need structured prior knowledge

10
Word Learning

Suppose objects are organized in taxonomic trees

(animals)
(dogs)
(dalmatians)
11
Word learning

Were given N examples of a word C. The goal of
learning will be to determine whether C
corresponds to the subordinate, basic or
superordinate level. The level in the taxonomy is
what we mean by meaning.
h Hypothesis, i.e., word meaning.
The set of possible hypothesis is strongly
constrained by the tree structure.

12
Word Learning
T Tree structure
H hypotheses
(animals)
(dogs)
(dalmatians)
13
Word learning

Inference just follow Bayes rule

h Hypothesis is this a dog/basic level word?
x Data, e.g. a set of labeled images of animals
T type of representation being assumed (e.g.
tree stucture)
14
Word learning
Prior the prior is strongly constrained by the
tree structure. Only some hypothesis are possible
(the ones corresponding to the hierarchical
levels in the tree)

Inference just follow Bayes rule

Likelihood function probability of the data
15
Word learning

Likelihood functions and the size principle
Assume youre given n example of a particular
group (e.g. 3 examples of dogs, or 3 examples of
dalmatians). Then

16
Word learning

Lets assume there are 100 dogs in the world, 10
of them Dalmatians. If examples are drawn
randomly with replacement from those pools, we
have

17
Word learning

More generally, the probability of getting n
examples of a particular hypothesis h is given
by
This is known as the size principle. Multiple
examples drawn from smaller sets are more likely.

18
Word learning

Let say youre given 1 example of a Dalmatian
Conclusion its very likely to be a subordinate
word

19
Word learning

Let say youre given 4 examples, all Dalmatians.
Conclusion its a subordinate word (dalmatian)
with near certainty or its a very suspicious
coincidence!

20
Word learning

Let say youre given 5 examples, 2 Dalmatians and
3 German Shepards.
Conclusion its a basic level word (dog) with
near certainty

Probablity that images got mislabeled. Assumed to
be very small.
21
Word Learning

Subject shown one Dalmatian and told its a fep
Subord. match subject is shown a new dalmatian
and asked if its a fep
Basic match subject is shown a new dog
(non-dalmatian) and asked whether its a fep

22
Word Learning
As more subord. examples are collected,
probability for basic and superord. level go
down.

Subject shown three Dalmatians and told they are
feps
Subord. match subject is shown a new dalmatian
and asked if its a fep
Basic match subject is shown a new dog
(non-dalmatian) and asked whether its a fep

23
Word Learning
As more basic examples are collected, probability
for basic level goes up.
With only one example, sub level is favored.
24
Word Learning
Model produces similar behavior
25
(No Transcript)
26
(No Transcript)
27
Bayesian property induction
28
Bayesian property induction

Given that Gorillas and Chimpanzees have gene X,
do Macaques have gene X?
If cheetah and giraffes carry disease X, do polar
bear carry disease X?
Classic approach boolean logic.
Problem such questions are inherently
probabilistic. Answering yes or no would be very
misleading.
Fuzzy logic?

29
Bayesian property induction

C concept (e.g. mammals can get disease X),
i.e., a set of animals defined by a particular
property.
H Hypothesis space. Space of all possible
concepts, i.e., all possible sets of animal. With
10 animals, H contains 210 sets.
h a particular set. Note that there is an h for
which hC.
y a particular statement (Dolphins can get
disease X), that is, a subset of any hypothesis.
X a set of observations drawn for the concept C.

30
Bayesian property induction

The goal of inference is to determine whether y
belong to a concept C for which we have samples
X
E.g., Given that Gorillas and Chimpanzees have
gene X, do Macaques have gene X?
X Gorillas, Chimpanzees
y Macaques
C the set of all animals with gene X.
Note that we dont know the full list of animal
in set C. C is a hidden (or latent) variable. We
need to integrate it out.

31
Bayesian property induction

Animal bufallo, zebra, giraffe, sealb,z,g,s
Xb,z have property a
yg, do giraffes have property a?

Probability that g has property a given that b
and z have property a
Probability that g has property a given that h
contains g. It must be equal to 1.
32
Bayesian property induction

More formally

Warning this is the probability that X will be
observed given h. It is not the probability that
members of X belong to h.
33
Bayesian property induction

The likelihood function
Animal bufallo, zebra, giraffe, sealb,z,g,s
If hb,z
p(Xb h)0.5
p(Xb,z h)0.50.5
p(Xg h)0. This rules out all hypothesis that
do not contain all of the observations
If hb,z,g have property a
p(Xb,zh)0.32. The larger the set, the
smaller the likelihood. Occams razor.

34
Bayesian property induction

The likelihood function
Non zero only if X is subset of h. Note that sets
that contains no or some elements of X, but no
all elements of X, are ruled out. Also, data are
less likely to come from large sets (Occams
razor?).

35
Bayesian property induction

The Prior the prior should embed knowledge of
the domain.
Naive approach If we have 10 animals and we
consider all possible sets, we ended up with
2101024 sets. A flat prior over this would yield
a prior of 1/1024 for each set.

36
Bayesian property induction

Bayesian taxonomic prior
Note only 19 sets have non zero prior.
210-191005 sets have a prior of zero. HUGE
constrain!

37
Bayesian property induction

Taxonomic prior is not enough. Why?
Seals and squirrels catch disease X, so horses
are also susceptible
Seals and cows catch disease X, so horses are
also susceptible
Most people say that statement 2 is stronger.

38
Bayesian property induction

Seals and squirrels catch disease X, so horses
are also susceptible
Seals and cows catch disease X, so horses are
also susceptible

Only hypothesis that can have all three animals
Only hypothesis that can have all three animals
39
Bayesian property induction

This approach does not distinguish the two
statements

Only hypothesis that can have all three animals
Only hypothesis that can have all three animals
40
Bayesian property induction

Evolutionary Bayes

41
Bayesian property induction

Evolutionary Bayes all 1024 sets of animals are
possible, but they differ by their prior. Sets
that contains animals that are nearby in the tree
are more likely.

42
Bayesian property induction

Comparison

p
x
p2
43
Bayesian property induction

Comparison

likely set pp2
44
Bayesian property induction

Comparison

unlikely set p2
45
Bayesian property induction

Seals and squirrels catch disease X, so horses
are also susceptible
Seals and cows catch disease X, so horses are
also susceptible
Under the evolutionary prior, there are more
scenarios that are compatible with the second
statement.

46
Bayesian property induction

Evolutionary prior explain the data better than
any other model (but not by much).

47
Other priors
48
How to learn the structure of the prior

Syntactic rules for growing graphs

49
(No Transcript)
50
(No Transcript)
51
Theory-based causal inference
52
Theory-based causal inference

Can we use this framework to infer causality?

53
Theory-based causal inference

Blicket detector activates when a blicket is
placed onto it.
Observation 1 B1 and B2 detector on
Most kids say that B1 and B2 are blickets
Observation 2 B1 alone detector on
All kids say B1 is a blicket but not B2.
This is known as extinction or explaining away

54
Theory-based causal inference

Impossible to capture with usual learning
algorithm because there arent enough trials to
learn all the probabilities involved.
Simple reasoning could be used, along with
Occams razor (e.g., B1 alone is enough to
explain all the data), but its hard to formalize
(How do we define Occams razor?)

55
Theory-based causal inference

Alternative assume the data were generated by a
causal process.
We observed two trials d1e1, x11, x21 and
d1e1, x11, x20. What kind of Bayesian net
can account for these data?
There are only four possible networks

56
Theory-based causal inference

Which network is most likely to explain the data?
Bayesian approach compute the posterior over
networks given the data
If we assume that the probability of any object
to be a blicket is r, the prior over Bayesian
nets is given by

57
Theory-based causal inference

Let consider what happen after we observe
d1e1, x11, x21
If we assume the machine does not go off by
itself (p(e1)0), we have

58
Theory-based causal inference

Let consider what happen after we observe
d1e1, x11, x21
For the other nets

59
Theory-based causal inference

Therefore, were left with three networks and for
each of them we have
Assuming blickets are rare (rlt0.5), the most
likely explanations are the ones for which only
one object is a blicket (h10 and h01). Therefore,
object1 or object 2 is a blicket (but its
unlikely that both are blickets!)

60
Theory-based causal inference

We now observe d2e1, x11, x20
Again, if we assume the machine does not go off
by itself (p(e1)0), we have

Were assuming that the machine does not go off
if there is no blicket
61
Theory-based causal inference

We observed two trials d1e1, x11, x21 and
d1e1, x11, x20. h00 and h01 are
inconsistent with these data, so were left with
the other two.

62
Theory-based causal inference

We observed two trials d1e1, x11, x21 and
d1e1, x11, x20 and were left with two
networks.
Assuming blickets are rare (rlt0.5), The network
in which only one object is a blicket (h10) is
the most likely explanation

63
Theory-based causal inference

But what happens to the probability that X2 is a
blicket?
To compute this we need to compute

Either this link is in network hjk or its not
Hypothesis for which this link exists must have a
1 here.
64
Theory-based causal inference

But what happens to the probability that X2 is a
blicket?

(Data suggest that r1/3)
65
Theory-based causal inference

Probability that X2 is a blicket after the second
observation
Therefore the probability that X2 is a blicket
went down after the second observation (3/5 to
1/3) which is consistent with kids reports.
Occams razor comes from assuming rlt0.5.

66
Theory-based causal inference