Bayesian approaches to cognitive sciences - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Bayesian approaches to cognitive sciences

Description:

Given a few instances of a particular words, say dog', how do we generalize to new instance? Hypothesis elimination: use deductive logic (along with prior ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 67
Provided by: Pou1
Category:

less

Transcript and Presenter's Notes

Title: Bayesian approaches to cognitive sciences


1
Bayesian approaches to cognitive sciences
2
  • Word learning
  • Bayesian property induction
  • Theory-based causal inference

3
Word Learning
4
Word Learning
  • Some constrains on word learning
  • Very few examples required
  • Learning is possible with only positive examples
  • Word meanings overlap
  • Learning is often graded

5
Word Learning
  • Given a few instances of a particular words, say
    dog, how do we generalize to new instance?
  • Hypothesis elimination use deductive logic
    (along with prior knowledge) to eliminate
    hypothesis that are inconsistent with the use of
    the word.

6
Word Learning
  • Some constrains on word learning
  • Very few examples required
  • Learning is possible with only positive examples
  • Word meaning overlap
  • Learning is often graded

7
Word Learning
  • Given a few instances of a particular words, say
    dog, how do we generalize to new instance?
  • Connectionist (associative) approach compute the
    probability of co-occurrences of object features
    and the corresponding word

8
Word Learning
  • Some constrains on word learning
  • Very few examples required
  • Learning is possible with only positive examples
  • Word meaning overlap
  • Learning is often graded

9
Word learning
  • Alternative Rational statistical inference with
    structure hypothesis space
  • Suppose you see a Dalmatian and you hear fep.
    Does fep refer to all dogs or just Dalmatians?
    What if you hear 3 more example, all
    corresponding to Dalmatians? Then it should be
    clear fep are Dalmatians because this
    observation would be suspicious coincidence if
    fep referred to all dogs.
  • Therefore, logic is not enough, you also need
    probabilities. However, you dont need that many
    examples. And co-occurrence frequencies is not
    enough (in our example, fep is associated 100 of
    the time with Dalmatians whether you see one or
    three examples)
  • We need structured prior knowledge

10
Word Learning
  • Suppose objects are organized in taxonomic trees

(animals)
(dogs)
(dalmatians)
11
Word learning
  • Were given N examples of a word C. The goal of
    learning will be to determine whether C
    corresponds to the subordinate, basic or
    superordinate level. The level in the taxonomy is
    what we mean by meaning.
  • h Hypothesis, i.e., word meaning.
  • The set of possible hypothesis is strongly
    constrained by the tree structure.

12
Word Learning
T Tree structure
H hypotheses
(animals)
(dogs)
(dalmatians)
13
Word learning
  • Inference just follow Bayes rule

h Hypothesis is this a dog/basic level word?
x Data, e.g. a set of labeled images of animals
T type of representation being assumed (e.g.
tree stucture)
14
Word learning
Prior the prior is strongly constrained by the
tree structure. Only some hypothesis are possible
(the ones corresponding to the hierarchical
levels in the tree)
  • Inference just follow Bayes rule

Likelihood function probability of the data
15
Word learning
  • Likelihood functions and the size principle
  • Assume youre given n example of a particular
    group (e.g. 3 examples of dogs, or 3 examples of
    dalmatians). Then

16
Word learning
  • Lets assume there are 100 dogs in the world, 10
    of them Dalmatians. If examples are drawn
    randomly with replacement from those pools, we
    have

17
Word learning
  • More generally, the probability of getting n
    examples of a particular hypothesis h is given
    by
  • This is known as the size principle. Multiple
    examples drawn from smaller sets are more likely.

18
Word learning
  • Let say youre given 1 example of a Dalmatian
  • Conclusion its very likely to be a subordinate
    word

19
Word learning
  • Let say youre given 4 examples, all Dalmatians.
  • Conclusion its a subordinate word (dalmatian)
    with near certainty or its a very suspicious
    coincidence!

20
Word learning
  • Let say youre given 5 examples, 2 Dalmatians and
    3 German Shepards.
  • Conclusion its a basic level word (dog) with
    near certainty

Probablity that images got mislabeled. Assumed to
be very small.
21
Word Learning
  • Subject shown one Dalmatian and told its a fep
  • Subord. match subject is shown a new dalmatian
    and asked if its a fep
  • Basic match subject is shown a new dog
    (non-dalmatian) and asked whether its a fep

22
Word Learning
As more subord. examples are collected,
probability for basic and superord. level go
down.
  • Subject shown three Dalmatians and told they are
    feps
  • Subord. match subject is shown a new dalmatian
    and asked if its a fep
  • Basic match subject is shown a new dog
    (non-dalmatian) and asked whether its a fep

23
Word Learning
As more basic examples are collected, probability
for basic level goes up.
With only one example, sub level is favored.
24
Word Learning
Model produces similar behavior
25
(No Transcript)
26
(No Transcript)
27
Bayesian property induction
28
Bayesian property induction
  • Given that Gorillas and Chimpanzees have gene X,
    do Macaques have gene X?
  • If cheetah and giraffes carry disease X, do polar
    bear carry disease X?
  • Classic approach boolean logic.
  • Problem such questions are inherently
    probabilistic. Answering yes or no would be very
    misleading.
  • Fuzzy logic?

29
Bayesian property induction
  • C concept (e.g. mammals can get disease X),
    i.e., a set of animals defined by a particular
    property.
  • H Hypothesis space. Space of all possible
    concepts, i.e., all possible sets of animal. With
    10 animals, H contains 210 sets.
  • h a particular set. Note that there is an h for
    which hC.
  • y a particular statement (Dolphins can get
    disease X), that is, a subset of any hypothesis.
  • X a set of observations drawn for the concept C.

30
Bayesian property induction
  • The goal of inference is to determine whether y
    belong to a concept C for which we have samples
    X
  • E.g., Given that Gorillas and Chimpanzees have
    gene X, do Macaques have gene X?
  • X Gorillas, Chimpanzees
  • y Macaques
  • C the set of all animals with gene X.
  • Note that we dont know the full list of animal
    in set C. C is a hidden (or latent) variable. We
    need to integrate it out.

31
Bayesian property induction
  • Animal bufallo, zebra, giraffe, sealb,z,g,s
  • Xb,z have property a
  • yg, do giraffes have property a?

Probability that g has property a given that b
and z have property a
Probability that g has property a given that h
contains g. It must be equal to 1.
32
Bayesian property induction
  • More formally

Warning this is the probability that X will be
observed given h. It is not the probability that
members of X belong to h.
33
Bayesian property induction
  • The likelihood function
  • Animal bufallo, zebra, giraffe, sealb,z,g,s
  • If hb,z
  • p(Xb h)0.5
  • p(Xb,z h)0.50.5
  • p(Xg h)0. This rules out all hypothesis that
    do not contain all of the observations
  • If hb,z,g have property a
  • p(Xb,zh)0.32. The larger the set, the
    smaller the likelihood. Occams razor.

34
Bayesian property induction
  • The likelihood function
  • Non zero only if X is subset of h. Note that sets
    that contains no or some elements of X, but no
    all elements of X, are ruled out. Also, data are
    less likely to come from large sets (Occams
    razor?).

35
Bayesian property induction
  • The Prior the prior should embed knowledge of
    the domain.
  • Naive approach If we have 10 animals and we
    consider all possible sets, we ended up with
    2101024 sets. A flat prior over this would yield
    a prior of 1/1024 for each set.

36
Bayesian property induction
  • Bayesian taxonomic prior
  • Note only 19 sets have non zero prior.
    210-191005 sets have a prior of zero. HUGE
    constrain!

37
Bayesian property induction
  • Taxonomic prior is not enough. Why?
  • Seals and squirrels catch disease X, so horses
    are also susceptible
  • Seals and cows catch disease X, so horses are
    also susceptible
  • Most people say that statement 2 is stronger.

38
Bayesian property induction
  • Seals and squirrels catch disease X, so horses
    are also susceptible
  • Seals and cows catch disease X, so horses are
    also susceptible

Only hypothesis that can have all three animals
Only hypothesis that can have all three animals
39
Bayesian property induction
  • This approach does not distinguish the two
    statements

Only hypothesis that can have all three animals
Only hypothesis that can have all three animals
40
Bayesian property induction
  • Evolutionary Bayes

41
Bayesian property induction
  • Evolutionary Bayes all 1024 sets of animals are
    possible, but they differ by their prior. Sets
    that contains animals that are nearby in the tree
    are more likely.

42
Bayesian property induction
  • Comparison

p
x
p2
43
Bayesian property induction
  • Comparison

likely set pp2
44
Bayesian property induction
  • Comparison

unlikely set p2
45
Bayesian property induction
  • Seals and squirrels catch disease X, so horses
    are also susceptible
  • Seals and cows catch disease X, so horses are
    also susceptible
  • Under the evolutionary prior, there are more
    scenarios that are compatible with the second
    statement.

46
Bayesian property induction
  • Evolutionary prior explain the data better than
    any other model (but not by much).

47
Other priors
48
How to learn the structure of the prior
  • Syntactic rules for growing graphs

49
(No Transcript)
50
(No Transcript)
51
Theory-based causal inference
52
Theory-based causal inference
  • Can we use this framework to infer causality?

53
Theory-based causal inference
  • Blicket detector activates when a blicket is
    placed onto it.
  • Observation 1 B1 and B2 detector on
  • Most kids say that B1 and B2 are blickets
  • Observation 2 B1 alone detector on
  • All kids say B1 is a blicket but not B2.
  • This is known as extinction or explaining away

54
Theory-based causal inference
  • Impossible to capture with usual learning
    algorithm because there arent enough trials to
    learn all the probabilities involved.
  • Simple reasoning could be used, along with
    Occams razor (e.g., B1 alone is enough to
    explain all the data), but its hard to formalize
    (How do we define Occams razor?)

55
Theory-based causal inference
  • Alternative assume the data were generated by a
    causal process.
  • We observed two trials d1e1, x11, x21 and
    d1e1, x11, x20. What kind of Bayesian net
    can account for these data?
  • There are only four possible networks

56
Theory-based causal inference
  • Which network is most likely to explain the data?
  • Bayesian approach compute the posterior over
    networks given the data
  • If we assume that the probability of any object
    to be a blicket is r, the prior over Bayesian
    nets is given by

57
Theory-based causal inference
  • Let consider what happen after we observe
    d1e1, x11, x21
  • If we assume the machine does not go off by
    itself (p(e1)0), we have

58
Theory-based causal inference
  • Let consider what happen after we observe
    d1e1, x11, x21
  • For the other nets

59
Theory-based causal inference
  • Therefore, were left with three networks and for
    each of them we have
  • Assuming blickets are rare (rlt0.5), the most
    likely explanations are the ones for which only
    one object is a blicket (h10 and h01). Therefore,
    object1 or object 2 is a blicket (but its
    unlikely that both are blickets!)

60
Theory-based causal inference
  • We now observe d2e1, x11, x20
  • Again, if we assume the machine does not go off
    by itself (p(e1)0), we have

Were assuming that the machine does not go off
if there is no blicket
61
Theory-based causal inference
  • We observed two trials d1e1, x11, x21 and
    d1e1, x11, x20. h00 and h01 are
    inconsistent with these data, so were left with
    the other two.

62
Theory-based causal inference
  • We observed two trials d1e1, x11, x21 and
    d1e1, x11, x20 and were left with two
    networks.
  • Assuming blickets are rare (rlt0.5), The network
    in which only one object is a blicket (h10) is
    the most likely explanation

63
Theory-based causal inference
  • But what happens to the probability that X2 is a
    blicket?
  • To compute this we need to compute

Either this link is in network hjk or its not
Hypothesis for which this link exists must have a
1 here.
64
Theory-based causal inference
  • But what happens to the probability that X2 is a
    blicket?

(Data suggest that r1/3)
65
Theory-based causal inference
  • Probability that X2 is a blicket after the second
    observation
  • Therefore the probability that X2 is a blicket
    went down after the second observation (3/5 to
    1/3) which is consistent with kids reports.
    Occams razor comes from assuming rlt0.5.

66
Theory-based causal inference
  • This approach can be generalized to much more
    complicated generative models.
Write a Comment
User Comments (0)
About PowerShow.com