Why the monkey ate the banana - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Why the monkey ate the banana

Description:

Need to describe relations between words. If we had an ontology we ... chatter / chats. ung / young (at) ... jenter / to ... girls. en jente ... / a girl ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 35
Provided by: bruk158
Category:
Tags: ate | banana | chatter | monkey

less

Transcript and Presenter's Notes

Title: Why the monkey ate the banana


1
Why the monkey ate the banana?
  • Christer Johansson1, Anders Nøklestad2, Chris
    Biemann3
  • 1 Universitetet i Bergen
  • 2 Universitetet i Oslo
  • 3 Universität Leipzig
  • Workshop on Anaphora Resolution, Mjølfjell 2005

2
The monkey ate the banana because
  • it was hungry.
  • it was ripe.
  • it was tea time

3
Ontologies / semantic information
  • Need to describe relations between words
  • If we had an ontology we would know
  • A) that monkeys can be hungry
  • B) That bananas cannot be hungry
  • C) that bananas can be ripe
  • D) that monkeys are not likely to be ripe
  • E) that tea time is a situation when you eat
  • And so on

4
Benefits
  • Well defined
  • Formal model
  • Supports (automatic) reasoning
  • Provides justification for decisions
  • EU buzzword, which facilitates funding

5
Problems
  • Ontologies are hard to get
  • Never complete
  • Hard to expand
  • Hard to merge
  • Hard to prepare for the unexpected
  • Not suited for the domain you have
  • Usually demands human efforts
  • Lots of sweat, coffee and money

6
Our approach
  • Accept that we cannot get the perfect ontology.
  • We can do without a well defined formal model
  • We may not need to justify the automatic
    decisions for some applications
  • We can do something useful, fast and with high
    coverage.

7
Using free resources
  • If just enough text is analyzed, it might be
    possible
  • to identify hungry as a typical property of
    monkeys
  • to find ripe as a property of bananas
  • by seeing those adjectives modifying the
    respective nouns quite often.

8
Statistical Co-occurrences
  • Co-occurrence occurrence of two words A and B
    within a well-defined unit of information or in
    some specific position.
  • Significant co-occurrences reflect relations
    between words. The mechanism extracts typical
    associations between words.
  • To distinguish co-occurrences from frequency
    counts, a significance measure is used, here
    (log-likelihood)
  • sig(A,B) x k log x log k!
  • k number of units containing A and B
  • n total number of units
  • x (ab)/n

9
Finding properties
  • Restrict co-occurrence statistics to grammatical
    functions.
  • Tags are provided by a Constraint-Grammar Tagger.
  • A Norwegian Corpus of 20 Million words was used
    to extract about 4.2 million units of different
    grammatical relations
  • Example
  • The quick fox eats the pizza. The pizza is
    cold.
  • adjective-noun modifiers (AN) quick fox
  • verb-subject relations (VS) eat fox
  • verb-object relations (VO) eat pizza
  • subject-predicate relations (SP) pizza cold
  • The co-occurrence statistics cut the noise and
    leave the typical cases

10
What can be done to bananas?
  • Co-occurrences only leave specific relations, but
    lose coverage

11
Norwegian girls and boys
12
How to integrate semantic knowledge into
anaphora resolution
13
Memory Based Learning
  • Vector space model
  • simplest nearest neighbor
  • look at a majority of k nearest.

Red is nearest
Majority of 5 nearest are blue
14
Feature Vectors
  • Pairs of pronoun and potential antecedent are
    represented as (boolean) feature vectors.
  • Does the pair match on surface form?
  • Do they have the same
  • syntactic function?
  • gender?
  • number?
  • Is the anaphor a distinctly human pronoun (han
    he or hun she) and the potential antecedent
    a proper name (John or Mary)?
  • If so, do they also have the same gender?

15
cont.
  • Is the antecedent a Subject?
  • Is the anaphor reflexive (seg), and the
    antecedent its closest subject?
  • Is the antecedent in the current or previous
    sentence?
  • Is the antecedent in the current,previous, or
    penultimate sentence?
  • Anaphor and antecedent lemmas concatenated

16
Results
  • Overall F62
  • non-identical antecedent and anaphoraF 40
  • (Some errors may have other anaphorical
    relations, without having been.)

17
Automatically detected syntactic/semantic
relations
18
Are the candidate and the anaphor described in
the same way?
  • The monkey ate the banana because it was
    hungry.
  • Is there a potential antecedent of it that is
    associated with hungry?
  • We find out this using corpora and significance
    measures.
  • --------------------------------------------------
    ----------------------
  • The quick fox ate the slug. It was not satisfied.
  • If we find (satisfied ? fox) the chances
    increase that the antecedent is fox.
  • If we find (satisfied ? slug) the chances
    increase for slug.

19
What does it do?
  • verb-subject relations (VS)
  • The dog ate the pill, and it swallowed it whole.
  • do dogs swallow?
  • do pills swallow?

20
What is done to it?
  • verb-object relations (VO)
  • The dog ate the pill, and it swallowed it whole.
  • Are pills swallowed?
  • Are dogs swallowed?

21
Even safe features fail
  • He bought Harry a hamburger and they sat down on
    plastic seats to eat them. (J.K. Rowling Harry
    Potter and the Philosophers Stone)
  • Judging from grammatical features only, Harry and
    Hagrid would eat plastic seats
  • We need world knowledge, and we may get some
    from large corpora

22
Training
  • Positive example
  • anaphor antecedent
  • Negative examples
  • anaphor any markable that is closer to the
    anaphor than the actual antecedent.

23
Testing
  • Starting from the pronoun,
  • search backwards in the text
  • until we find a markable that is classified as
    an antecedent for the pronoun.
  • For each classification decision If the
    mechanism finds more negative examples in the
    database within the nearest neighbors then the
    answer is no, otherwise yes.

24
Feature Percolation
  • If the markable is part of an already established
    coreference chain
  • then features are allowed to match on earlier
    markables of that chain.
  • We say that feature values percolate towards the
    most recent referent.

25
Results
26
Coverage
  • At least one of the semantic features is present
    in about 8 of the positive cases.
  • The corpus we used for constructing the features
    were about 20 million running words
  • Which means about 4 million (non-unique)
    instances
  • Which leaves us about 135 000 (unique)
    statistical collocations.

27
Coverage
  • Increase by using much larger corpora
  • Work in progress 400 million words from
    Norwegian news wire.

28
When is it useful?
  • 1/4 of the cases with semantic features selects
    for a positive instance of co-occurrence.
  • Background 1/10 is a positive instance
  • Gives something, but not much.
  • Hope positive interaction with other features.

29
Discussion
30
Discussion
  • It is a general problem that there are so many
    negative examples compared to the positive ones.
  • What can be done to reduced the number of
    negative examples?

31
Conclusion
32
Conclusion
  • Very simple features account for most of the
    correctly made decisions.
  • Hard to come up with features that improve the
    results.
  • The syntactic/semantic features improve results,
    although not drastically.

33
Questions?
  • THANKS!

34
  • War is over
  • if you want it
Write a Comment
User Comments (0)
About PowerShow.com