Detecting Coreference Processing discourse reference - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Detecting Coreference Processing discourse reference

Description:

The monkey ate the banana because ... 1) it was hungry hungry(it=monkey) 2) it was ripe ripe(it=banana) 3) it was tea time eat(Agent, Food, When=tea time) ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 43
Provided by: lili99
Category:

less

Transcript and Presenter's Notes

Title: Detecting Coreference Processing discourse reference


1
Detecting CoreferenceProcessing discourse
reference
  • Christer Johansson, UiB

2
Those little words
  • Typical anaphora
  • Pronouns
  • The cat bit the dog because it was angry.
  • Definite Nouns
  • I saw a cat and a dog. The cat was chasing the
    dog.

3
Those little words
  • Pronouns
  • We often think of pronouns as some kind of place
    holder, or a variable.
  • Pronouns have some internal structure
  • Gender
  • Number
  • Case
  • ...

4
Those little words
  • Nouns
  • Do not confuse words with what they refer to.
  • Pigs can fly. is a grammatical sentence even if
    it is not true. It is also meaningless if we do
    not know what pig is, or the relations between
    all three words.
  • Words like nouns are in a sense also variables,
    although they are more restricted. Compare cat
    with it.

5
More Anaphora
  • Less typical
  • Predication
  • John drives a taxi, and Joe studies math. Which
    one would you like to meet, the taxi driver or
    the math student?
  • Verb anaphora
  • Ann sings in the shower. The hollering last half
    an hour.

6
Coreference
  • Coreference occurs when the same person, place,
    event, or concept is referenced more than once in
    a single document. (Amit Bagga)

7
Extension of Coreference
  • Cross Document Coreference ... occurs when the
    same person, place, event, or concept is
    referenced more than once in multiple sources
    (Amit Bagga)
  • Essential Information Retrieval problem.
  • Are two documents about the same things?

8
Extension of CoreferenceImages
CBS 2829
CBS 3873
NBC 3885
NBC 5061
  • Amid Bagga Significant TV broadcasts are
    repeated across and within stations. Combining
    text and image recognition can aid detection of
    coreferent events.

9
Applications
10
Information Extraction and Retrieval
  • Q/A systems
  • Q Who was king of Norway 1985?
  • A He was king of Norway 1985.
  • Reference may find good keywords. Themes are
    often referred.
  • Reference more than word form

11
A simplistic example
  • The lion is the king1 of the jungle. She2
    hunts mostly at night. The females3 live in
    groups. The male4 is much larger, but _ 5
    lives alone.
  • Word form Lion 1 of 26 words
  • Reference Lion 6 of 26 words
  • The significance of lion increases.

12
Machine Translation
  • Det satt en katt-i pÃ¥ bordet-j.
  • Heldigvis sto det-j stille.
  • Heldigvis sto den-i stille.
  • There was a cat on the table. Fortunately, it was
    standing still.
  • Without co-reference
  • An unambiguous sentence becomes ambiguous.
  • Important when translating between case, gender
    or aspect marking languages.

13
Machine Translation
  • The monkey ate the banana because ...
  • 1) it was hungry hungry(itmonkey)
  • 2) it was ripe ripe(itbanana)
  • 3) it was tea time eat(Agent, Food, Whentea
    time)

14
ProsodyText-to-Speech systems
  • A given is seldom stressed (Horne Johansson
    1991)
  • I will never sell my dog. I LOVE the old mutt.
  • If old mutt and my dog are coreferent, stress is
    more likely to move to some other (new)
    information.

15
Disambiguation
  • The lion roams the savannah.
  • If there is no antecedent, assume definite np
    refers more generally (to the species).
  • Cognitive Giveness Hierarchy
  • A referent must be uniquely identifiable.
  • There is only one species of lion.
  • There are many individual lions

16
Disambiguation
  • Hunden knäckte benet med sina käkar.
  • The dog broke the bone with his jaws.
  • The dog broke the leg with his jaws.
  • ? The dog broke his leg with his jaws.
  • If benet can be identified in a reference chain
    that identifies it, we are more likely to get the
    correct translation.

17
Factors
18
Gender
  • Grammatical gender
  • Det sto en katt-x pÃ¥ bordet-y. Den-x/det-y sto
    heldigvids stille.
  • There was a cat on the table. Luckily, it kept
    still.
  • Natural Gender
  • Den nye eleven-x likte sykkelen-y.
    Han-x/hun-x/den-y var rask.
  • The new student liked the bike. He/She/It was
    fast.

19
Function of the antecedentcentering
  • Kari-S var sent ute, sÃ¥ hun-S ringte søsteren-O
    sin-S. Hun-S skrek inn i røret.
  • Kari was late so she called her sister. She
    yelled into the blower.
  • She most likely refers to Kari, as that choice
    keeps the focus on her.

20
Determined Noun Phrase
  • A cat and a dog were fighting outside. The dog
    howled like a wolf.
  • Identify stems in scandinavian hund / hunden.
  • Look at that dog-i. I wouldnt like to meet that
    beast-i alone.
  • Compatible units through semantic network/world
    knowledge (ontologies).

21
Negation
  • Ann didnt see any woman. She was next door.
  • Ann saw no woman. She was in another room.
  • Ann didnt talk to her. She was upset.
  • 1) She Ann 2) She her
  • Ann talked to her. She was upset.
  • 1) She her 2) She Ann
  • Ann talked to her, because she was upset. (?)
  • Ann talked to her. She became upset.
  • 1) She her ? She Ann
  • Cause -gt Effect

22
Explanation
  • The students protested, while the police
    observed. Then they attacked.
  • The students protested, while the police
    observed. Then they began to throw stones.
  • What do students do? And what do police do?
  • Depends heavily on background knowledge
  • We might extract some background knowledge from
    large collections of text, by observing
    statistical relations between subject and verb,
    and verb and object.

23
Heavy NP
  • A heavy NP is more likely to be referenced. Heavy
    NPs have more modifications.
  • The small man with the black hat sat in the
    corner chatting with a clerk. He seemed relaxed.
  • Embedded NPs are less likely to be referred than
    top level NPs.
  • The clerk sat in the corner chatting with a small
    man with a black hat. He seemed relaxed.
  • Interacts with other factors.

24
Semantics
  • Through part-whole relations
  • It was a beautiful car. He sat behind the wheel.
  • Through subordinate - superordinate
  • He wants a dachshound, but I dont know if he
    can take care of a dog.
  • Through verb anaphora
  • They captured Saddam last Sunday. The event was
    undramatic.

25
More
  • Co-ordinated nps.
  • The cat and dog were fighting. They got hurt.
  • ?
  • I saw a cat with one eye last night. It was
    horrible.
  • The cat (?)
  • The eye (?)
  • Last night (?)
  • The sight / situation (?)

26
Challenges
  • Noisy data underlying decisions
  • Word class tags 95 - 98 correct
  • Functional roles, maybe 80 correct
  • Spelling errors etc.

27
Challenges
  • Common knowledge is important, but difficult to
    model.
  • Semantic Networks
  • Ontologies what exists in the world
  • Dynamic Models change over time
  • Situational Semantics
  • Fuzzy Logic
  • Explanation Driven Processing.

28
Challenges
  • Highly ambiguous
  • It is often not clear to humans exactly what is
    referred.
  • We have place holding pronouns
  • It rains.
  • We have general reference, where possibly more
    than one thing is referred to some degree.

29
Challenges Fuzziness
In Fuzzy Logic we can say that the car is 0.70 in
parking pocket A and 0.30 in parking pocket
B. This is not the same as saying it is in A
with 0.70 probability. (it would then be either
in A or somewhere else.)
C A R
Similarly, reference might be to more than one
thing to some degree, simultaneously.
30
Developing a program
31
A Classification task
  • Machine Learning
  • Decide yes/no for coreference.
  • Soon, Ng, Lim, 2001. A Machine Learning
    Approach to Coreference Resolution of Noun
    Phrases. Computational Linguistics, Vol. 27(4).

32
Preprocessing Tilburg Memory Based Learner
http//pi0657.uvt.nl/
  • Input Now is a tough time to be a computer
    maker.
  • 1) tagging, 2) chunking, 3) functional role
    detection


  • NP1Subject Now/RB VP1 is/VBZ
    NP1NP-PRD a/DT tough/JJ time/NN VP2 to/TO
    be/VB NP2NP-PRD a/DT computer/NN
    maker/NN

33
An example of realistic input
  • NP1Subject Sun/NNP Microsystems/NNPS ,/,
    P along/IN PNP P with/IN NP its/PRP
    rivals/NNS ,/, VP1 has/VBZ had/VBD to/TO
    go/VB to/TO / NP1Object warp/NN
    speed/NN and/CC VP2 then/RB back/VB
    /UNKNOWN ,/,NP3Subject Scott/NNP McNealy/NNP
    ,/, NP4Subject its/PRP chief/JJ executive/NN
    ,/, VP3 said/VBD NP3NP-TMP last/JJ week/NN
    ,/, C as/IN NP4Subject Sun/NNP VP4
    announced/VBD C that/IN NP5Subject it/PRP
    VP5 would/MD make/VB NP5Object a/DT
    larger-than-expected/JJ loss/NN PNP P
    in/IN NP the/DT current/JJ quarter/NN
    and/CC VP6 would/MD lay/VB PRT off/RP
    NP6Object 3,900/CD workers/NNS ./.

34
Machine Learning
  • Train a match function for deciding the
    anaphor-antecedent relation.
  • TiMBL
  • Easy to expand the model when more data is
    available.

35
Machine Learning training
  • We start from a large collection of examples.
  • For each anaphor
  • construct a match vector for each candidate
  • mark the vector for antecedent (yes/no)
  • The match is calculated for 9 features
  • string, lemma, suffix (form)
  • subject, object, complement (of the same verb)
  • same functional role
  • grammatical gender
  • number

36
Machine Learning testing
  • Construct match vectors for the nearest 40
    candidates.
  • Check the outcome with the large database
  • For example, for the first candidate the nearest
    neighbor has 4 matching features. Collect from
    the database all exemplars with 3 or 4 matching
    features.
  • Outcome 90 no 10 yes

37
Machine Learning testing
  • Repeat for the 40 candidates.
  • Outcome 1 90 no 10 yes
  • Outcome 2 43 no 5 yes
  • ...
  • outcome 40 120 no 10 yes
  • How to decide for yes or no?

38
Machine Learning testing
  • Repeat for the 40 candidates.
  • Outcome 1 90 no 10 yes
  • Outcome 2 43 no 5 yes
  • ...
  • outcome 40 120 no 10 yes
  • How to decide for yes or no?
  • We have decide that the most extreme is probably
    the best. We have to calculate the expected
    values for yes / no from the training set.
  • Score (Observedyes - Expectedyes) / std.devyes
  • -
  • (Observedno - Expectedno) /
    std.devno

39
Conclusion 1(3)
  • Coreference is a very general problem in natural
    language processing. It also extends into related
    domains for coreference of images.
  • Establishing coreference has many applications
    MT, IR, T2S, etc.
  • Coreference is also a phenomena with inherent
    difficulties. Coreference might be vague and/or
    ambiguous. Coreference often depends heavily on
    background knowledge, which can be difficult to
    capture in a formal model.

40
Conclusion 2(3)
  • Using Machine Learning to adapt coreference with
    a textual domain gives us a general method to
    handle the problem.
  • One problem of Machine Learning is to find the
    relevant features.
  • Another problem is that the features we use
    often interact with each other.
  • Vagueness and ambiguity often makes it impossible
    to select only one candidate for co-reference.

41
Conclusion 3
  • Vagueness and ambiguity often makes it impossible
    to select only one candidate for co-reference.
  • There is certainly a problem with evaluation, as
    some mistakes are more serious than others.
  • Machine Learning of co-reference is still a young
    research field. Much work is needed, and many
    good ideas are certain to emerge.

42
Thank you for listening
  • http//ling.uib.no/BREDT/
  • Christer.Johansson_at_lili.uib.no
  • Lars.Johnsen_at_lili.uib.no
  • Kaja.Borthen_at_hf.ntnu.no
Write a Comment
User Comments (0)
About PowerShow.com