Detecting Coreference Processing discourse reference - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Detecting Coreference Processing discourse reference

Description:

The monkey ate the banana because ... 1) it was hungry hungry(it=monkey) 2) it was ripe ripe(it=banana) 3) it was tea time eat(Agent, Food, When=tea time) ... – PowerPoint PPT presentation

Number of Views:92

Avg rating:3.0/5.0

Slides: 43

Provided by: lili99

Category:

more less

Transcript and Presenter's Notes

Title: Detecting Coreference Processing discourse reference

1
Detecting CoreferenceProcessing discourse
reference

Christer Johansson, UiB

2
Those little words

Typical anaphora
Pronouns
The cat bit the dog because it was angry.
Definite Nouns
I saw a cat and a dog. The cat was chasing the
dog.

3
Those little words

Pronouns
We often think of pronouns as some kind of place
holder, or a variable.
Pronouns have some internal structure
Gender
Number
Case
...

4
Those little words

Nouns
Do not confuse words with what they refer to.
Pigs can fly. is a grammatical sentence even if
it is not true. It is also meaningless if we do
not know what pig is, or the relations between
all three words.
Words like nouns are in a sense also variables,
although they are more restricted. Compare cat
with it.

5
More Anaphora

Less typical
Predication
John drives a taxi, and Joe studies math. Which
one would you like to meet, the taxi driver or
the math student?
Verb anaphora
Ann sings in the shower. The hollering last half
an hour.

6
Coreference

Coreference occurs when the same person, place,
event, or concept is referenced more than once in
a single document. (Amit Bagga)

7
Extension of Coreference

Cross Document Coreference ... occurs when the
same person, place, event, or concept is
referenced more than once in multiple sources
(Amit Bagga)
Essential Information Retrieval problem.
Are two documents about the same things?

8
Extension of CoreferenceImages
CBS 2829
CBS 3873
NBC 3885
NBC 5061

Amid Bagga Significant TV broadcasts are
repeated across and within stations. Combining
text and image recognition can aid detection of
coreferent events.

9
Applications
10
Information Extraction and Retrieval

Q/A systems
Q Who was king of Norway 1985?
A He was king of Norway 1985.
Reference may find good keywords. Themes are
often referred.
Reference more than word form

11
A simplistic example

The lion is the king1 of the jungle. She2
hunts mostly at night. The females3 live in
groups. The male4 is much larger, but _ 5
lives alone.
Word form Lion 1 of 26 words
Reference Lion 6 of 26 words
The significance of lion increases.

12
Machine Translation

Det satt en katt-i på bordet-j.
Heldigvis sto det-j stille.
Heldigvis sto den-i stille.
There was a cat on the table. Fortunately, it was
standing still.
Without co-reference
An unambiguous sentence becomes ambiguous.
Important when translating between case, gender
or aspect marking languages.

13
Machine Translation

The monkey ate the banana because ...
1) it was hungry hungry(itmonkey)
2) it was ripe ripe(itbanana)
3) it was tea time eat(Agent, Food, Whentea
time)

14
ProsodyText-to-Speech systems

A given is seldom stressed (Horne Johansson
1991)
I will never sell my dog. I LOVE the old mutt.
If old mutt and my dog are coreferent, stress is
more likely to move to some other (new)
information.

15
Disambiguation

The lion roams the savannah.
If there is no antecedent, assume definite np
refers more generally (to the species).
Cognitive Giveness Hierarchy
A referent must be uniquely identifiable.
There is only one species of lion.
There are many individual lions

16
Disambiguation

Hunden knäckte benet med sina käkar.
The dog broke the bone with his jaws.
The dog broke the leg with his jaws.
? The dog broke his leg with his jaws.
If benet can be identified in a reference chain
that identifies it, we are more likely to get the
correct translation.

17
Factors
18
Gender

Grammatical gender
Det sto en katt-x på bordet-y. Den-x/det-y sto
heldigvids stille.
There was a cat on the table. Luckily, it kept
still.
Natural Gender
Den nye eleven-x likte sykkelen-y.
Han-x/hun-x/den-y var rask.
The new student liked the bike. He/She/It was
fast.

19
Function of the antecedentcentering

Kari-S var sent ute, så hun-S ringte søsteren-O
sin-S. Hun-S skrek inn i røret.
Kari was late so she called her sister. She
yelled into the blower.
She most likely refers to Kari, as that choice
keeps the focus on her.

20
Determined Noun Phrase

A cat and a dog were fighting outside. The dog
howled like a wolf.
Identify stems in scandinavian hund / hunden.
Look at that dog-i. I wouldnt like to meet that
beast-i alone.
Compatible units through semantic network/world
knowledge (ontologies).

21
Negation

Ann didnt see any woman. She was next door.
Ann saw no woman. She was in another room.
Ann didnt talk to her. She was upset.
1) She Ann 2) She her
Ann talked to her. She was upset.
1) She her 2) She Ann
Ann talked to her, because she was upset. (?)
Ann talked to her. She became upset.
1) She her ? She Ann
Cause -gt Effect

22
Explanation

The students protested, while the police
observed. Then they attacked.
The students protested, while the police
observed. Then they began to throw stones.
What do students do? And what do police do?
Depends heavily on background knowledge
We might extract some background knowledge from
large collections of text, by observing
statistical relations between subject and verb,
and verb and object.

23
Heavy NP

A heavy NP is more likely to be referenced. Heavy
NPs have more modifications.
The small man with the black hat sat in the
corner chatting with a clerk. He seemed relaxed.
Embedded NPs are less likely to be referred than
top level NPs.
The clerk sat in the corner chatting with a small
man with a black hat. He seemed relaxed.
Interacts with other factors.

24
Semantics

Through part-whole relations
It was a beautiful car. He sat behind the wheel.
Through subordinate - superordinate
He wants a dachshound, but I dont know if he
can take care of a dog.
Through verb anaphora
They captured Saddam last Sunday. The event was
undramatic.

25
More

Co-ordinated nps.
The cat and dog were fighting. They got hurt.
?
I saw a cat with one eye last night. It was
horrible.
The cat (?)
The eye (?)
Last night (?)
The sight / situation (?)

26
Challenges

Noisy data underlying decisions
Word class tags 95 - 98 correct
Functional roles, maybe 80 correct
Spelling errors etc.

27
Challenges

Common knowledge is important, but difficult to
model.
Semantic Networks
Ontologies what exists in the world
Dynamic Models change over time
Situational Semantics
Fuzzy Logic
Explanation Driven Processing.

28
Challenges

Highly ambiguous
It is often not clear to humans exactly what is
referred.
We have place holding pronouns
It rains.
We have general reference, where possibly more
than one thing is referred to some degree.

29
Challenges Fuzziness
In Fuzzy Logic we can say that the car is 0.70 in
parking pocket A and 0.30 in parking pocket
B. This is not the same as saying it is in A
with 0.70 probability. (it would then be either
in A or somewhere else.)
C A R
Similarly, reference might be to more than one
thing to some degree, simultaneously.
30
Developing a program
31
A Classification task

Machine Learning
Decide yes/no for coreference.
Soon, Ng, Lim, 2001. A Machine Learning
Approach to Coreference Resolution of Noun
Phrases. Computational Linguistics, Vol. 27(4).

32
Preprocessing Tilburg Memory Based Learner
http//pi0657.uvt.nl/

Input Now is a tough time to be a computer
maker.
1) tagging, 2) chunking, 3) functional role
detection
NP1Subject Now/RB VP1 is/VBZ
NP1NP-PRD a/DT tough/JJ time/NN VP2 to/TO
be/VB NP2NP-PRD a/DT computer/NN
maker/NN

33
An example of realistic input

NP1Subject Sun/NNP Microsystems/NNPS ,/,
P along/IN PNP P with/IN NP its/PRP
rivals/NNS ,/, VP1 has/VBZ had/VBD to/TO
go/VB to/TO / NP1Object warp/NN
speed/NN and/CC VP2 then/RB back/VB
/UNKNOWN ,/,NP3Subject Scott/NNP McNealy/NNP
,/, NP4Subject its/PRP chief/JJ executive/NN
,/, VP3 said/VBD NP3NP-TMP last/JJ week/NN
,/, C as/IN NP4Subject Sun/NNP VP4
announced/VBD C that/IN NP5Subject it/PRP
VP5 would/MD make/VB NP5Object a/DT
larger-than-expected/JJ loss/NN PNP P
in/IN NP the/DT current/JJ quarter/NN
and/CC VP6 would/MD lay/VB PRT off/RP
NP6Object 3,900/CD workers/NNS ./.

34
Machine Learning

Train a match function for deciding the
anaphor-antecedent relation.
TiMBL
Easy to expand the model when more data is
available.

35
Machine Learning training

We start from a large collection of examples.
For each anaphor
construct a match vector for each candidate
mark the vector for antecedent (yes/no)
The match is calculated for 9 features
string, lemma, suffix (form)
subject, object, complement (of the same verb)
same functional role
grammatical gender
number

36
Machine Learning testing

Construct match vectors for the nearest 40
candidates.
Check the outcome with the large database
For example, for the first candidate the nearest
neighbor has 4 matching features. Collect from
the database all exemplars with 3 or 4 matching
features.
Outcome 90 no 10 yes

37
Machine Learning testing

Repeat for the 40 candidates.
Outcome 1 90 no 10 yes
Outcome 2 43 no 5 yes
...
outcome 40 120 no 10 yes
How to decide for yes or no?

38
Machine Learning testing

Repeat for the 40 candidates.
Outcome 1 90 no 10 yes
Outcome 2 43 no 5 yes
...
outcome 40 120 no 10 yes
How to decide for yes or no?
We have decide that the most extreme is probably
the best. We have to calculate the expected
values for yes / no from the training set.
Score (Observedyes - Expectedyes) / std.devyes
-
(Observedno - Expectedno) /
std.devno

39
Conclusion 1(3)

Coreference is a very general problem in natural
language processing. It also extends into related
domains for coreference of images.
Establishing coreference has many applications
MT, IR, T2S, etc.
Coreference is also a phenomena with inherent
difficulties. Coreference might be vague and/or
ambiguous. Coreference often depends heavily on
background knowledge, which can be difficult to
capture in a formal model.

40
Conclusion 2(3)

Using Machine Learning to adapt coreference with
a textual domain gives us a general method to
handle the problem.
One problem of Machine Learning is to find the
relevant features.
Another problem is that the features we use
often interact with each other.
Vagueness and ambiguity often makes it impossible
to select only one candidate for co-reference.

41
Conclusion 3

Vagueness and ambiguity often makes it impossible
to select only one candidate for co-reference.
There is certainly a problem with evaluation, as
some mistakes are more serious than others.
Machine Learning of co-reference is still a young
research field. Much work is needed, and many
good ideas are certain to emerge.

42
Thank you for listening