Title: Detecting Coreference Processing discourse reference
1Detecting CoreferenceProcessing discourse
reference
2Those little words
- Typical anaphora
- Pronouns
- The cat bit the dog because it was angry.
- Definite Nouns
- I saw a cat and a dog. The cat was chasing the
dog.
3Those little words
- Pronouns
- We often think of pronouns as some kind of place
holder, or a variable. - Pronouns have some internal structure
- Gender
- Number
- Case
- ...
4Those little words
- Nouns
- Do not confuse words with what they refer to.
- Pigs can fly. is a grammatical sentence even if
it is not true. It is also meaningless if we do
not know what pig is, or the relations between
all three words. - Words like nouns are in a sense also variables,
although they are more restricted. Compare cat
with it.
5More Anaphora
- Less typical
- Predication
- John drives a taxi, and Joe studies math. Which
one would you like to meet, the taxi driver or
the math student? - Verb anaphora
- Ann sings in the shower. The hollering last half
an hour.
6Coreference
- Coreference occurs when the same person, place,
event, or concept is referenced more than once in
a single document. (Amit Bagga)
7Extension of Coreference
- Cross Document Coreference ... occurs when the
same person, place, event, or concept is
referenced more than once in multiple sources
(Amit Bagga) - Essential Information Retrieval problem.
- Are two documents about the same things?
8Extension of CoreferenceImages
CBS 2829
CBS 3873
NBC 3885
NBC 5061
- Amid Bagga Significant TV broadcasts are
repeated across and within stations. Combining
text and image recognition can aid detection of
coreferent events.
9Applications
10Information Extraction and Retrieval
- Q/A systems
- Q Who was king of Norway 1985?
- A He was king of Norway 1985.
- Reference may find good keywords. Themes are
often referred. - Reference more than word form
11A simplistic example
- The lion is the king1 of the jungle. She2
hunts mostly at night. The females3 live in
groups. The male4 is much larger, but _ 5
lives alone. - Word form Lion 1 of 26 words
- Reference Lion 6 of 26 words
- The significance of lion increases.
12Machine Translation
- Det satt en katt-i på bordet-j.
- Heldigvis sto det-j stille.
- Heldigvis sto den-i stille.
- There was a cat on the table. Fortunately, it was
standing still. - Without co-reference
- An unambiguous sentence becomes ambiguous.
- Important when translating between case, gender
or aspect marking languages.
13Machine Translation
- The monkey ate the banana because ...
- 1) it was hungry hungry(itmonkey)
- 2) it was ripe ripe(itbanana)
- 3) it was tea time eat(Agent, Food, Whentea
time)
14ProsodyText-to-Speech systems
- A given is seldom stressed (Horne Johansson
1991) - I will never sell my dog. I LOVE the old mutt.
- If old mutt and my dog are coreferent, stress is
more likely to move to some other (new)
information.
15Disambiguation
- The lion roams the savannah.
-
- If there is no antecedent, assume definite np
refers more generally (to the species). - Cognitive Giveness Hierarchy
- A referent must be uniquely identifiable.
- There is only one species of lion.
- There are many individual lions
16Disambiguation
- Hunden knäckte benet med sina käkar.
- The dog broke the bone with his jaws.
- The dog broke the leg with his jaws.
- ? The dog broke his leg with his jaws.
-
- If benet can be identified in a reference chain
that identifies it, we are more likely to get the
correct translation.
17Factors
18Gender
- Grammatical gender
- Det sto en katt-x på bordet-y. Den-x/det-y sto
heldigvids stille. - There was a cat on the table. Luckily, it kept
still. - Natural Gender
- Den nye eleven-x likte sykkelen-y.
Han-x/hun-x/den-y var rask. - The new student liked the bike. He/She/It was
fast.
19Function of the antecedentcentering
- Kari-S var sent ute, så hun-S ringte søsteren-O
sin-S. Hun-S skrek inn i røret. - Kari was late so she called her sister. She
yelled into the blower. - She most likely refers to Kari, as that choice
keeps the focus on her.
20Determined Noun Phrase
- A cat and a dog were fighting outside. The dog
howled like a wolf. - Identify stems in scandinavian hund / hunden.
- Look at that dog-i. I wouldnt like to meet that
beast-i alone. - Compatible units through semantic network/world
knowledge (ontologies).
21Negation
- Ann didnt see any woman. She was next door.
- Ann saw no woman. She was in another room.
- Ann didnt talk to her. She was upset.
- 1) She Ann 2) She her
- Ann talked to her. She was upset.
- 1) She her 2) She Ann
- Ann talked to her, because she was upset. (?)
- Ann talked to her. She became upset.
- 1) She her ? She Ann
- Cause -gt Effect
22Explanation
- The students protested, while the police
observed. Then they attacked. - The students protested, while the police
observed. Then they began to throw stones. - What do students do? And what do police do?
- Depends heavily on background knowledge
- We might extract some background knowledge from
large collections of text, by observing
statistical relations between subject and verb,
and verb and object.
23Heavy NP
- A heavy NP is more likely to be referenced. Heavy
NPs have more modifications. - The small man with the black hat sat in the
corner chatting with a clerk. He seemed relaxed. - Embedded NPs are less likely to be referred than
top level NPs. - The clerk sat in the corner chatting with a small
man with a black hat. He seemed relaxed. - Interacts with other factors.
24Semantics
- Through part-whole relations
- It was a beautiful car. He sat behind the wheel.
- Through subordinate - superordinate
- He wants a dachshound, but I dont know if he
can take care of a dog. - Through verb anaphora
- They captured Saddam last Sunday. The event was
undramatic.
25More
- Co-ordinated nps.
- The cat and dog were fighting. They got hurt.
- ?
- I saw a cat with one eye last night. It was
horrible. - The cat (?)
- The eye (?)
- Last night (?)
- The sight / situation (?)
26Challenges
- Noisy data underlying decisions
- Word class tags 95 - 98 correct
- Functional roles, maybe 80 correct
- Spelling errors etc.
27Challenges
- Common knowledge is important, but difficult to
model. - Semantic Networks
- Ontologies what exists in the world
- Dynamic Models change over time
- Situational Semantics
- Fuzzy Logic
- Explanation Driven Processing.
28Challenges
- Highly ambiguous
- It is often not clear to humans exactly what is
referred. - We have place holding pronouns
- It rains.
- We have general reference, where possibly more
than one thing is referred to some degree.
29Challenges Fuzziness
In Fuzzy Logic we can say that the car is 0.70 in
parking pocket A and 0.30 in parking pocket
B. This is not the same as saying it is in A
with 0.70 probability. (it would then be either
in A or somewhere else.)
C A R
Similarly, reference might be to more than one
thing to some degree, simultaneously.
30Developing a program
31A Classification task
- Machine Learning
- Decide yes/no for coreference.
- Soon, Ng, Lim, 2001. A Machine Learning
Approach to Coreference Resolution of Noun
Phrases. Computational Linguistics, Vol. 27(4).
32Preprocessing Tilburg Memory Based Learner
http//pi0657.uvt.nl/
- Input Now is a tough time to be a computer
maker. - 1) tagging, 2) chunking, 3) functional role
detection -
NP1Subject Now/RB VP1 is/VBZ
NP1NP-PRD a/DT tough/JJ time/NN VP2 to/TO
be/VB NP2NP-PRD a/DT computer/NN
maker/NN
33An example of realistic input
- NP1Subject Sun/NNP Microsystems/NNPS ,/,
P along/IN PNP P with/IN NP its/PRP
rivals/NNS ,/, VP1 has/VBZ had/VBD to/TO
go/VB to/TO / NP1Object warp/NN
speed/NN and/CC VP2 then/RB back/VB
/UNKNOWN ,/,NP3Subject Scott/NNP McNealy/NNP
,/, NP4Subject its/PRP chief/JJ executive/NN
,/, VP3 said/VBD NP3NP-TMP last/JJ week/NN
,/, C as/IN NP4Subject Sun/NNP VP4
announced/VBD C that/IN NP5Subject it/PRP
VP5 would/MD make/VB NP5Object a/DT
larger-than-expected/JJ loss/NN PNP P
in/IN NP the/DT current/JJ quarter/NN
and/CC VP6 would/MD lay/VB PRT off/RP
NP6Object 3,900/CD workers/NNS ./.
34Machine Learning
- Train a match function for deciding the
anaphor-antecedent relation. - TiMBL
- Easy to expand the model when more data is
available.
35Machine Learning training
- We start from a large collection of examples.
- For each anaphor
- construct a match vector for each candidate
- mark the vector for antecedent (yes/no)
- The match is calculated for 9 features
- string, lemma, suffix (form)
- subject, object, complement (of the same verb)
- same functional role
- grammatical gender
- number
36Machine Learning testing
- Construct match vectors for the nearest 40
candidates. - Check the outcome with the large database
- For example, for the first candidate the nearest
neighbor has 4 matching features. Collect from
the database all exemplars with 3 or 4 matching
features. - Outcome 90 no 10 yes
37Machine Learning testing
- Repeat for the 40 candidates.
- Outcome 1 90 no 10 yes
- Outcome 2 43 no 5 yes
- ...
- outcome 40 120 no 10 yes
- How to decide for yes or no?
38Machine Learning testing
- Repeat for the 40 candidates.
- Outcome 1 90 no 10 yes
- Outcome 2 43 no 5 yes
- ...
- outcome 40 120 no 10 yes
- How to decide for yes or no?
- We have decide that the most extreme is probably
the best. We have to calculate the expected
values for yes / no from the training set. - Score (Observedyes - Expectedyes) / std.devyes
- -
- (Observedno - Expectedno) /
std.devno
39Conclusion 1(3)
- Coreference is a very general problem in natural
language processing. It also extends into related
domains for coreference of images. - Establishing coreference has many applications
MT, IR, T2S, etc. - Coreference is also a phenomena with inherent
difficulties. Coreference might be vague and/or
ambiguous. Coreference often depends heavily on
background knowledge, which can be difficult to
capture in a formal model.
40Conclusion 2(3)
- Using Machine Learning to adapt coreference with
a textual domain gives us a general method to
handle the problem. - One problem of Machine Learning is to find the
relevant features. - Another problem is that the features we use
often interact with each other. - Vagueness and ambiguity often makes it impossible
to select only one candidate for co-reference.
41Conclusion 3
- Vagueness and ambiguity often makes it impossible
to select only one candidate for co-reference. - There is certainly a problem with evaluation, as
some mistakes are more serious than others. - Machine Learning of co-reference is still a young
research field. Much work is needed, and many
good ideas are certain to emerge.
42Thank you for listening
- http//ling.uib.no/BREDT/
- Christer.Johansson_at_lili.uib.no
- Lars.Johnsen_at_lili.uib.no
- Kaja.Borthen_at_hf.ntnu.no