BREDT Processing Reference in Discourse - PowerPoint PPT Presentation

1 / 53

About This Presentation

Title:

BREDT Processing Reference in Discourse

Description:

Dachshund is a specification of dog, and therefore new information. (The supposition might be that a dachshund is more demanding than the typical dog. ... – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 54

Provided by: lili99

Category:

more less

Transcript and Presenter's Notes

Title: BREDT Processing Reference in Discourse

1
BREDTProcessing Reference in Discourse

Christer Johansson, UiB
Anders Nøklestad, UiO

2
BREDT

Discover and determine chains of reference.
Fairly simple statistical methods
Partial goals
Finding selectional restrictions
Automatically generate useful semantic structure
from co-occurrence

3
Goals of BREDT

Develop statistical methods and resources for the
discovery of referential chains in (arbitrary)
text.

4
Proposals

Discourse analysis is a fundamental module of
language processing (cf. syntax, phonology, and
morphology).
Discourse Analysis can be performed without full
parsing.
DA helps the parser make decisions.

5
Simple examples

Pronouns
The monkey1 ate the banana2 because ...
it was hungry. itmonkey
it was ripe. itbanana
it was tea time it specification of the
situation

6
Simple examples

Definites
Ola ødela armen / Ola broke the arm.
Ola broke his arm.
The definite form indicates that the noun is
known. In this case, it can be resolved by
common knowledge that a person has-an arm.

7
Simple examples

Definites
The definite signals that something has been
mentioned before. It initiates a search for
reference.
General reference
The lion is a big cat.
if no previous reference then lion refers to
the species.
Cats are hungry.
a link could be established to represent the
knowledge that lions is a sub-group of cats, and
cats are hungry, therefore lions are hungry.

8
Across Sentence Boundaries

Unni was ill. A doctor came to see her. She said
that she had to be hospitalized, and then she
wrote her a prescription.
What do doctors (or patients) do?
Possible to find out from collocations?

9
Applications ...
10
Machine Translation

The correct translation of a pronoun depends on
what it refers to.
Translation of a definite noun may depend on its
informative status.

11
Prosody (e.g. in text-to-speech)

Given information is seldom stressed

12
New vs. Given(Horne Johansson 1991)

John wants a dachshund, but Im not sure he can
take care of a dog.
Dog is given information because dachshund is
a kind of dog.
John wants a dog, but Im not sure he can take
care of a dachshund.
Dachshund is a specification of dog, and
therefore new information. (The supposition might
be that a dachshund is more demanding than the
typical dog. There is usually a reason why
something is said.)

13
Applications

Text-to-speech
Given information is likely stressed.
Discourse Focus
Information could be given via semantic
relations
superordinate/subordinate (x is-a y)
part/whole (has-a)

14
Information Retrieval
15
Why?

Reference is important in information retrieval
because ...
Referring expressions hide key words
which makes it hard to automatically find the
relevant keywords.

16
IR

The detection of central themes in a text is
facilitated by reference detection.
Assumption themes are referred to often.
via pronouns
via semantic relations

17
There are plenty of applications for BREDT
18
Automatically Finding Features
19
Selectional Restrictions

He/sheSubject Verb
Which verbs have he or she as subjects.
Mutual Information (given total occurrences of
each verb).

20
Separating the living

He runs -- The boy runs.
Significant he-run gives indication that subject
is often living.
Cross more than one verb increase precision.
Look at the nouns that go with these verbs.

21
Other features

The same technique might detect natural gender,
and other relevant features.
Look at clusters of verbs that have a significant
difference between he and she as subjects (or
objects).
Look at nouns that go with these verbs.

22
Alternative ways of detecting natural gender
(Hale)

Noun-pronoun chains
Prob. of noun gender relative freq. of anaphors
having that gender
Detecting antecedents
Select previous noun (only requires POS tagging
high recall, low precision)
Use parsed corpus (requires parser lower recall,
higher precision)

23
Mutual information

Compare the found probability of a noun-pronoun
pair with its independent probability
(Pair/N) / ((Noun/N) (Pron/N))
Take the log of the above
Makes the numbers easier to handle

24
Making Decision
25
Decisions for representation

The nearest referent is linked. The links can be
followed to the first mentionthe anchor.
Information percolates to the most recent
antecedent in the chain.

26
Previous Research
27
Hobbs Algorithm (1978)

Use parsed syntactic trees.
Closeness different from linear search
Parallelism of syntactic structures.
Disadvantages
Finding the correct parse trees.
Robustness (?)

28
Lappin Leass Algorithm (1994)

Model the salience of discourse constituents.
Parsing.
Finding functional roles.
Ranking points for match on factors (e.g. match
on both antecedent and anaphor is subject).
Disadvantages
Finding correct parse trees.
This step is often replaced by robust taggers.
Robustness (?)
Ad Hoc (?).

29
Centering Algorithm (1995)

Model the salience of discourse constituents.
One item is in Focus (Backward looking / Forward
looking single focus versions).
Theoretical account of anaphora.
Disadvantages
Often criticized for vagueness.
Robustness (?)

30
Clustering (1996)

Noun Phrase co-reference
Discovered from co-occurrence in large corpora.
Anaphora(?)
Cardie Wagstaff (1996)
Supervised learning - viewing the task as a
classification task.

31
Our Statistical method
32
Statistical Method

Decision Tree Learning
Soon, Ng, Lim, 2001. A Machine Learning
Approach to Coreference Resolution of Noun
Phrases. Computational Linguistics, Vol. 27(4).
The core of the idea is to give each candidate a
context vector, which is calculated from the
match between the anaphor and the antecedent on
some selected features.

33
Statistical Method

We want to do something similar to Soon et al.
but using Memory Based Learning, or Analogical
Modeling.
Ongoing research
Doctoral student Anders Nøklestad
Univ. in Oslo. ILF.
Christer Johansson
Lars G. Johnsen Analogical Modeling

34
Getting Match Vectors

Match depends on context vectors.
anaphor 1 2 1 0 1 0
antecedent 1 1 0 1 0 0 0 match 1 - -
antecedent 2 1 2 1 0 0 0 match 2 -
antecedent 3 1 0 1 0 1 0 match 3 -
...
antecedent 39 1 0 0 0 0 0 match 39 - - -
antecedent 40 0 0 1 0 0 0 match 40 - - -

35
Machine Learning training

We start from a large collection of examples.
For each anaphor
construct a match vector for each candidate
mark the vector for antecedent (yes/no)
The match is calculated for 9 features
string, lemma, suffix (form)
subject, object, complement (of the same verb)
same functional role
grammatical gender
number

36
More candidates for features

Natural gender of proper nouns.
tested gave about 0.5 better results.
Natural gender for nouns.
Named Entity Recognition
Selectional Restrictions
Verbal relations (if you hit somebody it has some
possible results ...).

37
Getting Match Vectors

When training we know which was the antecedent.
The match vectors are stored with their outcome.
anaphor 1 2 1 0 1 0
antecedent 1 1 0 1 0 0 0 match 1 - -
no
antecedent 2 1 2 1 0 0 0 match 2 -
yes
antecedent 3 1 0 1 0 1 0 match 3 -
no
...
antecedent 39 1 0 0 0 0 0 match 39 - - -
no
antecedent 40 0 0 1 0 0 0 match 40 - - -
no

38
Percolation Match Vectors

Percolation If we have determined a referential
link between say antecedent 40 and 39, then all
matches of 40 are available at position 39. (The
position numbers are not stored).
The match vectors are stored with their outcome.
anaphor 1 2 1 0 1 0
antecedent 1 1 0 1 0 0 0 match 1 - -
no
antecedent 2 1 2 1 0 0 0 match 2 -
yes
antecedent 3 1 0 1 0 1 0 match 3 -
no
...
antecedent 39 1 0 0 0 0 0 match 39 - -
no
antecedent 40 0 0 1 0 0 0 match 40 - - -
no

39
Machine Learning testing

Construct match vectors for the nearest 40
candidates.
Check the outcome against the large database
(training data).
For example, for the first candidate the nearest
neighbor has 4 matching features. Collect from
the database all exemplars with 3 or 4 matching
features.
Outcome 90 no 10 yes

40
Machine Learning testing

Repeat for the 40 candidates.
Outcome 1 90 no 10 yes
Outcome 2 43 no 5 yes
...
outcome 40 120 no 10 yes
How to decide for yes or no?

41
Machine Learning testing

Repeat for the 40 candidates.
Outcome 1 90 no 10 yes
Outcome 2 43 no 5 yes
...
outcome 40 120 no 10 yes
How to decide for yes or no?
We have decided that the most extreme is probably
the best. We have to calculate the expected
values for yes / no from the training set.
Z-scores. (Chi-square also possible)
Score yes(Observed - Expected) / std.dev
-
no(Observed - Expected) / std.dev

42
Machine Learning testing

On training corpus
Apply leave-one-out testing with TiMBL (test each
instance against the rest of the database)
Calculate mean and std.dev for match strength for
each class (yes and no)

43
Machine Learning testing

For each candidate in new text
Classify with TiMBL
For each class, find the deviation from the class
mean
Express deviation in number of std.dev.
Select the candidate with the largest difference
between number of std.dev. from yes mean and
number of std.dev. from nomean

44
Results Pronoun Anaphora

Precision 45.0 std.dev 9.1
Recall 66.4 std.dev 2.6
F-ratio 53.0 std.dev 6.5
Results the best for looking 40 candidates back.
Results for 20 candidates are lower.
Results with z-score selection much better than
baseline
(TiMBL prec. 53.8 recall 2.2 F 4.2)
(Closest prec. 12.5 recall 18 F 14.8)

45
Results Pronoun Anaphora

Frequency weighting gave worse results.
Freq.weight is to give the positions where we
expect more antecedents a higher weight.

46
Results Pronoun Anaphoravs. Chunk Tagging

Chunk Tagging (CoNLL-2000) is a task for
selecting a brackets for phrases.
z-score selection does not work (!)
frequency weighting works (tags with higher
frequency favored provide better results). (!)
Classic crossover.

47
Pronoun Anaphora

Why is Pronoun anaphora different?
Poisson-process?
Each candidate has a low probability of being an
antecedent.
The selection is repeated until a best candidate
is found. (

48
Poisson Method

Selecting a neighborhood

49
Conclusion

Anaphora Resolution
is a fuzzy task
choose the best candidate
often there is not a majority for a yes
because there is inherently more no-answers than
yes.

50
Future Research

Getting better results
Finding better features
Changing a found antecedent to the last item in
the reference chain towards the anaphor.
evaluation issues
Prefer links that avoid hooking everything up
into one big chain this destroys precision.

51
Future Work

Getting larger training sets
Automatically annotate large amounts of texts.
Check automatic annotation by hand
Retrain.
Domain specificity genre, text type (dialogue,
monologue ...).

52
Thank you for listening

http//ling.uib.no/BREDT/
Christer.Johansson_at_lili.uib.no
Lars.Johnsen_at_lili.uib.no
Kaja.Borthen_at_hf.ntnu.no

53
State of the art

We have found very little research in Scandinavia
on this topic.
The Message Understanding Conference (MUC 1..7)
contained approaches for co-reference.

Write a Comment

User Comments (0)