Title: BREDT Processing Reference in Discourse
1BREDTProcessing Reference in Discourse
- Christer Johansson, UiB
- Anders Nøklestad, UiO
2BREDT
- Discover and determine chains of reference.
- Fairly simple statistical methods
- Partial goals
- Finding selectional restrictions
- Automatically generate useful semantic structure
from co-occurrence
3Goals of BREDT
- Develop statistical methods and resources for the
discovery of referential chains in (arbitrary)
text.
4Proposals
- Discourse analysis is a fundamental module of
language processing (cf. syntax, phonology, and
morphology). - Discourse Analysis can be performed without full
parsing. - DA helps the parser make decisions.
5Simple examples
- Pronouns
- The monkey1 ate the banana2 because ...
- it was hungry. itmonkey
- it was ripe. itbanana
- it was tea time it specification of the
situation
6Simple examples
- Definites
- Ola ødela armen / Ola broke the arm.
- Ola broke his arm.
- The definite form indicates that the noun is
known. In this case, it can be resolved by
common knowledge that a person has-an arm.
7Simple examples
- Definites
- The definite signals that something has been
mentioned before. It initiates a search for
reference. - General reference
- The lion is a big cat.
- if no previous reference then lion refers to
the species. - Cats are hungry.
- a link could be established to represent the
knowledge that lions is a sub-group of cats, and
cats are hungry, therefore lions are hungry.
8Across Sentence Boundaries
- Unni was ill. A doctor came to see her. She said
that she had to be hospitalized, and then she
wrote her a prescription. - What do doctors (or patients) do?
- Possible to find out from collocations?
9Applications ...
10Machine Translation
- The correct translation of a pronoun depends on
what it refers to. - Translation of a definite noun may depend on its
informative status.
11Prosody (e.g. in text-to-speech)
- Given information is seldom stressed
12New vs. Given(Horne Johansson 1991)
- John wants a dachshund, but Im not sure he can
take care of a dog. - Dog is given information because dachshund is
a kind of dog. - John wants a dog, but Im not sure he can take
care of a dachshund. - Dachshund is a specification of dog, and
therefore new information. (The supposition might
be that a dachshund is more demanding than the
typical dog. There is usually a reason why
something is said.)
13Applications
- Text-to-speech
- Given information is likely stressed.
- Discourse Focus
- Information could be given via semantic
relations - superordinate/subordinate (x is-a y)
- part/whole (has-a)
14Information Retrieval
15Why?
- Reference is important in information retrieval
because ... - Referring expressions hide key words
- which makes it hard to automatically find the
relevant keywords.
16IR
- The detection of central themes in a text is
facilitated by reference detection. - Assumption themes are referred to often.
- via pronouns
- via semantic relations
17There are plenty of applications for BREDT
18Automatically Finding Features
19Selectional Restrictions
- He/sheSubject Verb
- Which verbs have he or she as subjects.
- Mutual Information (given total occurrences of
each verb).
20Separating the living
- He runs -- The boy runs.
- Significant he-run gives indication that subject
is often living. - Cross more than one verb increase precision.
- Look at the nouns that go with these verbs.
21Other features
- The same technique might detect natural gender,
and other relevant features. - Look at clusters of verbs that have a significant
difference between he and she as subjects (or
objects). - Look at nouns that go with these verbs.
22Alternative ways of detecting natural gender
(Hale)
- Noun-pronoun chains
- Prob. of noun gender relative freq. of anaphors
having that gender - Detecting antecedents
- Select previous noun (only requires POS tagging
high recall, low precision) - Use parsed corpus (requires parser lower recall,
higher precision)
23Mutual information
- Compare the found probability of a noun-pronoun
pair with its independent probability - (Pair/N) / ((Noun/N) (Pron/N))
- Take the log of the above
- Makes the numbers easier to handle
24Making Decision
25Decisions for representation
- The nearest referent is linked. The links can be
followed to the first mentionthe anchor. - Information percolates to the most recent
antecedent in the chain.
26Previous Research
27Hobbs Algorithm (1978)
- Use parsed syntactic trees.
- Closeness different from linear search
- Parallelism of syntactic structures.
- Disadvantages
- Finding the correct parse trees.
- Robustness (?)
28Lappin Leass Algorithm (1994)
- Model the salience of discourse constituents.
- Parsing.
- Finding functional roles.
- Ranking points for match on factors (e.g. match
on both antecedent and anaphor is subject). - Disadvantages
- Finding correct parse trees.
- This step is often replaced by robust taggers.
- Robustness (?)
- Ad Hoc (?).
29Centering Algorithm (1995)
- Model the salience of discourse constituents.
- One item is in Focus (Backward looking / Forward
looking single focus versions). - Theoretical account of anaphora.
- Disadvantages
- Often criticized for vagueness.
- Robustness (?)
30Clustering (1996)
- Noun Phrase co-reference
- Discovered from co-occurrence in large corpora.
- Anaphora(?)
- Cardie Wagstaff (1996)
- Supervised learning - viewing the task as a
classification task.
31Our Statistical method
32Statistical Method
- Decision Tree Learning
- Soon, Ng, Lim, 2001. A Machine Learning
Approach to Coreference Resolution of Noun
Phrases. Computational Linguistics, Vol. 27(4). - The core of the idea is to give each candidate a
context vector, which is calculated from the
match between the anaphor and the antecedent on
some selected features.
33Statistical Method
- We want to do something similar to Soon et al.
but using Memory Based Learning, or Analogical
Modeling. - Ongoing research
- Doctoral student Anders Nøklestad
- Univ. in Oslo. ILF.
- Christer Johansson
- Lars G. Johnsen Analogical Modeling
34Getting Match Vectors
- Match depends on context vectors.
- anaphor 1 2 1 0 1 0
- antecedent 1 1 0 1 0 0 0 match 1 - -
- antecedent 2 1 2 1 0 0 0 match 2 -
- antecedent 3 1 0 1 0 1 0 match 3 -
- ...
- antecedent 39 1 0 0 0 0 0 match 39 - - -
- antecedent 40 0 0 1 0 0 0 match 40 - - -
35Machine Learning training
- We start from a large collection of examples.
- For each anaphor
- construct a match vector for each candidate
- mark the vector for antecedent (yes/no)
- The match is calculated for 9 features
- string, lemma, suffix (form)
- subject, object, complement (of the same verb)
- same functional role
- grammatical gender
- number
36More candidates for features
- Natural gender of proper nouns.
- tested gave about 0.5 better results.
- Natural gender for nouns.
- Named Entity Recognition
- Selectional Restrictions
- Verbal relations (if you hit somebody it has some
possible results ...).
37Getting Match Vectors
- When training we know which was the antecedent.
- The match vectors are stored with their outcome.
- anaphor 1 2 1 0 1 0
- antecedent 1 1 0 1 0 0 0 match 1 - -
no - antecedent 2 1 2 1 0 0 0 match 2 -
yes - antecedent 3 1 0 1 0 1 0 match 3 -
no - ...
- antecedent 39 1 0 0 0 0 0 match 39 - - -
no - antecedent 40 0 0 1 0 0 0 match 40 - - -
no
38Percolation Match Vectors
- Percolation If we have determined a referential
link between say antecedent 40 and 39, then all
matches of 40 are available at position 39. (The
position numbers are not stored). - The match vectors are stored with their outcome.
- anaphor 1 2 1 0 1 0
- antecedent 1 1 0 1 0 0 0 match 1 - -
no - antecedent 2 1 2 1 0 0 0 match 2 -
yes - antecedent 3 1 0 1 0 1 0 match 3 -
no - ...
- antecedent 39 1 0 0 0 0 0 match 39 - -
no - antecedent 40 0 0 1 0 0 0 match 40 - - -
no
39Machine Learning testing
- Construct match vectors for the nearest 40
candidates. - Check the outcome against the large database
(training data). - For example, for the first candidate the nearest
neighbor has 4 matching features. Collect from
the database all exemplars with 3 or 4 matching
features. - Outcome 90 no 10 yes
40Machine Learning testing
- Repeat for the 40 candidates.
- Outcome 1 90 no 10 yes
- Outcome 2 43 no 5 yes
- ...
- outcome 40 120 no 10 yes
- How to decide for yes or no?
41Machine Learning testing
- Repeat for the 40 candidates.
- Outcome 1 90 no 10 yes
- Outcome 2 43 no 5 yes
- ...
- outcome 40 120 no 10 yes
- How to decide for yes or no?
- We have decided that the most extreme is probably
the best. We have to calculate the expected
values for yes / no from the training set. - Z-scores. (Chi-square also possible)
- Score yes(Observed - Expected) / std.dev
- -
- no(Observed - Expected) / std.dev
42Machine Learning testing
- On training corpus
- Apply leave-one-out testing with TiMBL (test each
instance against the rest of the database) - Calculate mean and std.dev for match strength for
each class (yes and no)
43Machine Learning testing
- For each candidate in new text
- Classify with TiMBL
- For each class, find the deviation from the class
mean - Express deviation in number of std.dev.
- Select the candidate with the largest difference
between number of std.dev. from yes mean and
number of std.dev. from nomean
44Results Pronoun Anaphora
- Precision 45.0 std.dev 9.1
- Recall 66.4 std.dev 2.6
- F-ratio 53.0 std.dev 6.5
- Results the best for looking 40 candidates back.
Results for 20 candidates are lower. - Results with z-score selection much better than
baseline - (TiMBL prec. 53.8 recall 2.2 F 4.2)
- (Closest prec. 12.5 recall 18 F 14.8)
45Results Pronoun Anaphora
- Frequency weighting gave worse results.
- Freq.weight is to give the positions where we
expect more antecedents a higher weight.
46Results Pronoun Anaphoravs. Chunk Tagging
- Chunk Tagging (CoNLL-2000) is a task for
selecting a brackets for phrases. - z-score selection does not work (!)
- frequency weighting works (tags with higher
frequency favored provide better results). (!) - Classic crossover.
47Pronoun Anaphora
- Why is Pronoun anaphora different?
- Poisson-process?
- Each candidate has a low probability of being an
antecedent. - The selection is repeated until a best candidate
is found. (
48Poisson Method
49Conclusion
- Anaphora Resolution
- is a fuzzy task
- choose the best candidate
- often there is not a majority for a yes
- because there is inherently more no-answers than
yes.
50Future Research
- Getting better results
- Finding better features
- Changing a found antecedent to the last item in
the reference chain towards the anaphor. - evaluation issues
- Prefer links that avoid hooking everything up
into one big chain this destroys precision.
51Future Work
- Getting larger training sets
- Automatically annotate large amounts of texts.
- Check automatic annotation by hand
- Retrain.
- Domain specificity genre, text type (dialogue,
monologue ...).
52Thank you for listening
- http//ling.uib.no/BREDT/
- Christer.Johansson_at_lili.uib.no
- Lars.Johnsen_at_lili.uib.no
- Kaja.Borthen_at_hf.ntnu.no
53State of the art
- We have found very little research in Scandinavia
on this topic. - The Message Understanding Conference (MUC 1..7)
contained approaches for co-reference.