BREDT Processing Reference in Discourse - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

BREDT Processing Reference in Discourse

Description:

Dachshund is a specification of dog, and therefore new information. (The supposition might be that a dachshund is more demanding than the typical dog. ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 54
Provided by: lili99
Category:

less

Transcript and Presenter's Notes

Title: BREDT Processing Reference in Discourse


1
BREDTProcessing Reference in Discourse
  • Christer Johansson, UiB
  • Anders Nøklestad, UiO

2
BREDT
  • Discover and determine chains of reference.
  • Fairly simple statistical methods
  • Partial goals
  • Finding selectional restrictions
  • Automatically generate useful semantic structure
    from co-occurrence

3
Goals of BREDT
  • Develop statistical methods and resources for the
    discovery of referential chains in (arbitrary)
    text.

4
Proposals
  • Discourse analysis is a fundamental module of
    language processing (cf. syntax, phonology, and
    morphology).
  • Discourse Analysis can be performed without full
    parsing.
  • DA helps the parser make decisions.

5
Simple examples
  • Pronouns
  • The monkey1 ate the banana2 because ...
  • it was hungry. itmonkey
  • it was ripe. itbanana
  • it was tea time it specification of the
    situation

6
Simple examples
  • Definites
  • Ola ødela armen / Ola broke the arm.
  • Ola broke his arm.
  • The definite form indicates that the noun is
    known. In this case, it can be resolved by
    common knowledge that a person has-an arm.

7
Simple examples
  • Definites
  • The definite signals that something has been
    mentioned before. It initiates a search for
    reference.
  • General reference
  • The lion is a big cat.
  • if no previous reference then lion refers to
    the species.
  • Cats are hungry.
  • a link could be established to represent the
    knowledge that lions is a sub-group of cats, and
    cats are hungry, therefore lions are hungry.

8
Across Sentence Boundaries
  • Unni was ill. A doctor came to see her. She said
    that she had to be hospitalized, and then she
    wrote her a prescription.
  • What do doctors (or patients) do?
  • Possible to find out from collocations?

9
Applications ...
10
Machine Translation
  • The correct translation of a pronoun depends on
    what it refers to.
  • Translation of a definite noun may depend on its
    informative status.

11
Prosody (e.g. in text-to-speech)
  • Given information is seldom stressed

12
New vs. Given(Horne Johansson 1991)
  • John wants a dachshund, but Im not sure he can
    take care of a dog.
  • Dog is given information because dachshund is
    a kind of dog.
  • John wants a dog, but Im not sure he can take
    care of a dachshund.
  • Dachshund is a specification of dog, and
    therefore new information. (The supposition might
    be that a dachshund is more demanding than the
    typical dog. There is usually a reason why
    something is said.)

13
Applications
  • Text-to-speech
  • Given information is likely stressed.
  • Discourse Focus
  • Information could be given via semantic
    relations
  • superordinate/subordinate (x is-a y)
  • part/whole (has-a)

14
Information Retrieval
15
Why?
  • Reference is important in information retrieval
    because ...
  • Referring expressions hide key words
  • which makes it hard to automatically find the
    relevant keywords.

16
IR
  • The detection of central themes in a text is
    facilitated by reference detection.
  • Assumption themes are referred to often.
  • via pronouns
  • via semantic relations

17
There are plenty of applications for BREDT
18
Automatically Finding Features
19
Selectional Restrictions
  • He/sheSubject Verb
  • Which verbs have he or she as subjects.
  • Mutual Information (given total occurrences of
    each verb).

20
Separating the living
  • He runs -- The boy runs.
  • Significant he-run gives indication that subject
    is often living.
  • Cross more than one verb increase precision.
  • Look at the nouns that go with these verbs.

21
Other features
  • The same technique might detect natural gender,
    and other relevant features.
  • Look at clusters of verbs that have a significant
    difference between he and she as subjects (or
    objects).
  • Look at nouns that go with these verbs.

22
Alternative ways of detecting natural gender
(Hale)
  • Noun-pronoun chains
  • Prob. of noun gender relative freq. of anaphors
    having that gender
  • Detecting antecedents
  • Select previous noun (only requires POS tagging
    high recall, low precision)
  • Use parsed corpus (requires parser lower recall,
    higher precision)

23
Mutual information
  • Compare the found probability of a noun-pronoun
    pair with its independent probability
  • (Pair/N) / ((Noun/N) (Pron/N))
  • Take the log of the above
  • Makes the numbers easier to handle

24
Making Decision
25
Decisions for representation
  • The nearest referent is linked. The links can be
    followed to the first mentionthe anchor.
  • Information percolates to the most recent
    antecedent in the chain.

26
Previous Research
27
Hobbs Algorithm (1978)
  • Use parsed syntactic trees.
  • Closeness different from linear search
  • Parallelism of syntactic structures.
  • Disadvantages
  • Finding the correct parse trees.
  • Robustness (?)

28
Lappin Leass Algorithm (1994)
  • Model the salience of discourse constituents.
  • Parsing.
  • Finding functional roles.
  • Ranking points for match on factors (e.g. match
    on both antecedent and anaphor is subject).
  • Disadvantages
  • Finding correct parse trees.
  • This step is often replaced by robust taggers.
  • Robustness (?)
  • Ad Hoc (?).

29
Centering Algorithm (1995)
  • Model the salience of discourse constituents.
  • One item is in Focus (Backward looking / Forward
    looking single focus versions).
  • Theoretical account of anaphora.
  • Disadvantages
  • Often criticized for vagueness.
  • Robustness (?)

30
Clustering (1996)
  • Noun Phrase co-reference
  • Discovered from co-occurrence in large corpora.
  • Anaphora(?)
  • Cardie Wagstaff (1996)
  • Supervised learning - viewing the task as a
    classification task.

31
Our Statistical method
32
Statistical Method
  • Decision Tree Learning
  • Soon, Ng, Lim, 2001. A Machine Learning
    Approach to Coreference Resolution of Noun
    Phrases. Computational Linguistics, Vol. 27(4).
  • The core of the idea is to give each candidate a
    context vector, which is calculated from the
    match between the anaphor and the antecedent on
    some selected features.

33
Statistical Method
  • We want to do something similar to Soon et al.
    but using Memory Based Learning, or Analogical
    Modeling.
  • Ongoing research
  • Doctoral student Anders Nøklestad
  • Univ. in Oslo. ILF.
  • Christer Johansson
  • Lars G. Johnsen Analogical Modeling

34
Getting Match Vectors
  • Match depends on context vectors.
  • anaphor 1 2 1 0 1 0
  • antecedent 1 1 0 1 0 0 0 match 1 - -
  • antecedent 2 1 2 1 0 0 0 match 2 -
  • antecedent 3 1 0 1 0 1 0 match 3 -
  • ...
  • antecedent 39 1 0 0 0 0 0 match 39 - - -
  • antecedent 40 0 0 1 0 0 0 match 40 - - -

35
Machine Learning training
  • We start from a large collection of examples.
  • For each anaphor
  • construct a match vector for each candidate
  • mark the vector for antecedent (yes/no)
  • The match is calculated for 9 features
  • string, lemma, suffix (form)
  • subject, object, complement (of the same verb)
  • same functional role
  • grammatical gender
  • number

36
More candidates for features
  • Natural gender of proper nouns.
  • tested gave about 0.5 better results.
  • Natural gender for nouns.
  • Named Entity Recognition
  • Selectional Restrictions
  • Verbal relations (if you hit somebody it has some
    possible results ...).

37
Getting Match Vectors
  • When training we know which was the antecedent.
  • The match vectors are stored with their outcome.
  • anaphor 1 2 1 0 1 0
  • antecedent 1 1 0 1 0 0 0 match 1 - -
    no
  • antecedent 2 1 2 1 0 0 0 match 2 -
    yes
  • antecedent 3 1 0 1 0 1 0 match 3 -
    no
  • ...
  • antecedent 39 1 0 0 0 0 0 match 39 - - -
    no
  • antecedent 40 0 0 1 0 0 0 match 40 - - -
    no

38
Percolation Match Vectors
  • Percolation If we have determined a referential
    link between say antecedent 40 and 39, then all
    matches of 40 are available at position 39. (The
    position numbers are not stored).
  • The match vectors are stored with their outcome.
  • anaphor 1 2 1 0 1 0
  • antecedent 1 1 0 1 0 0 0 match 1 - -
    no
  • antecedent 2 1 2 1 0 0 0 match 2 -
    yes
  • antecedent 3 1 0 1 0 1 0 match 3 -
    no
  • ...
  • antecedent 39 1 0 0 0 0 0 match 39 - -
    no
  • antecedent 40 0 0 1 0 0 0 match 40 - - -
    no

39
Machine Learning testing
  • Construct match vectors for the nearest 40
    candidates.
  • Check the outcome against the large database
    (training data).
  • For example, for the first candidate the nearest
    neighbor has 4 matching features. Collect from
    the database all exemplars with 3 or 4 matching
    features.
  • Outcome 90 no 10 yes

40
Machine Learning testing
  • Repeat for the 40 candidates.
  • Outcome 1 90 no 10 yes
  • Outcome 2 43 no 5 yes
  • ...
  • outcome 40 120 no 10 yes
  • How to decide for yes or no?

41
Machine Learning testing
  • Repeat for the 40 candidates.
  • Outcome 1 90 no 10 yes
  • Outcome 2 43 no 5 yes
  • ...
  • outcome 40 120 no 10 yes
  • How to decide for yes or no?
  • We have decided that the most extreme is probably
    the best. We have to calculate the expected
    values for yes / no from the training set.
  • Z-scores. (Chi-square also possible)
  • Score yes(Observed - Expected) / std.dev
  • -
  • no(Observed - Expected) / std.dev

42
Machine Learning testing
  • On training corpus
  • Apply leave-one-out testing with TiMBL (test each
    instance against the rest of the database)
  • Calculate mean and std.dev for match strength for
    each class (yes and no)

43
Machine Learning testing
  • For each candidate in new text
  • Classify with TiMBL
  • For each class, find the deviation from the class
    mean
  • Express deviation in number of std.dev.
  • Select the candidate with the largest difference
    between number of std.dev. from yes mean and
    number of std.dev. from nomean

44
Results Pronoun Anaphora
  • Precision 45.0 std.dev 9.1
  • Recall 66.4 std.dev 2.6
  • F-ratio 53.0 std.dev 6.5
  • Results the best for looking 40 candidates back.
    Results for 20 candidates are lower.
  • Results with z-score selection much better than
    baseline
  • (TiMBL prec. 53.8 recall 2.2 F 4.2)
  • (Closest prec. 12.5 recall 18 F 14.8)

45
Results Pronoun Anaphora
  • Frequency weighting gave worse results.
  • Freq.weight is to give the positions where we
    expect more antecedents a higher weight.

46
Results Pronoun Anaphoravs. Chunk Tagging
  • Chunk Tagging (CoNLL-2000) is a task for
    selecting a brackets for phrases.
  • z-score selection does not work (!)
  • frequency weighting works (tags with higher
    frequency favored provide better results). (!)
  • Classic crossover.

47
Pronoun Anaphora
  • Why is Pronoun anaphora different?
  • Poisson-process?
  • Each candidate has a low probability of being an
    antecedent.
  • The selection is repeated until a best candidate
    is found. (

48
Poisson Method
  • Selecting a neighborhood

49
Conclusion
  • Anaphora Resolution
  • is a fuzzy task
  • choose the best candidate
  • often there is not a majority for a yes
  • because there is inherently more no-answers than
    yes.

50
Future Research
  • Getting better results
  • Finding better features
  • Changing a found antecedent to the last item in
    the reference chain towards the anaphor.
  • evaluation issues
  • Prefer links that avoid hooking everything up
    into one big chain this destroys precision.

51
Future Work
  • Getting larger training sets
  • Automatically annotate large amounts of texts.
  • Check automatic annotation by hand
  • Retrain.
  • Domain specificity genre, text type (dialogue,
    monologue ...).

52
Thank you for listening
  • http//ling.uib.no/BREDT/
  • Christer.Johansson_at_lili.uib.no
  • Lars.Johnsen_at_lili.uib.no
  • Kaja.Borthen_at_hf.ntnu.no

53
State of the art
  • We have found very little research in Scandinavia
    on this topic.
  • The Message Understanding Conference (MUC 1..7)
    contained approaches for co-reference.
Write a Comment
User Comments (0)
About PowerShow.com