CSA4040:Advanced Topics in NLP - PowerPoint PPT Presentation

About This Presentation
Title:

CSA4040:Advanced Topics in NLP

Description:

... WITH TAIWAN'S LARGEST CAR DEALER, THE COMPANY SAID ... string 'TAIWAN'S LARGEST CAR DEALER.' :slots (ENTITY (type COMPANY) (relationship JV-PARENT) ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 29
Provided by: MikeR2
Category:
Tags: nlp | advanced | csa4040 | topics

less

Transcript and Presenter's Notes

Title: CSA4040:Advanced Topics in NLP


1
CSA4040Advanced Topics in NLP
  • Information Extraction III
  • Coreference

2
References
  • D. Appelt D. Israel, Introduction to IE
    Technology, IJCAI-99 Tutorial (1999)
  • Van Deemter and Kibble (1999) What is
    coreference and what should coreference
    annotation be?
  • J.F. McCarthy W. Lenhert, Using Decision Trees
    for Coreference Resolution, Proc. IJCAI 1995

3
What is Coreference?
  • The relation of coreference has been defined as
    holding between two noun phrases if they "refer
    to the same entity".
  • NPs a and ß corefer if ref(a) ref(ß)
  • Equivalence relation symmetrical, transitive and
    reflexive relation which partitions NPs into a
    set of equivalence classes.
  • Issue must reference actually be identified in
    order to establish coreference?

4
Different Kinds of Noun Phrase
  • Proper Nouns
  • Single Word
  • Multiple Word
  • Pronouns
  • Descriptions
  • Definite
  • Indefinite

5
Coreference Tags in MUC6
  • The ID and REF attributes are used to indicate
    that there is a coreference link between two
    strings. The ID is arbitrarily but uniquely
    assigned to the string during markup. The REF
    uses that ID to indicate the coreference link.
    ltCOREF ID"100"gtLawson Mardon Group
    Ltd.lt/COREFgt said ltCOREF ID"101" TYPE"IDENT"
    REF"100"gtitlt/COREFgt ...

6
Proper Nouns
  • Obvious case two separate occurrences of the
    same proper noun.Paris is the capital of France
    Paris is beautiful
  • or of identical phrasesMr. Dom Mintoff is as Mr.
    Dom Mintoff does
  • But note that similar tokens do not always
    co-refer, even when proper nouns. ExampleChris
    Attard met Chris Attard for the first time
  • Conversely, different looking tokens can
    coreferGenève Geneva Ginevra Genf

7
Amo et. al 1999
  • Definition of replicantes relation between
    variants of Spanish proper names. Names are in
    replicancia relation if
  • One of them coincides with the initials of the
    other.
  • The shorter is contained in the longer
  • Every word of the shorter is a version of some
    word in the longer.Jose Luis Martinez Lopez, JL
    Martinez, J.L. Martinez, J Martinez, Luis
    Martinez, Jose Martinez, Martinez, JL, M, L

8
Pronouns
  • Most pronouns refer to an antecedent which occurs
    earlier in the text (not necessarily in the same
    sentence).John came into the room. He shivered.
  • The pronoun is said to be in an anaphoric
    relation to the antecedent.
  • Determination of reference can require large
    amounts of knowledge processing.
  • The police refused the demonstrators a permit
    because they feared/advocated violence.

9
Anaphor versus Coreference
  • Anaphoric and coreferential relations often
    coincide, but
  • Not all coreferential relations are
    anaphoricThe Rector made his speech.
    Ellul-Micallef, 55
  • Not all anaphoric relations are
    coreferential.Every man loves his mother.
  • Use substitution test to determine whether there
    is co-reference. If there is a change in meaning
    then expressions do not corefer.Every man loves
    every mans mother

10
Descriptions
  • Definite DescriptionsThe Armonk based giantThe
    head honcho at MicrosoftThe richest man in the
    world
  • Indefinite Descriptions A rogue stateSome
    participants arrived. When they left

11
Indefinite Noun Phrases
  • Indefinites usually introduce new referents into
    text and are therefore unlikely to refer to
    earlier items.A cat crept into the room. It
    leaped out.
  • There are exceptions when indefinites refer to a
    classCars are fast. A car can reach 200 kph.

12
Example
Motor Vehicles International Ltd. announced a
major management shakeup. MVI said that its CEO
had resigned. The big automaker is attempting to
regain market share. It will announce significant
losses for the third quarter. A company spokesman
said the company will be moving their operations.
13
Two Approaches to Coreference
  • Knowledge Engineering
  • Based on adapting theoretical work on coreference
    to the sparse and incomplete analyses obtained in
    IE.
  • Automatically Trained Systems
  • Small range of approaches
  • Probabilistic and non-probabilistic

14
Algorithm - KE Approach
  • Identify each noun phrase
  • Mark each noun phrase with
  • type information, animacy etc.
  • agreement info (number/gender)
  • syntactic features (definiteness)
  • possibly other semantic information from
    dictionary (e.g. vehicle/furniture/transport)

15
Algorithm KE Approach
  • Try to distinguish between referring expressions
    and referents
  • Length
  • Syntactic criteria (proper noun/description)
  • Internal Structure
  • Gazetteer
  • For each referring expression
  • Determine accessible antecedents
  • Filter with type check
  • Rank candidates

16
Accessibility Antecedents
  • Names entire preceding text possibly other
    documents in collection.
  • Match using aliases/acronyms
  • Definite noun phrases part of the preceding
    text
  • same sentence previous sentence previous
    paragraph etc.
  • Pronouns same but smaller stretches of
    preceding text (pronouns rarely refer across
    paragraph boundaries).

17
Filter for Semantic Consistency
  • Number/GenderJohn lost Mary's trousers. Then
    he/she found them/her
  • Semantic TypeJohn dropped the hammer on the
    glass table. It shattered/bounced. Mike tried to
    save CSAI. The Department was burning. The chief
    acted heroically.

18
State of the Art
  • MUC6 precision .72, recall .63
  • MUC7 UPENN's high precision system, precision
    .80, recall .30
  • The fact that IE parsers are incomplete, shallow
    and not fully reliable motivates the statistical
    appraches.

19
Statistical Approaches
  • Motivated by errors introduced by earlier
    processing
  • Supervised learning
  • Data determine, by inspection, correct
    coreference or classes.

20
Methodology of Statistical Methods
  • Produce tagged data in which all coreference
    pairs are tagged as such.
  • Determine which system-recognisable features of
    the individual expressions are relevant to the
    co-reference judgement.
  • Apply some learning technique to the resulting
    feature vectors.

21
UMASS RESOLVE(McCarthy Lenhert 1995)
  • Which features of phrases to look for when
    determining coreference
  • How to combine evidence (ve and ve)?
  • Accumulation of errors arising from earlier
    stages of processing rather than from coreference
    procedures.
  • RESOLVE uses a decision tree based on Quinlans
    C4.5 system, induced from training examples, to
    decide whether pre-established pairs of referents
    are likely to be coreferents.

22
Resolve Method
  • Extract references along with coreference links
    from text using CMI (coref. marking interf.
    tool).
  • Create all possible pairings of references, and
    reduce elements of each pair to a feature vectors
    (patterns, POS tags, semantic features, context)
  • Coreferent pairs are ve instances (others are
    ve instances)
  • RESOLVE then trained on different partitions of
    the set of feature vectors.

23
Example
  • FAMILYMART CO. OF SEIBU SAISON GROUP WILL OPEN A
    CONVENIENCE STORE IN TAIPEI FRIDAY IN A JOINT
    VENTURE WITH TAIWANS LARGEST CAR DEALER, THE
    COMPANY SAID WEDNESDAY

24
Information Available via CMI
  • (string FAMILYMART CO.
  • slots (ENTITY
  • (name FAMILYMART CO.)
  • (type COMPANY)
  • (relationship JV-PARENT CHILD)))
  • (string TAIWANS LARGEST CAR DEALER.
  • slots (ENTITY
  • (type COMPANY)
  • (relationship JV-PARENT)
  • (nationality Taiwan (COUNTRY))))

25
Feature Vectors forFAMILYMART Co./TAIWANS
LARGEST CAR DEALER
Individual Phrases Individual Phrases Pair of Phrases Pair of Phrases
Attribute Value Attribute Value
NAME-1 YES ALIAS NO
JV-CHILD-1 NO BOTH-JV-CHLD NO
NAME-2 YES COMMON-NP NO
JV-CHILD-2 NO SAME-SENT YES
26
Decision Tree(after McCarthy Lenhert 1995)
27
MUC-5 Rules
  • IF both tokens contain the same company name THEN
    they are coreferent
  • IF both tokens contain the different company
    names THEN they are not coreferent
  • IF both tokens contain a common phrase THEN they
    are coreferent

28
Results
Write a Comment
User Comments (0)
About PowerShow.com