Exploiting%20Subjectivity%20Classification%20to%20Improve%20Information%20Extraction - PowerPoint PPT Presentation

About This Presentation
Title:

Exploiting%20Subjectivity%20Classification%20to%20Improve%20Information%20Extraction

Description:

Exploiting Subjectivity Classification to Improve Information Extraction – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 31
Provided by: Wil851
Category:

less

Transcript and Presenter's Notes

Title: Exploiting%20Subjectivity%20Classification%20to%20Improve%20Information%20Extraction


1
Exploiting Subjectivity Classificationto Improve
Information Extraction
  • Ellen Riloff
  • University of Utah
  • Janyce Wiebe
  • University of Pittsburgh
  • William Phillips
  • University of Utah

2
Subjectivity ?
  • Definition Subjective language expresses or
    refers to opinions, emotions, sentiments and
    other private states.
  • Related Work
  • Sentiments (Turney Littman 2003 Dave,
    Lawrence, Pennock 2003 Pang Lee 2004)
  • Product Reputation Tracking (Morinaga et al.
    2002 Yi et al. 2003)
  • Opinion Oriented Summarization and QA (Hu Liu
    2004 Yu Hatzivassiloglou 2003)
  • Opinion - personal beliefs
  • Emotion - state of mind
  • Sentiments - positive/negative judgements

3
Motivation
  • Our observation many false hits produced by
    Information Extraction (IE) systems come from
    subjective sentences.
  • Hypothesis we can improve IE performance by
    avoiding extractions from subjective sentences.

4
Examples
  • DAubruisson unleashed harsh attacks on Duarte
  • The Parliament exploded into fury against the
  • government when word leaked out
  • The subversives must suspend the aggression
    against
  • the people and the destruction of the economy

5
The Big Picture
Subjective Sentence Classifier
subjective sentences
objective sentences
Full Information Extraction
Selective Information Extraction
6
The Subjectivity Classifier
  • Most documents contain a mix of subjective and
    objective sentences
  • 44 of sentences in newspaper articles
    subjective!
  • (Wiebe et al. 2004)
  • We used the Naïve Bayes subjective sentence
    classifier developed by Wiebe Riloff 2005.
  • Classifies at sentence level
  • unsupervised
  • rivals best supervised methods

7
Initial Training Data Creation
unlabeled texts
subjective clues
rule-based subjective sentence classifier
rule-based objective sentence classifier
subjective objective sentences
8
Naïve Bayes Training
Naïve Bayes Classifier
extraction pattern learner
Naïve Bayes training
9
NB Confidence Measure
CM
10
MUC-4 IE Task
  • To extract information about terrorist events in
    Latin America.
  • Evaluated performance on 4 types of information
  • perpetrators (individuals), victims, targets,
    weapons
  • Corpus 1700 texts
  • 1400 used for training, 100 for tuning, 200 for
    testing
  • Used Autoslog-TS to generate extraction patterns
  • system used 397 patterns

11
Base IE System Performance
  • System Rec Prec F Correct
    Wrong
  • IE .52 .42 .47 266 367

12
Filtering Subjective Sentences
  • System Rec Prec F Correct
    Wrong
  • IE .52 .42 .47 266 367

IESubjFilter .44 .44 .44 218 (-48)
273 (-94)
13
Source Attribution Sentences
  • In news articles, factual information is often
    prefaced with a source attribution.
    Examples The Associated Press
    reported The President stated
  • Source attribution sentences often contain
    important facts even if subjective language is
    also present.

14
Source Attribution Modification
  • Keep the subjective sentences if they contain a
    source attribution.
  • 1) the sentence contains a communication verb
  • affirm, announce, cite, confirm, convey,
    disclose, report,
  • tell, say, state
  • 2) the subjectivity classifier considers the
    sentence to be only weakly subjective (CM ? 25)

15
Results with Source Attribution Modification
System Rec. Prec. F Correct
Wrong IE .52 .42 .47 266
367 IESubjFilter .44 .44 .44 218(-48)
273(-94) IESubjFilter2 .46 .44
.45 231(-35) 289(-78)
16
Selective Filtering
  • We observed that subjective sentence can contain
    important facts. For example

    He was outraged by the terrorist attack on
    the World Trade Center.
  • Modification selectively extract information
    from subjective sentences
  • Done using Indicator Patterns.

17
Indicator Patterns
  • We defined an indicator pattern as a pattern that
    has the following Autoslog-TS statistics
    P(relevant pattern) ? 0.65 and
    Frequency ? 10
  • Indicator Patterns clearly represent a fact of
    interest
  • murder of X
  • X was assassinated .

18
Results for Selective Subjectivity Filtering
19
Removing Subjective Extraction Patterns
  • Example
  • .to destroy the building.
  • to destroy the process of reconciliation.
  • Use subjectivity analysis to remove subjective
    patterns.
  • We classified a pattern as subjective
    if 1) P(subjective pattern) gt .50
    and
  • 2) frequency ? 10

20
Final Results
System Rec Prec F Correct
Wrong IE .52 .42 .47 266
367 IESubjFilter .44 .44 .44 218 (-48)
273 (-94) IESubjFilter2 .46 .44
.45 231 (-35) 289 (-78) IESF2Slct .51 .45
.48 258 (-8) 311 (-56) IESF2Slct
-SubjEPs .51 .46 .48 258(-8)
305(-62)
21
Subjectivity Filtering Combined with Topic
Classification
System Rec Prec IE .52 .42 IE w/Perfect
TC .52 .53
IE w/Perfect TC SubjFilter .51
.56
22
Conclusions
  • Subjectivity filtering strategies improved IE
    precision with minimal recall loss.
  • The benefits of subjectivity classification are
    synergistic with those of topic classification.
  • As subjectivity classification improves, we
    expect corresponding improvements to IE.

23
IE Evaluation
  • Performed at extraction level, before template
    generation
  • Standard IE System

Slot Extraction Component
Template Generation Component
texts
extracts
24
  • We defined an indicator pattern as a pattern that
    has the following Autoslog-TS statistics
    P(relevant pattern) ? 0.65 and
    Frequency ? 10
  • Using only the indicator patterns for IE not
    sufficient.

Rec Prec F IE .52 .42 .47 IE
(Indicators Only) .40 .54 .46
25
IE System
  • We used Autoslog-TS to generate extraction
    patterns.
  • 40,553 distinct patterns were learned
  • We manually reviewed top patterns (2808
    patterns)
  • The final system used 397 patterns.

26
Examples of Filtered Extractions
  • The demonstrators, convoked by the solidarity
    with Latin America Committee, verbally attacked
    Salvadoran President Alfredo Cristiani and have
    asked the Spanish government to offer itself as a
    mediator to promote and end to the armed
    conflict. PATTERN attacked
    ltdobjgt VICTIM Salvadoran President Alfredo
    Cristiani

27
Examples of Filtered Extractions
  • The crime was directed at hindering the
    development of the electoral process and
    destroying the reconciliation process
  • PATTERN destroying ltdobjgt
  • TARGET the reconciliation process
  • Presidents, political and social figures of the
    continent have said that the solution is not
    based on the destruction of a native plant but in
    active fight against drug consumption.
  • PATTERN destruction of ltnpgt
  • TARGET a native plant

28
Breakdown by Extraction Type
  • Category Baseline SubjFilter
  • Rec Prec Rec Prec
  • Perp .47 .33 .45 .38
  • Victim .51 .50 .50 .52
  • Target .63 .42 .62 .47
  • Weapon .45 .39 .43 .42
  • Total .52 .42 .51 .46

29
Subjective Patterns
The following extraction patterns were classified
as subjective
  • attacks on ltnpgt to attack ltdobjgt
  • communique by ltnpgt to destroy ltdobjgt
  • ltsubjgt was linked leaders of ltnpgt
  • ltsubjgt unleashed was aimed at ltnpgt
  • offensive against ltnpgt dialogue with ltnpgt

30
Metaphor
  • False hits can come from subjective sentences
    that contain metaphorical language.
  • The Parliament exploded into fury against the
    government when word leaked out
Write a Comment
User Comments (0)
About PowerShow.com