Title: Exploiting%20Subjectivity%20Classification%20to%20Improve%20Information%20Extraction
1Exploiting Subjectivity Classificationto Improve
Information Extraction
- Ellen Riloff
- University of Utah
- Janyce Wiebe
- University of Pittsburgh
- William Phillips
- University of Utah
2Subjectivity ?
- Definition Subjective language expresses or
refers to opinions, emotions, sentiments and
other private states. - Related Work
- Sentiments (Turney Littman 2003 Dave,
Lawrence, Pennock 2003 Pang Lee 2004) - Product Reputation Tracking (Morinaga et al.
2002 Yi et al. 2003) - Opinion Oriented Summarization and QA (Hu Liu
2004 Yu Hatzivassiloglou 2003) - Opinion - personal beliefs
- Emotion - state of mind
- Sentiments - positive/negative judgements
3Motivation
- Our observation many false hits produced by
Information Extraction (IE) systems come from
subjective sentences. -
- Hypothesis we can improve IE performance by
avoiding extractions from subjective sentences.
4Examples
- DAubruisson unleashed harsh attacks on Duarte
- The Parliament exploded into fury against the
- government when word leaked out
- The subversives must suspend the aggression
against - the people and the destruction of the economy
5The Big Picture
Subjective Sentence Classifier
subjective sentences
objective sentences
Full Information Extraction
Selective Information Extraction
6The Subjectivity Classifier
- Most documents contain a mix of subjective and
objective sentences - 44 of sentences in newspaper articles
subjective! - (Wiebe et al. 2004)
- We used the Naïve Bayes subjective sentence
classifier developed by Wiebe Riloff 2005.
- Classifies at sentence level
- unsupervised
- rivals best supervised methods
7Initial Training Data Creation
unlabeled texts
subjective clues
rule-based subjective sentence classifier
rule-based objective sentence classifier
subjective objective sentences
8Naïve Bayes Training
Naïve Bayes Classifier
extraction pattern learner
Naïve Bayes training
9NB Confidence Measure
CM
10MUC-4 IE Task
- To extract information about terrorist events in
Latin America. - Evaluated performance on 4 types of information
- perpetrators (individuals), victims, targets,
weapons - Corpus 1700 texts
- 1400 used for training, 100 for tuning, 200 for
testing - Used Autoslog-TS to generate extraction patterns
- system used 397 patterns
11Base IE System Performance
- System Rec Prec F Correct
Wrong - IE .52 .42 .47 266 367
12Filtering Subjective Sentences
- System Rec Prec F Correct
Wrong - IE .52 .42 .47 266 367
IESubjFilter .44 .44 .44 218 (-48)
273 (-94)
13Source Attribution Sentences
- In news articles, factual information is often
prefaced with a source attribution.
Examples The Associated Press
reported The President stated - Source attribution sentences often contain
important facts even if subjective language is
also present.
14Source Attribution Modification
- Keep the subjective sentences if they contain a
source attribution. - 1) the sentence contains a communication verb
- affirm, announce, cite, confirm, convey,
disclose, report, - tell, say, state
- 2) the subjectivity classifier considers the
sentence to be only weakly subjective (CM ? 25) -
15Results with Source Attribution Modification
System Rec. Prec. F Correct
Wrong IE .52 .42 .47 266
367 IESubjFilter .44 .44 .44 218(-48)
273(-94) IESubjFilter2 .46 .44
.45 231(-35) 289(-78)
16Selective Filtering
- We observed that subjective sentence can contain
important facts. For example
He was outraged by the terrorist attack on
the World Trade Center. - Modification selectively extract information
from subjective sentences - Done using Indicator Patterns.
17 Indicator Patterns
- We defined an indicator pattern as a pattern that
has the following Autoslog-TS statistics
P(relevant pattern) ? 0.65 and
Frequency ? 10 - Indicator Patterns clearly represent a fact of
interest - murder of X
- X was assassinated .
18Results for Selective Subjectivity Filtering
19Removing Subjective Extraction Patterns
- Example
- .to destroy the building.
- to destroy the process of reconciliation.
- Use subjectivity analysis to remove subjective
patterns. - We classified a pattern as subjective
if 1) P(subjective pattern) gt .50
and - 2) frequency ? 10
20Final Results
System Rec Prec F Correct
Wrong IE .52 .42 .47 266
367 IESubjFilter .44 .44 .44 218 (-48)
273 (-94) IESubjFilter2 .46 .44
.45 231 (-35) 289 (-78) IESF2Slct .51 .45
.48 258 (-8) 311 (-56) IESF2Slct
-SubjEPs .51 .46 .48 258(-8)
305(-62)
21Subjectivity Filtering Combined with Topic
Classification
System Rec Prec IE .52 .42 IE w/Perfect
TC .52 .53
IE w/Perfect TC SubjFilter .51
.56
22Conclusions
- Subjectivity filtering strategies improved IE
precision with minimal recall loss. - The benefits of subjectivity classification are
synergistic with those of topic classification. - As subjectivity classification improves, we
expect corresponding improvements to IE.
23IE Evaluation
- Performed at extraction level, before template
generation - Standard IE System
-
Slot Extraction Component
Template Generation Component
texts
extracts
24- We defined an indicator pattern as a pattern that
has the following Autoslog-TS statistics
P(relevant pattern) ? 0.65 and
Frequency ? 10 - Using only the indicator patterns for IE not
sufficient.
Rec Prec F IE .52 .42 .47 IE
(Indicators Only) .40 .54 .46
25IE System
- We used Autoslog-TS to generate extraction
patterns. - 40,553 distinct patterns were learned
- We manually reviewed top patterns (2808
patterns) - The final system used 397 patterns.
26Examples of Filtered Extractions
- The demonstrators, convoked by the solidarity
with Latin America Committee, verbally attacked
Salvadoran President Alfredo Cristiani and have
asked the Spanish government to offer itself as a
mediator to promote and end to the armed
conflict. PATTERN attacked
ltdobjgt VICTIM Salvadoran President Alfredo
Cristiani
27Examples of Filtered Extractions
- The crime was directed at hindering the
development of the electoral process and
destroying the reconciliation process - PATTERN destroying ltdobjgt
- TARGET the reconciliation process
-
- Presidents, political and social figures of the
continent have said that the solution is not
based on the destruction of a native plant but in
active fight against drug consumption. - PATTERN destruction of ltnpgt
- TARGET a native plant
28Breakdown by Extraction Type
- Category Baseline SubjFilter
- Rec Prec Rec Prec
- Perp .47 .33 .45 .38
- Victim .51 .50 .50 .52
- Target .63 .42 .62 .47
- Weapon .45 .39 .43 .42
- Total .52 .42 .51 .46
29Subjective Patterns
The following extraction patterns were classified
as subjective
- attacks on ltnpgt to attack ltdobjgt
- communique by ltnpgt to destroy ltdobjgt
- ltsubjgt was linked leaders of ltnpgt
- ltsubjgt unleashed was aimed at ltnpgt
- offensive against ltnpgt dialogue with ltnpgt
30Metaphor
- False hits can come from subjective sentences
that contain metaphorical language. - The Parliament exploded into fury against the
government when word leaked out