Title: Extraction and Summarization of Opinions
1Extraction and Summarization of Opinions
2Subjectivity opinions, emotions, motivations,
speculations, sentiments
- Information Extraction of
- NL expressions
- Components
- Properties
Angolans are terrified of the Marburg virus
3Fine-grained Opinions
Australian press has launched a bitter attack on
Italy after seeing their beloved Socceroos
eliminated on a controversial late penalty.
Italian coach Lippi has also been blasted for his
comments after the game. In the opposite camp
Lippi is preparing his side for the upcoming game
with Ukraine. He hailed 10-man Italy's
determination to beat Australia and said the
penalty was rightly given.
Stoyanov Cardie, 2006
4Fine-grained Opinion Extraction
The Australian Press launched a bitter attack on
Italy
5Opinion Summary
6Summarization of Opinions Events
7Why Opinions?
- Provide technology that can aid analysts in their
- extracting socio-behavioral information from text
- monitoring public health awareness, knowledge and
speculations about disease outbreaks, - Enrich Information Extraction, Question
Answering, and Visualization tools
8Distinguish Objective from Subjective Language
- Speculations, hyperbole, emotions
- The Chilean president admitted that the US had
been attacked because their initial measure was
hasty. It amounted to using a tank to kill a
flea. - Reflects argument credibility
9Opinion Frame Source Polarity negative
Attitude Intensity high Target
E.g., are people extremely afraid or angry?
10- The industry is scared and so, even if they do
find an ornamental carp with KHV, they will keep
it secret - Recognize motivations
- Predict actions
11- Brugere-Picoux backs the decision to ban British
Beef - Search for opinions about particular named
targets
12- Brugere-Picoux backs the decision to ban British
Beef - Search for opinions held by particular named
sources
13Motivation for the Summaries
- Quickly determine the opinions of a person,
organization, community, region, etc. - Quickly determine the opinions toward a person,
organization, issue, event, - Across an entire document
- Across multiple documents
- Over time
- Reveal relationships and identify cliques and
communities of interest - Complement work in social network analysis
14Outline
- Motivations for opinion extraction
- Extracting opinion frames and components
- Lexicon of subjective expressions
- Contextual disambiguation
- Enriched tasks
- Opinion summarization
15Lexicon
- Explore different uses of words, to zero in on
the subjective ones - Example benefit
16Lexicon
- Example benefit
- Very often objective, as a Verb
- Children with ADHD benefited from a 15-course of
fish oil
17Lexicon
- Noun uses look more promising
- The innovative economic program has shown
benefits to humanity
18Lexicon
- However, there are objective noun uses too
- tax benefits.
- employee benefits.
- tax benefits to provide a stable economy.
- health benefits to cut costs.
19Lexicon
- Pattern benefits as the head of a noun phrase
containing a prepositional phrase - Matches this
- The innovative economic program has shown proven
benefits to humanity - But none of these
- tax benefits.
- employee benefits.
- tax benefits to provide a stable economy.
- health benefits to cut costs.
20LexiconLonger Constructionsbe soft on crime
- ltitem index"1"gt
- ltitemMorphoSyntaxgt
- ltlemmagtbelt/lemmagtlt/itemMorphoSyntaxgt
- ltitemRelation xsitype"ngramPattern"gt
- ltdistancegt2lt/distancegt
- ltlandmarkgt2lt/landmarkgtlt/itemRelationgtlt/itemgt
- ltitem index"2"gt
- ltitemMorphoSyntaxgt
- ltwordgtsoftlt/wordgt
- ltmajorClassgtJlt/majorClassgtlt/itemMorphoSyntaxgt
- ltitemRelation xsitype"ngramPattern"gt
- ltdistancegt1lt/distancegt
- ltlandmarkgt3lt/landmarkgtlt/itemRelationgtlt/itemgt
- ltitem index"3"gt
- ltitemMorphoSyntaxgt
- ltwordgtonlt/wordgtlt/itemMorphoSyntaxgt
- ltitemRelation xsitype"ngramPattern"gt
- ltdistancegt1lt/distancegt
- ltlandmarkgt4lt/landmarkgtlt/itemRelationgtlt/itemgt
21- The entry contains a pattern for finding
instances of the construction - Matches variations
- When I look into his past I see a man who is very
soft on crime. - The data could also weaken her authority to
criticize Patrick for being soft on crime.
22Attributive information
- ltentryAttributes origin"j"gt
- ltnamegtbe soft on crimelt/namegt
- ltsubjectivegttruelt/subjectivegt
- ltreliabilitygthlt/reliabilitygt
- ltconfidencegthlt/confidencegt
- ltsubTypegtsenlt/subTypegt
- ltexamplegtThe Obama campaign rejected the
notion that the senator might be vulnerable to
accusations that he is soft on crime.lt/examplegt - ltmorphosyngtvplt/morphosyngt
- lttargetgtslt/sp_targetgt
- ltpolaritygtnlt/polaritygt
- ltintensitygtmlt/intensitygt
- ltconfidencegthlt/confidencegt
- ltregexgt1morphlemma"be"
orderdistance"2" landmark"2"
2morphword"soft" majorClass"J"
orderdistance"1" landmark"3"
3morphword"on" orderdistance"1"
landmark"4" 4morphword"crime"
majorClass"N"lt/regexgt - ltpatterntypegtngramPatternlt/patterntypegt
23Attributive information
- ltentryAttributes origin"j"gt
- ltnamegtbe soft on crimelt/namegt
- ltsubjectivegttruelt/subjectivegt
- ltreliabilitygthlt/reliabilitygt
- ltconfidencegthlt/confidencegt
- ltsubTypegtsenlt/subTypegt
- ltexamplegtThe Obama campaign rejected the
notion that the senator might be vulnerable to
accusations that he is soft on crime.lt/examplegt - ltmorphosyngtvplt/morphosyngt
- lttargetgtslt/sp_targetgt
- ltpolaritygtnlt/polaritygt
- ltintensitygtmlt/intensitygt
- ltconfidencegthlt/confidencegt
- ltregexgt1morphlemma"be"
orderdistance"2" landmark"2"
2morphword"soft" majorClass"J"
orderdistance"1" landmark"3"
3morphword"on" orderdistance"1"
landmark"4" 4morphword"crime"
majorClass"N"lt/regexgt - ltpatterntypegtngramPatternlt/patterntypegt
24Attributive information
- ltentryAttributes origin"j"gt
- ltnamegtbe soft on crimelt/namegt
- ltsubjectivegttruelt/subjectivegt
- ltreliabilitygthlt/reliabilitygt
- ltconfidencegthlt/confidencegt
- ltsubTypegtsenlt/subTypegt
- ltexamplegtThe Obama campaign rejected the
notion that the senator might be vulnerable to
accusations that he is soft on crime.lt/examplegt - ltmorphosyngtvplt/morphosyngt
- lttargetgtslt/sp_targetgt
- ltpolaritygtnlt/polaritygt
- ltintensitygtmlt/intensitygt
- ltconfidencegthlt/confidencegt
- ltregexgt1morphlemma"be"
orderdistance"2" landmark"2"
2morphword"soft" majorClass"J"
orderdistance"1" landmark"3"
3morphword"on" orderdistance"1"
landmark"4" 4morphword"crime"
majorClass"N"lt/regexgt - ltpatterntypegtngramPatternlt/patterntypegt
25Attributive information
- ltentryAttributes origin"j"gt
- ltnamegtbe soft on crimelt/namegt
- ltsubjectivegttruelt/subjectivegt
- ltreliabilitygthlt/reliabilitygt
- ltconfidencegthlt/confidencegt
- ltsubTypegtsenlt/subTypegt
- ltexamplegtThe Obama campaign rejected the
notion that the senator might be vulnerable to
accusations that he is soft on crime.lt/examplegt - ltmorphosyngtvplt/morphosyngt
- lttargetgtslt/sp_targetgt
- ltpolaritygtnlt/polaritygt
- ltintensitygtmlt/intensitygt
- ltconfidencegthlt/confidencegt
- ltregexgt1morphlemma"be"
orderdistance"2" landmark"2"
2morphword"soft" majorClass"J"
orderdistance"1" landmark"3"
3morphword"on" orderdistance"1"
landmark"4" 4morphword"crime"
majorClass"N"lt/regexgt - ltpatterntypegtngramPatternlt/patterntypegt
26Attributive information
- ltentryAttributes origin"j"gt
- ltnamegtbe soft on crimelt/namegt
- ltsubjectivegttruelt/subjectivegt
- ltreliabilitygthlt/reliabilitygt
- ltconfidencegthlt/confidencegt
- ltsubTypegtsenlt/subTypegt
- ltexamplegtThe Obama campaign rejected the
notion that the senator might be vulnerable to
accusations that he is soft on crime.lt/examplegt - ltmorphosyngtvplt/morphosyngt
- lttargetgtslt/sp_targetgt
- ltpolaritygtnlt/polaritygt
- ltintensitygtmlt/intensitygt
- ltconfidencegthlt/confidencegt
- ltregexgt1morphlemma"be"
orderdistance"2" landmark"2"
2morphword"soft" majorClass"J"
orderdistance"1" landmark"3"
3morphword"on" orderdistance"1"
landmark"4" 4morphword"crime"
majorClass"N"lt/regexgt - ltpatterntypegtngramPatternlt/patterntypegt
27Attributive information
- ltentryAttributes origin"j"gt
- ltnamegtbe soft on crimelt/namegt
- ltsubjectivegttruelt/subjectivegt
- ltreliabilitygthlt/reliabilitygt
- ltconfidencegthlt/confidencegt
- ltsubTypegtsenlt/subTypegt
- ltexamplegtThe Obama campaign rejected the
notion that the senator might be vulnerable to
accusations that he is soft on crime.lt/examplegt - ltmorphosyngtvplt/morphosyngt
- lttargetgtslt/targetgt
- ltpolaritygtnlt/polaritygt
- ltintensitygtmlt/intensitygt
- ltconfidencegthlt/confidencegt
- ltregexgt1morphlemma"be"
orderdistance"2" landmark"2"
2morphword"soft" majorClass"J"
orderdistance"1" landmark"3"
3morphword"on" orderdistance"1"
landmark"4" 4morphword"crime"
majorClass"N"lt/regexgt - ltpatterntypegtngramPatternlt/patterntypegt
28Attributive information
- ltentryAttributes origin"j"gt
- ltnamegtbe soft on crimelt/namegt
- ltsubjectivegttruelt/subjectivegt
- ltreliabilitygthlt/reliabilitygt
- ltconfidencegthlt/confidencegt
- ltsubTypegtsenlt/subTypegt
- ltexamplegtThe Obama campaign rejected the
notion that the senator might be vulnerable to
accusations that he is soft on crime.lt/examplegt - ltmorphosyngtvplt/morphosyngt
- lttargetgtslt/sp_targetgt
- ltpolaritygtnlt/polaritygt
- ltintensitygtmlt/intensitygt
- ltconfidencegthlt/confidencegt
- ltregexgt1morphlemma"be"
orderdistance"2" landmark"2"
2morphword"soft" majorClass"J"
orderdistance"1" landmark"3"
3morphword"on" orderdistance"1"
landmark"4" 4morphword"crime"
majorClass"N"lt/regexgt - ltpatterntypegtngramPatternlt/patterntypegt
29Attributive information
- ltentryAttributes origin"j"gt
- ltnamegtbe soft on crimelt/namegt
- ltsubjectivegttruelt/subjectivegt
- ltreliabilitygthlt/reliabilitygt
- ltconfidencegthlt/confidencegt
- ltsubTypegtsenlt/subTypegt
- ltexamplegtThe Obama campaign rejected the
notion that the senator might be vulnerable to
accusations that he is soft on crime.lt/examplegt - ltmorphosyngtvplt/morphosyngt
- lttargetgtslt/sp_targetgt
- ltpolaritygtnlt/polaritygt
- ltintensitygtmlt/intensitygt
- ltconfidencegthlt/confidencegt
- ltregexgt1morphlemma"be"
orderdistance"2" landmark"2"
2morphword"soft" majorClass"J"
orderdistance"1" landmark"3"
3morphword"on" orderdistance"1"
landmark"4" 4morphword"crime"
majorClass"N"lt/regexgt - ltpatterntypegtngramPatternlt/patterntypegt
30Lexicon Summary
- Uniform representation for different types of
subjectivity clues - Word stem benefit
- Word benefits
- Word/POS benefits/nouns
- Fixed n-grams benefits to
- Syntactic patterns
- Combinations of the above
- Learn subjective uses from corpora (bodies of
texts) - Capture longer subjective constructions
- Add relevant knowledge about expressions
- Riloff, Wiebe, Wilson 2003 Riloff Wiebe 2003
Wiebe Riloff 2005 Riloff, Patwardhan, Wiebe
2006 Ruppenhofer, Akkaya, Wiebe in preparation
31Outline
- Motivations for opinion extraction
- Extracting opinion frames and components
- Lexicon of subjective expressions
- Contextual disambiguation
- Enriched tasks
- Opinion summarization
32Polarity
- Contextual polarity
- There is no reason at all to believe that hes
the right choice - Interacts with opinion topics
- Example argument for one type of design is
simultaneously an argument against an alternative
design
33Polarity
- Recognizing contextual polarity using rich
feature sets and machine learning - Modeling and recognizing discourse relations
among opinions and their targets in a text - Wilson, Wiebe, Hoffmann EMNLP05
- Wilson, Wiebe, Hoffmann, submitted
- Somasundaran, Wiebe, Ruppenhofer, submitted
34Opinion Frame Extraction via CRFs and ILP
- Joint extraction of entities and relations
CRFs Lafferty et al., 2001
Roth Yih, 2004
Choi et al., EMNLP 2006
35Opinion-Frame Extraction
- Joint extraction of entities and relations for
opinion recognition (previous slide) - Choi, Break, Cardie EMNLP 2006
- Linking sources referring to the same entity
- Stoyanov and Cardie ACL 2006 Workshop on
Sentiment and Subjectivity in Text - Identifying expressions of opinions in context
- Breck, Choi, Cardie IJCAI 2007
36Outline
- Motivations for opinion extraction
- Extracting opinion frames and components
- Lexicon of subjective expressions
- Contextual disambiguation
- Enriched tasks
- Opinion summarization
37Targets and Attitude TypesWilson PhD
Dissertation 2008
- I think people are happy because Chavez has
fallen.
direct subjective span are happy source
ltwriter, I, Peoplegt attitude
direct subjective span think source
ltwriter, Igt attitude
inferred attitude span are happy because
Chavez has fallen type neg sentiment
intensity medium target
attitude span are happy type pos sentiment
intensity medium target
attitude span think type positive arguing
intensity medium target
target span people are happy because
Chavez has fallen
target span Chavez has fallen
target span Chavez
38Current Work Topics
- Topic annotations added to the MPQA corpus
- Annotations indicate the closest phrase to the
opinion expression that adequately describes the
topic of the opinion - Include topic coreference chains to link all
phrases that describe the same topic concept - IAG results
Stoyanov and Cardie LREC 2008
39Current Work Topics
- Topic coreference resolution
- Treat as an NP coreference resolution task
- Modify our existing NP coref approach
- Initial results look promising
- Using topic spans from gold standard
- B3 .709
- MUC .917
- Topic span opinion sentence
- B3 .573
- MUC .914
- Topic span identified automatically
- B3 .574
- MUC .924
- Best baseline system
- B3 .554
- MUC .793
40Subjectivity TypesWilson PhD Dissertation 2008
41Subjectivity Types
- Arguing and sentiment in the news and
conversations - Manually annotating
- Automatically detecting
- Exploiting results of automatic detection to
improve question answering - Somasundaran, Wiebe, Hoffmann, Litman, ACL
workshop 2006 - Somasundaran, Wilson, Wiebe, Stoyanov ICWSM 2007
- Somasundaran, Ruppenhofer, Wiebe SIGdial 2007
- Ruppenhofer, Somasundaran, Wiebe LREC 2008
-
42Text Extraction and Data Visualization for Animal
Health Surveillance
- Collaborative project between CERATOPS, PURVAC,
and the Veterinary Information Network (VIN),
with funding from LLNL. - Goal Study of subjectivity in health
surveillance texts
43Method
- Manual Annotation Study
- Identify relevant types of topic, source, and
subjectivity - Annotate 16 texts from the ProMED (Program for
Monitoring Emerging Diseases) mailing list
44Hypothesis
- A fine-grained study of subjectivity will show
that - health-surveillance texts contain significant
amounts of subjectivity - recognizing this subjectivity can enhance
information extraction and question answering
applications
45Example
Sentence-level annotation
- Whilst the present tragedy in the UK is
extremely distressing to farmers , so far the
number of animals culled is only a miniscule
portion of the national herd.
46Example
Sentence-level annotation
- Whilst the present tragedy in the UK is
extremely distressing to farmers , so far the
number of animals culled is only a miniscule
portion of the national herd.
47Example
Sentence-level annotation
- Whilst the present tragedy in the UK is
extremely distressing to farmers , so far the
number of animals culled is only a miniscule
portion of the national herd.
48Example
Sentence-level annotation
- Whilst the present tragedy in the UK is
extremely distressing to farmers , so far the
number of animals culled is only a miniscule
portion of the national herd.
49Source types
- the writer
- medical experts
- media
- (non-media) organizations, including governments
and agencies - individuals affected by an outbreak
- members of the general public
- other explicitly mentioned entities
- implicit entities
50Source type example
- It has become clear that the UK has been
importing significant animal products from areas
where FMD is known to be endemic.
51Topic types
- Occurrence of a disease outbreak
- Danger/severity of an outbreak
- Cause of a disease
- Symptoms
- Treatment
- Prevention
- Diagnosis
- Attitudes of others
- Development/progression of outbreak
- Other
52Topic type example
-
- The crisis has been caused by the koi herpes
virus, commonly referred to as KHV, a disease
harmless to other animals, but invariably fatal
to carp.
53Topic type example
-
- The crisis has been caused by the koi herpes
virus, commonly referred to as KHV, a disease
harmless to other animals, but invariably fatal
to carp.
54Topic type example
-
- The crisis has been caused by the koi herpes
virus, commonly referred to as KHV, a disease
harmless to other animals, but invariably fatal
to carp.
55Subjectivity types (1)
- Sentiment
- Belief, distinguishing two sub-types
- Beliefs about what is the case
- Belief about what should or should not be done
- Knowledge Awareness of facts
- Uncertainty Speculation
56Subjectivity types (2)
- Agreement Disagreement between various sources
in the text - Confirmation Denial of contested statements
- Intention Purpose
- Policies Actions reflecting the above
attitudes, for example, restrictions on the use,
manufacture, distribution of substances
57Subjectivity type example
- Professor Jeanne Brugere-Picoux ... said
although France has officially registered 75
cases of BSE in the past 10 years, she believed
the real figure to be far higher than that.
58Subjectivity type example
- Nor did the FSA consider that there would be any
need to label meat products derived from animals
that have been vaccinated with the FMD vaccine.
59Frequency of subjective types
60Querying the annotations 1
I am afraid people dont know enough about this
disease.
61- Perform a query looking for sentences with
- type KnowledgeAwareness
- topic symptoms
- source member-of-public
- polarity negative
62- Because the infection is much more likely in the
summer, officials worry that this years tally
may increase. One problem, Andrews said, is that
many people dont know that they have liver
disease. As a result, she encourages people not
to eat raw oysters from the Gulf Coast in the
summer unless the oysters have been treated.
63Querying the annotations 2
What can we expect? How will this outbreak unfold?
64- The analyst could query the annotations to
retrieve sentences in which - type UncertaintySpeculation
- topic DangerSeverity
- source expert or organization
65- The number of infections this year is the
highest since 5 people died within 3 weeks in
1996, Dassey said. Because the infection is much
more likely in the summer, officials worry that
this years tally may increase.
66Outline
- Motivations for opinion extraction
- Extracting opinion frames and components
- Opinion summarization
67(No Transcript)
68Querying the Opinion Frames Summaries
69Timeline Format
70Expand to Reveal Opinion Holders
71DHS Expresses (neutral) Opinion
72Sortable List Format
73Drill Down to Original Article
74Juxtapose Opinions w/Other Info
75Summarization of Opinions Events