Title: Opinion Analysis
1Opinion Analysis
- Sudeshna Sarkar
- IIT Kharagpur
2Introduction facts and opinions
- Two main types of information on the Web.
- Facts and Opinions
- Current search engines search for facts (assume
they are true) - Facts can be expressed with topic keywords.
- Search engines do not search for opinions
- Opinions are hard to express with a few keywords
- How do people think of Motorola Cell phones?
- Current search ranking strategy is not
appropriate for opinion retrieval/search.
3Overview
- Motivation
- Definitions
- Coarse grained vs Fine grained opinion analysis
- Opinion Lexicons
- Approaches to document level opinion analysis
- Lexicon based
- Supervised learning approaches
- Mixed approaches
- Approaches to fine-grained opinion analysis
- Rule based
- Learning
- Opinion mining work at IIT Kharagpur
4Opinion Mining
- Search for and aggregate opinions from online
sources - Many reviews have both positive and negative
sentences - Many products are liked by some and disliked by
others there must be different reasons - Identify different features/ aspects of the
target and the opinion on these separately
5Why do opinion analysis?
- Opinion search
- to extract examples of particular types of
positive or negative statements on some topic. - Opinion question answering
- What is the reaction to the Left Fronts stand on
the nuclear deal? - Is support diminishing for the UPA government?
- Product review mining
- What features of Mr Coffee programmable coffee
maker do users like and what they dislike
(Microsoft Live) - Review classification
- Tracking sentiment toward topics over time
- to track the ups and downs of aggregate attitudes
to a brand or product
6Introduction Applications
- Businesses and organizations product and service
benchmarking. Market intelligence. - Business spends a huge amount of money to find
consumer sentiments and opinions. - Consultants, surveys and focused groups, etc
- Individuals interested in others opinions when
- Purchasing a product or using a service,
- Finding opinions on political topics,
- Many other decision making tasks.
- Ads placements Placing ads in user-generated
content - Place an ad when one praises an product.
- Place an ad from a competitor if one criticizes
an product. - Opinion retrieval/search providing general
search for opinions.
7Question Answering
- Opinion question answering
Q What is the international reaction to the
reelection of Robert Mugabe as President of
Zimbabwe?
A African observers generally approved of his
victory while Western Governments denounced it.
8Opinion search (Liu, Web Data Mining book, 2007)
- Can you search for opinions as conveniently as
general Web search? - Whenever you need to make a decision, you may
want some opinions from others, - Wouldnt it be nice? you can find them on a
search system instantly, by issuing queries such
as - Opinions Motorola cell phones
- Comparisons Motorola vs. Nokia
- Cannot be done yet!
9Typical opinion search queries
- Find the opinion of a person or organization
(opinion holder) on a particular object or a
feature of an object. - E.g., what is Bill Clintons opinion on abortion?
- Find positive and/or negative opinions on a
particular object (or some features of the
object), e.g., - customer opinions on a digital camera,
- public opinions on a political topic.
- Find how opinions on an object change with time.
- How object A compares with Object B?
- Gmail vs. Yahoo mail
10Find the opinion of a person on X
- In some cases, the general search engine can
handle it, i.e., using suitable keywords. - Bill Clintons opinion on abortion
- Reason
- One person or organization usually has only one
opinion on a particular topic. - The opinion is likely contained in a single
document. - Thus, a good keyword query may be sufficient.
11Find opinions on an object X
- We use product reviews as an example
- Searching for opinions in product reviews is
different from general Web search. - E.g., search for opinions on Motorola RAZR V3
- General Web search for a fact rank pages
according to some authority and relevance scores.
- The user views the first page (if the search is
perfect). - One fact Multiple facts
- Opinion search rank is desirable, however
- reading only the review ranked at the top is
dangerous because it is only the opinion of one
person. - One opinion ? Multiple opinions
12Search opinions (contd)
- Ranking
- produce two rankings
- Positive opinions and negative opinions
- Some kind of summary of both, e.g., of each
- Or, one ranking but
- The top (say 30) reviews should reflect the
natural distribution of all reviews (assume that
there is no spam), i.e., with the right balance
of positive and negative reviews. - Questions
- Should the user reads all the top reviews? OR
- Should the system prepare a summary of the
reviews?
13User generated content
- Word of mouth on the web.
- Review sites
- Blogs
- Online forums
- Shopping comparison sites
- User reviews
- Mine opinions expressed in the user-generated
content - Challenging task
- Useful to individual consumers and companies.
14Motivation for Consumer
- I want to buy a camera.
- Which model should I pick?
- Ask my friends
- Use the internet
- CEA-CNET Study Tech-Savvy Consumers Use Internet
to Research Products Before Buying Them - Wireless News, November, 2007
- Seventy Percent of Consumers Use Internet to
Research Consumer Packaged Goods, According to
Prospectiv Survey - Market Wire, January, 2008
15Businesses
- Identify opinions about products help to
position/ adapt products - Much of product feedback is web-based
- provided by customers/critiques online through
websites, discussion boards, mailing lists, and
blogs, CRM Portals. - Market research is becoming unwieldy
- Sources are heterogeneous and multilingual in
nature
16Facts vs Opinions
- An opinion is a person's ideas and thoughts
towards something. It is an assessment, judgment
or evaluation of something. An opinion is not a
fact, because opinions are either not
falsifiable, or the opinion has not been proven
or verified. ...en.wikipedia.org/wiki/Opinion - Subjectivity The linguistic expression of
somebodys emotions, sentiments, evaluations,
opinions, beliefs, speculations, etc. - Polarity positive and negative
- This camera is awesome.
- The movie is too long and boring.
- Strength of opinion
17Levels of opinion analysis
- Coarse to fine grained opinion analysis
- Document level At the document (or review) level
- Subjective vs Objective
- Sentiment classification positive, negative or
neutral - Sentence level, Expression level
- Task 1 identifying subjective/opinionated
sentences (or clauses/ phrases) - Classes objective and subjective (opinionated)
- Task 2 sentiment classification of sentences
- Classes positive, negative and neutral.
- But a document/ sentence may contain multiple
opinions on more than one topic from one or more
opinion holder
18Lexicon Development
- Manual
- Semi-automatic
- Fully automatic
- Find relevant words, phrases, patterns that can
be used to express subjectivity - Determine the polarity of subjective expressions
19Opinion Words
- An opinion lexicon containing lists of positive
and negative phrases is very useful for the
opinion mining task at different levels - Positive beautiful, wonderful, good, amazing,
- Negative bad, poor, terrible, cost someone an
arm and a leg - How to compile such a list?
- Dictionary-based approaches
- Corpus-based approaches
- Supervised
- Semi-supervised
- BUT
- Some opinion words are context independent (e.g.,
good). - Some are context dependent (e.g., long).
20Hand created lists
- Create lists of opinion words appropriate for the
domain manually - Sentiment term
- Polarity
- Strength
- These approaches, while being interesting, are
labor intensive and can be vulnerable to error
and high maintenance costs
21Dictionary-based approaches
- Start from a set of seed opinion words
- Use WordNets synsets and hierarchies to acquire
opinion words - Use the seeds to search for synonyms and antonyms
in WordNet (eg, Hu and Liu, 2004).
21
22Dictionary-based approaches
- Use additional information (e.g., glosses) and
learning from WordNet (Andreevskaia and Bergler,
2006) (Esuti and Sebastiani, 2005).
22
23Dictionary-based approaches
- Advantage Good to find a lot of such words
- Weakness Do not find context dependent opinion
words, e.g., small, long, fast.
23
24Corpus-based approaches
- Rely on syntactic rules and co-occurrence
patterns to extract from large corpora - Use a list of seed words
- A large domain corpus
- Machine learning
- Advantages This approach can find domain
(corpus) dependent opinions.
24
25How to identify subjective terms?
- Assume that contexts are coherent
- Statistical Association If words of the same
orientation like to co-occur together, then the
presence of one makes the other more probable - Use statistical measures of association to
capture this interdependence - Assume that contexts are coherent
- Assume that alternatives are similarly subjective
26Corpus-based approaches (contd)
- Conjunctions Conjoined adjectives usually have
the same orientation (Hazivassiloglou and McKeown
1997). - E.g., This car is beautiful and
spacious.(conjunction) - Start with seed words
- Use conjunctions to find adjectives with similar
orientations - Use log-linear regression to aggregate
information from various conjunctions - Use hierarchical clustering on a
graphrepresentation of adjective similarities to
find two groups of same orientation
26
27(No Transcript)
28Growing contextual opinion words
- Ding, Liu, Wu
- Intra-sentence conjunction rule Opinion on both
sides of and / two consecutive sentences tend
to be the same - E.g., This camera takes great pictures and has a
long battery life. - But with a but-like clause, the opinions
tend to be of opposite polarity. - Context is important
- Long battery life vs Long time to focus
- Growing
- by applying various conjunctive rules
- Verifying the results as the system sees more
reviews by those conjunctive rules - Only keep those opinions which the system is
confident about, controlled by a confidence
limit.
28
29Semantic Orientation by Association
- Labeled semantic orientation of words
- Pwords good, nice, excellent, positive,
fortunate, correct, superior - Nwords bad, nasty, poor, negative,
unfortunate, wrong, inferior. - Various approach to calculate the semantic
association of two words - Pointwise Mutual Information (PMI) Church and
Hanks 1989 - Latent Semantic Indexing (LSI) Dumais et al.
1990 - Likelihood Ratios Dunning 1993
30Turney 2002 Turney Littman 2003
- Determine the semantic orientation of each
extracted phrase based on their association with
seven positive and seven negative seed words
31Weakly spervised learning
- Gammon Aue 2005
- Given a list of seed words (seed words 1)
- Get more seed words (seed words 2) words with
low PMI at sentence level - Get semantic orientation of (seed words 2) by PMI
at document level - Get Semantic orientation of all words by PMI with
all seed words
32Document level opinion analysis
- Polarity classification Classify documents
(e.g., reviews) based on the overall sentiments
expressed by authors, - Approaches
- Use opinion lexicon
- Knowledge Engineering
- Supervised learning techniques
- Classifying using the Web as a corpus
- Semi-supervised
33Knowledge Engineering
- Make use of lists of sentiment terms
- Manually create analysis components based on
cognitive linguistic theory parser, feature
structure representation, etc
34Supervised polarity classifier
- Requirements A labeled database of opinion
- Download ratings from Amazon.com, epinions.com
etc. - Build a binary opinion classifier
- From positive and negative ratings
- Merge 1 and 2 stars to negative and 3, 4 and 5 to
positive - Use thresholded SVM, maximum entropy, naïve
Bayes, etc.
35Supervised Training
- Obtain Labeled Sentences positive, neutral,
negative - Extract features words, n-grams, multi word
expressions, feature generalization Kim Hovy
2007 - Feature values binary/ frequency
- Run Training algorithm on the features to give a
classifier - Optional Do feature selection (use
log-likelihood ratio)
36Semi-supervised approaches
- Fully supervised techniques require
- large amount of labeled data for the given domain
- Semi-supervised systems
- Use small amount of domain knowledge
- From a small set of seed words use domain corpus
to get domain relevant opinion words as discussed
earlier
37Semi-supervised approach
- Gamon Aue 2005
- Obtain opinion words by semi-supervised approach
- Given a domain corpus, label data using average
semantic orientation - Train classifier on labeled data