Automatic Hierarchy Discovery and Opinion Mining of Political Blogs - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Hierarchy Discovery and Opinion Mining of Political Blogs

Description:

Minimum cut. Efficient and results in higher accuracy rates. Agarwal and Bhattacharyya: ... Applied cut-based graph similar to Pang et al. Reached accuracies ... – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 49
Provided by: Kri9
Category:

less

Transcript and Presenter's Notes

Title: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs


1
Automatic Hierarchy Discovery and
Opinion Mining of Political Blogs
  • Amit Goyal
  • Kristi McBurnie
  • November 28, 2007

2
Outline
  • Introduction
  • Previous Work
  • Our Approach
  • Example
  • Challenges and Future Work
  • Milestones
  • Conclusion

3
Introduction
  • The Web contains a wealth of opinions about
    products, politics, newsgroup posts, review
    sites, and elsewhere
  • Our interest to mine opinions expressed in user
    generated content

4
Applications
  • Businesses and Organizations
  • Market Intelligence A huge amount of money is
    spent to find consumer sentiments and opinions
  • Opinion Polls, surveys
  • Individuals interested in other opinions when
  • Purchasing a product
  • Finding opinion on political topics
  • Using a service etc.
  • Smart Ads
  • Place an ad when one praises a product
  • Place an ad from a competitor if one criticizes a
    product
  • Opinion Search
  • Provide search for opinions
  • Give me opinions on gmail
  • Give me comparisons between gmail vs yahoomail

5
Types of opinions
  • Direct Opinions sentiment expressions on
    objects. E.g. policies, politicians, movies,
    products
  • E.g. I find myself in support of the Senate
    Judiciary Committee, which approved legislation
    that clears the way for millions of undocumented
    workers to continue working in America and seek
    citizenship.
  • Comparisons relations expressing similarities or
    differences of more than one object.
  • E.g. I think Bush will beat Kerry in the
    presidential elections or The lens quality of
    Camera A is better than Camera B

6
Problem Statement
  • Given a object and a collection of reviews on it,
    the task is
  • Identification of features
  • Making hierarchy of features
  • Sentiment Analysis Determining the orientation
    and strength
  • Provide a visualization (summary)

7
Previous Work
  • Mainly focused on product and movie reviews
  • Feature Extraction
  • Opinion Observer (Hu and Liu, 2004)
  • Opine (Popescu and Etzioni, 2005)
  • Red Opal (Scaffidi, 2007)
  • Hierarchical Discovery
  • To be filled by kristi

8
Previous Work
  • Opinion Observer
  • By Bing Liu and Minqing Hu
  • Feature Extraction
  • Identify Nouns using POS tagging
  • Identify Noun phrases by Association Rule Mining
  • Compactness pruning, redundancy pruning
  • Opinion word extraction
  • Infrequent feature identification
  • 72 precision and 80 recall

9
Previous Work
  • OPINE
  • Feature Extraction
  • First, extract nouns and noun phrases, retains
    those with frequency greater than some threshold
  • Evaluates each noun phrase by computing the PMI
    (point-wise mutual information) scores between
    the phrase and meronymy discriminators associated
    with the product class
  • E.g. of scanner, scanner has, scanner come
    with etc. for the Scanner class
  • PMI(f,d) Hits(df) / Hits(d) Hits(f)
  • Then, PMI score are converted to binary features
    for a Naïve Bayes Classifier, which outputs a
    probability associated with each fact
  • Compared to Hu and Liu work, 22 better precision
    and 3 lower recall

10
Previous Work
  • Red Opal
  • 3 components
  • Feature Extractor
  • Product Scorer
  • User Interface
  • Performs better than Opinion Observer

11
Previous Work
  • Red Opal
  • Feature Extraction
  • POS tagging, takes noun and noun phrases as
    potential features
  • Use lemma frequency to rank the features
  • Product Scoring Score of feature f of product p
  • o(r,f) is the number of occurrences of feature f
    in review r
  • w(r,f) is the weight of feature f in review r

12
Previous Work
  • Clustering
  • Conceptual clustering
  • CLUSTER/2
  • Places object descriptions and attributes
    together to obtain domain-dependent goals
  • COBWEB
  • Favours classes that maximize the information
    that can be predicted from knowledge of class
    membership
  • Hierarchical clustering
  • BIRCH
  • Hierarchically cluster elements in a dataset
  • Level of clustering quality level in the
    hierarchy

13
Previous Work
  • Hierarchy Discovery
  • Han and Fu define formally as A sequence of
    mapping from a set of lower-level concepts to
    their higher-level correspondences
  • DBLearn automatically discovered a hierarchy of
    concepts for the purpose of data mining
  • Ie birthplace may have the following hierarchy
    city, province, country
  • Foreman et al.
  • Trains categorizers and automatically constructs
    hierarchy of categories using human trainers
  • Good GUI
  • Difficult for novice users and hard to optimize

14
Previous Work
15
Previous Work
  • Hierarchy Discovery
  • Sanderson and Croft
  • Automatically develop hierarchy in web documents
  • Organize extracted words/phrases using
    subsumption
  • No clustering or training techniques
  • Yang and Lee
  • Hierarchies of web directories
  • Text mining to discover relationships between
    documents and between words
  • Cluster them into document and word maps

16
Previous Work
  • Sentiment Analysis
  • Esuli and Sebastiani
  • 3 stages
  • Determine subjective/objective polarity
  • Determine positive/negative polarity
  • Determine strength of the positive/negative
    polarity
  • Uses SentiWordNet to assign 3 scores to each word
    (objectivity, positivity, negativity)

17
Previous Work
  • Sentiment Analysis
  • Pang and Lee
  • Only subjective sections of the movie review
  • Machine learning techniques
  • Pair-wise relations between extracts to build an
    undirected graph
  • Minimum cut
  • Efficient and results in higher accuracy rates
  • Agarwal and Bhattacharyya
  • SVM classifier
  • Determine strength of polarity of subjective
    adjectives in good vs bad classification based on
    WordNets synonymy graph
  • Applied cut-based graph similar to Pang et al
  • Reached accuracies of 84-95.6

18
Our proposal
  • Apply feature extraction and opinion mining in
    political domain
  • Applications in political domain
  • Automatic opinion polls
  • Identification of local/global issues in
    elections
  • Target campaigning in elections
  • Impact of speech
  • Output ltpolitician, topic, opinion, polaritygt
  • Objects are politicians
  • Categories are political organizations
  • Topic may be policies, issues etc
  • In this project, we focus mainly on feature
    extraction and their hierarchy discovery

19
Our Approach
  • Observations
  • Two kinds of opinions
  • Direct talks about single object
  • Comparison talks about multiple objects
  • Two kinds of information
  • Facts (objective)
  • Opinions (subjective)
  • Sentiment Analysis can be done only on subjective
    information
  • Although, features occur both categories,
    subjective sentences are noisy

20
Comparison to product domain
Product Domain Political Domain
Category Product Category (e.g. Camera) Political Organizations (e.g. Democrats)
Object Product (e.g. Camera A) Leaders (e.g. Bush)
Features/Topics Properties (e.g. lens) Policies (e.g. Immigration)
21
Our Approach
22
Our Approach
  • Perform feature extraction
  • Split into objective and subjective phrases
  • Hierarchy discovery on features from objective
    sentences
  • Sentiment analysis on features from subjective
    sentences

23
Our Approach
  • Feature Extraction
  • Extract the features
  • Extract nouns from POS tagging
  • Extract noun phrases from Association Rule Mining
  • Pruning
  • Rank the features based on lemma frequency
  • Identify the subjectivity of all sentences
  • Mine the opinion words (adjectives)
  • Use key phrases dictionary (e.g. can you
    believe, I think, I recommend etc)
  • Visual differences factual data is often
    represented in quotes

24
Our Approach
  • Hierarchy Discovery
  • 3 approaches
  • Subsumption
  • Sanderson and Croft
  • Look at every pair of terms and apply subsumption
  • X subsumes Y if the documents in which Y occurs
    are a subset of the documents in which X occurs
  • P(XY) 1 and P(YX) lt 1
  • Clustering
  • Use DBpedia and/or YAGO

X
Y
25
Our Approach
  • Hierarchy Discovery
  • 3 approaches
  • Subsumption
  • Clustering
  • Yang and Lee
  • Cluster phrases by co-occurrance
  • Using unsiupervised learning algorithm ? SOM
    networks
  • Organizes phrases into a 2D map of neurons
  • According to similarity of vectors
  • 3 Steps
  • Training process
  • Assigning phrases to a neuron
  • Labelling process
  • Use DBpedia and/or YAGO

26
Our Approach
  • Hierarchy Discovery
  • 3 approaches
  • Subsumption
  • Clustering
  • Find a group of dominating clusters (neurons)
  • Make these as superclusters and put neighbours
    one level down
  • Repeat for lower level of hierarchy under each
    subcluster
  • Use DBpedia and/or YAGO

27
Our Approach
  • Hierarchy Discovery
  • 3 approaches
  • Subsumption
  • Clustering
  • Use DBpedia and/or YAGO
  • DBpedia provides 3 classification schemes
  • Wikipedia categories
  • YAGO classification
  • Word Net Sysnet Links

28
Our Approach
  • Hierarchy Discovery
  • 3 approaches
  • Subsumption
  • Clustering
  • Use DBpedia and/or YAGO

29
Our Approach
  • Hierarchy Discovery


Dbpedia and YAGO

Clustering and Subsumption
30
Our Approach
  • Sentiment Analysis
  • 2 ways to approach this
  • Subjective phrases
  • What does the public think about each policy
  • Objective phrases
  • What is the policy
  • Rank parties from each policy on a scale from
    right-wing to left-wing

31
Our Approach
  • Sentiment Analysis
  • Subjective phrases
  • What does the public think the policy
  • Pang and Lee
  • Cut-based classification (Pang and Lee)
  • Individual scores
  • Association scores
  • Partition Cost
  • A cut (S,T) of G is a partition of its nodes into
    sets S s U S and T t U T, where s not
    contained in S and t is not contained in T.
    Its cost cost(S,T) is the sum of the weights of
    all edges crossing from S to T
  • A minimum cut of G is one of minimum cost.

32
Our Approach
  • Sentiment Analysis
  • Subjective phrases
  • What does the public think about each policy
  • Agarwal and Bhattacharyya
  • Determine adjective strength
  • Cut-based classification between
  • sentences (Pang and Lee)
  • Cut-based classification between
  • documents
  • Improved accuracy

33
Our Approach
  • Sentiment Analysis
  • Objective phrases
  • What is the policy
  • Rank parties from each policy on a scale from
    right-wing to left-wing
  • Definition of polarity would be left/right using
    a comparison of left-wing and right-wing
    policies/ideals
  • Instead of traditional positive/negative using
    the ideal words poor and excellent

Right-wing (Conservative)
Left-wing (Liberal)
34
Example
  • Ideal case
  • The economic cost of the war in Iraq is estimated
    to total 1.3 trillion roughly double the
    amount the White House has requested thus far,
    according to a new report by Democrats on
    Congress Joint Economic Committee. I think this
    is an absurd amount of money to be spending on
    killing people and freeing oil fields.
  • Political Organization Republicans
  • Politician George Bush
  • Topic War in Iraq
  • Sub-topic cost
  • Opinion words absurd, killing, freeing
  • Polarity negative

35
Example
  • Feature Extraction
  • The economic cost of the war in Iraq is estimated
    to total 1.3 trillion roughly double the
    amount the White House has requested thus far,
    according to a new report by Democrats on
    Congress Joint Economic Committee. I think this
    is an absurd amount of money to be spending on
    killing people and freeing oil fields.
  • Noun phrases economic cost, war in Iraq, amount,
    report, amount, money, people, oil fields
  • Proper nouns White House, Democrats on Congress
    Joint Economic Committee
  • Frequent features economic cost, war in Iraq,
    money, oil fields, White House

36
Example
  • Identification of Subjectivity
  • The economic cost of the war in Iraq is estimated
    to total 1.3 trillion roughly double the
    amount the White House has requested thus far,
    according to a new report by Democrats on
    Congress Joint Economic Committee. I think this
    is an absurd amount of money to be spending on
    killing people and freeing oil fields.
  • Opinion words think, absurd
  • 1st sentence is objective, and 2nd is subjective
  • Interesting features economic cost, war in Iraq

37
Example
  • Hierarchy Discovery step 1
  • The economic cost of the war in Iraq is estimated
    to total 1.3 trillion roughly double the
    amount the White House has requested thus far,
    according to a new report by Democrats on
    Congress Joint Economic Committee. I think this
    is an absurd amount of money to be spending on
    killing people and freeing oil fields.
  • Identification of category/object for proper
    nouns using DBpedia
  • Category Republicans
  • Object George Bush

38
Example
39
Example
  • Hierarchy Discovery step 2
  • The economic cost of the war in Iraq is estimated
    to total 1.3 trillion roughly double the
    amount the White House has requested thus far,
    according to a new report by Democrats on
    Congress Joint Economic Committee. I think this
    is an absurd amount of money to be spending on
    killing people and freeing oil fields.
  • Identification of policy hierarchy using
    subsumption and clustering
  • Policies are derived from interesting features
  • economic cost, war in Iraq

40
Example
41
Example
  • Sentiment Analysis
  • The economic cost of the war in Iraq is estimated
    to total 1.3 trillion roughly double the
    amount the White House has requested thus far,
    according to a new report by Democrats on
    Congress Joint Economic Committee. I think this
    is an absurd amount of money to be spending on
    killing people and freeing oil fields.
  • Opinion is the subjective sentence
  • Polar words absurd, spending, killing, freeing
  • Polarity Negative

42
Challenges
  • Difficult to distinguish between objective and
    subjective information
  • Opinion words also occur in objective sentences
  • Identification of spam blogs
  • Identification of implicit features
  • Mapping politician to the policy in comparison
    blogs
  • Deciding on a distance measurement for clustering

43
Future Work
  • Implementation of algorithms
  • Summarization of opinions
  • Visualization
  • Refinements

44
Milestones
  • Decide on domain
  • Read previous works
  • Decide on an approach that is best for the domain
  • Write up an example to illustrate it
  • Challenges and future work
  • Presentation
  • Write the paper

?
?
?
?
?
?
?
?
45
Questions?
46
Previous Work
  • OPINE (Backup Slide)
  • Overall Process

47
Previous Work
  • Opinion Observer (Backup Slide)
  • By Bing Liu and
  • Minqing Hu

48
Types of opinions
  • Direct Opinions sentiment expressions on
    objects. E.g. policies, politicians, movies,
    products
  • E.g. I find myself in support of the Senate
    Judiciary Committee, which approved legislation
    that clears the way for millions of undocumented
    workers to continue working in America and seek
    citizenship.
  • Comparisons relations expressing similarities or
    differences of more than one object.
  • E.g. I think Bush will beat Kerry in the
    presidential elections or The lens quality of
    Camera A is better than Camera B
Write a Comment
User Comments (0)
About PowerShow.com