1. Introduction - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

1. Introduction

Description:

In this system users can make two kinds of comparisons between different brands ... a taxonomy has a name, weight attributes, and contained extra information like ... – PowerPoint PPT presentation

Number of Views:11
Avg rating:3.0/5.0
Slides: 2
Provided by: ltlabS
Category:

less

Transcript and Presenter's Notes

Title: 1. Introduction


1
An Opinion Mining System for Chinese Automobile
Reviews Tianfang Yao   Qingyang Nie   Jianchao
Li  Linlin Li Decheng Lou   Ke Chen   Yu
FuDepartment of Computer Science and
Engineering, Shanghai Jiao Tong University800
Dong Chuan Rd., Shanghai 200240, ChinaEmail
tf_yao_at_yahoo.com.cn tonywhitewhite_at_qq.com
flyinghigher_at_sjtu.edu.cn ridingwind_at_sjtu.edu.cn
lou-de_at_163.com equalchen_at_sjtu.edu.cn
ralf_kk_at_hotmail.com
  • 1. Introduction
  • Nowadays, when online business becomes a fashion,
    the quantity of the reviews towards the products
    given by customers is growing surprisingly as
    well, so that it is difficult for a customer to
    read over all of the reviews and make a
    reasonable decision when he/she is facing the
    problem whether to purchase a certain product or
    not. Our main task is to extract the opinions of
    reviews given by customers towards different
    features for different brands of cars, and
    determine whether these opinions are positive,
    negative or neutral and how strong they are. In
    this paper, a practical system named Surveyer
    that can accomplish opinion mining tasks by
    natural language processing techniques, and its
    related algorithms will be introduced.
  • 2. Interface of Opinion Observer

4. Resource Building Ontology Polarity
Dictionary
Ontology There are two taxonomies in our
ontology, which represent cars and features of
cars. Each category in a taxonomy has a name,
weight attributes, and contained extra
information like synonyms of the name. All
categories are arranged in a hierarchical
structure to describe relations between different
cars or car features.
Polarity Dictionary Two kinds of words, which
are polarity words and modifier words, are
involved in the polarity dictionary. The polarity
words have 6 attributes including text, POS, def,
exceptional-feature, dynamic-polarity, and
strength attribute. The text attribute stands for
the word itself. POS attribute depicts
part-of-speech of words. The def attribute means
the concept definition of a word from HowNet. The
exceptional-feature and dynamic-polarity
attributes are to deal with special case, in
which words may have a different polarity from
its basic polarity. For example, the word high
is positive when it modifies the word quality,
but negative when modifies the word price. The
strength attribute reflects the strength of
polarity for a word. Modifier words are words
that can strengthen, weaken or even reverse
polarity of polarity words, and they have very
similar attributes as the polarity words.
In this system users can make two kinds of
comparisons between different brands as well as
different parts of a certain car. In the left
figure, we can see that six products are selected
for comparison. Users choose brands from the left
column of the interface and compared cars from
the top menu. A bar chart will appear on the
right. The bars above the x-axis show positive
opinion quantity (in red color) and the ones
below x-axis show negative opinion quantity (in
blue color). Thus, we can clearly observe the
statistical evaluation of consumer reviews. The
right figure looks much the same as the left one,
while the main difference is that it deals with
features of cars. You can get a distinct
impression of how consumers view different
features of each product.

5. Pattern Generation and Effective Evaluation
3. System Architecture
Generation Two features which are syntactic
nodes in the parsed tree and part-of-speech of
each related words are used to generate patterns.
Four annotators have hand-crafted the training
data, and rules are automatically generated with
predefined criteria from annotated texts. Several
optimization methods are used before the
automatically generated rules are put into the
pattern library, which is the source for new
relation identification.
Evaluation Some tests have been used to evaluate
the effectiveness of this pattern building
method. Human annotated test data are used as
gold standard, and we got an average 80 recall
rate and 60 precision rate, which mainly towards
feature-polarity-patterns and car-polarity-pattern
s. While most mistakes occur with polarity
strength, the direction of polarity is correct
most of the time. The result shows quite
promising, in that only with part-of-speech and
syntactic features, this method could achieve a
relative high performance. In the future
research, we consider adding more features to
rebuild the pattern knowledge base.
6. A Self-developed Annotation Tool
Surveyer annotation tool is designed not only to
meet the needs of annotation, but also to
describe the processing flow of the system. You
can get a legible view of how Surveyer extracts
opinions and determines their polarization step
by step. You can also export the automatically
generated rule file from annotated data here.
The corpus used in our system is the reviews from
Bulletin Board, which is available from the
following website http//autobbs.pconline.com.cn/
In the corpus, there are a lot of reviews
written with irregular punctuation, so criteria
to split sentence needs to be built first. Then
each sentence is processed in a stage we called
element construction, in which we use several
tools and resources that are the syntactic
parser, POS tagger, Ontology and Polarity
Dictionary to build a dependency syntactical
structure and assign different tags to each word
in the sentence according to their potential use
in the following stage. The pronominal resolution
and ellipsis recovery model mainly deals with
feature words, which mean car names or feature
names of cars in our system. After that, a stage
of the reconstruction for elements is arranged.
In the last two stages, we first identify
constituent relations using a pattern library
which we have built using training data, and then
summarize these opinions from a paragraph level.
Finally, visualized results could be shown with
the Opinion Observer.
Write a Comment
User Comments (0)
About PowerShow.com