Title: NLTK Sentiment Analysis
1 2- CHAPTER 4
- THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN
DEVELOPMENT
3NLTK Sentiment Analysis About NLTK
The Natural Language Toolkit, or more
commonly NLTK, is a suite of libraries and
programs for symbolic and statistical natural
language processing (NLP) for English written in
the Python programming language. It was developed
by Steven Bird and Edward Loper in the Department
of Computer and Information Science at the
University of Pennsylvania.
Copyright _at_ 2019 Learntek. All Rights Reserved.
4Sentiment Analysis Sentiment Analysis is a
branch of computer science, and overlaps heavily
with Machine Learning, and Computational
Linguistics Sentiment Analysis is the most common
text classification tool that analyses an
incoming message and tells whether the underlying
sentiment is positive, negative our neutral. It
the process of computationally identifying and
categorizing opinions expressed in a piece of
text, especially in order to determine whether
the writers attitude towards a particular topic,
product, etc. is positive, negative, or neutral.
5Sentiment Analysis is a concept of Natural
Language Processing and Sometimes referred to as
opinion mining, although the emphasis in this
case is on extraction
6- Examples of the sentimental analysis are as
follows - Is this product review positive or negative?
- Is this customer email satisfied or dissatisfied?
- Based on a sample of tweets, how are people
responding to this ad campaign/product
release/news item? - How have bloggers attitudes about the president
changed since the election? - The purpose of this Sentiment Analysis is to
automatically classify a tweet as a positive or
Negative Tweet Sentiment wise
7- Given a movie review or a tweet, it can be
automatically classified in categories. These
categories can be user defined (positive,
negative) or whichever classes you want. - Sentiment Analysis for Brand Monitoring
- Sentiment Analysis for Customer Service
- Sentiment Analysis for Market Research and
Analysis
8(No Transcript)
9- Sample Positive Tweets
- I love this car
- This view is amazing
- I feel great this morning
- I am so excited about the concert
- He is my best friend
- Sample Negative Tweets
- I do not like this car
- This view is horrible
- I feel tired this morning
- I am not looking forward to the concert
- He is my enemy
10- Sentimental Analysis Process
- The list of word features need to be extracted
from the tweets. - It is a list with every distinct word ordered by
frequency of appearance. - The use of Feature Extractor to decide which
features are more relevant. - The one we are going to use returns a dictionary
indicating that words are contained in the input
passed.
11(No Transcript)
12- Naive Bayes Classifier
- It uses the prior probability of each label
which is the frequency of each label in the
training set and the contribution from each
feature. - In our case, the frequency of each label is the
same for positive and negative. - Word amazing appears in 1 of 5 of the positive
tweets and none of the negative tweets. - This means that the likelihood of the positive
label will be multiplied by 0.2 when this word is
seen as part of the input
13- Sentiment Analysis Example 1
- Training Data
- This is a good book! Positive
- This is a awesome book! Positive
- This is a bad book! Negative
- This is a terrible book! Negative
- Testing Data
- This is a good article
- This is a bad article
14We will train the model with the help of training
data by using Naïve Bayes Classifier. And then
test the model on testing data.
15gtgtgt def form_sent(sent) ...return word True
for word in nltk.word_tokenize(sent) ... gtgtgt
form_sent("This is a good book") 'This' True,
'is' True, 'a' True, 'good' True, 'book'
True gtgtgt s1'This is a good book gtgtgt s2'This
is a awesome book gtgtgt s3'This is a bad book
gtgtgt s4'This is a terrible book'
gtgtgt training_dataform_sent(s1),'pos',form_sen
t(s2),'pos',form_sent(s3),'neg',form_sent(s4),
'neg'
gtgtgt for t in training_dataprint(t) ...
'This' True, 'is' True, 'a' True, 'good'
True, 'book' True, 'pos 'This' True,
'is' True, 'a' True, 'awesome' True, 'book'
True, 'pos'
16'This' True, 'is' True, 'a' True, 'bad'
True, 'book' True, 'neg 'This' True,
'is' True, 'a' True, 'terrible' True, 'book'
True, 'neg gtgtgt from nltk.classify import
NaiveBayesClassifier gtgtgt model
NaiveBayesClassifier.train(training_data)
gtgtgtmodel.classify(form_sent('This is a good
article)) 'pos gtgtgtmodel.classify(form_sent('Thi
s is a bad article)) 'neg gtgtgt
17(No Transcript)
18Accuracy NLTK has a built-in method that
computes the accuracy rate of our model
gtgtgt from nltk.classify.util import accuracy
Sentiment Analysis Example 2 Gender
Identification we know that male and female
names have some distinctive characteristics.
Generally, Names ending in a, e and i are likely
to be female, while names ending in k, o, r, s
and t are likely to be male. We build a
classifier to model these differences more
precisely.
19gtgtgt def gender_features(word) ... return
'last_letter' word-1 gtgtgt gender_features('Sh
rek) 'last_letter' 'k'
Now that weve defined a feature extractor, we
need to prepare a list of examples and
corresponding class labels.
gtgtgt from nltk.corpus import names gtgtgt
labeled_names ((name, 'male') for name in
names.words('male.txt') ... (name, 'female')
for name in names.words('female.txt')) gtgtgt
import random gtgtgt random.shuffle(labeled_names)
20Next, the feature extractor is using to process
the names data and divide the resulting list of
feature sets into a training set and a test set.
The training set is used to train a new naive
Bayes classifier.
gtgtgt featuresets (gender_features(n), gender)
for (n, gender) in labeled_names gtgtgt
train_set, test_set featuresets500,
featuresets500 gtgtgt classifier
nltk.NaiveBayesClassifier.train(train_set)
21(No Transcript)
22Lets just test it out on some names that did not
appear in its training data
gtgtgt classifier.classify(gender_features('Neo))
'male gtgtgt classifier.classify(gender_features
('olvin)) 'male gtgtgt classifier.classify(gend
er_features('ricky)) 'female gtgtgt
classifier.classify(gender_features('serena))
'female'
23gtgtgt classifier.classify(gender_features('cyra))
'female gtgtgt classifier.classify(gender_feature
s('leeta)) 'female gtgtgt classifier.classify(g
ender_features('rock)) 'male gtgtgt
classifier.classify(gender_features('jack))
'male'
24We can systematically evaluate the classifier on
a much larger quantity of unseen data
gtgtgt print(nltk.classify.accuracy(classifier,
test_set)) 0.77
Finally, we can examine the classifier to
determine which features it found most effective
for distinguishing the names genders
25gtgtgt classifier.show_most_informative_features(20)
Most Informative Features last_letter
'a' female male 35.5 1.0
last_letter 'k' male female
30.7 1.0 last_letter 'p' male
female 20.8 1.0 last_letter
'f' male female 15.9 1.0
last_letter 'd' male female
11.5 1.0 last_letter 'v'
male female 9.8 1.0
26last_letter 'o' male female
8.7 1.0 last_letter 'w'
male female 8.4 1.0 last_letter
'm' male female 8.2 1.0
last_letter 'r' male female
7.0 1.0 last_letter 'g'
male female 5.1 1.0 last_letter
'b' male female 4.4 1.0
last_letter 's' male female
4.3 1.0
27last_letter 'z' male female
3.9 1.0 last_letter 'j'
male female 3.9 1.0 last_letter
't' male female 3.8 1.0
last_letter 'i' female male
3.8 1.0 last_letter 'u'
male female 3.0 1.0 last_letter
'n' male female 2.1 1.0
last_letter 'e' female male
1.8 1.0
28(No Transcript)
29For more Training Information , Contact
Us Email info_at_learntek.org USA 1734 418
2465 INDIA 40 4018 1306
7799713624