CS 124LINGUIST 180: From Languages to Information - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

CS 124LINGUIST 180: From Languages to Information

Description:

Kitchell's writing is unbelievably mature and is only likely to get better. ... What makes movies hard to classify? Sentiment can be subtle: ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 49
Provided by: DanJur6
Category:

less

Transcript and Presenter's Notes

Title: CS 124LINGUIST 180: From Languages to Information


1
CS 124/LINGUIST 180 From Languages to
Information
  • Dan Jurafsky
  • Lecture 5 Sentiment Analysis

IP notice many slides for today from Janyce
Wiebe, plus some from Marti Hearst and Marta Tatu
2
Sentiment, Style, Identity Classification
  • Identity
  • Authorship identification
  • Age/gender identification
  • Sentiment Analysis
  • Movie is a review positive or negative
  • Products (new MacBook Pro)
  • Sentiments over time (is anger increasing or
    decreasing?)
  • Politics (is this editorial left or right?)
  • Prediction (election outcomes, market trends).
    Will stock go up after this news report?
  • Style/Emotion
  • Is this conversation (or blog) friendly,
    aggressive, polite, flirtatious

3
Part I Author Identification(Stylometry)
4
Author Identification
  • Also called Stylometry in the humanities
  • An example of a Classification Problem
  • Classifiers
  • Decide which of N buckets to put an item in
  • (Some classifiers allow for multiple buckets)

5
The Disputed Federalist Papers
  • In 1787-1788, Jay, Madison, and Hamilton wrote a
    series of anonymous essays to convince the voters
    of New York to ratify the new U. S. Constitution.
  • Scholars have consensus that
  • 5 authored by Jay
  • 51 authored by Hamilton
  • 14 authored by Madison
  • 3 jointly by Hamilton and Madison
  • 12 remain in dispute Hamilton or Madison?

6
Author identification
  • Federalist papers
  • In 1963 Mosteller and Wallace solved the problem
  • They identified function words as good candidates
    for authorships analysis
  • Using statistical inference they concluded the
    author was Madison
  • Since then, other statistical techniques have
    supported this conclusion.

7
Function vs. Content Words
High rates for by favor M, low favor H High
rates for from favor M, low says little High
rats for to favor H, low favor M
8
Function vs. Content Words
No consistent pattern for war
9
Federalist Papers Problem
Fung, The Disputed Federalist Papers SVM Feature
Selection Via Concave Minimization, ACM TAPIA03
10
Part II Political Sentiment
  • Two examples of classifiers
  • Using words as features
  • And a Naïve Bayes or SVM classifier
  • To make a binary decision
  • About the political stance of a text

11
Part II Political Sentiment
  • Lin, Wilson, Wiebe, Hauptmann (2006)
  • Bitterlemon.com
  • A website designed to contribute to mutual
    understanding between Palestinians and Israelis
    through the open exchange of ideas
  • Can we label Israeli Palestinian perspective
  • The inadvertent killing by Israeli forces of
    Palestinian civilians usually in the course of
    shooting at Palestinian terrorists is
    considered no different at the moral and ethical
    level than the deliberate targeting of Israeli
    civilians by Palestinian suicide bombers.
  • In the first weeks of the Intifada, for example,
    Palestinian public protests and civilian
    demonstrations were answered brutally by Israel,
    which killed tens of unarmed protesters.

12
Lin et al on Political Perspective
  • 594 articles from 2001-2005
  • Naïve Bayes classifier
  • Accuracy 89-99

13
Naïve Bayes Top 20 words
  • Palestinian
  • palestinian, israel, state, politics, peace,
    international, people, settle, occupation,
    sharon, right, govern, two, secure, end,
    conflict, process, side, negotiate
  • Israeli
  • israel, palestinian, state, settle, sharon,
    peace, arafat, arab, politics, two, process,
    secure, conflict, lead, america, agree, right,
    gaza, govern

14
Thomas, Pang, Lee Get out the vote Determining
support or opposition from Congressional
floor-debate transcripts
  • Goal label a speech as pro or con a bill
  • Data transcripts of all debates in House of
    Representatives in 2005
  • From GovTrack (http//govtrack.us) website
  • Each speech segment (sequence of uninterrupted
    utterances by speaker)
  • Labeled by the vote (yea or nay) cast
  • Labeled by SVM classifier, using all word
    unigrams as features

15
Results
  • majority baseline 58.37
  • (support) - (oppos) 62.67
  • SVM classifier 66.05
  • Add network of agreements 70.81

16
Part III How to choose sentiment vocabulary
  • Key task Vocabulary
  • The previous work used all words
  • Can we do better by focusing on subset of words?
  • How to find words, phrases, patterns that express
    sentiment or polarity?

17
Words
  • Adjectives
  • positive honest important mature large patient
  • Ron Paul is the only honest man in Washington.
  • Kitchells writing is unbelievably mature and is
    only likely to get better.
  • To humour me my patient father agrees yet again
    to my choice of film

18
Words
  • Adjectives
  • negative harmful hypocritical inefficient
    insecure
  • It was a macabre and hypocritical circus.
  • Why are they being so inefficient ?
  • subjective curious, peculiar, odd, likely,
    probably

Slide from Janyce Wiebe
19
Other parts of speech
  • Verbs
  • positive praise, love
  • negative blame, criticize
  • Nouns
  • positive pleasure, enjoyment
  • negative pain, criticism

20
Phrases
  • Phrases containing adjectives and adverbs
  • positive high intelligence, low cost
  • negative little variation, many troubles

21
Identifying polarity words
  • Assume that contexts are coherent
  • Fair and legitimate, corrupt and brutal
  • fair and brutal, corrupt and legitimate

22
Hatzivassiloglou McKeown 1997Predicting the
semantic orientation of adjectives
  • Step 1
  • From 21-million word WSJ corpus
  • For every adjective with frequency gt 20
  • Label for polarity
  • Total of 1336 adjectives
  • 657 positive
  • 679 negative

23
Hatzivassiloglou McKeown 1997
  • Step 2 Extract all conjoined adjectives

nice and comfortable nice and scenic
Slide adapted from Janyce Wiebe
23
24
Hatzivassiloglou McKeown 1997
  • 3. A supervised learning algorithm builds a graph
    of adjectives linked by the same or different
    semantic orientation

scenic
nice
terrible
painful
handsome
fun
expensive
comfortable
25
Hatzivassiloglou McKeown 1997
  • 4. A clustering algorithm partitions the
    adjectives into two subsets


slow
scenic
nice
terrible
handsome
painful
fun
expensive
comfortable
26
Hatzivassiloglou McKeown 1997
27
Turney (2002) Thumbs Up or Thumbs Down?
Semantic Orientation Applied to Unsupervised
Classification of Reviews
  • Input review
  • Identify phrases that contain adjectives or
    adverbs by using a part-of-speech tagger
  • Estimate the semantic orientation of each phrase
  • Assign a class to the given review based on the
    average semantic orientation of its phrases
  • Output classification (? or ?)

28
Turney Step 1
  • Extract all two-word phrases including an
    adjective

29
Turney Step 2
  • Estimate the semantic orientation of the
    extracted phrases using Pointwise Mutual
    Information

30
Pointwise Mutual Information
  • Mutual information between 2 random variables X
    and Y
  • Pointwise mutual information measure of how
    often two events x and y occur, compared with
    what we would expect if they were independent

31
Weighting Mutual Information
  • Pointwise mutual information measure of how
    often two events x and y occur, compared with
    what we would expect if they were independent
  • PMI between two words how much more often they
    occur together than we would expect if they were
    independent

32
Turney Step 2
  • Semantic Orientation of a phrase defined as
  • Estimate PMI by issuing queries to a search
    engine (Altavista, 350 million pages)

33
Turney Step 3
  • Calculate average semantic orientation of phrases
    in review
  • Positive ?
  • Negative ?

34
Experiments
  • 410 reviews from Epinions
  • 170 (41) (?)
  • 240 (59) (?)
  • Average phrases per review 26
  • Baseline accuracy 59

35
Discussion
  • What makes movies hard to classify?
  • Sentiment can be subtle
  • Perfume review in Perfumes the Guide
  • If you are reading this because it is your
    darling fragrance, please wear it at home
    exclusively, and tape the windows shut.
  • She runs the gamut of emotions from A to B
  • (Dorothy Parker on Katherine Hepburn)
  • Order effects
  • This film should be brilliant. It sounds like a
    great plot, the actors are first grade, and the
    supporting cast is good as well, and Stallone is
    attempting to deliver a good performance.
    However, it cant hold up.

36
Patterns
  • Lexico-syntactic patterns Riloff Wiebe 2003
  • way with ltnpgt to ever let China use force to
    have its way with
  • expense of ltnpgt at the expense of the worlds
    security and stability
  • underlined ltdobjgt Jiangs subdued tone
    underlined his desire to avoid disputes

37
Labeling Conversations for Style
  • Dan Jurafsky, Rajesh Ranganath, Dan McFarland

38
Extraction of Social Meaning/Emotion/ Style from
Speech and Text
  • Detection of student uncertainty in tutoring
  • Forbes-Riley et al. (2008)
  • Emotion detection (annoyance)
  • Ang et al. (2002)
  • Detection of deception
  • Newman et al. (2003)
  • Detection of charisma
  • Rosenberg and Hirschberg (2005)
  • Speaker stress, trauma
  • Rude et al. (2004), Pennebaker and Lay (2002)

39
Our task Detect Interactional Style
  • Given speech and text from a conversation
  • Can we tell if a speaker is
  • Awkward?
  • Flirtatious?
  • Friendly?
  • Dataset
  • 1000 4-minute speed-dates
  • Each subject rated their partner for these styles
  • The following segment has been lightly
    signal-processed

40
Features Prosodic and Lexical
  • In a regularized logistic regression classifier
  • Pitch min, max, mean, std, range
  • Amplitude min, max, mean, std
  • Duration of turn
  • Number of words
  • Use of past tense
  • Use of you
  • Use of we

41
Features Discourse
  • of Backchannels
  • Uh-huh. Yeah. Right. Oh, Okay
  • of Appreciations
  • Wow. Thats true. Oh, great!
  • of Questions
  • Amount of Laughter
  • Total number of turns
  • of disfluencies
  • Amount of overlapped speech
  • NTRI
  • Wait, say it again, do you have another one?

42
Results so far
43
Regression weights
44
Regression weights
45
Good predictors, across both genders
  • Awkward speaker slow, lower pitched, stilted
    talk
  • Flirtatious speaker greater laughter, more
    questions, and referring to the past.
  • Friendly speaker greater laughter.

46
Gender differences
  • Flirtation
  • Women raise pitch, men drop pitch
  • Women laughter is highest weighted
  • Women quiet talk
  • Men expanded pitch range

47
Flirtation versus friendliness
  • Men
  • Flirtation
  • more questions, you, lower pitch, greater
    pitch range
  • Friendliness
  • Higher pitch, decreased pitch range
  • Women
  • Flirtation
  • more questions, you, quiet speech, higher pitch
  • Friendliness
  • no questions, loud speech, more pitch variation

48
Summary on Sentiment and Style
  • Function words are a good cue to identity
  • All words work well for some tasks
  • Finding subsets of words may help in other tasks
  • Other features may also help
  • Questions
  • Length of sentences
  • Speech features
Write a Comment
User Comments (0)
About PowerShow.com