Emotional Grounding in Spoken Dialog Systems - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Emotional Grounding in Spoken Dialog Systems

Description:

Somewhat Frustrated. Very Frustrated. Somewhat Angry. Very Angry. Other ... Somewhat Frustrated. User: Agent: Positive/Neutral. User: 20020221/0221080552atf1536 ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 39
Provided by: jlisc
Category:

less

Transcript and Presenter's Notes

Title: Emotional Grounding in Spoken Dialog Systems


1
Emotional Grounding in Spoken Dialog Systems
  • Jackson Liscombe
  • jaxin_at_cs.columbia.edu
  • Giuseppe Riccardi Dilek Hakkani-Tür
  • dsp3_at_research.att.com
    dtur_at_research.att.com

2
The Problem Emotion
  • In Spoken Dialog Systems, users can
  • start angry.
  • get angry.
  • end angry.

3
Outline
  • Previous Work
  • Corpus Description
  • Feature Extraction
  • Classification Experiments

4
Outline
  • Previous Work
  • Corpus Description
  • Feature Extraction
  • Classification Experiments

5
Past Work
  1. Isolated Speech
  2. Spoken Dialog Systems

6
Past Work Isolated Speech
  • Acted Data
  • Features
  • F0/pitch
  • energy
  • speaking rate
  • Researchers (late 1990s - present)
  • Aubergé, Campbell, Cowie, Douglas-Cowie,
    Hirscheberg, Liscombe, Mozziconacci, Oudeyer,
    Pereira, Roach, Scherer, Schröder, Tato, Yuan,
    Zetterholm,

7
Past Work Spoken Dialog Systems (1)
  • Batliner, Huber, Fischer, Spilker, Nöth (2003)
  • system Verbmobil (Wizard of Oz scenarios)
  • binary classification
  • features
  • prosodic
  • lexical (POS tags, swear words)
  • dialog acts (repeat/repair/insult)
  • 0.1 relative improvement using dialog acts

8
Past Work Spoken Dialog Systems (2)
  • Ang, Dhillon, Krupski, Shriberg, Stolcke (2002)
  • system DARPA Communicator
  • binary classification
  • features
  • prosodic
  • lexical (language model)
  • dialog acts (repeats/repairs)
  • 4 relative improvement using dialog acts

9
Past Work Spoken Dialog Systems (3)
  • Lee, Narayanan (2004)
  • system Speechworks call-center
  • binary classification
  • features
  • prosodic
  • lexical (weighted mutual information)
  • dialog acts (repeat/rejection)
  • 3 improvement using dialog acts

10
Past Work Summary
  • Past research has focused on acoustic data
  • But, moving toward grounding emotion in context
    (dialogs acts)
  • Summer work extend contextual features for
    better emotion prediction

11
Outline
  • Previous Work
  • Corpus Description
  • Feature Extraction
  • Classification Experiments

12
Corpus Description
  • ATTs How May I Help You?SM corpus (0300
    Benchmark)
  • Labeled with Voice Signature information
  • user state (emotion)
  • gender
  • age
  • accent type

13
Corpus Description
Statistic Training Testing
number user turns 15,013 5,000
number of dialogs 4,259 1,431
number of turns per dialog 3.5 3.5
number of words per turn 9.0 9.9
14
User Emotion Distribution
15
Emotion Labels
  • Original Set
  • Positive/Neutral
  • Somewhat Frustrated
  • Very Frustrated
  • Somewhat Angry
  • Very Angry
  • Other Somewhat Negative
  • Very Negative
  • Reduced Set
  • Positive
  • Negative

16
Corpus Description Binary User States
Statistic Training Testing
of turns that are positive 88.1 73.1
of dialogs with at least one negative turn 24.8 44.7
of negative dialogs that start negative 43.5 59.9
of negative dialogs that end negative 42.4 48.7
17
Outline
  • Previous Work
  • Corpus Description
  • Feature Extraction
  • Classification Experiments

18
Feature Set Space
Features Context Prosodic Lexical Discourse
turni
turni-1 turni
turni-2 turni-1

19
Feature Set Space Context Overview
Features Context Prosodic Lexical Discourse
turni Isolated Isolated Isolated
turni-1 turni Differentials Prior Statistics Prior Statistics
turni-2 turni-1 Differentials Prior Statistics Prior Statistics
Differentials Prior Statistics Prior Statistics
20
Lexical Features
  • Language Model (ngrams)
  • Examples of words significantly correlated with
    negative user state (plt0.001)
  • 1st person pronouns I, me
  • requests for a human operator person, talk,
    speak, human, machine
  • billing-related words dollars, cents
  • curse words

21
Prosodic Features
  • Praat
  • open source tool for speech analysis, synthesis,
    statistics, manipulation,
  • Paul Boersma and David Weenink 
  • University of Amsterdam
  • www.praat.org

22
Prosodic Features
  • Pitch (F0)
  • overall minimum
  • overall maximum
  • overall median
  • overall standard deviation
  • mean absolute slope
  • slope of final vowel
  • longest vowel mean
  • Other
  • local jitter over longest vowel
  • Energy
  • overall minimum
  • overall maximum
  • overall mean
  • overall standard deviation
  • longest vowel mean
  • Speaking Rate
  • vowels per second
  • mean vowel length
  • ratio voiced frames to total frames
  • percent internal silence

23
Prosodic Features in Action
24
A View Of Pitch
Very Frustrated
Somewhat Frustrated
  • That amount is incorrect.

20020508/0508095935fhf2910
25
Pitch/Energy/Speaking Rate Change
Person Transcription Emotion Play

Agent I'm sorry, I didn't understand that. How may I help you?
User please an operator please .hst may I speak- Positive/Neutral
Agent I'm sorry we're having trouble. Can we try one more time? Just briefly tell me how I may help you.
User may I speak to an assistant please Somewhat Frustrated
Agent Are you calling from your home phone?
User yes Somewhat Angry
20020221/0221080552atf1536
26
Pitch/Energy/Speaking Rate Change
20020221/0221080552atf1536
27
Pitch/Energy/Speaking Rate No Change
Person Transcription Emotion Play
Agent Hello, I am an ATT automated assistant. You can speak naturally to me. How may I help you ?
User I need to find out about a number that I don't recognize Positive/Neutral
Agent Would you like to look up a number you don't recognize on you bill?
User yes I would Positive/Neutral
Agent Are you calling from your home phone?
User yes I am Positive/Neutral
20020317/0317220739atf9051
28
Pitch/Energy/Speaking Rate No Change
20020317/0317220739atf9051
29
Feature Set Space Baseline
Features Context Prosodic Lexical Discourse
turni
turni-1 turni
turni-2 turni-1

30
Discourse Features
  • Dialog Acts
  • greeting
  • re-prompt
  • confirmation
  • specification
  • acknowledgment
  • disambiguation

31
Feature Set Space State-of-the-Art
Features Context Prosodic Lexical Discourse
turni
turni-1 turni
turni-2 turni-1

32
Contextual Features
  • Lexical (2)
  • edit distance with previous 2 turns
  • Discourse (10)
  • turn number
  • call type repetition with previous 2 turns
  • dialog act repetition with previous 2 turns
  • Prosodic (34)
  • 1st and 2nd order differentials for each feature
  • Other (2)
  • user state of previous 2 turns

33
Feature Set Space Contextual
Features Context Prosodic Lexical Discourse
turni
turni-1 turni
turni-2 turni-1

34
Outline
  • Previous Work
  • Corpus Description
  • Feature Extraction
  • Classification Experiments

35
Experimental Design
  • Training size 15,013 turns
  • Testing size 5,000 turns
  • Most frequent user state (positive) accounts for
    73.1 of testing data
  • Learning Algorithm Used
  • BoosTexter (boosting w/ weak learners)
  • continuous and discrete valued features
  • 2000 iterations

36
Performance Accuracy Summary
Feature Set Accuracy Rel. Improv. over Baseline
Most Freq. State 73.1 -----
Baseline 76.1 -----
State-of-the-Art 77.0 1.2
Contextual 79.0 3.8
37
Conclusions
  • Baseline (prosodic and lexical features)
  • leads to improved emotion prediction over chance
  • State-of-the-Art (baseline plus dialog acts)
  • gives further improvement
  • Innovative contextual features
  • improves emotion prediction even further
  • Towards a computation model of emotional grounding

38
Thank You
Write a Comment
User Comments (0)
About PowerShow.com