Emotional Speech - PowerPoint PPT Presentation

About This Presentation
Title:

Emotional Speech

Description:

interested. anxious. bored. encouraging. Acted Corpora. CS ... Negative: angry, anxious, bored, frustrated, sad. Example. CS 4706. 9. Emotion Intercorrelations ... – PowerPoint PPT presentation

Number of Views:316
Avg rating:3.0/5.0
Slides: 37
Provided by: www1CsC
Category:

less

Transcript and Presenter's Notes

Title: Emotional Speech


1
Emotional Speech
  • CS 4706
  • Julia Hirschberg (thanks to Jackson Liscombe and
    Lauren Wilcox for some slides)

2
Outline
  • Why study emotional speech?
  • Why is modeling emotional speech so difficult?
  • Production and perception studies
  • Voice Quality features the holy grail

3
Why study emotional speech?
  • Recognition
  • Customer-care centers
  • Tutoring systems
  • Automated agents (Wildfire)
  • Generation
  • Characteristics of emotional speech little
    understood, so hard to produce a voice that
    sounds friendly, sympathetic, authoritative.
  • TTS systems
  • Games

4
Emotion in Spoken Dialogue Systems
  • Batliner, Huber, Fischer, Spilker, Nöth (2003)
  • Verbmobil (Wizard of Oz scenarios)
  • Ang, Dhillon, Krupski, Shriberg, Stolcke (2002)
  • DARPA Communicator
  • Liscombe, Guicciardi, Tur, Gokken-Tur (2005)
  • How May I Help You? call center
  • Lee, Narayanan (2004)
  • Speechworks call-center
  • Liscombe, Hirschberg, Venditti (2005)
  • ITSpoke Tutoring System (physics)

5
Why is emotional speech so hard to model?
  • Colloquial definitions of speakers and listeners
    ? technical definitions
  • Utterances may convey multiple emotions
    simultaneously
  • Result
  • Human consensus low
  • Hard to get reliable training data

6
Spontaneous Corpora
  • Unconstrained
  • Campbell, 2003 Roach, 2000
  • Cowie et al., 2001
  • Call centers
  • Vidrascu Devillers, 2005 Ang et al., 2002
  • Litman and Forbes-Riley, 2004 Batliner et al.,
    2003
  • Lee Narayanan, 2005
  • Meetings
  • Wrede and Shriberg, 2003

7
Acted Corpora
  • happy
  • sad
  • angry
  • confident
  • frustrated
  • friendly
  • interested

anxious bored encouraging
8
LDC Emotional Prosody and Transcripts corpus
  • Semantically neutral (dates and numbers)
  • 8 actors
  • 15 emotions

9
Are Emotions Mutually Exclusive?
  • User study to classify tokens from LDC Emotional
    Prosody corpus
  • 10 emotions only
  • Positive confident, encouraging, friendly,
    happy, interested
  • Negative angry, anxious, bored, frustrated, sad
  • Example

10
Emotion Intercorrelations
Emotion sad angry bored frust anxs friend conf happy inter encour
sad 0.44 0.44 0.26 0.22 -0.27 -0.32 -0.42 -0.32 -0.33
angry 0.70 0.21 -0.41 -0.37 -0.09 -0.32
bored 0.14 -0.14 -0.28 -0.17 -0.32 -0.42 -0.27
frustrated 0.32 -0.43 -0.09 -0.47 -0.16 -0.39
anxious -0.14 -0.25 -0.17 -0.14
friendly 0.44 0.77 0.59 0.75
confident 0.45 0.51
happy 0.58 0.73
interested 0.62
encouraging
(p lt 0.001)
11
Results
  • Emotions are heavily correlated
  • Positive with positive
  • Negative with negative
  • Emotions are non-exclusive
  • Can they be clustered empirically
  • Activation
  • Valency

12
Global Pitch Statistics
Different Valence/Activation
13
Different Valence/Same Activation
14
Identifying Emotions
  • Automatic Acoustic-prosodic
  • Davitz, 1964 Huttar, 1968
  • Global characterization
  • pitch
  • loudness
  • speaking rate
  • Intonational Contours
  • Mozziconacci Hermes, 1999
  • Spectral Tilt
  • Banse Scherer, 1996 Ang et al., 2002

15
Machine Learning Experiment
  • RIPPER 90/10 split
  • Binary classification for each emotion
  • Results
  • 62 average baseline
  • 75 average accuracy
  • Acoustic-prosodic features for activation
  • /H-L/ for negative /L-L/ for positive
  • Spectral tilt for valence?

16
Accuracy Distinguishing One Emotion from the Rest
Emotion Baseline Accuracy
angry 69.32 77.27
confident 75.00 75.00
happy 57.39 80.11
interested 69.89 74.43
encouraging 52.27 72.73
sad 61.93 80.11
anxious 55.68 71.59
bored 66.48 78.98
friendly 59.09 73.86
frustrated 59.09 73.86
17
A Call Center Application
  • ATTs How May I Help You? system
  • Customers often angry and frustrated

18
HMIHY Example
Very Frustrated
Somewhat Frustrated
19
Pitch, Energy and Rate
20
Features
  • Automatic Acoustic-prosodic
  • Contextual
  • Cauldwell, 2000
  • Lexical
  • Schröder, 2003 Brennan, 1995
  • Pragmatic
  • Ang et al., 2002 Lee Narayanan, 2005

21
Results
Feature Set Accuracy Rel. Improv. over Baseline
Majority Class 73.1 -----
proslex 76.1 -----
proslexda 77.0 1.2
all 79.0 3.8
22
Tutoring Systems Should Respond to Uncertainty
  • SCoT Pon-Barry et al. 2006
  • Responding to uncertainty
  • Active listening
  • Hinting vs. paraphrasing
  • Features examined
  • Latency
  • Filled pauses
  • Hedges
  • Performance metric
  • Learning gain
  • But no improvement by responding to uncertainty

23
What does uncertainty sound like?
24
pr01_sess00_prob58
25
Uncertainty in ITSpoke
  • um ltsighgt I dont even think I have an idea here
    ...... now .. mass isnt weight ...... mass is
    ................ the .......... space that an
    object takes up ........ is that mass?

71-67-192-113
26
ITSpoke Experiment
  • Human-Human Corpus
  • AdaBoost(C4.5) 90/10 split in WEKA
  • Classes Uncertain vs Certain vs Neutral
  • Results

Features Accuracy
Baseline 66
Acoustic-prosodic 75
contextual 76
breath-groups 77
27
ITSpoke Results
Emotion Precision Recall F-measure
certain 0.611 0.602 0.606
uncertain 0.515 0.393 0.446
neutral 0.846 0.891 0.868
Emotion label Classified as Classified as Classified as
Emotion label certain uncertain neutral
certain 80 11 42
uncertain 26 35 28
neutral 25 22 384
28
Voice Quality and Emotion
  • Perceptual coloring
  • Derived from a variety of laryngeal and
    supralaryngeal features
  • modal, creaky, whispered, harsh, breathy, ...
  • Correlates with emotion
  • Laver 80, Scherer 86, Murray Arnott 93,
    Laukkanen 96, Johnstone Scherer 99, Gobl
    Chasaide, 03, Fernandez 00

29
Phonation Gestures
  • Adductive tension interarytenoid muscles adduct
    the arytenoid muscles
  • Medial compression adductive force on vocal
    processes- adjustment of ligamental glottis
  • Longitudinal pressure tension of vocal folds

30
Modal Voice
  • Neutral mode
  • Muscular adjustments moderate
  • Vibration of vocal folds periodic, full closing
    of glottis, no audible friction
  • Frequency of vibration and loudness in low to mid
    range for conversational speech

31
Tense Voice
  • Very strong tension of vocal folds, very high
    tension in vocal tract

32
Whispery Voice
  • Very low adductive tension
  • Medial compression moderately high
  • Longitudinal tension moderately high
  • Little or no vocal fold vibration
  • Turbulence generated by friction of air in and
    above larynx

33
Creaky Voice
  • Vocal fold vibration at low frequency, irregular
  • Low tension (only ligamental part of glottis
    vibrates)
  • The vocal folds strongly adducted
  • Longitudinal tension weak
  • Moderately high medial compression

34
Breathy Voice
  • Tension low
  • Minimal adductive tension
  • Weak medial compression
  • Medium longitudinal vocal fold tension
  • Vocal folds do not come together completely,
    leading to frication

35
Estimating Voice Quality
  • Estimate wrt controlled neutral quality
  • But how do we know the control is truly
    neutral?
  • Must must match the natural laryngeal behavior to
    laboratory neutral
  • Our knowledge of models of vocal fold movements
    may be inadequate for describing real phonation
  • Known relationships between acoustic signal and
    voice source are complex
  • Only can observe behavior of voicing indirectly
    so prone to error.
  • Direct source data obtained by invasive
    techniques which may interfere with signal

36
Next Class
  • Deceptive Speech
Write a Comment
User Comments (0)
About PowerShow.com