Producing Emotional Speech - PowerPoint PPT Presentation

About This Presentation
Title:

Producing Emotional Speech

Description:

Title: Producing Emotional Speech Last modified by: Julia Hirschberg Document presentation format: Custom Other titles: Marker Felt ProN W3 Arial ... – PowerPoint PPT presentation

Number of Views:251
Avg rating:3.0/5.0
Slides: 42
Provided by: colu117
Category:

less

Transcript and Presenter's Notes

Title: Producing Emotional Speech


1
Producing Emotional Speech
  • Thanks to Gabriel Schubiner

2
Papers
  • Generation of Affect in Synthesized Speech
  • Corpus-based approach to synthesis
  • Expressive visual speech using talking head
  • Demos
  • Affect Editor Quiz/Demo
  • Synface Demo

3
Affect in Speech
  • Goals
  • Addition of Emotion to Synthetic speech
  • Acoustic Model
  • Typology of parameters of emotional speech
  • Quantification
  • Addresses problem of expressiveness
  • What benefit is gained from expressive speech?

4
Emotion Theory/Assumptions
  • Emotion -gt Nervous System -gt Speech Output
  • Binary distinction
  • Parasympathetic vs Sympathetic
  • based on physical changes
  • universal emotions

5
Approaches to Affect
  • Generative
  • Emotion -gt Physical -gt Acoustic
  • Descriptive
  • Observed acoustic params imposed

6
Descriptive Framework
  • 4 Parameter groups
  • Pitch
  • Timing
  • Voice Quality
  • Articulation
  • Assumption of independence
  • How could this affect design and results?

7
Pitch Timing
  • Accent Shape
  • Average Pitch
  • Contour Slope
  • Final Lowering
  • Pitch Range
  • Reference Line
  • Exaggeration (not used)
  • Fluent Pauses
  • Hesitation Pauses
  • Speech Rate
  • Stress Frequency
  • Stressed Stressable

8
Voice Quality Articulation
  • Breathiness
  • Brilliance
  • Loudness
  • Pause Discontinuity
  • Pitch Discontinuity
  • Tremor
  • Laryngealization
  • Precision

9
Implementation
  • Each parameter has scale
  • Each scale is independent
  • from other parameters
  • between positive and negative

10
Implementation
  • Settings grouped into preset conditions for each
    emotion
  • based on prior studies

11
Program Flow Input
  • Emotion -gt parameter representation
  • Utterance -gt clauses
  • Agent, Action, Object, Locative
  • Clause and lexeme annotations
  • Finds all possible locations of affect and
    chooses whether or not to use

12
Program Flow
  • Utterance -gt Tree structure -gt linear phonology
  • compiled for specific synthesizer with software
    to simulate affects not available in hardware

13
(No Transcript)
14
Perception
  • 30 Utterances
  • 5 sentences 6 affects
  • Forced choice of one of six affects
  • magnitude and comments

15
Elicitation Sentences
  • Intro
  • Im almost finished
  • Im going to the city
  • I saw your name in the paper X
  • I thought you really meant it
  • Look at that picture

16
Pop Quiz!!!
17
Pop Quiz Solutions
  • Im almost finished
  • Disgust Surprise Sadness Gladness Anger
    Fear
  • Im going to the city
  • Surprise Gladness Anger Disgust Sadness
    Fear
  • I thought you really meant it
  • Anger Disgust Gladness Sadness Fear
    Surprise
  • Look at that picture
  • Anger Fear Disgust Sadness Gladness
    Surprise

18
Results
  • approx 50 recognition rate
  • 91 sadness

19
(No Transcript)
20
Conclusions
  • Effective?
  • Thoughts?

21
Corpus-based Approach to Expressive Speech
Synthesis
22
Corpus
  • Collect utterances in each emotion
  • emotion-dependent semantics
  • One speaker
  • Good news, Bad news, Question

23
Model Feature Vector
  • Features
  • Lexical stress
  • Phrase-level stress
  • Distance from beginning of phrase
  • Distance from end of phrase
  • POS
  • Phrase-type
  • End of syllable pitch

24
Model Classification
  • Predicts F0
  • 5 syllable window
  • Uses feature vector to predict observation vector
  • observation vector log(p), ?p
  • p end of syllable pitch
  • Decision Tree

25
Model Target Duration
  • Similar to predicting F0
  • build tree with goal of providing Gaussian at
    leafs
  • Use mean of class as target duration
  • discretization

26
Models
  • Uses acoustic analogue of n-grams
  • captures sense of context
  • compared to describing full emotion as sequence
  • compare to Affect Editor
  • Uses only F0 and length (comp. A E)
  • Include information about from which utterance
    the features are derived
  • intentional bias, justified?

27
Model Synthesis
  • Data tagged with original expression and emotion
  • expression-cost matrix
  • noted trade-off
  • emotional intensity vs. smoothness
  • Paralinguistic events

28
SSML
  • Compare to Cahns typology
  • Abstraction layers

29
Perception Experiment
  • Distinguish same utterance spoken with neutral
    and affected prosody
  • Semantic content problematic?

30
Results
  • Binary decision
  • Reasonable gain over baseline?

31
Conclusion
  • Major contributions?
  • Paths forward?

32
Synthesis of Expressive Visual Speech on a
Talking Head
33
lt Not these Talking Heads... gt
34
Synthesis Background
  • Manipulation of video images
  • Virtual model with deformation parameters
  • Synchronized with time-aligned transcription
  • Articulatory Control Model
  • Cohen Massaro (1993)

35
Data
  • Single actor
  • Given specific emotion as instruction
  • 6 emotions neutral

36
Facial Animation Parameters
  • Face independent
  • FAP Matrix scaling factor position0
  • Weighted deformations of distance between
    vertices and feature point

37
Modeling
  • Phonetic segments assigned target parameter
    vector
  • temporal blending over dominance functions
  • Principal components

38
ML
  • Separate models for each emotion
  • 61 trainingtesting ratio
  • models -gt PC traj -gt FAP traj emotion param
    matrix

39
Results
  • More extreme emotions easier to perceive
  • 73 sad, 60 angry, 40 sad

40
Synface Demo
41
Discussion
  • Changes in approach from Cahn to Eide
  • Production compared to Detection
Write a Comment
User Comments (0)
About PowerShow.com