Multi-modal expression of Swedish prominence Bj - PowerPoint PPT Presentation

About This Presentation
Title:

Multi-modal expression of Swedish prominence Bj

Description:

... Measurement points for lip coarticulation analysis The expressive mouth Prompted read speech database Slide 26 Slide 27 Examples from the database ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 37
Provided by: tmh90
Category:

less

Transcript and Presenter's Notes

Title: Multi-modal expression of Swedish prominence Bj


1
Multi-modal expression of Swedish prominence
Björn Granström Centre for Speech
Technology, Department of Speech, Music and
Hearing, KTH, Stockholm, Sweden
2
Historical background
  • Prosody for speech synthesis at KTH, together
    with Rolf Carlson
  • The Lund intonation model Gösta Bruce et al.

3
Several joint projects
  • Profs Prosodic phrasing in Swedish 1989-1992
  • Gösta Bruce, Björn Granström and more
  • First reference G. Bruce and B. Granström.
    Modelling Swedish intonation in a text-to-speech
    system. STL-QPSR, 30(1)17-21, 1989. (on the KTH
    web)

4
Potentially ambiguous sentences, varying in
phrase boundary location
5
Entering greve Pipers humble residence
6
Several joint projects, cont.
  • Prosodiag - Prosodic Segmentation and Structuring
    of Dialogue (HSFR NUTEK) 1993 1996
  • Gösta Bruce, Björn Granström, Kjell Gustafson,
    David House, Paul Touati
  • Project Description
  • The object of study is the prosody of dialogue in
    a language technology framework. The primary goal
    of the project is to increase our understanding
    of how prosodic aspects of speech are exploited
    interactively in dialogue and on the basis of
    this increased knowledge to be able to create a
    more powerful prosody model.
  • Late reference Gösta Bruce, Johan Frid, Björn
    Granström, Kjell Gustafson, Merle Home, and David
    House. Prosodic segmentation and structuring of
    dialogue. TMH-QPSR, 37(3)1-6, 1996.
  • More than 20 joint publications and then?

7
Much in the context of the annual phonetics
meetings next
8
Project meetings in inspirering surroundings
9
..probing many different cultures
10
Is prosody more than sound?
  • Our bias communication is multi-modal
  • Traditionally prosodic functions are signaled by
    gestures, perceived by eye and ear
  • This concerns both body and face gestures
  • Preliminary hypothesis F0eyebrow height - e.g.
    Cavé et al. (1996)
  • Easy to put to a test with multimodal speech
    synthesis

11
Eyebrow vs intonation
1 No eyebrow motion 2 Eyebrow motion
controlled by the fundamental frequency
of the voice 3 Eyebrow motion at focal
accents 4 Eyebrow motion at the first
focal accent
  • Jag heter Axel, inte Axell (translation My
    name is Axel, not Axell). In Sweden Axel is a
    first name as opposed to Axell, which is a family
    name.

12
Goals and research context
  • How are visual expressions used to convey and
    strengthen prosodic functions?
  • Understand interactions between visual
    expressions, dialog functions and speech
    acoustics
  • Context animated talking agent
  • Realistic communicative behavior using multimodal
    speech synthesis

13
Visual prosodic functions
  • Prominence
  • stress
  • focus
  • Phrasing
  • Utterance type
  • question
  • statement
  • Dialogue functions
  • back channeling
  • turntaking
  • Attitudes
  • Emotions

14
Visual prosody cont.
  • What is underlying?
  • How tight is the AV connection?
  • What are the important visual gestures?
  • More optional than acoustic prosodic parameters?
  • Individual and cultural variation
  • Reinforcing or qualifying acoustics?

15
Formal experimentProminence due to eyebrow
rise5 content words När pappa fiskar stör
piper PutteWhen dad is fishing sturgeon, Putte
is whimpering
16
Example of stimuliTask which word is most
prominent (identical acoustics varied
location of eyebrow movement)
Eyebrow movement
No eyebrow movement (neutral)
17
Prominence increase due to eyebrow movement
18
Feedback experiment
  • Mini dialogues (two turns)
  • Travel agent application
  • Both visual and acoustic feedback cues
  • Affirmative cues agent understands/accepts the
    request
  • Negative cues agent is unsure about the request
    (seeks confirmation)
  • Six cues hypothesised
  • Granström, House Swerts (2002)

19
Pos/Neg feedback experiment
(Granström, House Swerts 2002)
20
(No Transcript)
21
Recording of communicative interactions
Automatic tracking of reflective spots in 3D
(Qualisys)
22
Interactions emotion and articulation
(resynthesis)(from AV speech database
EU/PF_STAR project)
23
Measurement points for lip coarticulation analysis
Vertical distance
left mouth corner
Lateral distance
24
The expressive mouth
left mouth corner
  • All vowels
  • (sentences)
  • Encouraging
  • Happy
  • Angry
  • Sad
  • Neutral

(Svanfeldt et al. 2003)
25
Prompted read speech database
  • Expressive modes
  • Confirming, questioning, certain, uncertain,
    happy, (angry)
  • 39 short, content neutral sentences with three
    possible focal accent positions each, e.g.
  • Båten seglade förbi (The boat sailed by)
  • Dom flyttade möblerna (They moved the furniture)
  • Nonsense words (VCV, VCCV, CVC)
  • Digits

26
Mean eyebrow positions for one speaker
27
Nose marker traces with automatic (blue) and two
human (red) annotated head nods (adapted from
Cerrato Svanfeldt 2006)
28
Examples from the database
Focal accent on Båten
seglade förbi
Confirming Happy
29
Exploitation of visual parameters
  • Visual cues exploited at focal accent
  • Mouth cues
  • Happy, encouraging
  • Eyebrow cues
  • Happy, questioning
  • Vertical head nods
  • Confirming

30
Analysis in terms of FAP and FMQ
  • MPEG-4 Facial Animation Parameter (FAP) A subset
    of 31 FAPs out of the 68 FAPs defined in the
    MPEG-4 standard, including only the ones that we
    were able to calculate directly from our measured
    point data
  • Focal Motion Quotient, FMQ, defined as the
    standard deviation of a FAP parameter taken over
    a word in focal position, divided by the average
    standard deviation of the same FAP in the same
    word in non-focal position.

31
The focal motion quotient, FMQ, averaged across
all sentences, for all measured MPEG-4 FAPs for
several expressive modes
articulation I
smile I brows I head
32
The effect of focus on the variation of several
groups of MPG-4 /FAP parameters, for different
expressive modes
FMQ (Focal Motion Quotient)
33
The effect of focal accent on selected parameter
variations in Certain and Uncertain readings
FMQ (Focal Motion Quotient)
34
Whats next?
  • Better recordings
  • Detailed analysis of the eye region Gaze and
    wrinkles
  • Use in applications, e.g. spoken dialogue systems
  • And more audible prosody.

35
New cooperative project
SIMULEKT - Simulering av svenskans prosodiska
dialekttyper (Simulating intonational varieties
of Swedish) VR 2007-2009 And finally..
36
Congratulations! Well done Gösta!
Write a Comment
User Comments (0)
About PowerShow.com