Koen Meijs, Mariet Theune, Dirk Heylen and others - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Koen Meijs, Mariet Theune, Dirk Heylen and others

Description:

Rating storytelling quality, naturalness, and suspense on a 5 point scale ... Suspense of manipulated fragments rated higher than neutral fragments (some significance) ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 34
Provided by: the45
Category:

less

Transcript and Presenter's Notes

Title: Koen Meijs, Mariet Theune, Dirk Heylen and others


1
Generating narrative speech for the Virtual
Storyteller
Koen Meijs, Mariet Theune, Dirk Heylen and others
2
Overview
  • Background The Virtual Storyteller
  • Analysis of human storytellers
  • Conversion rules and testing
  • Implementation
  • Evaluation
  • Conclusions and future work

3
The Virtual Storyteller
  • Automatic story
  • generation
  • Plot creation
  • Natural language generation
  • Storytelling

4
Plot creation
  • Characters in the story are (semi) autonomous
    agents, which
  • Have their own personality, goals and emotions
  • Can perform planned actions to reach their goals
  • Are guided by a director agent

5
NLG and story presentation
  • Language generation using simple sentence
    templates
  • Story presentation by an embodied, speaking agent
  • (using Microsoft Agents as a temporary solution)

6
Example story setting
  • NB Visualisation is not part
  • of the system yet!

7
Example story text
  • Diana walked to the forest.
  • Brutus walked to the plains.
  • Diana picked up the sword.
  • Brutus walked to the desert.
  • Diana walked to the desert.
  • Brutus was afraid of Diana because Brutus saw
    that Diana had the sword.
  • Brutus hit Diana.
  • Diana was afraid of Brutus because Diana saw
    Brutus.
  • Diana walked to the forest.
  • Brutus was afraid of Diana because Brutus saw
    that Diana had the sword.
  • Brutus walked to the forest.
  • Diana stabbed the villain. And she lived happily
    ever after!!!

8
Storytellers speech
  • Human storytellers engage their audience by
  • General storytelling speech style
  • Different voices for characters
  • Expressing emotions
  • Different sound effects

9
Focus of this work
  • General storytelling style
  • Use of prosody to express suspense in stories

10
Analysis of human speakers
  • Global storytelling style, material from
  • newsreader (Onno Duyvené de Wit)
  • childrens storyteller (Sacco van der Made)
  • adult storyteller (Toon Tellegen)
  • Analysis (using PRAAT) mainly based on childrens
    storyteller

11
Features
  • Pitch
  • Intensity
  • Tempo (syllables per second)
  • Pause duration
  • Vowel length

12
Global storytelling style
  • Pitch / intensity
  • Averages are similar
  • Standard deviation is much larger for storyteller

newsreader
childrens storyteller
13
Global storytelling style
  • Tempo (syllables per second) newsreader is much
    faster than both storytellers
  • Pause duration storyteller pauses are longer
    (esp. between sentences)
  • Also lengthening of certain adverbs/adjectives
    by storyteller (A long corridor that was s o
    low )

14
Expressing suspense
  • Sudden climax an unexpected revelation.
  • E.g., opening Bluebeards secret chamber
  • She had to get used to the darkness, and then
  • Increasing climax building up expectation.
  • Finally finding the Sleeping Beauty
  • He opened the door and there was the sleeping
    princess.

15
Sudden climax
  • En toen / And then
  • Sudden rise in pitch and intensity on then
  • Vowel lengthening in then

16
Increasing climax
  • Two parts 1 creating expectation 2 revelation
  • First part increasing pitch and vowel duration
  • Second part more constant, lower pitch and
    intensity

17
Conversion rules
  • Conversion from neutral to storytelling
    speech
  • Rules based on analysis of human speakers
  • Input paired time-value data
  • Output new values for a given time domain

18
Example from storytelling style
  • Pitch increase the pitch of syllables carrying a
    sentence accent
  • All pitch values inside the syllables time
    domain are multiplied by a certain factor (based
    on a sine function)
  • Maximum increase between 40-90 Hz
  • ? best value to be determined experimentally

19
Determining constant values
  • Material speech produced by Fluency
    text-to-speech, manipulated using PRAAT scripts
  • Five subjects compared 22 speech fragment pairs
    with different values for one constant
  • Subjects had to indicate
  • Which fragment sounded most natural or
  • Which had the best expression of suspense

20
Results storytelling style
21
Results sudden climax
Everybody waited in silence, and then ... there
was a loud bang!
22
Results increasing climax
Step by step he jumped from stone to stone,
slipped on the last stone and fell into the
water. Neutral Pitch contour manipulated
23
Pilot test of conversion rules
  • 16 speech fragments
  • 8 neutral (Fluency, with no manipulation)
  • 8 manipulated using PRAAT according to conversion
    rules, using best constant values
  • Eight subjects rated storytelling quality,
    naturalness, and suspense on a five-point scale
    (subjects divided in two groups)

24
(No Transcript)
25
Pilot test results
  • Compared to neutral fragments,
  • Storytelling quality of manipulated fragments was
    rated equal or better
  • Naturalness of manipulated fragments was rated
    equal or less
  • Manipulated fragments were rated as having more
    suspense, even if only the global storytelling
    style was used

26
Implementation
Prosodic information list of phonemes with
pitch and duration values (no possible to adjust
intensity)
27
Example annotated text
  • Annotation extension of SSML.
  • ltspeakgt
  • ltstyle typenarrative/gt
  • ltsgt The beard made him look ltaccent extendyesgt
    so lt/accentgt ugly that everybody ran away when
    they saw him. lt/sgt
  • ltsgt He wanted to turn around ltclimax typesuddengt
    and then lt/climaxgt there was a loud bang. lt/sgt
  • ltsgt Bluebeard raised the big knife, ltclimax
    typeincreasinggt he wanted to strike and
    ltclimax_top/gt there was a knock on the door.
    lt/climaxgt lt/sgt
  • lt/speakgt

28
Example prosodic information
  • 1 h 112
  • 2 I 151 50 75
  • 3 R 75
  • 4 l 75
  • 5 _at_ 47 20 71 70 61
  • 6 k 131
  • 7 _at_ 55 80 70
  • 8 _ 11 50 65
  • Phoneme
  • Duration (ms)
  • Pitch percentage (specifying at which point
    during the phoneme the pitch value should be
    applied)
  • Pitch value

29
Conversion steps
  • Parse XML
  • Look up phonemes to be manipulated
  • Apply function
  • For example, pitch for global storytelling
    style
  • y(t).(sin((((t-t1)/(t2-t1))0,5p) 0,25p)/n)),
  • where n average pitch / 40
  • Return adapted values
  • NB intensity cannot be adapted in Fluency

30
Evaluation of implementation
  • Set-up similar to conversion rule pilot test
  • 16 fragments (8 neutral / narrative pairs)
  • 20 subjects, divided in two groups
  • Rating storytelling quality, naturalness, and
    suspense on a 5 point scale

31
Mean scores
Significant differences ( 0,05) are shown in
bold face. Underlining indicates near
significance.
32
Summing up the results
  • Storytelling quality of manipulated fragments
    rated above average, and better than neutral
    fragments (but hardly significant)
  • Naturalness ratings vary some accents were seen
    as misplaced (though copied from original
    fragment)
  • Suspense of manipulated fragments rated higher
    than neutral fragments (some significance)

33
Conclusions future work
  • Successful automatic conversion from standard
    text-to-speech to storytelling prosody
  • Further improvement and larger-scale evaluation
    still needed
  • Automatic derivation of features from text?
Write a Comment
User Comments (0)
About PowerShow.com