LING124 Prosody in TTS - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

LING124 Prosody in TTS

Description:

Content words, new or informative words are more likely to bear pitch accent ... Specify F0 target points for each pitch accent and boundary tone and interpolate ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 14
Provided by: hahn7
Category:
Tags: tts | ling124 | pitch | prosody

less

Transcript and Presenter's Notes

Title: LING124 Prosody in TTS


1
LING124 Prosody in TTS
  • November 18, 2008

2
Prosody
  • Use of suprasegmental features to convey
    sentence-level pragmatic meanings
  • F0, duration, energy
  • Difference between statements and questions
  • Salience of particular words or phrases
  • Affective and emotional meaning
  • Phrasing, prominence, tune

3
Phrasing (1)
  • Intonational phrase
  • I wanted to go to London, but could only get
    tickets for France
  • Intermediate phrase
  • I wanted to go to London
  • Final vowel is longer at the end of phrase
  • Pause after the final word in the phrase
  • Slight decrease in F0 from the beginning to end
    of the phrase

4
Phrasing (2)
  • Hand-crafted rules
  • e.g. Insert a phrase break after punctuation
  • Features for machine learning classifiers
  • Number of words and syllables in the sentence
  • Distance from the beginning and the end of the
    sentence
  • Distance from the previous phrase break
  • Distance from the last punctuation
  • Part-of-speech tags of neighboring words

5
Prominence (1)
  • Pitch accent
  • Im a little surprised to hear it characterized
    as upbeat
  • Emphatic accent
  • I know SOMETHING interesting is sure to happen
  • Unaccented
  • Reduced

6
Prominence (2)
  • Content words, new or informative words are more
    likely to bear pitch accent
  • New or informative words
  • Low N-gram probability
  • Term-frequency inverse-document frequency
  • Term-frequency
  • Relatively how often a word occurs in the given
    document
  • Inverse-document frequency
  • Log of inverse of the proportion of documents
    that contain the word

7
Tune
  • Rise and fall of F0 over time
  • Final-fall
  • Question-rise
  • Continuation-rise (when nouns are separated by
    commas)

8
ToBI (1)
  • Prominence and tune
  • Pitch accents and boundary tones
  • Pitch accents
  • H Peak accent
  • L Low accent
  • LH Scooped accent
  • LH Rising peak accent
  • H!H Step down

9
ToBI (2)
  • Boundary tones
  • L-L Final fall
  • L-H Continuation rise
  • H-H Question rise
  • H-L Final-level plateau

10
ToBI (3)
  • Prosodic structure
  • Intonational phrase break 4
  • Intermediate phrase break 3
  • Pause between words 2
  • Word boundaries without pause 1

11
Duration - Klatt (1979)
  • Determine by how much the mean duration of a
    phone should be lengthened or shortened
  • Duration (inherent duration minimum
    duration)A minimum duration
  • Features (A), from Taylor (2009)

12
Duration Klatt (1979) (2)
13
F0
  • Specify F0 target points for each pitch accent
    and boundary tone and interpolate among these
    targets
  • Target points are specified in terms of pitch
    range
  • Baseline frequency Lowest F0 in utterance
  • Topline Highest F0 in utterance
  • Specify where in the accented syllable the
    targets apply
  • Example rules from Jilka et al. (1999)
  • H 100, L 0, LH 20-100,H-H 120,
    L-L -20
  • H on the 60 of the way through the voiced part
    of the accented syllable
Write a Comment
User Comments (0)
About PowerShow.com