Introduction to Speech Synthesis - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Introduction to Speech Synthesis

Description:

Building CART models for predicting phone from letters (and context) ... Duration: average, Klatt, CART. Intensity. Derived from rules? Learnt statistically ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 12
Provided by: johnmc5
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Speech Synthesis


1
Introduction to Speech Synthesis
  • Aims
  • Prepare for talk
  • Terminology
  • Overview of Festival
  • Evaluation
  • Past and future directions

2
Typical Architecture
3
Festival commands
  • festivalgt (set! utt1 (Utterance Text "Hello
    world"))
  • ltUtterance 1d08a0gt
  • festivalgt
  • festivalgt (utt.synth utt1)
  • ltUtterance 1d08a0gt
  • festivalgt
  • festivalgt (utt.play utt1)
  • ltUtterance 1d08a0gt
  • festivalgt
  • festivalgt (SayText "Good morning, welcome to
    Festival")
  • ltUtterance 1d8fd0gt
  • festivalgt

4
Grammar
  • After tokenisation
  • Parse
  • POS tagging (HMM-based n-gram)
  • Disambiguation
  • I live in Reading
  • I can fish
  • Homographs e.g. 1996

5
Dictionary/Lexicon
  • Morphological analysis?
  • Phonemic info
  • Prosodic info
  • ( "present" v ((( p r e ) 0) (( z _at_ n t ) 1)) )
  • ( "monument" n ((( m o ) 1) (( n y u ) 0) (( m _at_
    n t ) 0)) )
  • ( "lives" n ((( l ai v z ) 1)) )
  • ( "lives" v ((( l i v z ) 1)) )
  • What if no entry?
  • E.g. proper nouns
  • Letter-to-sound rules
  • Post-lexical rules
  • médecin

6
Letter-to-sound rules
  • Pre-processing lexicon into suitable training set
  • Defining the set of allowable pairing of letters
    to phones.
  • Constructing the probabilities of each
    letter/phone pair.
  • Aligning letters to an equal set of
    phones/_epsilons_.
  • Extracting the data by letter suitable for
    training.
  • Building CART models for predicting phone from
    letters (and context).
  • Building additional lexical stress assignment
    model (if necessary).

7
Prosody
  • Intonation phrasing
  • Duration average, Klatt, CART
  • Intensity
  • Derived from rules?
  • Learnt statistically
  • ANNs
  • CART models

8
Speech signal generation
  • Articulatory
  • Formant source filter
  • Concatenative
  • What units?
  • Words, syllables, phones, diphones
  • Unit selection
  • Post-processing

9
Diphone concatenation
di?gw??
10
Evaluation
  • Intelligibility
  • Naturalness
  • Perceptual tests
  • Psychoacoustics

11
Future trends
  • Best synthesis units?
  • Speech signal modification
  • Voice conversion
  • Variability style, mood, ...
  • Better models
Write a Comment
User Comments (0)
About PowerShow.com