Introduction to Speech Synthesis - PowerPoint PPT Presentation

1 / 11

About This Presentation

Title:

Introduction to Speech Synthesis

Description:

Building CART models for predicting phone from letters (and context) ... Duration: average, Klatt, CART. Intensity. Derived from rules? Learnt statistically ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 12

Provided by: johnmc5

Category:

Tags: cart | introduction | speech | synthesis

Transcript and Presenter's Notes

Title: Introduction to Speech Synthesis

1
Introduction to Speech Synthesis

Aims
Prepare for talk
Terminology
Overview of Festival
Evaluation
Past and future directions

2
Typical Architecture
3
Festival commands

festivalgt (set! utt1 (Utterance Text "Hello
world"))
ltUtterance 1d08a0gt
festivalgt
festivalgt (utt.synth utt1)
ltUtterance 1d08a0gt
festivalgt
festivalgt (utt.play utt1)
ltUtterance 1d08a0gt
festivalgt
festivalgt (SayText "Good morning, welcome to
Festival")
ltUtterance 1d8fd0gt
festivalgt

4
Grammar

After tokenisation
Parse
POS tagging (HMM-based n-gram)
Disambiguation
I live in Reading
I can fish
Homographs e.g. 1996

5
Dictionary/Lexicon

Morphological analysis?
Phonemic info
Prosodic info
( "present" v ((( p r e ) 0) (( z _at_ n t ) 1)) )
( "monument" n ((( m o ) 1) (( n y u ) 0) (( m _at_
n t ) 0)) )
( "lives" n ((( l ai v z ) 1)) )
( "lives" v ((( l i v z ) 1)) )
What if no entry?
E.g. proper nouns
Letter-to-sound rules
Post-lexical rules
médecin

6
Letter-to-sound rules

Pre-processing lexicon into suitable training set
Defining the set of allowable pairing of letters
to phones.
Constructing the probabilities of each
letter/phone pair.
Aligning letters to an equal set of
phones/_epsilons_.
Extracting the data by letter suitable for
training.
Building CART models for predicting phone from
letters (and context).
Building additional lexical stress assignment
model (if necessary).

7
Prosody

Intonation phrasing
Duration average, Klatt, CART
Intensity
Derived from rules?
Learnt statistically
ANNs
CART models

8
Speech signal generation

Articulatory
Formant source filter
Concatenative
What units?
Words, syllables, phones, diphones
Unit selection
Post-processing

9
Diphone concatenation
di?gw??
10
Evaluation

Intelligibility
Naturalness
Perceptual tests
Psychoacoustics

11
Future trends

Best synthesis units?
Speech signal modification
Voice conversion
Variability style, mood, ...
Better models

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user