Speech Synthesis: Abstract - PowerPoint PPT Presentation

1 / 14

About This Presentation

Title:

Speech Synthesis: Abstract

Description:

Generally, Text-to-Speech system or also known as TTS can be divided into three ... Formant synthesis use signal processing based on knowledge of how phonemes ... – PowerPoint PPT presentation

Number of Views:309

Avg rating:3.0/5.0

Slides: 15

Provided by: sabrin1

Category:

more less

Transcript and Presenter's Notes

Title: Speech Synthesis: Abstract

1
Speech Synthesis Abstract

Abstract
This presentation will touch on the overview of
Text-to-Speech processing based on rule-based
approach. Generally, Text-to-Speech system or
also known as TTS can be divided into three main
components Text Analysis, Prosody Control and
Speech Synthesis. Text Analysis will be more on
normalizing the input text and transcribed the
text into phonemes. In order the speech to be
uttered naturally, the prosodic features of the
synthesized speech need to be controlled. The
last component which is speech synthesis, is
responsible for generating the speech signal by
referring to the rules that map the phonetic
transcription into sets of acoustical parameters.
At the end of the presentation, a demo of simple
malay TTS based on rules will be presented

2
Speech Synthesis

Speech Synthesis Overview
Simple Malay TTS
Conclusion
Future works

3
Overview History

- Automatically convert text to speech.
- History 2
- since 17th century (mechanical tube)
- 1960s (speech production theory)
- 1970s (Application of TTS)
- 1980s (speech waveform stored speech
segment) the current trend.

4
Speech production

Note Resonator 3.

5
TTS Block Diagram
Synthesized speech Source
TTS block diagram 2
Text Dictionary
Text Analysis
Text
Prosody Rules
Prosody Control
Acoustical parameter
Speech Synthesis
6
Text Analysis

3 major steps 4
Structure Analysis Determine the paragraphs,
sentences, and other structure start and end (
e.g word, phrases).
Text normalization Convert written text into
spoken text.
Example She arrives at 5.30 pm
she arrives at five thirty pm

7
Text Analysis (cont.d)

Text-to-phoneme conversion (phonemisation) Will
convert each word into phonemes.
Example
word daisy
IPA transcription d e I z i
ARPAbet transcription d ey z I
Method of phonemisation 1
1. Rules-based
2. PbA (Pronunciation by Analog)
the best for english.
3. Neural-Network (NETspeak)
4. Machine Learning (Nearest
neighbour)

8
Prosody Control

Prosody Refer to the aspects of a sentences
pronunciation.
Important Natural sound and convey correct
meaning of sentence.
Determine the appropriate prosody sentence by
considering speech features Pitch (melody),
timing (rhythm), pausing, speaking rate, focused
words and many others.
Example She loves me. ( statement)
She loves me? ( skeptical )

9
Speech Synthesis

Finally with the phonemes and prosody information
are used to produce speech sound.
Current ways
Concatenation of recorded human speech
Formant synthesis use signal processing based
on knowledge of how phonemes sound and prosody
affect those phonemes.

10
Malay TTS toy demo

word
speech sound
Note
1 Used Klatt rules-based method Top to bottom
strategy. Java source code can be obtained from
http//ww.John-Waser.com/TextToSpeech/
2 Sample of speech segment concatenation in
java can be obtained from http//www.javaworld.com
/javaworld/jw-08-2001/jw-0817-javatalk-p.html

Phonemisation rules1
Phonemisation
Speech database
Speech synthesis 2
11
Example

Example
Input word dia (her/him)
Phonemisation d i y a
Speech Synthesis d.wav i.wavy.wav
a.wav
Output speech sound diya.wav

12
Conclusion

Observation
Rules-based need a depth knowledge in phonology
in that particular language. However, recent
studies has implement statistics method to
acquire this knowledge
Phoneme difficult ( require intensive
complicated rules and produce unnatural sound)

13
Future works

Will do investigation on PbA instead of machine
learning since PbA can give the highest
percentage of phonemisation 1.
Avoid rules that needed phonologist expertise
unless rules can be generated using statistical
methods.
Focus on syllables or synthesis unit that retain
more coarticulation so that speech sound can be
more natural and rules becomes simpler.

14
Reference

1Damper et al. 1997. Evaluating the
pronunciation component of Text-to-Speech Systems
for English A Performance Comparison of
Different Approaches. Computer Speech and
Language.
3Jurafsky, D. Martin, J. H. 2000. Speech and
Language processing An Introduction to Natural
Language Processing, Computational Linguistics
and Speech Recognition.
2Shirai K and Abe M (Ed.). 2000. Recent
Progress in Japanese Speech Synthesis. Gordon and
Breach Science Publisher.
3 http//www.acoustics.hut.fi/slemmett/dippa/ch
ap2.html
4 http//java.sun.com/products/java-media/speech
/forDevelopers/jsapi-guide/index.html