Speech Synthesis: Abstract - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Speech Synthesis: Abstract

Description:

Generally, Text-to-Speech system or also known as TTS can be divided into three ... Formant synthesis use signal processing based on knowledge of how phonemes ... – PowerPoint PPT presentation

Number of Views:309
Avg rating:3.0/5.0
Slides: 15
Provided by: sabrin1
Category:

less

Transcript and Presenter's Notes

Title: Speech Synthesis: Abstract


1
Speech Synthesis Abstract
  • Abstract
  • This presentation will touch on the overview of
    Text-to-Speech processing based on rule-based
    approach. Generally, Text-to-Speech system or
    also known as TTS can be divided into three main
    components Text Analysis, Prosody Control and
    Speech Synthesis. Text Analysis will be more on
    normalizing the input text and transcribed the
    text into phonemes. In order the speech to be
    uttered naturally, the prosodic features of the
    synthesized speech need to be controlled. The
    last component which is speech synthesis, is
    responsible for generating the speech signal by
    referring to the rules that map the phonetic
    transcription into sets of acoustical parameters.
    At the end of the presentation, a demo of simple
    malay TTS based on rules will be presented

2
Speech Synthesis
  • Speech Synthesis Overview
  • Simple Malay TTS
  • Conclusion
  • Future works

3
Overview History
  • - Automatically convert text to speech.
  • - History 2
  • - since 17th century (mechanical tube)
  • - 1960s (speech production theory)
  • - 1970s (Application of TTS)
  • - 1980s (speech waveform stored speech
  • segment) the current trend.

4
Speech production
  • Note Resonator 3.

5
TTS Block Diagram
Synthesized speech Source
TTS block diagram 2
Text Dictionary
Text Analysis
Text
Prosody Rules
Prosody Control
Acoustical parameter
Speech Synthesis
6
Text Analysis
  • 3 major steps 4
  • Structure Analysis Determine the paragraphs,
    sentences, and other structure start and end (
    e.g word, phrases).
  • Text normalization Convert written text into
    spoken text.
  • Example She arrives at 5.30 pm
  • she arrives at five thirty pm

7
Text Analysis (cont.d)
  • Text-to-phoneme conversion (phonemisation) Will
    convert each word into phonemes.
  • Example
  • word daisy
  • IPA transcription d e I z i
  • ARPAbet transcription d ey z I
  • Method of phonemisation 1
  • 1. Rules-based
  • 2. PbA (Pronunciation by Analog)
    the best for english.
  • 3. Neural-Network (NETspeak)
  • 4. Machine Learning (Nearest
    neighbour)

8
Prosody Control
  • Prosody Refer to the aspects of a sentences
    pronunciation.
  • Important Natural sound and convey correct
    meaning of sentence.
  • Determine the appropriate prosody sentence by
    considering speech features Pitch (melody),
    timing (rhythm), pausing, speaking rate, focused
    words and many others.
  • Example She loves me. ( statement)
  • She loves me? ( skeptical )

9
Speech Synthesis
  • Finally with the phonemes and prosody information
    are used to produce speech sound.
  • Current ways
  • Concatenation of recorded human speech
  • Formant synthesis use signal processing based
    on knowledge of how phonemes sound and prosody
    affect those phonemes.

10
Malay TTS toy demo
  • word
  • speech sound
  • Note
  • 1 Used Klatt rules-based method Top to bottom
    strategy. Java source code can be obtained from
    http//ww.John-Waser.com/TextToSpeech/
  • 2 Sample of speech segment concatenation in
    java can be obtained from http//www.javaworld.com
    /javaworld/jw-08-2001/jw-0817-javatalk-p.html

Phonemisation rules1
Phonemisation
Speech database
Speech synthesis 2
11
Example
  • Example
  • Input word dia (her/him)
  • Phonemisation d i y a
  • Speech Synthesis d.wav i.wavy.wav
  • a.wav
  • Output speech sound diya.wav

12
Conclusion
  • Observation
  • Rules-based need a depth knowledge in phonology
    in that particular language. However, recent
    studies has implement statistics method to
    acquire this knowledge
  • Phoneme difficult ( require intensive
    complicated rules and produce unnatural sound)

13
Future works
  • Will do investigation on PbA instead of machine
    learning since PbA can give the highest
    percentage of phonemisation 1.
  • Avoid rules that needed phonologist expertise
    unless rules can be generated using statistical
    methods.
  • Focus on syllables or synthesis unit that retain
    more coarticulation so that speech sound can be
    more natural and rules becomes simpler.

14
Reference
  • 1Damper et al. 1997. Evaluating the
    pronunciation component of Text-to-Speech Systems
    for English A Performance Comparison of
    Different Approaches. Computer Speech and
    Language.
  • 3Jurafsky, D. Martin, J. H. 2000. Speech and
    Language processing An Introduction to Natural
    Language Processing, Computational Linguistics
    and Speech Recognition.
  • 2Shirai K and Abe M (Ed.). 2000. Recent
    Progress in Japanese Speech Synthesis. Gordon and
    Breach Science Publisher.
  • 3 http//www.acoustics.hut.fi/slemmett/dippa/ch
    ap2.html
  • 4 http//java.sun.com/products/java-media/speech
    /forDevelopers/jsapi-guide/index.html
Write a Comment
User Comments (0)
About PowerShow.com