Speech Synthesis: Then and Now - PowerPoint PPT Presentation

About This Presentation
Title:

Speech Synthesis: Then and Now

Description:

Speech Synthesis: Then and Now – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 26
Provided by: juliahir
Category:
Tags: cub | now | speech | synthesis

less

Transcript and Presenter's Notes

Title: Speech Synthesis: Then and Now


1
Speech Synthesis Then and Now
  • Julia Hirschberg
  • CS 4706

2
Today
  • Early speech synthesizers
  • Overview of Modern TTS Systems

3
Synthesizer I/O
  • Front end From input to control parameters
  • From acoustic/phonetic representations
  • From naturally occurring text
  • From constrained mark-up language
  • From semantic/conceptual representations
  • Back end From control parameters to waveform
  • Articulatory synthesis
  • Formant/acoustic synthesis
  • Concatenative synthesis

4
The First Speaking Machine
  • Wolfgang von Kempelen, Mechanismus der
    menschlichen Sprache nebst Beschreibung einer
    sprechenden Maschine, 1791 (in Deutsches Museum
    still and playable)
  • First to produce whole words, phrases in many
    languages

5
Joseph Fabers Euphonia, 1846
6
  • Constructed 1835 w/pedal and keyboard control
  • Whispered and ordinary speech
  • Model of tongue, pharyngeal cavity with
    manipulable shape
  • Singing too God Save the Queen
  • Rieszs 1937 synthesizer with almost natural
    vocal tract shape
  • For-runners of Modern Articulatory Synthesis
    Dennis Klatt (1987) at MIT

7
(No Transcript)
8
  • Worlds Fair in NY, 1939
  • Requires much training to play
  • Purpose coding/compression
  • Reduce bandwidth needed to transmit speech, so
    many phone calls can be sent over single line

9
(No Transcript)
10
(No Transcript)
11
  • Answers
  • These days a chicken leg is a rare dish.
  • Its easy to tell the depth of a well.
  • Four hours of steady work faced us.
  • Automatic synthesis from spectrogram but can
    also use hand-painted spectrograms as input
  • Purpose understand perceptual effect of spectral
    details

12
Formant/Resonance/Acoustic Synthesis
  • Parametric or resonance synthesis
  • Specify minimal parameters, e.g. f0 and first 3
    formants
  • Pass electronic source signal thru filter
  • Harmonic tone for voiced sounds
  • Aperiodic noise for unvoiced
  • Filter simulates the different resonances of the
    vocal tract
  • E.g.
  • Walter Lawrences Parametric Artificial Talker
    (1953) for vowels and consonants
  • Gunnar Fants Orator Verbis Electris (1953) for
    vowels
  • Formant synthesis download (demo)

13
Synthesis by Computer
  • Beginnings 1960 dominant from 1970

14
Concatenative Synthesis
  • Most common type today
  • First practical application in 1936 British
    Phone companys Talking Clock
  • Optical storage for words, part-words, phrases
  • Concatenated to tell time
  • E.g.
  • And a similar example
  • Bell Labs TTS (1977) (1985)

15
Variants of Concatenative Synthesis
  • Inventory units
  • Diphone synthesis (e.g. Festival)
  • Microsegment synthesis
  • Unit Selection large, variable units
  • Issues
  • How well do units fit together?
  • What is the perceived acoustic quality of the
    concatenated units?
  • Is post-processing on the output possible, to
    improve quality?

16
TTS Production Levels Back End and Front End
  • Orthographic input The children read to Dr.
    Smith
  • World Knowledge text normalization
  • Semantics
  • Syntax word pronunciation
  • Lexical
  • Intonation assignment
  • Phonology intonation realization
  • F0, amplitude, duration
  • Acoustics synthesis

17
Text Normalization
  • Reading is what W. hates most.
  • Reading is what Wilde hated most.
  • Have the students read the questions.
  • In 1996 she sold 1995 shares and deposited 42 in
    her 401(k).
  • The duck dove supply.

18
Pronunciation in Context
19
Intonation Assignment Phrasing
  • Traditional hand-built rules
  • Punctuation 234-5682
  • Context/function word no breaks after function
    word He went to dinner
  • Parse? She favors the nuts and bolts approach
  • Current statistical analysis of large labeled
    corpus
  • Punctuation, pos window, utt length,

20
Intonation Assignment Accent
  • Hand-built rules
  • Function/content distinction He went out the back
    door/He threw out the trash
  • Complex nominals
  • Main Street/Park Avenue
  • city hall parking lot
  • Statistical procedures trained on large corpora
  • Contrastive stress, given/new distinction?

21
Intonation Assignment Contours
  • Simple rules
  • . declarative contour
  • ? yes-no-question contour unless wh-word
    present at/near front of sentence
  • Well, how did he do it? And what do you know?

22
Phonological and Acoustic Realization
  • Task
  • Produce a phonological representation from
    phonemes and intonational assignment
  • Pitch contour aligned with text
  • Durations, intensity
  • Select best concatenative units from inventory
  • Post-process if needed/possible to smooth joins,
    modify pitch, duration, intensity, rate from
    original units
  • Produce acoustic waveform as output

23
TTS Where are we now?
  • Natural sounding speech for some utterances
  • Where good match between input and database
  • Stillhard to vary prosodic features and retain
    naturalness
  • Yes-no questions Do you want to fly first class?
  • Context-dependent variation still hard to infer
    from text and hard to realize naturally

24
  • Appropriate contours from text
  • Emphasis, de-emphasis to convey focus, given/new
    distinction I own a cat. Or, rather, my cat
    owns me.
  • Variation in pitch range, rate, pausal duration
    to convey topic structure
  • Characteristics of emotional speech little
    understood, so hard to convey a voice that
    sounds friendly, sympathetic, authoritative.
  • How to mimic real voices?
  • ScanSoft/Nuance demo ATT demo

25
Next Week
  • Text Normalization
Write a Comment
User Comments (0)
About PowerShow.com