Title: SPEECH PRODUCTION AND ANALYSIS
1SPEECH PRODUCTION AND ANALYSIS
MUSICAL ACOUSTICS
References Science of Sound, Chapters 15, 16
2THE VOCAL ORGANS
3THE LARYNX
THE MOST IMPORTANT SOUND SOURCE IN THE VOCAL
SYSTEM IS THE LARYNX WHICH CONTAINS THE VOCAL
FOLDS
4 VOCAL FOLDS
CONTROL OF THE GLOTTAL OPENING BY THE ARYTENOIDS
NORMAL MODE FOLDS OPEN AND CLOSE COMPLETELY
DURING THE CYCLE OPEN PHASE MODE BREATHY VOICE
(FOLDS DO NOT CLOSE COMPLETELY) CREAKY VOICE
AIR PASSES IN SHORT PUFFS FALSETTO (HEAD VOICE)
USED IN SINGING BUT NOT MUCH IN SPEECH WHISPER
AIR PASSES THROUGH THE VOCAL FOLDS WHICH DONT
VIBRATE
5MOTION OF THE VOCAL FOLDS
HIGH-SPEED CAMERA AND MIRROR
LESS OBTRUSIVE OPTICAL FIBER PROBES ALLOW
CONTINUOUS OBSERVATION OF VOCAL FOLDS
6GLOTTAL WAVE FORM
SPECTRUM
7LOUD SPEAKING
THE MOST IMPORTANT PARAMETER AFFECTING LOUDNESS
OF PHONATION IS THE RATE OF GLOTTAL CLOSURE.
RAPID CLOSURE INTRODUCES HIGHER HARMONICS IN THE
GLOTTAL AIRFLOW SPECTRUM. ALTHOUGH THE VOCAL
FOLDS SERVE AS THE PRINCIPAL SOURCE OF SOUND IN
SPEECH, OTHER SOURCES ARE USED, ESPECIALLY IN THE
PRODUCTION OF UNVOICED CONSONANT SOUNDS SUCH AS
f th, s, sh (FRICATIVES) l (LIQUIDS),
p, t, AND k, (PLOSIVES).
8INVERSE FILTERING AND THE GLOTTOGRAM
SPEECH SOUND IS THE PRODUCT OF GLOTTAL FLOW
(SOURCE), THE VOCAL TRACT (FILTER), AND THE MOUTH
OPENING (RADIATOR). ONE WAY TO STUDY THE GLOTTAL
FLOW IS TO CANCEL THE FILTERING EFFECT OF THE
VOCAL TRACT BY INVERSE FILTERING. THE WAVEFORM
AFTER INVERSE FILTERING IS CALLED A GLOTTOGRAM
9ARTICULATION OF SPEECH
APPROXIMATE TONGUE POSITIONS FOR ARTICULATING
VOWELS
10ENGLISH VOWELS
11RESONANCES OF THE VOCAL TRACT FORMANTS
12VOWEL FORMANT FREQUENCIES AND AMPLITUDES
13CLOSED PIPE MODEL OF THE VOCAL TRACT
14SIMPLE MODELS OF THE VOCAL TRACT
15THE EFFECT OF FORMANTS ON SOUND
16FIRST AND SECOND FORMANTS OF VOWELS
17PROSODIC FEATURES OF SPEECH
PROSODIC FEATURES ADD MEEANING, EMPHASIS, AND
EMOTION WITHOUT ACTUALLY CHANGING THE PHONEMES.
(e.g. PITCH RHYTHM, ACCENT)
EXAMPLE USE OF TONES IN MANDARIN CHINESE
18SPEECH RECOGNITION, ANALYSIS and SYNTHESIS
- Speech is not just the future of Windows fur the
future of computing itself . . . .Bill Gates
(1998) - Our ability to recognize the sounds of language
is truly phenomenal. Speech can be followed at
rates as high as 400 words per minute. Even
ordinary speech requires the recognition of 10 to
15 phonemes per second.
19SPEECH ANALYSIS
SPEECH IS OFTEN REPRESENTED ON A GRAPH OF SOUND
LEVEL vs. TIME AND FREQUENCY
20SPEECH SPECTROGRAPH
SOUND SPECTROGRAPH DIGITAL SOUND
SPECTROGRAPH
21SPEECH SPECTROGRAPH
SOUND LEVEL IS REPRESENTED BY DENSITY
22RECOGNITION OF VOWELS
THE FIRST TWO OR THREE FORMANTS ARE GENERALLY
SUFFICIENT TO IDENTIFY VOWEL SOUNDS, EVEN WHEN
DISTORTION AND INTERFERENCE ARE PRESENT. EXAMPLES
OF DISTORTION RECORDING SPEECH AT ONE SPEED AND
PLAYING IT BACK AT ANOTHER (DUCK TALK) HELIUM
SPEECH SPEECH OF MEN, WOMEN, AND CHILDREN HAS
DIFFERENT PITCH AND FORMANT STRUCTURE
23RECOGNITION OF CONSONANTS
CONSONANTS ARE RAPIDLY CHANGING SOUNDS THAT
PRECEDE VOWELS, LIKE GRACE NOTES IN MUSIC
24FORMANT TRANSITIONS
A FORMANT TRANSITION WHICH CAN BE HEARD AS /t/,
/p/, OR /k/ DEPENDING ON THE VOWEL THAT FOLLOWS
2ND FORMANT TRANSITIONS, ALL HEARD AS /t/
25SYNTHESIS OF CONSONANTS
SPECTROGRAPHIC PATTERNS FOR SYNTHESIZING /b/,
/d/, /g/ BEFORE DIFFERENT VOWELS DASHED LINE AT
1800 Hz IS LOCUS FOR /d/
- SEC0ND FORMANT TRANSITIONS THAT START AT THE /d/
LOCUS - (b) TRANSITIONS THAT MERELY POINT AT IT.
26FILTERED SPEECH AND NOISY ENVIRONMENTS
INTELLIGIBILITY OF FILTERED SPEECH FOR DIFFERENT
CUTOFF FREQUENCIES OF HIGH-PASS AND LOW-PASS
FILTERS (NOTE THAT THEY CROSS OVER AT 1800 Hz
WHERE THE ARTICULATION INDEX IS THE SAME FOR BOTH
TYPES OF FILTER
PEAK CLIPPING
27SPEECH SYNTHESIS
VON KEMPELENS TALKING MACHINE (1791)
28SPEAKER IDENTIFICATION VOICEPRINTS
SPECTROGRAMS OF THE SPOKEN WORD SCIENCE (TWO
SPECTROGRAMS AT THE LEFT ARE BY THE SAME SPEAKER)