SPEECH PRODUCTION AND ANALYSIS - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

SPEECH PRODUCTION AND ANALYSIS

Description:

ONE WAY TO STUDY THE GLOTTAL FLOW IS TO CANCEL THE FILTERING EFFECT OF THE VOCAL ... SPEECH IS OFTEN REPRESENTED ON A GRAPH OF SOUND LEVEL vs. TIME AND FREQUENCY ... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 29
Provided by: elsis4
Category:

less

Transcript and Presenter's Notes

Title: SPEECH PRODUCTION AND ANALYSIS


1
SPEECH PRODUCTION AND ANALYSIS
MUSICAL ACOUSTICS
References Science of Sound, Chapters 15, 16
2
THE VOCAL ORGANS
3
THE LARYNX
THE MOST IMPORTANT SOUND SOURCE IN THE VOCAL
SYSTEM IS THE LARYNX WHICH CONTAINS THE VOCAL
FOLDS
  • BACK VIEW
  • (b) SIDE VIEW

4
VOCAL FOLDS
CONTROL OF THE GLOTTAL OPENING BY THE ARYTENOIDS
NORMAL MODE FOLDS OPEN AND CLOSE COMPLETELY
DURING THE CYCLE OPEN PHASE MODE BREATHY VOICE
(FOLDS DO NOT CLOSE COMPLETELY) CREAKY VOICE
AIR PASSES IN SHORT PUFFS FALSETTO (HEAD VOICE)
USED IN SINGING BUT NOT MUCH IN SPEECH WHISPER
AIR PASSES THROUGH THE VOCAL FOLDS WHICH DONT
VIBRATE
5
MOTION OF THE VOCAL FOLDS
HIGH-SPEED CAMERA AND MIRROR
LESS OBTRUSIVE OPTICAL FIBER PROBES ALLOW
CONTINUOUS OBSERVATION OF VOCAL FOLDS
6
GLOTTAL WAVE FORM
SPECTRUM
7
LOUD SPEAKING
THE MOST IMPORTANT PARAMETER AFFECTING LOUDNESS
OF PHONATION IS THE RATE OF GLOTTAL CLOSURE.
RAPID CLOSURE INTRODUCES HIGHER HARMONICS IN THE
GLOTTAL AIRFLOW SPECTRUM. ALTHOUGH THE VOCAL
FOLDS SERVE AS THE PRINCIPAL SOURCE OF SOUND IN
SPEECH, OTHER SOURCES ARE USED, ESPECIALLY IN THE
PRODUCTION OF UNVOICED CONSONANT SOUNDS SUCH AS
f th, s, sh (FRICATIVES) l (LIQUIDS),
p, t, AND k, (PLOSIVES).
8
INVERSE FILTERING AND THE GLOTTOGRAM
SPEECH SOUND IS THE PRODUCT OF GLOTTAL FLOW
(SOURCE), THE VOCAL TRACT (FILTER), AND THE MOUTH
OPENING (RADIATOR). ONE WAY TO STUDY THE GLOTTAL
FLOW IS TO CANCEL THE FILTERING EFFECT OF THE
VOCAL TRACT BY INVERSE FILTERING. THE WAVEFORM
AFTER INVERSE FILTERING IS CALLED A GLOTTOGRAM
9
ARTICULATION OF SPEECH
APPROXIMATE TONGUE POSITIONS FOR ARTICULATING
VOWELS
10
ENGLISH VOWELS
11
RESONANCES OF THE VOCAL TRACT FORMANTS
12
VOWEL FORMANT FREQUENCIES AND AMPLITUDES
13
CLOSED PIPE MODEL OF THE VOCAL TRACT
14
SIMPLE MODELS OF THE VOCAL TRACT
15
THE EFFECT OF FORMANTS ON SOUND
16
FIRST AND SECOND FORMANTS OF VOWELS
17
PROSODIC FEATURES OF SPEECH
PROSODIC FEATURES ADD MEEANING, EMPHASIS, AND
EMOTION WITHOUT ACTUALLY CHANGING THE PHONEMES.
(e.g. PITCH RHYTHM, ACCENT)
EXAMPLE USE OF TONES IN MANDARIN CHINESE
18
SPEECH RECOGNITION, ANALYSIS and SYNTHESIS
  • Speech is not just the future of Windows fur the
    future of computing itself . . . .Bill Gates
    (1998)
  • Our ability to recognize the sounds of language
    is truly phenomenal. Speech can be followed at
    rates as high as 400 words per minute. Even
    ordinary speech requires the recognition of 10 to
    15 phonemes per second.

19
SPEECH ANALYSIS
SPEECH IS OFTEN REPRESENTED ON A GRAPH OF SOUND
LEVEL vs. TIME AND FREQUENCY
20
SPEECH SPECTROGRAPH
SOUND SPECTROGRAPH DIGITAL SOUND
SPECTROGRAPH
21
SPEECH SPECTROGRAPH
SOUND LEVEL IS REPRESENTED BY DENSITY
22
RECOGNITION OF VOWELS
THE FIRST TWO OR THREE FORMANTS ARE GENERALLY
SUFFICIENT TO IDENTIFY VOWEL SOUNDS, EVEN WHEN
DISTORTION AND INTERFERENCE ARE PRESENT. EXAMPLES
OF DISTORTION RECORDING SPEECH AT ONE SPEED AND
PLAYING IT BACK AT ANOTHER (DUCK TALK) HELIUM
SPEECH SPEECH OF MEN, WOMEN, AND CHILDREN HAS
DIFFERENT PITCH AND FORMANT STRUCTURE
23
RECOGNITION OF CONSONANTS
CONSONANTS ARE RAPIDLY CHANGING SOUNDS THAT
PRECEDE VOWELS, LIKE GRACE NOTES IN MUSIC
24
FORMANT TRANSITIONS
A FORMANT TRANSITION WHICH CAN BE HEARD AS /t/,
/p/, OR /k/ DEPENDING ON THE VOWEL THAT FOLLOWS
2ND FORMANT TRANSITIONS, ALL HEARD AS /t/
25
SYNTHESIS OF CONSONANTS
SPECTROGRAPHIC PATTERNS FOR SYNTHESIZING /b/,
/d/, /g/ BEFORE DIFFERENT VOWELS DASHED LINE AT
1800 Hz IS LOCUS FOR /d/
  • SEC0ND FORMANT TRANSITIONS THAT START AT THE /d/
    LOCUS
  • (b) TRANSITIONS THAT MERELY POINT AT IT.

26
FILTERED SPEECH AND NOISY ENVIRONMENTS
INTELLIGIBILITY OF FILTERED SPEECH FOR DIFFERENT
CUTOFF FREQUENCIES OF HIGH-PASS AND LOW-PASS
FILTERS (NOTE THAT THEY CROSS OVER AT 1800 Hz
WHERE THE ARTICULATION INDEX IS THE SAME FOR BOTH
TYPES OF FILTER
PEAK CLIPPING
27
SPEECH SYNTHESIS
VON KEMPELENS TALKING MACHINE (1791)
28
SPEAKER IDENTIFICATION VOICEPRINTS
SPECTROGRAMS OF THE SPOKEN WORD SCIENCE (TWO
SPECTROGRAMS AT THE LEFT ARE BY THE SAME SPEAKER)
Write a Comment
User Comments (0)
About PowerShow.com