Title: Structure of Human Speech
1Structure of Human Speech
2Vocal Tract
3Pitch and Formants
1. Harmonics (giving pitch) produced by vocal
cord vibration
2. Formant frequencies resonances of the vocal
tract
3. Formant frequencies change as you change the
shape of your vocal tract
4Source Filter
Larynx
Vocal tract
Output sound
5Prosody vsSegments
Segmental consonants / vowels -gt
words Prosodic pitch contour, stress.
Emphasis, pragmatics. I thought she was
married? ".!" I thought she was
married.I thought she was married!I thought
she was married! NB in tone languages pitch used
segmentally. Are bird/mammal animal systems like
human prosody? generally use different pitch
contours
yes!
6Vowel production
7Vowel sounds
Vocal tract shapes for vowels
8narrow-band spectrogram
sine-wave speech
9Mynah bird speech
Klatt Stefanski (1974) How does a mynah bird
imitate human speech? J Acoust Soc Amer, 55,
822-832.
10Mynah / Grey parrot
- Mynah produces "formants" but probably through
changing syrinx resonances, not through changing
vocal tract shape. (Klatt Stefanski, 1974, J
Acoust Soc Amer) - Grey parrot has a longer vocal tract and may use
changes in its shape to produce formant variation
(more like human speech). (Warren, Patterson,
Pepperburg, 1996, Auk)
11Characteristics of speech
- No gaps between words
- Smoothly changing sound from one speech sound to
the next - So you cant just shuffle the acoustic words
Only silence is /g/ of ago
12Speech is more like semaphore than like music
- Music discrete targets giving discrete
acoustic events - Semaphore discrete targets with transitions
between targets - Speech articulatory transitions between targets
13Semaphore
14Formants in a wide-band spectrogram
lt-- F 3
Burst --gt
lt-- F 2
lt-- Formant transitions -------gt
lt-- F1
w e g o
15Where are the segments?
16Different transition - same consonant
lt-- Formant transitions -------gt
1400 Hz
dee da
17Speech is more like speech than like semaphore
Speech does not have invariant acoustic
targets consonants change with the
vowel. Compare /s/ in /si/ with /s/ in
/su/ This is due to co-articulation.
18Same noise - different consonant
F 2
1400 Hz
Burst --gt
F 1
pea ka
19Co-articulation
Arises because (mainly) consonant gestures dont
involve all the articulators eg /b/ is lips
only, tongue free to take up position for next
vowel. /d/ and /s/ just involve the tongue tip,
touching the alveolar ridge, tongue body and lips
free to take up position for next vowel - viz.
/si/ /su/.
20Two articulatory systems
Öhman suggested that articulation can be
decomposed into two semi-independent
systems Slow movement from one vowel target to
nexteg /i/ -gt /u/ Rapid consonantal movement
superimposedeg /b/ /d/ So the /b/ in /ibu/ is
not the same as in /ibi/
21Co-articulation
Advantages 1. information about different
segments is spread across time (Hocketts
squashed eggs). You know that a /u/ is coming
because of the type of /s/ you have heard.
22Co-articulation
2. Liberman thought that this spreading across
time makes it easier to transmit information at a
fast rate.
23Co-articulation - 2
The disadvantage of co-articulation for
perception is that there are no constant acoustic
targets in speech. The same phoneme can be
represented as different sounds in different
contexts (/s/ before /u/ or /i/. Conversely, the
same sound, can be heard as different consonants
in different contexts (eg as /p/ before /i/ and
/a/ but as /k/ before /u/).
24Speech Code
Factors that make it hard (for machines) to
recognise speech
- Co-articulation
- Rapid speech /djewonega?at/
- Different vocal-tract sizes
- Different dialects ---gt