Title: Structure of Human Speech
1Structure of Human Speech
2Vocal Tract
3Pitch and Formants
1. Harmonics (giving pitch) produced by vocal
cord vibration
2. Formant frequencies resonances of the vocal
tract
3. Formant frequencies change as you change the
shape of your vocal tract
4Source Filter
Larynx
Vocal tract
Output sound
5Sex change
Me (m)
Shorter vocal-tract (higher formants)
Higher pitch
Both (-gt f)
6Prosody vsSegments
Segmental consonants / vowels -gt
words Prosodic pitch contour, stress.
Emphasis, pragmatics. I thought she was
married? ".!" I thought she was
married.I thought she was married!I thought
she was married! NB in tone languages pitch used
segmentally. Are bird/mammal animal systems like
human prosody? generally use different pitch
contours
yes!
7Vowel production
8narrow-band spectrogram
sine-wave speech
9Sine-wave speech
10Orchestra in your throat
11Tuvan throat music
12Tuvan throat music
13Mynah bird speech
Klatt Stefanski (1974) How does a mynah bird
imitate human speech? J Acoust Soc Amer, 55,
822-832.
14Mynah / Grey parrot
- Mynah produces "formants" but probably through
changing syrinx resonances, not through changing
vocal tract shape. (Klatt Stefanski, 1974, J
Acoust Soc Amer) - Grey parrot has a longer vocal tract and may use
changes in its shape to produce formant variation
(more like human speech). (Warren, Patterson,
Pepperburg, 1996, Auk)
15Characteristics of speech
Narrow-band spectrogram
Only silence is /g/ of ago
- No gaps between words
- Smoothly changing sound from one speech sound to
the next - So you cant just shuffle the acoustic words
16from Clive Frankish
17Speech is more like semaphore than like music
- Music discrete targets giving discrete
acoustic events - Semaphore discrete targets with transitions
between targets - Speech articulatory transitions between targets
18Semaphore
19Formants in a wide-band spectrogram
lt-- F 3
Burst --gt
lt-- F 2
lt-- Formant transitions -------gt
lt-- F1
w e g o
20Where are the segments?
21Speech is more like speech than like semaphore
Speech does not have invariant acoustic
targets consonants change with the
vowel. Compare /s/ in /si/ with /s/ in
/su/ This is due to co-articulation.
22Different transition - same consonant
lt-- Formant transitions -------gt
1400 Hz
dee da
Liberman et al. (1967) Perception of the speech
code. Psych Rev 74, 431-461
23Co-articulation
Arises because (mainly) consonant gestures dont
involve all the articulators eg /b/ is lips
only, tongue free to take up position for next
vowel. /d/ and /s/ just involve the tongue tip,
touching the alveolar ridge, tongue body and lips
free to take up position for next vowel - viz.
/si/ /su/.
24Same noise - different consonant
F 2
1400 Hz
Burst --gt
F 1
pea ka
Liberman et al. (1967) Perception of the speech
code. Psych Rev 74, 431-461
25Two articulatory systems
Öhman suggested that articulation can be
decomposed into two semi-independent
systems Slow movement from one vowel target to
nexteg /i/ -gt /u/ Rapid consonantal movement
superimposedeg /b/ /d/ So the /b/ in /ibu/ is
not the same as in /ibi/
26Co-articulation
Advantages 1. information about different
segments is spread across time (Hocketts
squashed eggs). You know that a /u/ is coming
because of the type of /s/ you have heard.
27Co-articulation
2. Liberman thought that this spreading across
time makes it easier to transmit information at a
fast rate. Liberman et al (1967) Psych Rev 74,
431-461
28Co-articulation - 2
The disadvantage of co-articulation for
perception is that there are no constant acoustic
targets in speech. The same phoneme can be
represented as different sounds in different
contexts (/s/ before /u/ or /i/. Conversely, the
same sound, can be heard as different consonants
in different contexts (eg as /p/ before /i/ and
/a/ but as /k/ before /u/).
29Speech Code
Factors that make it hard (for machines) to
recognise speech
- Articulatory movement
- Co-articulation
- Rapid speech /djewonega?at/
- Different vocal-tract sizes
- men 15 longer than women
- Different dialects ---gt