Title: Speech Science XII
1Speech Science XII
Version 2007-8
- Speech Perception
- (acoustic cues)
2Topics
- ? Psychoacoustics
- ? Psychophonetics acoustic cues
- Reading BHR, chap. 6, 184-203 (5th
ed.) chaps. 9/10, 201 ff. - P.-M., 3.2.2., first part. pp. 158-171
(2nd ed.) 149-162 (1st ed.)
3Psychoacoustics 1
- Psychoacoustics investigates the relationship
between basic (acoustic) signal properties and
basic auditory impressions - - How loud something sounds.
- - How high- or low-pitched something sounds.
- - How long somethings sounds.
- - What the timbre (quality) of a sound is.
- The questions asked are
- - Can the signal be heard? (signal strength)
- - Can differences between signals be heard?
(for all signal properties)
4Psychoacoustics 2
- Important Psychoacoustics relates the objective,
measurable signal to subjective impressions. - These are two different worlds
- The simplest model of psychoacoustic
perceptionwould be a linear relationship - - A change in a signal parameter always has an
equivalent change in the auditory impression. - This not the case
- (which makes psychoacoustics very complex .)
- Some of the non-linearity has direct
implications for phonetic understanding..
5A non-linear relationship Loudness
6The reason for non-linear loudness
Resonance characteristics of the outer ear
7Non-linearity above threshold
8Also, sounds mask one another
If noise is present, a tone has to be stronger to
be heard(it has a higher audibility threshold).
Intensity of pure tone (masked) stimuls (dB)
The closer the tone is in frequency tothe centre
frequencyof the noise, the stronger it has to
be to be heard!
9Critical Bands (Barks Erbs)
Wide-band noise witha gap still masks a tonein
the middle of the gap
until the gap reachesa critical width.
Then the signal is heardat the same threshold
asif there were no noise.
The noise no longer interferes with the part of
the hearingmechanism dealing with the tone.
These critical bands arenarrow at low and
broader at higher frequencies.
10Non-linearity of loudness with duration
- Above approx 300 ms (exact duration not certain)
the perceived loudness of a sound is determined
by signal strength (and frequency) independent of
its duration. - Below this duration, a shorter sound is heard as
less loud than a longer sound of equal intensity. - I.e., it is as if the energy is integrated over
time, so that a shorter sound has less energy
than a longer one. - Phonetic importance? Short (unstressed)
syllables are perceptually less prominent than
longer (stressed) syllables.
11Psychophonetics
- Used here as a term to parallel
psychoacoustics. In our definition,
psychophonetics is the study of the relationship
between the acoustic speech signal and functional
aspects of speech e.g., speech sounds,
(stressed/unstressed) syllables, tonal accents,
junctural phenomena etc. - The experimental procedure typically requires
changing the analytic properties of the acoustic
speech signal in a controlled manner and
recording the perceptual effect. - The properties changed are those of acoustic
analysis duration, intensity, fundamental
frequency and spectral structure.
12Acoustic Cues
- This term was coined in the 1950s, when
synthesis and manipulation of the acoustic speech
signal was starting. (Origin Haskins
Laboratories, NJ, USA) - The cues are those acoustic properties that
can be shown to affect the perception of a speech
sound.(so we have acoustic cues for vowels and
consonants, and within these categories
fore.g. voicing, manner, place of articulation
in consonants, degree of opening, place,
rounding etc. in vowels )
13Acoustic cues vowels 1
- Cues Formants 1 and 2 (to a first
approximation)
. and the evidence from formant synthesis
14Acoustic cues - vowels 2
- While monophthongs have a steady state formant
structure, diphthongs e.g. aI, aU, ?I and
(vowel glide) approximants e.g. j, w, ?
have changing formants as a cue to their
identity.
aI, aU, ?I have a more or less fixed formant
pattern, determined by the identity two vocalic
elements which define them.
j, w, ? have a defined starting point, but
the degree of formant change is determined by the
following vowel. The starting point has a
(slightly more damped) formant structure similar
to the related vowel j ? i w ? u
? ? y (see acoustics slides)
15Acoustic cues plosives
- Plosives have a temporally complex set of
acoustic cues resulting from (i) the closing
movement, (ii) the closure phase and the (iii)
release of the closure.
The closure is a period with no energy
(voiceless stops) or a weak low frequency
periodic signal (voicing in the closure). This
introduces a perceptible interruption.
The release burst is the result of turbulence
due to the escaping air from the increased
intra-oral pressure built up during the closure.
This may be relatively weak (in voiced stops) or
strong (in voiceless stops).The different
spectral properties of the burst noise signal the
different places of articulation.
16Release bursts and vowel quality
17Vowel formant transitions as consonant cues
- Formant transitions (changing formant values in
the vowel preceding and following the stop
consonant) reflect the articulator movement
towards and away from the closure. The F2
transition is a cue to the consonantal place of
articulation F1 just signals the opening and
closing movement.
The place of the stop determines the F2 formant
value from which or towards which the transition
moves (called the locus). But the actual shape of
the transition is determined by the vowel (as it
is with vowel glides).
18Locus frequencies e.g. d
19What sort of transitions for which place?
- The previous slide showed that the locus for
d (and logically for t, n, l, s, z) is
fairly constant. The value (for the average adult
male vocal tract) is about 1800 Hz.
For labial consonants, the vowel can be formed
independent of the consonant closure (the tongue
is free to move). Both F2 and F1 therefore just
reflect the opening and closing of the jaw and
lips. The locus is therefore always low.
For velar consonants, the consonant closure is
very dependent on the vowel (both use the tongue
dorsum).The locus is higher than for alveolars
both for front and back vowels, but for back
vowels it is lower than for front vowels. F2 and
F3 transitions often converge with velars.
20(No Transcript)
21The importance of timing as a cue to the
voicing distinction
The temporal differencesshown here signal
thedifference between weakand strong
plosives,whether there is closurevoicing
present or not.It is often claimed that
thedistinction fortis-lenis is better than
voiced-voiceless
22Acoustic cues - fricatives
- Fricative identity is determined by the
spectral distribution of the energy (see also
acoustics slides).
D
T
f
v
S
Z
z
s
23Summary of cues - Manner
24Summary of cues - Place
25voice bar
Summary of cuesFortis-lenis