Title: Auditory Perception
1Auditory Perception Hillenbrand SPPA 206
2Auditory Perception
Auditory perception is one branch of a larger
science called psychophysics. Psychophysics
studies the relationships that exist between
perceptual dimensions (also psychological,
subjective, or mental) and the physical
properties of stimuli. The distinction between
perceptual dimensions and physical dimensions is
all important. Physical dimensions Any aspect of
a physical stimulus that could be measured in a
straightforward way with an instrument (e.g., a
light meter, a sound level meter, a spectrum
analyzer, a fundamental frequency meter,
etc.) Perceptual dimensions These are the mental
experiences that occur inside the mind of the
observer. These experiences are actively created
by the sensory system and brain based on an
analysis of the physical properties of the
stimulus. Perceptual dimensions can be measured,
but not with a meter, Measuring perceptual
dimensions requires an observer (e.g., a
listener). In vision, for example, the percept of
hue is created by the eye and brain based (in
part) on the visual systems analysis of the
wavelength composition of the stimulus. But hue
¹ wavelength
3Visual Psychophysics
Physical Properties Perceptual
Dimensions of Light
Hue Wavelength
Brightness Luminance Shape
Contour/Contrast Auditory Psychophysics
Physical
Properties Perceptual Dimensions
of Sound Pitch
Fundamental Freq
Loudness Intensity
Timbre (sound quality) Spectrum Env/Amp
Env
4Perceptual Experiences are Actively Created, Not
Passively Received
Subjective contour The triangles, circles and
squares are seen not so much because they are
there in the physical sense, but because they
are inferred. Unconscious inference lies at the
heart of perception. In some sense, Ill see it
when I believe it. is more true as Ill believe
it when I see it.
5Reversible figures reveal the active organization
of percepts.
6(No Transcript)
7(No Transcript)
8(No Transcript)
9The corridor illusion Which cylinder is larger?
The cylinder above and to the right appears
larger because the visual system infers that it
is further away. The inference is unconscious,
automatic and obligatory.
10 The McGurk Effect (McGurk Macdonald, 1976)
McGurk, H., and MacDonald, J. (1976). Hearing
lips and seeing voices. Nature, 264,
746-748. (McGurk demo http//www.faculty.ucr.edu/
rosenblu/VSMcGurk.html)
11Some History on the McGurk Illusion The
most striking demonstration of the combined
(bimodal) nature of speech understanding appeared
by accident. Harry McGurk, a senior developmental
psychologist at the University of Surrey in
England, and his research assistant John
MacDonald were studying how infants perceive
speech during different periods of development.
For example, they placed a videotape of a mother
talking in one location while the sound of her
voice played in another. For some reason, they
asked their recording technician to create a
videotape with the audio syllable "ba" dubbed
onto a visual "ga." When they played the tape,
McGurk and McDonald perceived "da." Confusion
reigned until they realized that "da" resulted
from a quirk in human perception, not an error on
the technician's part. After testing children and
adults with the dubbed tape, the psychologists
reported this phenomenon in a 1976 paper
humorously titled "Hearing Lips and Seeing
Voices," a landmark in the field of human sensory
integration. This audio-visual illusion has
become known as the McGurk effect or McGurk
illusion." Further reading Dominic W. Massaro
David G. Stork, "Speech Recognition and Sensory
Integration", American Scientist, 1998, vol. 86,
p. 236-244. The McGurk effect has played an
important role in audio-visual speech integration
and speech reading. McGurk links on the web
include the following http//www.psych.ucr.edu/
faculty/rosenblum/AVspeech.html http//www.theshop
.net/campbell/mcgurk1.htm http//www.amsci.org/ams
ci/articles/98articles/massaro.html http//www.sys
.uea.ac.uk/iam/newav/newav.html http//www.media.
uio.no/personer/arntm/McGurk_english.html http//m
acserver.haskins.yale.edu/Haskins/HEADS/BIBLIOGRAP
HY/bibliomcgurk.html
12- The Three Main Perceptual Attributes of Sound
- Pitch (not fundamental frequency)
- Loudness (not intensity)
- Timbre (not spectrum envelope or amplitude
envelope)
The terms pitch, loudness, and timbre refer not
to the physical characteristics of sound, but to
the mental experiences that occur in the minds of
listeners.
13Pitch and Fundamental Frequency Rule 1 All else
being equal, the higher the F0, the higher the
perceived pitch.
Lower F0, lower pitch Higher F0, higher pitch
14Rule 2 The ear is more sensitive to F0
differences in the low frequencies than the
higher frequencies. This means that 300 vs. 350
¹ 3000 vs. 3050 That is, the difference in
perceived pitch (not F0) between 300 and 350 Hz
is NOT the same as the difference in pitch
between 3000 and 3050 Hz, even though the
physical differences in F0 are the
same. 300-350 3000-3050
15 OK, so our sensation of pitch comes mainly from
f0. How does the ear measure f0? What are the
possibilities? Theory 1 f0 is the lowest
frequency harmonic in a harmonic spectrum. Maybe
the auditory system measures the lowest frequency
harmonic. Theory 2 f0 is also the harmonic
spacing (i.e., if f0 200 Hz, the harmonic
spacing is also 200 Hz). Maybe the auditory
system measures the harmonic spacing. Theory 3
The 2 possibilities above measure f0 in the
frequency domain, but f0 can also be measured
directly from a time-domain signal f0 is the
number of cycles of the complex pattern per
second. Maybe the auditory system measures f0
directly from the time domain signal.
16The Problem of the Missing Fundamental
Normal F0 F0 Removed
17Conclusion f0 does not need to be physically
present in the signal for a listener to hear a
pitch corresponding to where f0 ought to be. So,
we can rule out Theory 1. Even with f0 removed, a
signal remains periodic at the original f0.
And, the harmonic spacing remains unchanged. So,
Theory 2 and Theory 3 are still alive.
18The harmonic spacing is the same for the top and
bottom, but the 2 signals are not periodic at
quite the same rate.
19Another View of the Pitch Shift Experiment
1200, 1400, 1600
1240, 1440, 1640
Harmonic spacing is the same t0 is not the same.
20- So
- We can rule out Theory 1 (because of the missing
fundamental experiment). - And we can rule out Theory 2 (because of the
pitch shift experiment just described). - That leaves Theory 3 the auditory system
measures the rate at which the signal repeats
itself.
21What does all this mean? Rule 3 The sensation of
pitch is definitely not based on a measurement of
the lowest frequency harmonic in a harmonic
spectrum, and it is not based on harmonic
spacing. It seems to be based on a measurement of
the fundamental period.
22Loudness and Intensity Rule 1 All else being
equal, the higher the intensity, the greater the
loudness.
Higher intensity, higher Lower
intensity, lower loudness loudness
23Rule 2 The relationship between intensity and
loudness is seriously nonlinear. Doubling
intensity does not double loudness. In order to
double loudness, intensity must be increased by a
factor of 10, or by 10 dB 10 x log10 (10) 10 x
1 10 dB. This is called the 10 dB rule. Two
signals differing by 10 dB
(500 Hz sinusoids)
Note that the more intense sound is NOT 10 times
louder even though it is 10 times more
intense. The 10 dB rule means that a 70 dB signal
is twice as loud as a 60 dB signal, four times as
loud as a 50 dB signal, eight times as loud as a
40 dB signal, etc. A 30 dB hearing loss is
considered mild -- just outside the range of
normal hearing. Based on the 10 dB rule, how much
is loudness affected by a 30 dB hearing
loss? (Answer 1/8th. But note that this does
not mean that someone with a 30 dB loss will have
8 times more difficulty with speech understanding
than someone with normal hearing.)
24Rule 3 Loudness is strongly affected by the
frequency of the signal. If intensity is held
constant, a mid-frequency signal (in the range
from 1000-4000 Hz) will be louder than lower or
higher frequency signals. 125 Hz, 3000 Hz, 8000
Hz The 3000 Hz signal should appear louder than
the 125 or the 8000 signal, despite the fact that
their intensities are equal.
25Timbre Timbre, also known as sound quality or
tone color, is oddly defined in terms of what it
is not When two sounds are heard that match for
pitch, loudness, and duration, and a difference
can still be heard between the two sounds, that
difference is called timbre. For example a
clarinet, a saxophone, and a piano all play a
middle C at the same loudness and same duration.
Each of these instruments has a unique sound
quality. This difference is called timbre, tone
color, or simply sound quality. There are also
many examples of timbre difference in speech. For
example, two vowels (e.g., /å/ and /i/) spoken at
the same loudness and same pitch differ from one
another in timbre. There are two physical
correlates of timbre spectrum
envelope amplitude envelope
26Timbre and Spectrum Envelope
Timbre differences between one musical instrument
and another are partly related to differences in
spectrum envelope -- differences in the relative
amplitudes of the individual harmonics. In the
examples above, we would expect all of these
sounds to have the same pitch because the
harmonic spacing is the same in all cases. The
timbre differences that you would hear are
controlled in part by the differences in the
shape of the spectrum envelope.
27Six Synthesized Sounds Differing in Spectrum
Envelope
Note the similarities in pitch (due to constant
F0/harmonic spacing) and the differences in
timbre or sound quality.
28Vowels Also Differ in Spectrum Envelope
Shown here are the smoothed envelopes only (i.e.,
the harmonic fine structure is not shown) of 10
American-English vowels. Note that each vowel
has a unique shape to its spectrum envelope.
Perceptually, these sounds differ from one
another in timbre. Purely as a matter of
convention, the term timbre is seldom used by
phoneticians, although it applies just as well
here as it does in musical acoustics. In
phonetics, timbre differences among vowels are
typically referred to as differences in vowel
quality or vowel color.
From Hillenbrand, J.M, Houde, R.A., Clark, M.J.,
and Nearey, T.M. Vowel recognition from harmonic
spectra. Acoustical Society of America, Berlin,
March, 1999.
29Aperiodic sounds can also differ in spectrum
envelope, and the perceptual differences are
properly described as timbre differences.
30Amplitude Envelope Timber is also affected by
amplitude envelope. Amplitude envelope is a
smooth line drawn to enclose a sound wave. It is
also sometimes called the amplitude contour or
energy contour of the sound wave. These are both
good terms since the amplitude envelope shows how
overall signal amplitude varies over time.
Amplitude envelope refers to the characteristics
of the way sounds are turned on and turned off.
The four signals below are sinusoids that differ
in their amplitude envelopes.
Leading edge attack Trailing edge decay The
attack especially has a large effect on timbre.
31There are many examples in music of timbre
differences related to differences in amplitude
envelope
Plucked vs. bowed stringed instruments The
damping pedal on a piano The difference in sound
quality between a hammered string (e.g., a piano)
and a string that is plucked by a quill (e.g., a
harpsichord).
The timbre differences that distinguish one
musical instrument from another appear to be more
closely related to differences in amplitude
envelope -- and especially the attack -- than to
the shape of the spectrum envelope (although both
play a role). For example, when the amplitude
contour of an oboe tone is imposed on a violin
tone, the resulting tone sounds more like an oboe
than a violin.
White, G.D. The Audio Dictionary, 1987, Seattle
University of Washington Press.
32Same melody, same spectrum envelope (if
sustained), different amplitude envelopes (i.e.,
different attack and decay characteristics). Note
differences in timbre or sound quality as the
amplitde envelope varies.
(1)
(2)
(3)
(4)
(5)
(6)
33Timbre differences related to amplitude envelope
also play a role in speech. Note the differences
in the shape of the attack for /b/ vs. /w/ and
/S/ vs. /tS/.
more gradual attack
abrupt attack
more gradual attack
abrupt attack
34(No Transcript)
35mel
sone
phon