Title: Perceptual Organization
1Perceptual Organization
- Perfecto Herrera
- Music Perception and Cognition
2The problem(s) of perceptual organization
3Some terms
- Source the physical entity that gives rise to
the sound pressure waves e.g. a violin being
played - Stream the percept of a group of successive
and/or simultaneous sounds as a coherent whole
appearing to come from a single source - The sounds we hear at any one time usually come
from a number of different sources. - In most cases we can hear and identify each of
the different sound sources as having its own
pitch, timbre, loudness and location.
4Auditory Scene Analysis
- A computational theory of hearing is required
plus a functional explanation of the information
processing problems that the auditory system must
solve in order to make sense of the acoustic
environment - Work in computer vision has benefited from a
computational theory since the late 1970s, due
to David Marr - A similar foundation for hearing was developed by
Albert Bregman at McGill University in Montreal
and is known as auditory scene analysis
5Auditory Scene Analysis
- ASA can be conceptualized as a two-stage process
- The mixture of sounds is decomposed into a
collection of sensory elements (onsets, pitch
trajectories, modulations, spectral tracks, etc.) - Elements that are likely to have arisen from the
same event are grouped to form a perceptual
structure (stream) which can be interpreted by
higher centers - For example, when listening to a violin
performance, it is the task of auditory scene
analysis to group the acoustic events emitted
from the physical source (the violin) into a
perceptual stream (the mental experience of a
violin being played).
Is this the only way of listening? What about
reduced listening? Read Pierre Schaeffer
6Auditory Scene Analysis
- In most listening situations, a mixture of sounds
reaches the ears. However we can - Attend to one conversation amid many competing
voices and other background sounds (e.g. music)
at a cocktail party - Follow the melodic line played by the violins in
an orchestral recording. - This problem is of great scientific interest, and
a solution also has engineering applications - -gt The Holy Grial!!!
- Auditory image of Bachs Mass in Bm, consisting
of voice, violin, cello etc. - How does the auditory system process this image
to recover a description of each source?
7- Active Perception expectation-based processing
(bottom-up top-down)
8Auditory Scene Analysis
- The inner ear separates sound into its frequency
components - At some point in the auditory system these
components need to be assigned to the appropriate
sound source - Often called perceptual grouping, or auditory
scene analysis - Two aspects simultaneous grouping and sequential
grouping
9Auditory Scene Analysis
- Simultaneous grouping the grouping together of
the simultaneous frequency components that come
from a single source - Sequential grouping the connecting over time of
the changing frequencies that a single source
produces from one moment to the next
10Example simultaneous grouping and sequential
grouping
11Antecedents Gestalt Psychology
- Gestalt means pattern in German
- Gestalt Psychology originated in early XXth
century Max Wertheimer (1880-1943), Wolfgang
Köhler (1887-1967) and Kurt Koffka (1886-1941) - The basic principles underlying Gestalt
psychology are - The whole is greater than the sum of the parts
- The parts are defined by the whole as much as
vice versa - Gestalt psychologists are best known for their
work in vision but their principles are also
applicable to auditory perception. - They systematically developed a set of principles
of perceptual organisation (believed to be
innate) that they thought determine how we
assemble or associate components in a perceptual
field - These principles are
12Gestalt Psychology Principles
- Proximity
- Similarity
- Common Fate (Common Direction)
- Good Continuation
- Disjoint Allocation (Belongingness)
- Closure
Bottom Up Hard wired, Pre-attentive, Not
Learned (primitive)
Top Down Plastic, Learned (schema-driven)
13Proximity
- In vision when elements in an image are close
together they are perceived to be together and
separate from others that are further away, even
though they are similar - In hearing, sounds occurring together over time
are clustered
14Similarity
- Two or more auditory events are grouped if they
are similar in timbre, pitch, loudness or close
in apparent location or time - Fundamentals in same region but harmonics are
not, leads to fission i.e. Different timbres but
same pitch unfused - Harmonics in same region but fundamentals not,
leads to fusion i.e. Different pitches but same
timbre fused - This is not clear-cut depends on individual
differences. - If the difference in loudness is large enough
they form different streams either can be
attended to - Same dB ? single stream at twice the tempo
15Common Fate
- Components in sound act together
- They tend to start and finish together
- They tend to change in pitch or intensity
together - Therefore if we have a complex sound and the
components are co-ordinated then they are fused,
e.g. onset disparities, and AM and FM (tremolo
vibrato) - For example if harmonics 2,4 and 8s frequency is
modulated (FM) they separate from harmonics 3,5,6
and 7 - Or if the frequency of the 1st harmonic is
modulated (FM) at a different rate it separates
from harmonics 3,4 and 5
16Good Continuation
- Natural sound sources tend change gradually
rather than abruptly in frequency, intensity,
location or timbre - Abrupt change ? new stream ? new source
- Low and high tones tend to split into streams
this can be suppressed by putting glides in
between In speech if there are oscillations in
frequency it gives the impression that there are
two speakers saying the one word - In music in general if a note is near in pitch to
the one just before it then it will be heard as
the next note in the melody rather than a note
that is separate - higher or lower
17Disjoint Allocation (Belongingness)
- One component can only come from one source
i.e. hearing tries to use each component only
once - Say we have two tones at slightly different
pitches and these can either be heard in
isolation or embedded in another series of
pitches thus In isolation the order of AB or BA
is easily judged. - The addition of pitches (Xs) that are close in
pitch to AB act as distracters making it
difficult to order AB (This is thought to be
because we attend more to the start and end of
sequences). - But if more Xs are added, they form a stream
that is separate from AB and again the order of
AB is easily judged. - This not hard fast ambiguity is possible and
this shows that this level of organisation is on
the boundary of being pre-attentive and attentive - It also shows how the addition of new elements
changes the perceptual organisation of the
stimulus.
18Closure
- A source maybe obscured or absent but its
percept continues - e.g. FM radio disturbance from ignition of
passing cars we hear a click over the sound
whereas in fact the radio is producing only a
click - A pitched sound that is broken but the gap is
filled by noise seems unbroken - Similarly a glide that is broken but the gap is
filled with noise seems unbroken
19Auditory Scene Analysis
- Bregman re-examines the Gestalt principles and
proposes the simultaneous and sequential grouping
cues as the basic elements of information that
help to organize our perception what, when,
where, how - Bregman, A. S. (1990) Auditory scene analysis
the perceptual organisation of sound. Cambridge,
Mass. The MIT Press - But see also
- Wang, D. Brown, G. (Editors) (2006).
Computational Auditory Scene Analysis
Principles, Algorithms and Applications. New
York Wiley.
20Simultaneous grouping
- Some cues
- Fundamental Frequency and Spectral Regularity
- Onset Timing
- Correlated changes in Amplitude or Frequency
- Sound Location
- Important A single cue may not be effective all
the time these cues work together for
perceptual organisation of the input sound
21Fundamental Frequency
- Consider two musical instruments each playing a
note simultaneously - It is easier to hear each note and each
instrument if they are playing different notes
(have different fundamental frequencies) - Simultaneous sounds are more likely to fuse if
they have the same fundamental frequency - It has been shown that a pair of simultaneously
presented vowels are easier to identify if their
fundamental frequencies differ
22Spectral Regularity
- Perceptual fusion of the frequency components
from a harmonic sound harmonicity heard as a
single sound - If a frequency component does not form part of
the harmonic series it tends to be heard out
separately as if part of a different source
23Onset disparities
- Perceptual separation on tones enhanced by onset
asynchrony. - A frequency component that stops or starts at a
different time from the complex sound is less
likely to be heard as part of it than if it is
simultaneous with it - Importance to make a soloist standing out
24Onset disparities
- We can hear each of two simultaneously played
notes easier if there is a small onset difference
between them - These onset asynchronies are up to 30ms so the
percept is still of the notes sounding together - The auditory system can exploit these onset
differences even though we are not consciously
aware of them - Ensemble playing completely synchronised?
25Onset disparities
- Shorter rise times easier to hear the order of
the tones - Generally, sounds with abrupt onsets (shorter
rise time) stand out better from a background of
other sounds than do slow-rising sounds - Shorter rise times aids the perceptual
segregation of sounds to tell them apart - Rapid onset sounds e.g. notes from plucked or
struck instruments - Why many musical systems combine abrupt slow
attacking sounds?
26Correlated Changes in Amplitude or Frequency
- A sound may be perceptually segregated from an
unchanging background if its components are
modulated in amplitude or frequency - Hear harmonic complex tone
- Harmonics 1, 3, 5, 6, 7 remain steady
- Harmonics 2, 4, and 8 rise and fall in frequency
four times - Hear the two sets as separate sounds
27Sound Location
- Sounds coming from different locations in space
are generally assumed to be from different
sources - But this is a weak cue for simultaneous
grouping it becomes stronger for sequential
grouping
28Sequential organisation
- Events in the world occur over time. We organise
sounds into sequences over time using various
criteria - Events that are similar in some way (e.g. in
loudness or pitch) or going in the same direction
(e.g. rising or falling) are perceived to have
the same origin. - Music uses this principle
- Streams are created by differences in pitch,
loudness, timbre, repetition rate etc and by
combining these in different ways. - Characteristics of Streams
- Streams are separate we only attend to one
fully at a time. - Foreground and Background possibly 3 maximum
- Streams organisation is relative rather than
absolute - Stream organisation may change as the complexity
of the stimulus changes - Some aspects of streaming are pre-attentive,
others are attentive, i.e. attentive means that
by attending to different aspects of a stimulus
we hear different things
29- The Trill Phenomenon
- Miller and Heise experimented with two
alternating tones to see how close in pitch they
had to be for people to hear a trill. - They observed two states
- When the frequency difference is small the pitch
moves continuously up and down (i.e. a trill) - When the frequency was large two separate tones
were heard. - Miller called the breaking point the "trill
threshold" - it is at approximately 3 semitones.
Eventually, as the pitch variation drops to below
the JND for pitch the trill will become vibrato
(FM, see lecture 5) - The trill phenomenon is an example of auditory
grouping by proximity or similarity of pitch. In
the first pair of sequences below the x's and z's
are seen as separate "objects", but in the second
we see a single zigzag of xs and zs
30Sequential grouping
- Periodicity cues periodic oscillations help to
group objects according to their rates - Spectral cues we tend to group in time elements
that appear in the same spectral regions (e.g.,
high partials vs. low partials) - Level (intensity) cues we tend to group in time
elements of similar level - Spatial cues we tend to group in time elements
coming from the same place
31Features Important for Sequential Grouping
- Spectral distribution (oldnew heuristic)
32Streaming
- What happens when pitch separation and/or
repetition rate are varied? - If we compress the time dimension do we hear
notes that are further apart in frequency
belonging together? - This was tested by Van Noorden (1976,1977), who
found - Segregation depends on repetition rate and pitch
separation - When stream segregation occurs, we are unable to
attend fully to the events in both streams at the
same time - We find it difficult to distinguish the order of
events across streams - We have trouble hearing the overall rhythm of the
sequence
33Features Important for Sequential Grouping
- Frequency and temporal contiguity auditory
streaming
Freq. separation
34Attention and Streaming
- Bregman proposed that auditory streaming was
obligatory and did not depend on attention - Recent studies have shown that this is wrong,
primitive does require attention.
Build up of streaming
If attending to another task
20s
20s
10s
35The Figure-Ground Phenomenon and Attention
- Generally we do not attend to every aspect of the
auditory input certain parts are selected for
conscious analysis - Complex sound is analysed into streams we
attend to one stream at a time attended stream
stands out perceptually rest of sound becomes
less salient - Separation into attended and unattended streams
is equivalent to the figure-ground phenomenon - Examples Attending to one conversation at a time
at a party other conversations form a
background music with soloists TV noisy home - Importance of changes the listeners attention
is usually drawn to aspects of the sound that are
changing it becomes figure while the relatively
unchanging part(s) become background
36- Guess who wrote this text
- It is not enough to be able to describe the
response of single cells, nor predict the results
of psychophysical experiments. Nor is it enough
even to write computer programs that perform
approximately in the desired way One has to do
all these things at once, and also be very aware
of the computational theory...
37- This presentation reused materials from
educational and research slides and documents by - Dan Ellis
- Guy Brown
- Niall Griffith
- Rianna Walsh
- Chris Darwin
- Sue Denham