Perceptual Organization - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Perceptual Organization

Description:

Recent studies have shown that this is wrong, primitive does require attention. If attending ... Guy Brown. Niall Griffith. Rianna Walsh. Chris Darwin. Sue ... – PowerPoint PPT presentation

Number of Views:818

Avg rating:3.0/5.0

Slides: 37

Provided by: Perfecto3

Category:

more less

Transcript and Presenter's Notes

Title: Perceptual Organization

1
Perceptual Organization

Perfecto Herrera
Music Perception and Cognition

2
The problem(s) of perceptual organization
3
Some terms

Source the physical entity that gives rise to
the sound pressure waves e.g. a violin being
played
Stream the percept of a group of successive
and/or simultaneous sounds as a coherent whole
appearing to come from a single source
The sounds we hear at any one time usually come
from a number of different sources.
In most cases we can hear and identify each of
the different sound sources as having its own
pitch, timbre, loudness and location.

4
Auditory Scene Analysis

A computational theory of hearing is required
plus a functional explanation of the information
processing problems that the auditory system must
solve in order to make sense of the acoustic
environment
Work in computer vision has benefited from a
computational theory since the late 1970s, due
to David Marr
A similar foundation for hearing was developed by
Albert Bregman at McGill University in Montreal
and is known as auditory scene analysis

5
Auditory Scene Analysis

ASA can be conceptualized as a two-stage process
The mixture of sounds is decomposed into a
collection of sensory elements (onsets, pitch
trajectories, modulations, spectral tracks, etc.)
Elements that are likely to have arisen from the
same event are grouped to form a perceptual
structure (stream) which can be interpreted by
higher centers
For example, when listening to a violin
performance, it is the task of auditory scene
analysis to group the acoustic events emitted
from the physical source (the violin) into a
perceptual stream (the mental experience of a
violin being played).

Is this the only way of listening? What about
reduced listening? Read Pierre Schaeffer
6
Auditory Scene Analysis

In most listening situations, a mixture of sounds
reaches the ears. However we can
Attend to one conversation amid many competing
voices and other background sounds (e.g. music)
at a cocktail party
Follow the melodic line played by the violins in
an orchestral recording.
This problem is of great scientific interest, and
a solution also has engineering applications
-gt The Holy Grial!!!

Auditory image of Bachs Mass in Bm, consisting
of voice, violin, cello etc.
How does the auditory system process this image
to recover a description of each source?

Active Perception expectation-based processing
(bottom-up top-down)

8
Auditory Scene Analysis

The inner ear separates sound into its frequency
components
At some point in the auditory system these
components need to be assigned to the appropriate
sound source
Often called perceptual grouping, or auditory
scene analysis
Two aspects simultaneous grouping and sequential
grouping

9
Auditory Scene Analysis

Simultaneous grouping the grouping together of
the simultaneous frequency components that come
from a single source
Sequential grouping the connecting over time of
the changing frequencies that a single source
produces from one moment to the next

10
Example simultaneous grouping and sequential
grouping
11
Antecedents Gestalt Psychology

Gestalt means pattern in German
Gestalt Psychology originated in early XXth
century Max Wertheimer (1880-1943), Wolfgang
Köhler (1887-1967) and Kurt Koffka (1886-1941)
The basic principles underlying Gestalt
psychology are
The whole is greater than the sum of the parts
The parts are defined by the whole as much as
vice versa
Gestalt psychologists are best known for their
work in vision but their principles are also
applicable to auditory perception.
They systematically developed a set of principles
of perceptual organisation (believed to be
innate) that they thought determine how we
assemble or associate components in a perceptual
field
These principles are

12
Gestalt Psychology Principles

Proximity
Similarity
Common Fate (Common Direction)
Good Continuation
Disjoint Allocation (Belongingness)
Closure

Bottom Up Hard wired, Pre-attentive, Not
Learned (primitive)
Top Down Plastic, Learned (schema-driven)
13
Proximity

In vision when elements in an image are close
together they are perceived to be together and
separate from others that are further away, even
though they are similar
In hearing, sounds occurring together over time
are clustered

14
Similarity

Two or more auditory events are grouped if they
are similar in timbre, pitch, loudness or close
in apparent location or time
Fundamentals in same region but harmonics are
not, leads to fission i.e. Different timbres but
same pitch unfused
Harmonics in same region but fundamentals not,
leads to fusion i.e. Different pitches but same
timbre fused
This is not clear-cut depends on individual
differences.
If the difference in loudness is large enough
they form different streams either can be
attended to
Same dB ? single stream at twice the tempo

15
Common Fate

Components in sound act together
They tend to start and finish together
They tend to change in pitch or intensity
together
Therefore if we have a complex sound and the
components are co-ordinated then they are fused,
e.g. onset disparities, and AM and FM (tremolo
vibrato)
For example if harmonics 2,4 and 8s frequency is
modulated (FM) they separate from harmonics 3,5,6
and 7
Or if the frequency of the 1st harmonic is
modulated (FM) at a different rate it separates
from harmonics 3,4 and 5

16
Good Continuation

Natural sound sources tend change gradually
rather than abruptly in frequency, intensity,
location or timbre
Abrupt change ? new stream ? new source
Low and high tones tend to split into streams
this can be suppressed by putting glides in
between In speech if there are oscillations in
frequency it gives the impression that there are
two speakers saying the one word
In music in general if a note is near in pitch to
the one just before it then it will be heard as
the next note in the melody rather than a note
that is separate - higher or lower

17
Disjoint Allocation (Belongingness)

One component can only come from one source
i.e. hearing tries to use each component only
once
Say we have two tones at slightly different
pitches and these can either be heard in
isolation or embedded in another series of
pitches thus In isolation the order of AB or BA
is easily judged.
The addition of pitches (Xs) that are close in
pitch to AB act as distracters making it
difficult to order AB (This is thought to be
because we attend more to the start and end of
sequences).
But if more Xs are added, they form a stream
that is separate from AB and again the order of
AB is easily judged.
This not hard fast ambiguity is possible and
this shows that this level of organisation is on
the boundary of being pre-attentive and attentive
It also shows how the addition of new elements
changes the perceptual organisation of the
stimulus.

18
Closure

A source maybe obscured or absent but its
percept continues
e.g. FM radio disturbance from ignition of
passing cars we hear a click over the sound
whereas in fact the radio is producing only a
click
A pitched sound that is broken but the gap is
filled by noise seems unbroken
Similarly a glide that is broken but the gap is
filled with noise seems unbroken

19
Auditory Scene Analysis

Bregman re-examines the Gestalt principles and
proposes the simultaneous and sequential grouping
cues as the basic elements of information that
help to organize our perception what, when,
where, how
Bregman, A. S. (1990) Auditory scene analysis
the perceptual organisation of sound. Cambridge,
Mass. The MIT Press
But see also
Wang, D. Brown, G. (Editors) (2006).
Computational Auditory Scene Analysis
Principles, Algorithms and Applications. New
York Wiley.

20
Simultaneous grouping

Some cues
Fundamental Frequency and Spectral Regularity
Onset Timing
Correlated changes in Amplitude or Frequency
Sound Location
Important A single cue may not be effective all
the time these cues work together for
perceptual organisation of the input sound

21
Fundamental Frequency

Consider two musical instruments each playing a
note simultaneously
It is easier to hear each note and each
instrument if they are playing different notes
(have different fundamental frequencies)
Simultaneous sounds are more likely to fuse if
they have the same fundamental frequency
It has been shown that a pair of simultaneously
presented vowels are easier to identify if their
fundamental frequencies differ

22
Spectral Regularity

Perceptual fusion of the frequency components
from a harmonic sound harmonicity heard as a
single sound
If a frequency component does not form part of
the harmonic series it tends to be heard out
separately as if part of a different source

23
Onset disparities

Perceptual separation on tones enhanced by onset
asynchrony.
A frequency component that stops or starts at a
different time from the complex sound is less
likely to be heard as part of it than if it is
simultaneous with it
Importance to make a soloist standing out

24
Onset disparities

We can hear each of two simultaneously played
notes easier if there is a small onset difference
between them
These onset asynchronies are up to 30ms so the
percept is still of the notes sounding together
The auditory system can exploit these onset
differences even though we are not consciously
aware of them
Ensemble playing completely synchronised?

25
Onset disparities

Shorter rise times easier to hear the order of
the tones
Generally, sounds with abrupt onsets (shorter
rise time) stand out better from a background of
other sounds than do slow-rising sounds
Shorter rise times aids the perceptual
segregation of sounds to tell them apart
Rapid onset sounds e.g. notes from plucked or
struck instruments
Why many musical systems combine abrupt slow
attacking sounds?

26
Correlated Changes in Amplitude or Frequency

A sound may be perceptually segregated from an
unchanging background if its components are
modulated in amplitude or frequency
Hear harmonic complex tone
Harmonics 1, 3, 5, 6, 7 remain steady
Harmonics 2, 4, and 8 rise and fall in frequency
four times
Hear the two sets as separate sounds

27
Sound Location

Sounds coming from different locations in space
are generally assumed to be from different
sources
But this is a weak cue for simultaneous
grouping it becomes stronger for sequential
grouping

28
Sequential organisation

Events in the world occur over time. We organise
sounds into sequences over time using various
criteria
Events that are similar in some way (e.g. in
loudness or pitch) or going in the same direction
(e.g. rising or falling) are perceived to have
the same origin.
Music uses this principle
Streams are created by differences in pitch,
loudness, timbre, repetition rate etc and by
combining these in different ways.
Characteristics of Streams
Streams are separate we only attend to one
fully at a time.
Foreground and Background possibly 3 maximum
Streams organisation is relative rather than
absolute
Stream organisation may change as the complexity
of the stimulus changes
Some aspects of streaming are pre-attentive,
others are attentive, i.e. attentive means that
by attending to different aspects of a stimulus
we hear different things

The Trill Phenomenon
Miller and Heise experimented with two
alternating tones to see how close in pitch they
had to be for people to hear a trill.
They observed two states
When the frequency difference is small the pitch
moves continuously up and down (i.e. a trill)
When the frequency was large two separate tones
were heard.
Miller called the breaking point the "trill
threshold" - it is at approximately 3 semitones.
Eventually, as the pitch variation drops to below
the JND for pitch the trill will become vibrato
(FM, see lecture 5)
The trill phenomenon is an example of auditory
grouping by proximity or similarity of pitch. In
the first pair of sequences below the x's and z's
are seen as separate "objects", but in the second
we see a single zigzag of xs and zs

30
Sequential grouping

Periodicity cues periodic oscillations help to
group objects according to their rates
Spectral cues we tend to group in time elements
that appear in the same spectral regions (e.g.,
high partials vs. low partials)
Level (intensity) cues we tend to group in time
elements of similar level
Spatial cues we tend to group in time elements
coming from the same place

31
Features Important for Sequential Grouping

Spectral distribution (oldnew heuristic)

32
Streaming

What happens when pitch separation and/or
repetition rate are varied?
If we compress the time dimension do we hear
notes that are further apart in frequency
belonging together?
This was tested by Van Noorden (1976,1977), who
found
Segregation depends on repetition rate and pitch
separation
When stream segregation occurs, we are unable to
attend fully to the events in both streams at the
same time
We find it difficult to distinguish the order of
events across streams
We have trouble hearing the overall rhythm of the
sequence

33
Features Important for Sequential Grouping

Frequency and temporal contiguity auditory
streaming

Freq. separation
34
Attention and Streaming

Bregman proposed that auditory streaming was
obligatory and did not depend on attention
Recent studies have shown that this is wrong,
primitive does require attention.

Build up of streaming
If attending to another task
20s
20s
10s
35
The Figure-Ground Phenomenon and Attention

Generally we do not attend to every aspect of the
auditory input certain parts are selected for
conscious analysis
Complex sound is analysed into streams we
attend to one stream at a time attended stream
stands out perceptually rest of sound becomes
less salient
Separation into attended and unattended streams
is equivalent to the figure-ground phenomenon
Examples Attending to one conversation at a time
at a party other conversations form a
background music with soloists TV noisy home
Importance of changes the listeners attention
is usually drawn to aspects of the sound that are
changing it becomes figure while the relatively
unchanging part(s) become background

Guess who wrote this text
It is not enough to be able to describe the
response of single cells, nor predict the results
of psychophysical experiments. Nor is it enough
even to write computer programs that perform
approximately in the desired way One has to do
all these things at once, and also be very aware
of the computational theory...

This presentation reused materials from
educational and research slides and documents by
Dan Ellis
Guy Brown
Niall Griffith
Rianna Walsh
Chris Darwin
Sue Denham

Write a Comment

User Comments (0)