Title: Perception
1Perception Cognition, One at last in Spoken
Word Recognition Temporal Integration at Two
Time Scales
5/9/05 Cochlear Implant Team UIHC
Bob McMurray University of Iowa Dept. of
Psychology
2Collaborators
Richard Aslin Michael Tanenhaus David Gow
Joe Toscano Dana Subik Julie Markant
3 Perceptual processes Continuous acoustic detail
Specifically Sensitivity to fine-grained
perceptual detail can help integrate information
over time.
- High-level language Processes
- Word Recognition
- Syntax
- Reference
4 Perceptual processes Continuous acoustic detail
Specifically Sensitivity to fine-grained
perceptual detail can help integrate information
over time.
- High-level language Processes
- Word Recognition
- Syntax
- Reference
5 Perceptual processes Continuous acoustic detail
for
support
provide
- High-level language Processes
- Word Recognition
6 Ganong (1980) Lexical information biases
perception of ambiguous phonemes.
Phoneme Restoration (Warren, 1970, Samuel, 1997).
/t/
Lexical Feedback McClelland Elman (1988)
Magnuson, McMurray, Tanenhaus Aslin (2003)
7 Ganong (1980) Lexical information biases
perception of ambiguous phonemes.
Lexical Feedback McClelland Elman (1988)
Magnuson, McMurray, Tanenhaus Aslin (2003)
8 Ganong (1980) Lexical information biases
perception of ambiguous phonemes.
Lexical Feedback McClelland Elman (1988)
Magnuson, McMurray, Tanenhaus Aslin (2003)
9 Perceptual processes Continuous acoustic detail
for
support
provide
- Invariance, Covariance Temporal Integration
- Short-term storage.
- Covariance.
- Limit sensitivity to necessary detail.
- High-level language Processes
- Word Recognition
10 - In language, information arrives sequentially.
- Partial syntactic and semantic representations
are formed as words arrive.
The
Eastside
is
prettier
than the
Westside.
- Words are identified over sequential phonemes.
l
?
?
g?
?
d
?
11 Spoken Word Recognition is an ideal arena in
which to study these issues because
- Research divides word recognition into perceptual
and cognitive mechanisms. - Perceptual information available for temporal
information integration. - Cognitive architectures may support perception.
12 - Scales of temporal integration in word
recognition - A Word ordered series of articulations.
- - Build abstract representations.
- - Form expectations about future events.
- - Fast (online) processing.
- A phonology
- - Abstract across utterances.
- - Expectations about possible future events.
- - Slow (developmental) processing
13Mechanisms of Temporal Integration
- Stimuli do not change arbitrarily.
- Perceptual cues reveal something about the change
itself. - Active integration
- Anticipating future events
- Retain partial present representations.
- Resolve prior ambiguity.
14Representational Medium Lexical Activation
- Lexical activation shows
- Online processing dynamics.
- Sensitivity to fine-grained detail.
- Integration of asynchronous material.
15Overview
- Speech perception and Spoken Word Recognition.
2) Lexical activation is sensitive to
fine-grained detail in speech.
3) Fast temporal integration taking advantage of
regularity in the signal for temporal integration.
4) Slow temporal integration Developmental
consequences
16- Online Word Recognition
- Information arrives sequentially
- At early points in time, signal is temporarily
ambiguous.
- Later arriving information disambiguates the word.
17- Current models of spoken word recognition
- Immediacy Hypotheses formed from the earliest
moments of input. - Activation Based Lexical candidates (words)
receive activation to the degree they match the
input. - Parallel Processing Multiple items are active in
parallel. - Competition Items compete with each other for
recognition.
18Input
b... u tt e r
time
beach
butter
bump
putter
dog
19These processes have been well defined for a
phonemic representation of the input.
A
S
n
I
g
?
n
k
- Considerably less ambiguity if we consider
subphonemic information. - Bonus processing dynamics may solve problems in
speech perception.
Example subphonemic effects of motor processes.
20Coarticulation
Any action reflects future actions as it unfolds.
Example Coarticulation Articulation (lips,
tongue) reflects current, future and past
events. Subtle subphonemic variation in speech
reflects temporal organization.
Sensitivity to these perceptual details might
yield earlier disambiguation. Lexical activation
could store these perceptual details.
21 These processes have largely been ignored
because of a history of evidence that perceptual
variability gets discarded. Example
Categorical Perception
22Categorical Perception
Subphonemic variation in VOT is discarded in
favor of a discrete symbol (phoneme).
23Evidence against the strong form of Categorical
Perception from psychophysical-type tasks
Discrimination Tasks Pisoni and Tash (1974)
Pisoni Lazarus (1974) Carney, Widin
Viemeister (1977)
Training Samuel (1977) Pisoni, Aslin, Perey
Hennessy (1982)
Goodness Ratings Miller (1997) Massaro
Cohen (1983)
24Fundamental independence of fields. Enabled by
CP. Evidence against CP seen to support
paradigm.
25Experiment 1
?
Does within-category acoustic detail
systematically affect higher level language? Is
there a gradient effect of subphonemic detail on
lexical activation?
26McMurray, Aslin Tanenhaus (2002)
A gradient relationship would yield systematic
effects of subphonemic information on lexical
activation.
If this gradiency is useful for temporal
integration, it must be preserved over
time. Need a design sensitive to both acoustic
detail and detailed temporal dynamics of lexical
activation.
27Acoustic Detail
Use a speech continuummore steps yields a better
picture acoustic mapping.
KlattWorks generate synthetic continua from
natural speech.
9-step VOT continua (0-40 ms) 6 pairs of
words. beach/peach bale/pale bear/pear bump/pump
bomb/palm butter/putter 6 fillers. lamp leg loc
k ladder lip leaf shark shell shoe ship sheep shi
rt
28(No Transcript)
29Temporal Dynamics
How do we tap on-line recognition? With an
on-line task Eye-movements
Subjects hear spoken language and manipulate
objects in a visual world. Visual world
includes set of objects with interesting
linguistic properties. a beach, a peach and some
unrelated items. Eye-movements to each object are
monitored throughout the task.
Tanenhaus, Spivey-Knowlton, Eberhart Sedivy,
1995
30Why use eye-movements and visual world paradigm?
- Relatively natural task.
- Eye-movements generated very fast (within 200ms
of first bit of information). - Eye movements time-locked to speech.
- Subjects arent aware of eye-movements.
- Fixation probability maps onto lexical
activation..
31Task
A moment to view the items
32(No Transcript)
33Task
Bear
Repeat 1080 times
34Identification Results
High agreement across subjects and items for
category boundary.
proportion /p/
VOT (ms)
B
P
By subject 17.25 /- 1.33ms By item 17.24
/- 1.24ms
35Task
Target Bear Competitor Pear Unrelated Lamp,
Ship
36Task
37Task
- Given that
- the subject heard bear
- clicked on bear
How often was the subject looking at the pear?
Categorical Results
Gradient Effect
target
target
competitor
competitor
competitor
competitor
38Results
Response
Response
VOT
VOT
0 ms
5 ms
Competitor Fixations
Time since word onset (ms)
Long-lasting gradient effect seen throughout the
timecourse of processing.
39Response
Response
Looks to
Competitor Fixations
Looks to
Category Boundary
VOT (ms)
40Response
Response
Looks to
Competitor Fixations
Looks to
Category Boundary
VOT (ms)
41Summary
Subphonemic acoustic differences in VOT have
gradient effect on lexical activation.
- Gradient effect of VOT on looks to the
competitor.
- Effect holds even for unambiguous stimuli.
- Seems to be long-lasting.
Consistent with growing body of work using
priming (Andruski, Blumstein Burton, 1994
Utman, Blumstein Burton, 2000 Gow, 2001, 2002).
42The Proposed Framework
Sensitivity Use
- Word recognition is systematically sensitive to
subphonemic acoustic detail.
2) Acoustic detail is represented as gradations
in activation across the lexicon.
- This sensitivity enables the system to take
advantage of subphonemic regularities for
temporal integration.
4) This has fundamental consequences for
development learning phonological organization.
43Lexical Sensitivity
- Word recognition is systematically sensitive to
subphonemic acoustic detail.
- Voicing
- Laterality, Manner, Place
- Natural Speech
- X Metalinguistic Tasks
44Lexical Sensitivity
- Word recognition is systematically sensitive to
subphonemic acoustic detail.
- Voicing
- Laterality, Manner, Place
- Natural Speech
- X Metalinguistic Tasks
45Lexical Sensitivity
- Word recognition is systematically sensitive to
subphonemic acoustic detail.
- Voicing
- Laterality, Manner, Place
- Natural Speech
- X Metalinguistic Tasks
ResponseP Looks to B
Competitor Fixations
ResponseB Looks to B
Category Boundary
0
5
10
15
20
25
30
35
40
VOT (ms)
46Lexical Sensitivity
- Word recognition is systematically sensitive to
subphonemic acoustic detail.
- Voicing
- Laterality, Manner, Place
- Natural Speech
- X Metalinguistic Tasks
- ? Non minimal pairs
- ? Duration of effect
- (experiment 1)
472) Acoustic detail is represented as gradations
in activation across the lexicon.
Input
b... u m p
time
48Temporal Integration
- This sensitivity enables the system to take
advantage of subphonemic regularities for
temporal integration.
- Regressive ambiguity resolution (exp 1)
- Ambiguity retained until more information
arrives. - Progressive expectation building (exp 2)
- Phonetic distinctions are spread over time
- Anticipate upcoming material.
49Development
4) Consequences for development learning
phonological organization.
- Learning a language
- Integrating input across many utterances to build
long-term representation. - Sensitivity to subphonemic detail (exp 4 5).
- Allows statistical learning of categories
(model).
50Experiment 2
?
51Misperception
What if initial portion of a stimulus was
misperceived?
Competitor still active - easy to activate it
rest of the way. Competitor completely
inactive - system will garden-path. P (
misperception ) ? distance from
boundary. Gradient activation allows the system
to hedge its bets.
52/ beIr?keId / vs. / peIr?kit /
barricade vs. parakeet
Input
p/b eI r ? k i t
time
Categorical Lexicon
53Methods
10 Pairs of b/p items.
54(No Transcript)
55Eye Movement Results
Barricade -gt Parricade
1
VOT
0.8
0.6
Fixations to Target
0.4
0.2
0
300
600
900
Time (ms)
Faster activation of target as VOTs near lexical
endpoint. --Even within the non-word range.
56Eye Movement Results
Barricade -gt Parricade
1
VOT
0.8
0.6
Fixations to Target
0.4
0.2
0
300
600
900
Time (ms)
Faster activation of target as VOTs near lexical
endpoint. --Even within the non-word range.
57Experiment 2 Conclusions
Gradient effect of within-category variation
without minimal-pairs.
- Gradient effect long-lasting mean POD 240 ms.
- Regressive ambiguity resolution
- Subphonemic gradations maintained until more
information arrives. - Subphonemic gradation can improve (or hinder)
recovery from garden path.
58Progressive Expectation Formation
- Can within-category detail be used to predict
future acoustic/phonetic events? - Yes Phonological regularities create systematic
within-category variation. - Predicts future events.
59Experiment 3 Anticipation
Word-final coronal consonants (n, t, d)
assimilate the place of the following segment.
Maroong Goose
Maroon Duck
Place assimilation -gt ambiguous segments
anticipate upcoming material.
60Subject hears select the maroon
duck select the maroon goose select the
maroong goose select the maroong duck
61Results
Anticipatory effect on looks to non-coronal.
62Onset of goose oculomotor delay
0.3
Assimilated
0.25
Non Assimilated
Fixation Proportion
0.2
0.15
0.1
0.05
0
0
200
400
600
Time (ms)
Looks to duck as a function of time
Inhibitory effect on looks to coronal (duck,
p.024)
63- Sensitivity to subphonemic detail
- Increase priors on likely upcoming events.
- Decrease priors on unlikely upcoming events.
- Active Temporal Integration Process.
- Occasionally assimilation creates ambiguity
- Resolves prior ambiguity mudg drinker
- Similar to experiment 2
- Progressive effect delayed 200ms by lexical
competition.
64Adult Summary
- Lexical activation is exquisitely sensitive to
within-category detail. - This sensitivity is useful to integrate material
over time. -
- Regressive Ambiguity resolution.
- Progressive Facilitation
- Underpins a potentially lexical role in speech
perception.
65Development
Historically, work in speech perception has been
linked to development. Sensitivity to
subphonemic detail must revise our view of
development.
Use Infants face additional temporal integration
problems No lexicon available to clean up
noisy input rely on acoustic regularities. Ex
tracting a phonology from the series of
utterances.
66Sensitivity to subphonemic detail For 30
years, virtually all attempts to address this
question have yielded categorical discrimination
(e.g. Eimas, Siqueland, Jusczyk Vigorito, 1971).
- Exception Miller Eimas (1996).
- Only at extreme VOTs.
- Only when habituated to non- prototypical
token.
67Use?
Nonetheless, infants possess abilities that would
require within-category sensitivity.
- Infants can use allophonic differences at word
boundaries for segmentation (Jusczyk, Hohne
Bauman, 1999 Hohne, Jusczyk, 1994)
- Infants can learn phonetic categories from
distributional statistics (Maye, Werker Gerken,
2002 Maye Weiss, 2004).
68Statistical Category Learning
Speech production causes clustering along
contrastive phonetic dimensions.
E.g. Voicing / Voice Onset Time B VOT
0 P VOT 40
69To statistically learn speech categories, infants
must
- This requires ability to track specific VOTs.
70Experiment 4
Why no demonstrations of sensitivity?
- Habituation
- Discrimination not ID.
- Possible selective adaptation.
- Possible attenuation of sensitivity.
- Synthetic speech
- Not ideal for infants.
- Single exemplar/continuum
- Not necessarily a category representation
Experiment 4 Reassess issue with improved
methods.
71HTPP
- Head-Turn Preference Procedure
- (Jusczyk Aslin, 1995)
- Infants exposed to a chunk of language
- Words in running speech.
- Stream of continuous speech (ala statistical
learning paradigm). - Word list.
- Memory for exposed items (or abstractions)
assessed - Compare listening time between consistent and
inconsistent items.
72Test trials start with all lights off.
73Center Light blinks.
74Brings infants attention to center.
75One of the side-lights blinks.
76When infant looks at side-light he hears a word
77as long as he keeps looking.
78Methods
7.5 month old infants exposed to either 4 b-, or
4 p-words. 80 repetitions total. Form a
category of the exposed class of words.
79Stimuli constructed by cross-splicing naturally
produced tokens of each end point.
80Novelty or Familiarity?
Novelty/Familiarity preference varies across
infants and experiments.
Were only interested in the middle stimuli (b,
p). Infants were classified as novelty or
familiarity preferring by performance on the
endpoints.
81 After being exposed to bear beach bail
bomb Infants who show a novelty effect will
look longer for pear than bear.
What about in between?
Listening Time
Bear
Bear
Pear
82Results
Novelty infants (B 36 P 21)
10000
9000
8000
Listening Time (ms)
7000
Exposed to
6000
B
P
5000
4000
Target
Target
Competitor
Target vs. Target Competitor vs. Target
plt.001 p.017
83Familiarity infants (B 16 P 12)
Target vs. Target Competitor vs. Target
P.003 p.012
84Infants exposed to /p/
Novelty N21
85Infants exposed to /b/
86Experiment 4 Conclusions
Contrary to all previous work
- 7.5 month old infants show gradient sensitivity
to subphonemic detail. - Clear effect for /p/
- Effect attenuated for /b/.
87Reduced effect for /b/ But
88- Category boundary lies between Bear Bear
- - Between (3ms and 11 ms) ??
- Within-category sensitivity in a different range?
89Experiment 5
Same design as experiment 3. VOTs shifted away
from hypothesized boundary Train
Test
-9.7 ms.
Bomb Bear Beach Bale
3.6 ms.
Bomb Bear Beach Bale
40.7 ms.
Palm Pear Peach Pail
90Familiarity infants (34 Infants)
.01
9000
.05
8000
7000
Listening Time (ms)
6000
5000
4000
B-
B
P
91Novelty infants (25 Infants)
.002
9000
.02
8000
7000
Listening Time (ms)
6000
5000
4000
B-
B
P
92Experiment 5 Conclusions
- Within-category sensitivity in /b/ as well as /p/.
- Shifted category boundary in /b/ not consistent
with adult boundary (or prior infant work). Why?
93/b/ results consistent with (at least) two
mappings.
/b/
/p/
1) Shifted boundary
Category Mapping Strength
VOT
- Inconsistent with prior literature.
- Why would infants have this boundary?
94HTPP is a one-alternative task. Asks B or
not-B not B or P
Hypothesis Sparse categories by-product of
efficient learning.
95Computational Model
Distributional learning model
- Model distribution of tokens as
- a mixture of Gaussian distributions
- over phonetic dimension (e.g. VOT) .
2) After receiving an input, the Gaussian with
the highest posterior probability is the
category.
96Statistical Category Learning
1) Start with a set of randomly selected
Gaussians.
- After each input, adjust each parameter to find
best description of the input.
- Start with more Gaussians than necessary--model
doesnt innately know how many categories. - ? -gt 0 for unneeded categories.
97(No Transcript)
98- Overgeneralization
- large ?
- costly lose phonetic distinctions
99- Undergeneralization
- small ?
- not as costly maintain distinctiveness.
100- To increase likelihood of successful learning
- err on the side of caution.
- start with small ?
101Sparseness coefficient of space not strongly
mapped to any category.
VOT
102Start with large s
VOT
Starting ?
0.4
0.35
0.3
0.25
Avg Sparsity Coefficient
0.2
0.15
0.1
0.05
0
0
2000
4000
6000
8000
10000
12000
Training Epochs
103Intermediate starting s
VOT
Starting ?
0.4
0.35
0.3
0.25
Avg Sparsity Coefficient
0.2
0.15
0.1
0.05
0
0
2000
4000
6000
8000
10000
12000
Training Epochs
104Limitations
- Occasionally model leaves sparse regions at the
end of learning. - Competition/Choice framework
- Additional competition or selection mechanisms
during processing categorization despite
incomplete information.
- Multi-dimensional categories
- 1-D 3 parameters / category
- 2-D 6
- 3-D 13
- 4-D 15
- Cue/model-reliability may reduce dimensionality.
105Non-parametric approach?
- Not constrained by a particular equationcan fill
space better.
- Similar properties in terms of starting ? and
sparseness.
106Model Conclusions
To avoid overgeneralization better to start
with small estimates for ?
Small or even medium starting ?s lead to sparse
category structure during infancymuch of
phonetic space is unmapped.
Sparse categories Similar temporal integration
to exp 2 Retain ambiguity (and partial
representations) until more input is available.
107AEM Paradigm
Examination of sparseness/completeness of
categories needs a two alternative task.
- Also useful with
- Color
- Shape
- Spatial Frequency
- Faces
Quicktime Demo
108Experiment 6
Anticipatory Eye Movements Train Bear0
Left Pail35 Right Test Bear0
Pear40 Bear5 Pear35 Bear10 Pear30
Bear15 Pear25 Same naturally-produced
tokens from Exps 4 5.
109Expected results
Adult boundary
unmapped
space
Pail
Bear
Performance
VOT
VOT
VOT
110Results
Correct 67 9 / 16 Better than chance.
Training Tokens
111Infant Summary
Infants show graded sensitivity to subphonemic
detail.
- /b/-results regions of unmapped phonetic space.
- Statistical approach provides support for
sparseness. - Given current learning theories, sparseness
results from optimal starting parameters. - Empirical test will require a two-alternative
task. - AEM train infants to make eye-movements in
response to stimulus identity. - What is the role of the emerging lexicon?
112Conclusions
Infant and adults sensitive to subphonemic detail.
Sensitivity is important to adult and developing
word recognition systems. 1) Short term cue
integration. 2) Long term phonology
learning. In both cases Partially ambiguous
material is retained by lexical activation until
more data arrives. Partially active
representations anticipate likelihood of future
material (classes of words)
113Conclusions
Spoken language is defined by change. But the
information to cope with it is in the signalif
the lexicon looks online. Within-category
acoustic variation is signal, not noise.
114Within-Category Variation is Used in Spoken Word
Recognition Temporal Integration at Two Time
Scales Bob McMurray University of Iowa Dept. of
Psychology
115(No Transcript)
116Misperception Additional Results
117- 10 Pairs of b/p items.
- 0 35 ms VOT continua.
20 Filler items (lemonade, restaurant,
saxophone) Option to click X
(Mispronounced). 26 Subjects 1240 Trials over
two days.
118Identification Results
1.00
0.90
0.80
0.70
Significant target responses even at
extreme. Graded effects of VOT on correct
response rate.
Voiced
0.60
0.50
Response Rate
Voiceless
0.40
NW
0.30
0.20
0.10
0.00
0
5
10
15
20
25
30
35
Barricade
Parricade
119Phonetic Garden-Path
Garden-path effect Difference between looks
to each target (b vs. p) at same VOT.
120Target
GP Effect Gradient effect of VOT. Target
plt.0001 Competitor plt.0001
Competitor
121Assimilation Additional Results
122 runm picks runm takes
123Exp 3 4 Conclusions
- Within-category detail used in recovering from
assimilation temporal integration. - Anticipate upcoming material
- Bias activations based on context
- - Like Exp 2 within-category detail retained to
resolve ambiguity.. - Phonological variation is a source of information.
124Subject hears select the mud drinker select
the mudg gear select the mudg drinker
Critical Pair
125Onset of gear
Avg. offset of gear (402 ms)
0.45
0.4
0.35
0.3
Fixation Proportion
0.25
0.2
0.15
0.1
0.05
0
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Time (ms)
Mudg Gear is initially ambiguous with a late bias
towards Mud.
126Mudg Drinker is also ambiguous with a late bias
towards Mug (the /g/ has to come from
somewhere).
127(No Transcript)