Colleagues: - PowerPoint PPT Presentation

About This Presentation
Title:

Colleagues:

Description:

languagelog.ldc.upenn.edu – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 98
Provided by: ArtsandHu1
Category:

less

Transcript and Presenter's Notes

Title: Colleagues:


1
A brains-eye-view of speech perception David
Poeppel Cognitive Neuroscience of Language
Lab Department of Linguistics and Department of
Biology Neuroscience and Cognitive Science
Program University of Maryland College Park
  • Colleagues
  • Allen Braun, NIH
  • Greg Hickok, UC Irvine
  • Jonathan Simon, Univ. Maryland
  • Students
  • Anthony Boemio
  • Maria Chait
  • Huan Luo
  • Virginie van Wassenhove

2
encoding ?
Is this a hard problem? Yes! If it could be
solved straightforwardly (e.g. by machine), Mark
Liberman would be in Tahiti having cold beers.
representation ?
3
  • Outline
  • (1) Fractionating the problem in space
  • Towards a functional anatomy of speech
    perception
  • Fractionating the problem in time
  • Towards a functional physiology of speech
    perception
  • - A hypothesis about the quantization of time
  • - Psychophysical evidence for temporal
    integration
  • - Imaging evidence

4
interface with lexical items, word recognition
5
interface with lexical items, word recognition
hypothesis about storage distinctive
features -voice voice
voice labial high
labial -round round -round .
. .
6
production, articulation of speech
interface with lexical items, word recognition
hypothesis about storage distinctive features
-voice voice voice labial
high labial -round round
-round . . .
7
production, articulation of speech
hypothesis about production distinctive
features -voice voice labial
high . .
interface with lexical items, word recognition
hypothesis about storage distinctive
features -voice voice
voice labial high
labial -round round -round .
. .
8
production, articulation of speech FEATURES
analysis of auditory signal ? spectro-temporal
rep. ? FEATURES
interface with lexical items, wordrecognition FEA
TURES
9
Unifying concept distinctive feature
auditory-motor interface
coordinate transform from acoustic to
articulatory space
production, articulation of speech
analysis of auditory signal ? spectro-temporal
rep. ? FEATURES
auditory-lexical interface
interface with lexical items, word recognition
10
coordinate transform from acoustic to
articulatory space
production, articulation of speech
analysis of auditory signal ? spectro-temporal
rep. ? FEATURES
interface with lexical items, word recognition
11
Area Spt (left) auditory-motor interface
pIFG/dPM (left) articulatory-based speech codes
STG (bilateral) acoustic-phonetic speech codes
pMTG (left) sound-meaning interface
Hickok Poeppel (2000), Trends in Cognitive
Sciences Hickok Poeppel (in press), Cognition
12
Indefrey Levelt, in press, Cognition Meta-analys
is of neuroimaging data, perception/production
overlap
Shared neural correlates of word production and
perception processes Bilat mid/post STG L
anterior STG L mid/post MTG L post IFG
  • MTG and IFG overlap when controlling for the
    overt/covert
  • distinction across tasks
  • Hypothesized functions
  • lexical selection (MTG)
  • lexical phon. code retr. (MTG)
  • post-lexical syllabification (IFG)

13
Scott Johnsrude 2003
14
Possible Subregions of Inferior Frontal
GyrusBurton (2001)
Auditory Studies Burton et al. (2000), Demonet et
al. (1992, 1994), Fiez et al, (1995), Zatorre et
al., (1992, 1996) Visual Studies Sergent et al.
(1992, 1993), Poldrack et al., (1999), Paulesu
et al. (1993, 1996), Sergent et al., 1993,
Shaywitz et al. (1995)
15
Auditory lexical decision versus FM/sweeps (a),
CP/syllables (b), and rest (c)
(a)
(b)
(c)
D. Poeppel et al. (in press)
z6
z9
z12
16
fMRI (yellow blobs) and MEG (red dots) recordings
of speech perception show pronounced bilateral
activation of left and right temporal cortices
T. Roberts D. Poeppel (in preparation)
17
Binder et al. 2000
18
Area Spt (left) auditory-motor interface
pIFG/dPM (left) articulatory-based speech codes
STG (bilateral) acoustic-phonetic speech codes
pMTG (left) sound-meaning interface
Hickok Poeppel (2000), Trends in Cognitive
Sciences Hickok Poeppel (in press), Cognition
19
  • Outline
  • (1) Fractionating the problem in space
  • Towards a functional anatomy of speech
    perception
  • Fractionating the problem in time
  • Towards a functional physiology of speech
    perception
  • - A hypothesis about the quantization of time
  • - Psychophysical evidence for temporal
    integration
  • - Imaging evidence

20
(No Transcript)
21
The local/global distinction in vision is
intuitively clear
Chuck Close
22
What information does the brain extract from
speech signals?
23
Acoustic and articulatory phonetic phenomena
occur on different time scales
fine structure
envelope
24
Does different granularity in time matter?
Segmental and subsegmental information serial
order in speech fool/flu carp/crap bat/t
ab Supra-segmental information proso
dy Sleep during lecture! Sleep during
lecture?
25
The local/global distinction can be
conceptualized as a multi-resolution analysis in
time
Further processing
Binding process
Supra-segmental information (time 200ms)
Segmental information (time 20-50ms)
syllabicity
metrics
tone
features, segments
26
  • Outline
  • (1) Fractionating the problem in space
  • Towards a functional anatomy of speech
    perception
  • Fractionating the problem in time
  • Towards a functional physiology of speech
    perception
  • - A hypothesis about the quantization of time
  • - Psychophysical evidence for temporal
    integration
  • - Imaging evidence

27
Temporal integration windows Psychophysical and
electrophysiologic evidence suggests that
perceptual information is integrated and analysed
in temporal integration windows (v. Bekesy 1933
Stevens and Hall 1966 Näätänen 1992 Theunissen
and Miller 1995 etc). The importance of the
concept of a temporal integration window is that
it suggests the discontinuous processing of
information in the time domain. The CNS, on this
view, treats time not as a continuous variable
but as a series of temporal windows, and
extracts data from a given window.
arrow of time, physics
arrow of time, Central Nervous System
28
Asymmetric sampling/quantization of the speech
waveform
This p a p er i s h
ar d tp u b l i sh
29
Two spectrograms of the same word illustrate how
different analysis windows highlight different
aspects of the sounds. (a) high time resolution
- each glottal pulse visible as vertical
striation (b) high frequency resolution - each
harmonic visible as horizontal stripe
(a) High time, low frequ.- resolution
(b) Low time, high frequ.- resolution
30
Hypothesis Asymmetric Sampling in Time
(AST) Left temporal cortical areas
preferentially extract information over 25ms
temporal integration windows. Right hemisphere
areas preferentially integrate over long,
150-250ms integration windows. By assumption,
the auditory input signal has a neural
representation that is bilaterally
symmetric (e.g. at the level of core) beyond the
initial representation, the signal is elaborated
asymmetrically in the time domain. Another way
to cocneptualize the AST proposal is to say
that the sampling rate of non-primary auditory
areas is different, with LH sampling at high
frequencies (40Hz) and RH sampling at low
frequencies (4-10Hz).
31
a. Physiological lateralization
Symmetric representation of spectro-temporal
receptive fields in primary auditory cortex
Temporally asymmetric elaboration of perceptual
representations in non-primary cortex
LH
RH
Proportion of neuronal ensembles
25 40Hz 4Hz
250
25 40Hz 4Hz
250
Size of temporal integration windows
(ms) Associated oscillatory frequency (Hz)
32
Asymmetric sampling in time (AST)
characteristics AST is an example of
functional segregation, a standard concept.
AST is an example of multi-resolution analysis, a
signal processing strategy common in other
cortical domains (cf. visual areas MT and V4
which, among other differences, have phasic
versus tonic firing properties, respectively).
AST speaks to the granularity of perceptual
representations the model suggests that there
exist basic perceptual representations that
correspond to the different temporal windows
(e.g. featural info is equally basic to the
envelope of syllables, on this view). The AST
model connects in plausible ways to the local
versus global distinction there are multiple
representations of a given signal on different
scales (cf. wavelets) Global gt large-chunk
analysis, e.g., syllabic level Local gt
small-chunk analysis, e.g., subsegmental level
33
a. Physiological lateralization
Symmetric representation of spectro-temporal
receptive fields in primary auditory cortex
Temporally asymmetric elaboration of perceptual
representations in non-primary cortex
LH
RH
Proportion of neuronal ensembles
25 40Hz 4Hz
250
25 40Hz 4Hz
250
Size of temporal integration windows
(ms) Associated oscillatory frequency (Hz)
b. Functional lateralization
Analyses requiring high temporal resolution
Analyses requiring high spectral resolution
LH
RH
34
  • Outline
  • (1) Fractionating the problem in space
  • Towards a functional anatomy of speech
    perception
  • Fractionating the problem in time
  • Towards a functional physiology of speech
    perception
  • - A hypothesis about the quantization of time
  • AST model
  • - Psychophysical evidence for temporal
    integration
  • - Imaging evidence

35
Perception of FM sweepsHuan Luo, Mike Gordon,
Anthony Boemio,David Poeppel
36
FM Sweep Example
waveform
80msec, from 3-2 kHz, linear FM sweep
spectrogram
37
The rationale
  • Important cues for speech perception
  • Formant transition in speech sounds
  • (For example, F2 direction can distinguish
    /ba/ from /da/)
  • Importance in tone languages
  • Vertebrate auditory system is well equipped to
    analyze FM signals.

38
Tone languages
  • For example, Chinese, Thai
  • The direction of FM (of the fundamental
    frequency) is important in the language to make
    lexical distinctions.
  • (Four tones in Chinese)
  • /Ma 1/, /Ma 2/ , /Ma 3/, /Ma 4/

39
Questions
  • How good are we at discriminating these signals?
  • determine the threshold of the duration of
    stimuli (corresponding to rate) for the
    detection of FM direction
  • Any performance difference between UP and DOWN
    detection?
  • Will language experience affect the performance
    of such a basic perceptual ability?

40
Stimuli
  • Linearly frequency modulated
  • Frequency range studied 2-3 kHz (0.5 oct)
  • Two directions (Up / Down )
  • Changing FM rate (frequency range/time) by
    changing duration. For each frequency range,
    frequency span is kept constant (slow / Fast
    )
  • Stimuli duration from 5msec(100 oct/sec) to 640
    msec (0.8 oct/sec)

Tasks
Detection and discrimination of UP versus
DOWN 2 AFC, 2IFC, 3IFC
41
  • English speakers
  • 3 frequency ranges relevant to speech
  • (approximately F1, F2, F3 ranges)
  • single-interval 2-AFC
  • Two main findings
  • threshold for UP at 20ms
  • UP better than DOWN

2-3 kHz
1-1.5 kHz
600-900Hz
Gordon Poeppel (2001), JASA-ARLO
42
2IFC
  • To eliminate the possibility of bias strategy
    subjects can use
  • To see whether the asymmetric performance of
    English subjects is due to their Up preference
    bias

Same duration of the two sounds, so the only
difference is direction
Interval 1
Interval 2
UP
Down
Which interval (1 or 2) contains certain
direction sound?
43
Results for Chinese Subjects
no significant difference Threshold for both UP
and DOWN is about 20 msec
44
Results for English Subjects
No difference now between UP and DOWN Threshold
for both at 20msec No difference between Chinese
and English subjects now.
45
3IFC
  • Standard
    Interval 1 Interval 2

UP
UP
Down
Choose which interval contains DIFFERENT among
the three sounds (different quality rather than
only direction)
46
3 IFC versus 2 IFC
No difference between Chinese and English subjects
Threshold confirmed at 20ms
47
Conclusion
  • Importance of 20 msec as the threshold for
    discrimination of FM sweeps
  • - corresponds to temporal order threshold
    determined by Hirsh 1959
  • - consistent with Schouten 1985, 1989 testing FM
    sweeps
  • - this basic threshold arguably reflects the
    shortest integration window that generates robust
    auditory percepts.

48
Click trains Anthony Boemio David Poeppel
49
Click Stimuli
50
Psychophysics
51
Auditory visual integration the McGurk
effectVirginie van Wassenhove, Ken Grant,David
Poeppel
52
McGurk Effect
  • Audiovisual (AV) token
  • Visual (V) token
  • Auditory (A) token

53
Identification Task (3AFC) ApVk
TWI
True bimodal responses
Response rate as a function of SOA (ms) in the
ApVk McGurk pair. Mean responses (N21) and
standard errors. Fusion rate (open red squares)
and corrected fusion rate (filled red squares,
dotted line) are /ta/ responses, visually driven
responses (open green triangles) are /ka/, and
auditorily driven responses (filled blue circles)
are /pa/. A negative value in corrected fusion
rate is interpreted as a visually dominated error
response /ta/.
54
Simultaneity Judgment Task (2AFC) ApVk vs. AtVt
and AbVg vs. AdVd
Simultaneity judgment task. Simultaneity
judgment as a function of SOA (ms) in both
incongruent and congruent conditions (ApVk and
AtVt N21 AbVg and AdVd N18). The congruent
conditions (open symbols) are associated with
broader and higher simultaneity judgment profile
than the incongruent conditions (filled symbols).
55
Temporal Window of Integration (TWI) across
Tasks and Bimodal Speech Stimuli
56
  • Outline
  • (1) Fractionating the problem in space
  • Towards a functional anatomy of speech
    perception
  • Fractionating the problem in time
  • Towards a functional physiology of speech
    perception
  • - A hypothesis about the quantization of time
  • AST model
  • - Psychophysical evidence for temporal
    integration
  • FM sweeps and click trains 20-30ms
    integration
  • AV processing in McGurk 200ms integration
  • - Imaging evidence

57
Binding of Temporal Quanta in Speech Processing
Maria Chait, Steven Greenberg, Takayuki Arai,
David Poeppel
58
Multi Resolution Analysis Hypothesis
SYLLABLE
Binding process
Supra- segmental information (t.s 300 ms)
(Sub)-segmental information (t.s 30 ms)
stress
tone
syllabicity
feature
59
(No Transcript)
60
  • 0-6 khz
  • 14 channels
  • spaced in 1/3 octave steps along the cochlear
    frequency map.
  • Every two neighboring channels are separated by
    50hz

61
Envelope Extraction
Amplitude
Time
62
(No Transcript)
63
Original
High Passed
Low Passed
64
Evidence
  • Comodulation masking release
  • Ahissar et al. (2001) - Phase locking in the
    auditory cortex to the envelope of sentence
    stimuli.
  • Shannon (1995)
  • Drullman (1994)

Effect of low pass filtering the envelope on
speech reception severe reduction at 0-2Hz
cutoff frequencies marginal contribution of
frequencies above 16Hz Effect of High Pass
filtering the envelope reduction in speech
intelligibility for cutoff frequencies above
64Hz no reduction in sentence intelligibility
when only frequencies below 4Hz are reduced
65
Experiment 1
  • Stimuli
  • - 53 Sentences from the IEEE corpus.
  • - Nonsense Syllables (CUNY)
  • 8 Blocks 2(voiced/voiceless)2
    vowels(/a/,/i/) 2(CV/VC)
  • - 3 manipulations
  • 0-3 Hz Low Pass
  • 22-40 Hz Band Pass
  • 0-3 and 22-40 Hz
  • Each subject hears all 53 sentences but only one
    manipulation
  • per sentence. A practice block of 26 sentences
    precedes
  • the experiment.
  • Task
  • Sentences subjects asked to write down what they
    heard as precisely as they can
  • Syllables 7-alternative forced choice

66
Results
high-pass
67
Results
high-pass
low-pass
68
Results
high-pass plus low-pass?
high-pass
low-pass
69
Results
high-pass
low-pass
high-pass plus low-pass?
Result reflects the interaction between
information carried on the short and long time
scales.
70
  • Outline
  • (1) Fractionating the problem in space
  • Towards a functional anatomy of speech
    perception
  • Fractionating the problem in time
  • Towards a functional physiology of speech
    perception
  • - A hypothesis about the quantization of time
  • AST model
  • - Psychophysical evidence for temporal
    integration
  • FM sweeps and click trains 20-30ms
    integration
  • AV processing in McGurk 200ms integration
  • Interaction of temporal windows

71
  • fMRI study of temporal
  • structure in concatenated FMs
  • Anthony Boemio, Allen Braun,
  • Steven Fromm, David Poeppel

72
Stimulus Properties
73
Stimulus Properties
Spectrograms
Ampl. vs. Time
PSDs
FM Stimulus
TONE Stimulus
CNST Stimulus
1
1E-10
Time (sec)
0
1
Frequency (Hz)
100
1E4
All 13 stimuli have nearly identical long-term
spectra and RMS power over the entire 9-second
stimulus duration. Stimuli differ only in
segment duration which was determined by drawing
from a Gaussian distribution (previous panel),
with means of 12, 25, 45, 85, 160, and 300ms.
74
fMRI Single-trial sparse acquisition
paradigm (clustered volume acqu.) 1.5T GE
Signa, echo-planar sequence 11.4s TR (9s
signal, 2.4s volume), TE 40ms 24
reps/condition SPM 99 random-effects Model,
plt0.05 corrected
75
SPM 99 Cohort AnalysisFMs-CNST Categorical
Contrasts (p lt 0.05 corr.)
76
(No Transcript)
77
(No Transcript)
78
Hemodynamic response/stimulus modelNot all
segment transitions are equal.
Only 1second of stimuli are shown for clarity
acquisition
Including the segment transitions and segments
themselves, but assuming that transitions between
long segments contribute more to the response
than shorter ones produces the observed
activation vs. segment-duration relation (left).
threshold set by categorical contrast to CNST
stimulus- anything below this level will be zero
in the SPM
79
(No Transcript)
80
  • MEG study of spectral responses
  • to complex sounds
  • David Poeppel, Huan Luo, Dana Ritter, Anthony
    Boemio,
  • Didier Depireux, Jonathan Simon

81
Asymmetric sampling in time (AST)
hypothesis predicts electrophysiological
asymmetries in specific frequency bands, gamma
(25-55Hz) and theta (3-8Hz) . because the
hypothesized temporal quantization is reflected
as oscillatory activity.
LH
RH
Sensitivity of neuronal ensembles
25 250 40Hz 4Hz
25 250 40Hz 4Hz
Size of temporal integration windows
(ms) Associated oscillatory frequency (Hz)
82
(No Transcript)
83
(No Transcript)
84
(No Transcript)
85
Flow chart
Gamma for LH
LH
RMS
Gamma for RH
Theta for LH
RH
RMS
Theta for RH
86
(No Transcript)
87
Multi-taper spectral analysis
88
Result
89
Power ratio in specific frequency bands
(P(L)/(P(L)P(R)))
Kaiser Remetz Elliptic
Gamma 0.4769 0.4751 0.4733
Theta 0.3958 0.3965 0.4210
  • The difference is much greater in Theta band (low
    frequency band) and RH activation in Theta band
    is greater than LH

90
Distribution of spectral responses
91
  • Outline
  • (1) Fractionating the problem in space
  • Towards a functional anatomy of speech
    perception
  • Fractionating the problem in time
  • Towards a functional physiology of speech
    perception
  • - A hypothesis about the quantization of time
  • AST model
  • - Psychophysical evidence for temporal
    integration
  • FM sweeps and click trains 20-30ms
    integration
  • AV processing in McGurk 200ms integration
  • Interaction of temporal windows

92
Area Spt (left) auditory-motor interface
pIFG/dPM (left) articulatory-based speech codes
STG (bilateral) acoustic-phonetic speech codes
pMTG (left) sound-meaning interface
Hickok Poeppel (2000), Trends in Cognitive
Sciences Hickok Poeppel (in press), Cognition
93
Asymmetric sampling in time (AST) builds on
anatomical symmetry but permits functional
asymmetry
a. Physiological lateralization
Symmetric representation of spectro-temporal
receptive fields in primary auditory cortex
Temporally asymmetric elaboration of perceptual
representations in non-primary cortex
LH
RH
Proportion of neuronal ensembles
25 40Hz 4Hz
250
25 40Hz 4Hz
250
Size of temporal integration windows
(ms) Associated oscillatory frequency (Hz)
b. Functional lateralization
Analyses requiring high temporal resolution
Analyses requiring high spectral resolution
LH
RH
94
  • Conclusion
  • The input signal (e.g. speech) must interface
    with
  • higher-order symbolic representations of
    different types
  • (e.g. segmental representations relevant to
    lexical access
  • and supra-segmental representations relevant to
  • interpretation).
  • These higher-order representation categories
    appear to be
  • lateralized (e.g. segmental phonology/LH, phrasal
    prosody/RH).
  • The timing-based asymmetry provides a possible
    cortical
  • logistical or administrative device that
    helps create
  • representations of the appropriate granularity.
  • If this is on the right track, syllable is - at
    least for perception -
  • as elementary a unit as feature/segment. Both are
    basic.

95
Analysis-by-synthesis I
Hypothesize- and test models
96
Analysis-by-synthesis II
Analysis-by-synthesis model of lexical hypothesis
generation and verification (adapted and
extended from Klatt, 1979)
analysis-by-synthesis verification internal
forward model
best- scoring lexical candidates
peripheral and central neurogram
partial feature matrix
lexical hypotheses
spectral analysis
segmental analysis
lexical search
synt./seman. analysis
speech waveform
predicted subsequent items
acceptable word string
97
Analysis-by-synthesis III
analysis-by-synthesis verification internal
forward model
best- scoring lexical candidates
peripheral and central neurogram
partial feature matrix
lexical hypotheses
spectral analysis
segmental analysis
lexical search
synt./seman. analysis
speech waveform
predicted subsequent items
acceptable word string
Write a Comment
User Comments (0)
About PowerShow.com