Title: Hallucinations in Auditory Perception
1Hallucinations in Auditory Perception!!!
- Malcolm Slaney
- Yahoo! Research
- Stanford CCRMA
2Hadoop
3(No Transcript)
4(No Transcript)
5One Dimensional (waveform)
Pressure
Time
Cochlear Processing
Two Dimensional (not a spectrogram)
Cochlear
Place
Time
Correlogram Processing
Three Dimensional (neural movie)
Cochlear
Place
Time
Autocorrelation Lag
6Correlogram
Distance down cochlea
Center Frequency
Time Interval (s) Autocorrelation Lag
With help from Richard O. Duda
7Correlogram
8Success
- Reconstructing from correlogram
- NIPS Keynote
9Problems
- Continuation
- Tone and Noise
- Parliament Cough
- Hear two voices?
- What do you hear?
- Waveforms?
- Ideas?
10Pressure
Time
Cochlear Processing
Cochlear
Place
Time
Correlogram Processing
Cochlear
Place
Time
Autocorrelation Lag
11Speech Examples
Wedding
Sine
Natural
12What Vowel is This?
Word 1
Word 2
Peter Ladefoged
Word 3
13(No Transcript)
14(No Transcript)
15(No Transcript)
16McGurk
17Sinewave
18(No Transcript)
19(No Transcript)
20ASR
Three
Three
Three
Language model for the words one, two,
three
Two
Two
Two
One
One
One
Word model showing phonemes for the word one
/w/
/? /
/n/
Acoustic (phoneme) model for the phoneme /? /
S1
S2
S3
21Conventional Scene Analysis
Slide by Dan Ellis (Columbia)
22BarkerASR
23GotoCASA with MIDI
MIDI Sequence
24Old plus New Principle
Slide by Dan Ellis (Columbia)
25EllisPrediction Driven
26Saliency
27Saliency Example
- Time-frequency display
- Saliency map shows high-interest locations
28Saliency Maps
- Longer tones better
- Missing parts salient
- Modulation more salient
- Forward masking works
29Sound Examples
- Birds
- Calls
- Cows
- Horse
- Waterfall
30Saliency Comparison
- Details of saliency comparison
- Model predictions
31Relational Network (Simple)
- Patches of neurons
- Each measureone quantity
- Bidirectionalrelations for feedback/feedforward
- Thanks to Rodney Douglas
32Relational Network (example)
33ASR Relational Network
Bidirectional links enforce phoneme/word
constraints
Phone Recognizer
Cochlea
Word Recognizer
Phone Recognizer
Delay
A patch of neurons (one of N output)
Note We dont know how to represent delays
34Desired Results
Relational Feedback
With
/A/ Phoneme Patch
/I/ Phoneme Patch
AI Word Patch
IA Word Patch
Phoneme Input
35Simulation
36Simulation 2
37Simulation 3
38GrossbergART
39Statistical Means
- ICA
- Different distributions
- One Microphone
- GMM models of distribution
40Conventional
41Better?
42Thanks
43(No Transcript)
44Pitch
45Silicon Frequency Response
- Tone ramps into two cochleas
46Cochlear Best Frequency
47Cochlear Rate Profiles
Spikes per utterance
Left Cochlea
Right Cochlea
48Hardware Overview
Phoneme
Word
Cochlea
Learning
PCI-AER (for remapping)
Learning
Cochlea
Learning
Giacomo Indiveri
Shih-Chii Liu
PCI-AER (for remapping)
Implemented in MATLAB
49(No Transcript)
50LSH Movie
51Auditory Map
By Lloyd Watts
52Please do more Neurophysiology!
David
Jerry
Prabhakar
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61Timbre definition
- Sound color
- Instruments
- Vowels
All sound
Static
Timbre
Pitch
Dynamic
Loudness
62Multi-Dimensional Scaling of Timbre
- Measure
- Distances
- Estimate
- Positions
- Art
- Label axis
Decay
Spectral centroid
Spectral flux
McAdams et al. (1995)
63Desired perception model
- Compact (parsimonious)
- Three Properties
- Predictive
- Explain distance perception
- Simple model
- Orthogonal axis
- Linear model
- Interpolate sounds
Test Euclidean distance
Assumption
64Experimental Contrast
Guess a model that fits the data
Sound
Perception
Model
Parameter space
Perception
65Spectral shape using MFCC
Time (frames)
- A huge tapestry hung in her hallway.
66MFCC and LFC
67Kernel function of DCT
- Spectrum
- superposition of DCT kernels
- Cepstrum coefficients
- Coefficients for superposition
68Parameter space MFCC
C60
0.25
0.5
0.75
C30
0.25
0.5
0.75
69Parameter space LFC
C60
0.25
0.5
0.75
C30
0.25
0.5
0.75
70Synthesize stimuli
- Harmonics pitch and vibrato
- Amplitude weighted by the spectral shape
flat
weighted
Desired spectral shape Vertical - frequency,
Horizontal - amplitude
71Experiment procedures
- Paired stimuli (AB, AG, AD, )
- Rate dissimilarities using 0- 9 scale
- 10 subjects
- Quiet office
- Individual sessions (headphone)
72Euclidean Fitting
- 2D linear regression
- Known values x, y, d - estimate a and b
- Residual from Euclidean model
73Results summary
Tristimulus model
LFC
MFCC
74Experiment results
- MFCC most successful timbre model
- Less linearity for high coeffs
75(No Transcript)
76Remix Examples
Abba Gimme Gimme
Madonna Hung Up
Tracy Young Remix of Hung Up
Tracy Young Remix 2 of Hung Up
77Specificity Spectrum
Cover songs
Remixes
Fingerprinting
Genre
Look for specific exact matches
Bag of Features model
Our work (nearestneighbor)
78Cross-Correlation
72 Billion
- 2M songs
- 3 minutes
- 10 frames/second
72 Billion
79Curse of Dimensionality
- Histogram of distances between Gaussian data
- Normalizedto the mean
- NearestNeighborIll-posed?
80Distractors
81Correlogram
Distance down cochlea
Center Frequency
Time Interval (s) Autocorrelation Lag