Title: Computational Audition at AFRL/HE: Past, Present, and Future
1Computational Audition at AFRL/HE Past,
Present, and Future
- Dr. Timothy R. Anderson
- Human Effectiveness Directorate
- Air Force Research Laboratory
2Biologically Based Signal Processing
- research, development and applications of
- Biologically based algorithms
- Perceptually relevant features
- Human-centered metrics and models
- to improve robustness of speech processing systems
3Why Is This Area Important?
- Present signal processing systems (i.e. speech
and speaker recognition, speech coding, etc.) are
not robust in adverse military environments. - Biological principles offer potential to provide
improved performance in military environments.
4Biologically Based Signal Processing
- Approach
- Develop psychoacoustic testing procedures
- Characterize key features and processes
- Developed human-centered model and metrics
- Implement computationally efficient algorithms
- Provide support to operational test and
warfighting exercises to evaluate system utility
- Technical Challenges
- Identification and modeling of features and
processes used by biological systems - Incorporation of those key features and
processes into computationally efficient
algorithms and structures
5Research Areas
- Cockpit Speech Recognition
- Robust Speech Recognition
- Monaural Speech Recognition
- Binaural Speech Recognition
- Auditory Model Front-ends
- Speaker Recognition/Verification
- Biologically Based Speaker ID
- Channel Robustness
- Speaker Recognizability Test
6Phoneme Classification
- Kohonen Self-Organizing Feature Map
- 16 X 16
- 10 Speaker Database (TIMIT)
- 10 sentences/speaker
- Leaving one out method (per speaker)
- Features calculated with
- 16 ms window
- 5 ms frame step
7TRADITIONAL VS. AUDITORY MONAURAL
8Monaural ASR
- CDHMM-based SI Continuous Word Recognition
- TIMIT
- Diagonal Covariance Correct/Accuracy/Sentence
- MFCC 79.1/78.9/47.2
- AIM 63.5/55.1/29.1
- Full Covariance
- MFCC 72.5/72.4/38.2
- AIM 79.1/75.9/49.1
9Binaural Speech Recognition
10Binaural Speech Recognition
- Stereausis
- Cocktail Party Processor
- BAIM
- BINAP
11EXPERIMENT SETUP
SOUND
SOURCE
X
NOISE
X
SOURCE
12MONAURAL VS. BINAURAL COCKTAIL PARTY PROCESSOR
13MONAURAL VS. BINAURAL AUDITORY IMAGE MODEL
14BINAURAL
15MONAURAL
16BAIM VS. CPP-AIM
17COINCIDENCE
18MONAURAL, BINAURAL AND TRADITIONAL
19Binaural Speech Recognition
RESULTS
BINAURAL AUDITORY MODELPROVIDES BETTER
REPRESENTATION THAN TRADITIONAL TECHNIQUES
RESULTS
7-12 dB BINAURAL ADVANTAGE
20Binaural Speech Recognition
- Past
- Present
- No Current Work
- Future
21Binaural Speech Recognition
- Past
- Present
- Future
- Implement binaural ASR system
- Investigate further binaural fusion mechanisms
- Meeting room data
- Implement binaural system using AIM chips
22Auditory Model Front Ends
23Auditory Model Front Ends
- Tanner Research Analog Speech Recognition
- Implementation of AIM
- 56 channels Analog Filter bank
- Single SBUS board
- 1.5 X Real-time
24Auditory Model Front Ends
- AFIT
- Designed Digital Implementation
- Middle ear, BMM, adaptive thresholding
- 32 channels per chip
- 300 Hz 7 kHz
- 44.1 KHz sampling rate
- 2 chips provide 64 channels in real-time
25Auditory Model Front Ends
- Tanner Research Enhanced Algorithms for Cockpit
Speech Recognition
26Auditory Model Front Ends
- Vocal Point Novel Solutions for Noise-Robust
Speech Processors - Investigated
- Auditory features - SAI
- Neural Networks - Manifold learning
- Isolated word
- High noise
27Auditory Model Front Ends
- Past
- Present
- Single board system designed and prototyped - USB
- Current chip design undergoing debug
- Second fabrication run this fall
- Future
28Auditory Model Front Ends
- Past
- Present
- Future
- Debug and verify chip fabrication
- Debug PC based real-time auditory model front end
- Implement complete end-to-end auditory ASR
- Investigate feedback mechanisms in auditory model
for ASR
29Biologically Based SID
30Biologically Based SID
- Auditory Models Investigated
- Paytons Auditory Model (PAM)
- Auditory Image Model (AIM)
- VQ Codebook used to model speaker
- 37 Speakers from TIMIT (dr1,2 12F 25M)
- MFCC 94
- PAM 67
- AIM 91
31Biologically Based SID
32Biologically Based SID
- Using perceptual features
- Formants, formant bandwidths, and pitch
- Voiced Frames
- Using GMM classifier
- Conducting experiments on larger databases
- Switchboard
33Biologically Based SID
MFCCs, no Deltas, no CMS
MFCCs, no CMS
F0 Base
34Biologically Based SID
MFCCs, no Deltas, no CMS
MFCCs, no CMS
F0 Base
35Biologically Based SID
MFCCs, no Deltas, no CMS
MFCCs, no CMS
F0 Base
36Biologically Based SID
MFCCs, no Deltas, no CMS
MFCCs, no CMS
F0 Base
37Biologically Based SID
- Performance isnt the best, but this feature set
- Uses only 9 features versus 1938 for MFCCs
- Hasnt been as heavily researched as MFCCs
38Biologically Based SID
- Determine reasons for performance differences
between various databases - Channel score normalizations
- Pitch-synchronous features
- Closed-phase analysis
- Glottal model features
39Biologically Based SID
40Biologically Based SID
41Biologically Based SID
- Investigate other auditory based features
- Vocal agitation
- Formants, formant bandwidths, and pitch
calculated from the auditory model - Auditory model features
- Conduct experiments on other databases
- Broadcast news
- Military training exercises
42Speaker Recognizability Test
43Speaker Recognizability Test
- Dynastat The Development of a Method for
Evaluating and Predicting Speaker Recognizability
in Voice Communication Systems - Determined perceptually relevant features
- Perceptual voice traits (PVT)
- 21 traits currently identified
- Developed methodology to measure these traits
- Human listeners
- Developed measure to determine loss due to
channel - Diagnostic Speaker Recogniziability Test (DSRT)
44Speaker Recognizability Test
45Speaker Recognizability Test
- Use perceptual voice traits to identify groups of
similar and distinctive speakers - Determine if current SID systems have difficulty
with these similar speakers - Implementing in-house
- Web-based listening test for
- PVT rating
- DSRT
46Speaker Recognizability Test
47Speaker Recognizability Test
- Obtain PVT ratings for larger database
- Switchboard
- Determine acoustic correlates of perceptually
relevant features - Use as features for speaker recognition
- Utilize DSRT for communication system testing
48Summary
- Computational Audition offers potential for
improved performance in adverse military
environments - Still lots of research needs to be accomplished
- Fidelity of model
- Model feedback pathways
- Computation issues no longer limiting factor in
performing meanful experiments
49Questions?