Computational Audition at AFRL/HE: Past, Present, and Future - PowerPoint PPT Presentation

About This Presentation
Title:

Computational Audition at AFRL/HE: Past, Present, and Future

Description:

... Front Ends Tanner Research Analog Speech Recognition Implementation ... Challenges Identification and ... Characterize key features and ... – PowerPoint PPT presentation

Number of Views:205
Avg rating:3.0/5.0
Slides: 47
Provided by: AFR105
Category:

less

Transcript and Presenter's Notes

Title: Computational Audition at AFRL/HE: Past, Present, and Future


1
Computational Audition at AFRL/HE Past,
Present, and Future
  • Dr. Timothy R. Anderson
  • Human Effectiveness Directorate
  • Air Force Research Laboratory

2
Biologically Based Signal Processing
  • research, development and applications of
  • Biologically based algorithms
  • Perceptually relevant features
  • Human-centered metrics and models
  • to improve robustness of speech processing systems

3
Why Is This Area Important?
  • Present signal processing systems (i.e. speech
    and speaker recognition, speech coding, etc.) are
    not robust in adverse military environments.
  • Biological principles offer potential to provide
    improved performance in military environments.

4
Biologically Based Signal Processing
  • Approach
  • Develop psychoacoustic testing procedures
  • Characterize key features and processes
  • Developed human-centered model and metrics
  • Implement computationally efficient algorithms
  • Provide support to operational test and
    warfighting exercises to evaluate system utility
  • Technical Challenges
  • Identification and modeling of features and
    processes used by biological systems
  • Incorporation of those key features and
    processes into computationally efficient
    algorithms and structures

5
Research Areas
  • Cockpit Speech Recognition
  • Robust Speech Recognition
  • Monaural Speech Recognition
  • Binaural Speech Recognition
  • Auditory Model Front-ends
  • Speaker Recognition/Verification
  • Biologically Based Speaker ID
  • Channel Robustness
  • Speaker Recognizability Test

6
Phoneme Classification
  • Kohonen Self-Organizing Feature Map
  • 16 X 16
  • 10 Speaker Database (TIMIT)
  • 10 sentences/speaker
  • Leaving one out method (per speaker)
  • Features calculated with
  • 16 ms window
  • 5 ms frame step

7
TRADITIONAL VS. AUDITORY MONAURAL
8
Monaural ASR
  • CDHMM-based SI Continuous Word Recognition
  • TIMIT
  • Diagonal Covariance Correct/Accuracy/Sentence
  • MFCC 79.1/78.9/47.2
  • AIM 63.5/55.1/29.1
  • Full Covariance
  • MFCC 72.5/72.4/38.2
  • AIM 79.1/75.9/49.1

9
Binaural Speech Recognition
  • Past
  • Present
  • Future

10
Binaural Speech Recognition
  • Stereausis
  • Cocktail Party Processor
  • BAIM
  • BINAP

11
EXPERIMENT SETUP
SOUND
SOURCE
X
NOISE
X
SOURCE
12
MONAURAL VS. BINAURAL COCKTAIL PARTY PROCESSOR
13
MONAURAL VS. BINAURAL AUDITORY IMAGE MODEL
14
BINAURAL
15
MONAURAL
16
BAIM VS. CPP-AIM
17
COINCIDENCE
18
MONAURAL, BINAURAL AND TRADITIONAL
19
Binaural Speech Recognition
RESULTS
BINAURAL AUDITORY MODELPROVIDES BETTER
REPRESENTATION THAN TRADITIONAL TECHNIQUES
RESULTS
7-12 dB BINAURAL ADVANTAGE
20
Binaural Speech Recognition
  • Past
  • Present
  • No Current Work
  • Future

21
Binaural Speech Recognition
  • Past
  • Present
  • Future
  • Implement binaural ASR system
  • Investigate further binaural fusion mechanisms
  • Meeting room data
  • Implement binaural system using AIM chips

22
Auditory Model Front Ends
  • Past
  • Present
  • Future

23
Auditory Model Front Ends
  • Tanner Research Analog Speech Recognition
  • Implementation of AIM
  • 56 channels Analog Filter bank
  • Single SBUS board
  • 1.5 X Real-time

24
Auditory Model Front Ends
  • AFIT
  • Designed Digital Implementation
  • Middle ear, BMM, adaptive thresholding
  • 32 channels per chip
  • 300 Hz 7 kHz
  • 44.1 KHz sampling rate
  • 2 chips provide 64 channels in real-time

25
Auditory Model Front Ends
  • Tanner Research Enhanced Algorithms for Cockpit
    Speech Recognition

26
Auditory Model Front Ends
  • Vocal Point Novel Solutions for Noise-Robust
    Speech Processors
  • Investigated
  • Auditory features - SAI
  • Neural Networks - Manifold learning
  • Isolated word
  • High noise

27
Auditory Model Front Ends
  • Past
  • Present
  • Single board system designed and prototyped - USB
  • Current chip design undergoing debug
  • Second fabrication run this fall
  • Future

28
Auditory Model Front Ends
  • Past
  • Present
  • Future
  • Debug and verify chip fabrication
  • Debug PC based real-time auditory model front end
  • Implement complete end-to-end auditory ASR
  • Investigate feedback mechanisms in auditory model
    for ASR

29
Biologically Based SID
  • Past
  • Present
  • Future

30
Biologically Based SID
  • Auditory Models Investigated
  • Paytons Auditory Model (PAM)
  • Auditory Image Model (AIM)
  • VQ Codebook used to model speaker
  • 37 Speakers from TIMIT (dr1,2 12F 25M)
  • MFCC 94
  • PAM 67
  • AIM 91

31
Biologically Based SID
  • Past
  • Present
  • Future

32
Biologically Based SID
  • Using perceptual features
  • Formants, formant bandwidths, and pitch
  • Voiced Frames
  • Using GMM classifier
  • Conducting experiments on larger databases
  • Switchboard

33
Biologically Based SID
MFCCs, no Deltas, no CMS
MFCCs, no CMS
F0 Base
34
Biologically Based SID
MFCCs, no Deltas, no CMS
MFCCs, no CMS
F0 Base
35
Biologically Based SID
MFCCs, no Deltas, no CMS
MFCCs, no CMS
F0 Base
36
Biologically Based SID
MFCCs, no Deltas, no CMS
MFCCs, no CMS
F0 Base
37
Biologically Based SID
  • Performance isnt the best, but this feature set
  • Uses only 9 features versus 1938 for MFCCs
  • Hasnt been as heavily researched as MFCCs

38
Biologically Based SID
  • Determine reasons for performance differences
    between various databases
  • Channel score normalizations
  • Pitch-synchronous features
  • Closed-phase analysis
  • Glottal model features

39
Biologically Based SID
40
Biologically Based SID
  • Past
  • Present
  • Future

41
Biologically Based SID
  • Investigate other auditory based features
  • Vocal agitation
  • Formants, formant bandwidths, and pitch
    calculated from the auditory model
  • Auditory model features
  • Conduct experiments on other databases
  • Broadcast news
  • Military training exercises

42
Speaker Recognizability Test
  • Past
  • Present
  • Future

43
Speaker Recognizability Test
  • Dynastat The Development of a Method for
    Evaluating and Predicting Speaker Recognizability
    in Voice Communication Systems
  • Determined perceptually relevant features
  • Perceptual voice traits (PVT)
  • 21 traits currently identified
  • Developed methodology to measure these traits
  • Human listeners
  • Developed measure to determine loss due to
    channel
  • Diagnostic Speaker Recogniziability Test (DSRT)

44
Speaker Recognizability Test
  • Past
  • Present
  • Future

45
Speaker Recognizability Test
  • Use perceptual voice traits to identify groups of
    similar and distinctive speakers
  • Determine if current SID systems have difficulty
    with these similar speakers
  • Implementing in-house
  • Web-based listening test for
  • PVT rating
  • DSRT

46
Speaker Recognizability Test
  • Past
  • Present
  • Future

47
Speaker Recognizability Test
  • Obtain PVT ratings for larger database
  • Switchboard
  • Determine acoustic correlates of perceptually
    relevant features
  • Use as features for speaker recognition
  • Utilize DSRT for communication system testing

48
Summary
  • Computational Audition offers potential for
    improved performance in adverse military
    environments
  • Still lots of research needs to be accomplished
  • Fidelity of model
  • Model feedback pathways
  • Computation issues no longer limiting factor in
    performing meanful experiments

49
Questions?
Write a Comment
User Comments (0)
About PowerShow.com