Compensating speaker-to-microphone playback system for robust speech recognition

About This Presentation

Title:

Compensating speaker-to-microphone playback system for robust speech recognition

Description:

Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and – PowerPoint PPT presentation

Number of Views:109

Avg rating:3.0/5.0

Slides: 11

Provided by: cns96

Category:

more less

Transcript and Presenter's Notes

Title: Compensating speaker-to-microphone playback system for robust speech recognition

1
Compensating speaker-to-microphone playback
system for robust speech recognition
So-Young Jeong and Soo-Young Lee Brain Science
Research Center and Department of Electrical
Engineering and Computer Science Korea Advanced
Institute of Science and Technology
2
Motivation

ASR in mismatched environments
Environmental information
Background noise, acoustic/transmission channel
Assume environment degradation model

3
Channel Impacts on feature
Channel Assumption 1

P.S
F.B.
L.S.
C.S.

Channel Assumption 2
4
Speaker-to-Microphone compensation

Speaker-to-Microphone playback
Speaker distortion
Nonlinearity caused by voice coil
Microphone distortion
Frequency response caused by different
fabrication
Nonlinearity caused by dynamic range
Ambient noise by directionality

5
Speaker-to-Microphone mapping

Mapper train
Where and which type of mapper should be
deployed?
Mapper apply

Error
F.E.
clean

F.E.
Trained Mapper
distorted
To recognizer
6
Mapping error at L.S.

Diamond, plus, cross denotes PS,FB.LS level

7
Frequency correlation plots
8
Recognition Experiments

Task
Phoneme recognition for 40 TIMIT phone sets
Phone accuracy (N-D-S-I) 100 /N
Database
HTIMIT re-recording TIMIT sentence thru. 10
various telephone handsets
Training 246 speaker 8 sent. 1968sent.
Test 48 speaker 8 384 sent.
Baseline
3-state monophone HMM with 16 gaussian mixture

9
Experiment I CI result
type matched mismatch CMS DIAG LIN PER MLP
senh 54.7
cb1 53.6 45.8 50.3 52.6 52.2 52.4 51.9
cb2 54.9 48.3 52.4 55.1 54.8 54.6 53.7
cb3 48.5 32.3 38.7 37.3 40.6 38.2 41.9
cb4 49.8 35.8 40.8 37.9 42.9 42.2 43.3
el1 55.4 45.6 52.2 54.0 53.5 53.2 54.1
el2 53.7 36.7 49.1 51.8 52.5 52.6 52.4
el3 51.0 44.6 44.5 47.1 46.9 47.1 47.2
el4 53.7 43.1 47.6 49.4 49.6 49.7 50.1
pt1 52.6 41.1 43.0 45.2 46.0 45.4 45.9
10
Conclusion

Speech signal distorted by low-quality
speaker-to-microphone playback system can be
compensated with feature mapping network
Feature mapping scheme would be useful in cases
that environmental condition is tough for
collecting database

Write a Comment

User Comments (0)