Title: Compensating speaker-to-microphone playback system for robust speech recognition
1Compensating speaker-to-microphone playback
system for robust speech recognition
So-Young Jeong and Soo-Young Lee Brain Science
Research Center and Department of Electrical
Engineering and Computer Science Korea Advanced
Institute of Science and Technology
2Motivation
- ASR in mismatched environments
- Environmental information
- Background noise, acoustic/transmission channel
- Assume environment degradation model
3Channel Impacts on feature
Channel Assumption 1
Channel Assumption 2
4Speaker-to-Microphone compensation
- Speaker-to-Microphone playback
- Speaker distortion
- Nonlinearity caused by voice coil
- Microphone distortion
- Frequency response caused by different
fabrication - Nonlinearity caused by dynamic range
- Ambient noise by directionality
5Speaker-to-Microphone mapping
- Mapper train
- Where and which type of mapper should be
deployed? - Mapper apply
Error
F.E.
clean
F.E.
Trained Mapper
distorted
To recognizer
6Mapping error at L.S.
- Diamond, plus, cross denotes PS,FB.LS level
7Frequency correlation plots
8Recognition Experiments
- Task
- Phoneme recognition for 40 TIMIT phone sets
- Phone accuracy (N-D-S-I) 100 /N
- Database
- HTIMIT re-recording TIMIT sentence thru. 10
various telephone handsets - Training 246 speaker 8 sent. 1968sent.
- Test 48 speaker 8 384 sent.
- Baseline
- 3-state monophone HMM with 16 gaussian mixture
9Experiment I CI result
type matched mismatch CMS DIAG LIN PER MLP
senh 54.7
cb1 53.6 45.8 50.3 52.6 52.2 52.4 51.9
cb2 54.9 48.3 52.4 55.1 54.8 54.6 53.7
cb3 48.5 32.3 38.7 37.3 40.6 38.2 41.9
cb4 49.8 35.8 40.8 37.9 42.9 42.2 43.3
el1 55.4 45.6 52.2 54.0 53.5 53.2 54.1
el2 53.7 36.7 49.1 51.8 52.5 52.6 52.4
el3 51.0 44.6 44.5 47.1 46.9 47.1 47.2
el4 53.7 43.1 47.6 49.4 49.6 49.7 50.1
pt1 52.6 41.1 43.0 45.2 46.0 45.4 45.9
10Conclusion
- Speech signal distorted by low-quality
speaker-to-microphone playback system can be
compensated with feature mapping network - Feature mapping scheme would be useful in cases
that environmental condition is tough for
collecting database