Compensating speaker-to-microphone playback system for robust speech recognition - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Compensating speaker-to-microphone playback system for robust speech recognition

Description:

Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 11
Provided by: cns96
Category:

less

Transcript and Presenter's Notes

Title: Compensating speaker-to-microphone playback system for robust speech recognition


1
Compensating speaker-to-microphone playback
system for robust speech recognition
So-Young Jeong and Soo-Young Lee Brain Science
Research Center and Department of Electrical
Engineering and Computer Science Korea Advanced
Institute of Science and Technology
2
Motivation
  • ASR in mismatched environments
  • Environmental information
  • Background noise, acoustic/transmission channel
  • Assume environment degradation model

3
Channel Impacts on feature
Channel Assumption 1
  • P.S
  • F.B.
  • L.S.
  • C.S.

Channel Assumption 2
4
Speaker-to-Microphone compensation
  • Speaker-to-Microphone playback
  • Speaker distortion
  • Nonlinearity caused by voice coil
  • Microphone distortion
  • Frequency response caused by different
    fabrication
  • Nonlinearity caused by dynamic range
  • Ambient noise by directionality

5
Speaker-to-Microphone mapping
  • Mapper train
  • Where and which type of mapper should be
    deployed?
  • Mapper apply

Error
F.E.
clean

F.E.
Trained Mapper
distorted
To recognizer
6
Mapping error at L.S.
  • Diamond, plus, cross denotes PS,FB.LS level

7
Frequency correlation plots
8
Recognition Experiments
  • Task
  • Phoneme recognition for 40 TIMIT phone sets
  • Phone accuracy (N-D-S-I) 100 /N
  • Database
  • HTIMIT re-recording TIMIT sentence thru. 10
    various telephone handsets
  • Training 246 speaker 8 sent. 1968sent.
  • Test 48 speaker 8 384 sent.
  • Baseline
  • 3-state monophone HMM with 16 gaussian mixture

9
Experiment I CI result
type matched mismatch CMS DIAG LIN PER MLP
senh 54.7
cb1 53.6 45.8 50.3 52.6 52.2 52.4 51.9
cb2 54.9 48.3 52.4 55.1 54.8 54.6 53.7
cb3 48.5 32.3 38.7 37.3 40.6 38.2 41.9
cb4 49.8 35.8 40.8 37.9 42.9 42.2 43.3
el1 55.4 45.6 52.2 54.0 53.5 53.2 54.1
el2 53.7 36.7 49.1 51.8 52.5 52.6 52.4
el3 51.0 44.6 44.5 47.1 46.9 47.1 47.2
el4 53.7 43.1 47.6 49.4 49.6 49.7 50.1
pt1 52.6 41.1 43.0 45.2 46.0 45.4 45.9
10
Conclusion
  • Speech signal distorted by low-quality
    speaker-to-microphone playback system can be
    compensated with feature mapping network
  • Feature mapping scheme would be useful in cases
    that environmental condition is tough for
    collecting database
Write a Comment
User Comments (0)
About PowerShow.com