Enhanced Speech Models for Robust Speech Recognition - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Enhanced Speech Models for Robust Speech Recognition

Description:

Enhanced Speech Models for Robust Speech Recognition Juan Arturo Nolazco-Flores Dpto. de Ciencias Computacinales ITESM, campus Monterrey Talk Overview Introduction ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 48

Provided by: JuanArtur

Category:

more less

Transcript and Presenter's Notes

Title: Enhanced Speech Models for Robust Speech Recognition

1
Enhanced Speech Modelsfor Robust Speech
Recognition

Juan Arturo Nolazco-Flores
Dpto. de Ciencias Computacinales
ITESM, campus Monterrey

2
Talk Overview

Introduction
Enhanced-Speech Models
Coments and Conclusions

3
Questions?
4
Introduction

Problem
Automatic Speech Recognition performance is
highly degraded when speech is corrupted for
noise (additive noise, convolutional noise,
etc.).
Fact
In order to have real speech recognisers, ASR
should tackle this problem.
Knowledge.
ASR can be improved either
Enhancing speech before recognition
Training models in the same environment the ASR
is going to be used.
Challenge
Find a simple and efficient technique to solve
this problem.

5
Recognition using CD-HMM
Recogniser
6
Recognition under Adverse Environments
TIMIT 6632
Digitos 10
7
(No Transcript)
8
Enhancing Speech

Features
Models are trained with clean speech.
Corrupted speech is enhanced.
There are a number of well studied techniques
Subtract an estimated noise found during
nonspeech activity.
Adaptive noise cancelling (ANC).
Successful for low to medium SNR (gt5dB).

Problems
Enhancers are not perfects, therefore
the speech is distorted and
there are residual noise.

10
Training models in the same environment

ASR systems which uses this technique can deal
with low to high SNR (gt0 dB).
In example, for an isolated digit recognition
task where digits are corrupted for
helicopter(Lynx) noise, you can get the following
performance
For TIMIT
Problem
There are many possible environments (no
practical).

However, using continuous HMM is possible to
combine the clean speech model and noise model
and obtain a noisy speech model.
Techniques
Model Decomposition
Parallel Model Combination-PMC (Mark Gales,
1996).
Cepstrum-Domain Model Combination-CDMC (Kim
Rose, 2002).

12
Changing to linear domain using PMC

Introduction
Scheme
Diagram

13
Introduction

It is an artificial way to simulate that the
system has been trained in the adverse
environment the system is going to work.
The clean speech CHMM and the noise CHMM
(estimated with the noise before the word is
uttered) are combined in the linear domain to
obtain models adapted to the adverse environment.
The combination is based in the assumption that
that pdf of the state distribution models are
completely defined by the mean and variance.

14
Scheme

For simplicity, it is convenient to combine these
models in a linear domain.
Problem
High performance speech recognition is obtained
in a non-linear domain (i.e. mel-cepstral domain,
auditory-based coefficients).
Solution
Transform coefficients to a linear domain.

15
Diagram
Clean speech HMM
Linear domain
C-1()
exp()
PMC HMM
C()

log()
Noise HMM
C-1()
exp()
Simulates training in noise.
16
Enhanced Speech Models

Introduction
Hypothesis prove
Enhanced-Speech Models Combination
Changing to linear domain using PMC
Diagram
Results

17
Introduction

When we train in the same environment, we
obtained the following upper boundry values
Since PMC or CDMC (Cepstrum-Domain Model
Combination) tries to simulated recognition in
the same environment, hence this are the best
expected results for these kind of techniques.

18
Introduction

How can we improve recognition performance in
adverse environments?

Fact
The enhancer returns a cleaner speech, but
distorted.
Therefore the question is
Is it possible to improve recognition performance
if the models where trained with this enhaned
speech?

20
Hypothesis

Enhanced-Speech models improve ASR performance in
noisy environments.

21
In order to prove this hypothesis

A signal enhancement scheme has to be selected.
Models has to be trained with the enhanced
speech.
Observation vectors input to the recogniser has
to be processed for the selected enhancement
scheme.

22
Hypothesis Prove

Introduction
Spectral Subtraction definition
Experiments and results
Conclusions

23
Introduction

Since it is a simple (and successful) scheme,
Spectral Subtraction (SS) was selected.

24
Spectral Subtraction Definition

Before filterbank
After filterbank.

25
Experiments and Results.

CHMMs were trained with speech enhanced by SS.
Recognition performance was developed over speech
enhance by SS in the same conditions.

26
Example 1

Task isolated digit Recognition
Vocabulary Size 10
Training Using enhanced speech
Noise Helicopter (Lynx)
Database Noisex92
Real noise is artificially added to clean speech,
such that no Lombard effect can bias recognition
performance.

27
Std. HMM

bPSS

Training Models in Noise (PMC)
Enhanced-Speech Models
28
Example 2

Task continuous digit Recognition
Vocabulary size 30 words
Training Using enhanced speech
Noise White
White noise is artificially added to clean
speech, such that no Lombard effect can bias
recognition performance.

29
Results
Std. HMM
Noisy Speech Models (PMC)
Enhanced-Speech Models
30
Example 3

Task continuous speech Recognition
Vocabulary size 6233 words
Training Using enhanced speech
Noise white
Database TIMIT
Real noise is artificially added to clean speech,
such that no Lombard effect can bias recognition
performance.

31
Results
Std. HMM
Noisy Speech Models (PMC)
Enhanced-Speech Models
32
Conclusions

Hypothesis was prove to be true.
Challenge
Tried these experiments using other databases.
How can we combine
Enhanced Scheme,
the Noise Model
and the Clean models
such that we do not need to train for all
enhancement conditions.

33
Conclusions

Are all the enhancement schemes suited for
combination?

34
Conclusions

Now, we know that ASR can be improved either
Enhancing speech before recognition
Training CHMM in the same environment the ASR is
going to be used.
Training CHMM with the same enhancement technique
that is used to get cleaner speech at
recognition.
Advantage
Moreover, training with a better enhancement
technique means a potential better recognition
performance.

35
ES-SS Model Combination

Introduction
ES-Spectral Subtraction Scheme

36
Introduction

How can we combine CHMMs without having to train
for each enhancement and noise condition?
Observation For CHMMs the states pdfs are
completely defined for their means and variances.

37
ES-Spectral Subtraction Scheme
Assuming Y and YD can be modelled as parametric
distributions with means EY and EYD and
variances VY and VYD.
It can be shown that these parameters
are distorted as follows
pdf of Y
38
Prove
where
Re-arranging
39
Hence
40
A(a,P(Y))
Assuming that Y is lognormal
Making
( )
41
ES-PMC Diagram
Adaptation calculations
Clean speech HMM
ES-PMC HMM
C-gtlog
exp()
C()
log()

PMC
Noise HMM
C-gtlog
exp()
Speech is pre-processed using SS.
42
Results
No compensation scheme
Spectral Subtraction
PMC
Spectral Subtraction and parallel
model combination
43
Results
No compensation scheme
Spectral Subtraction
PMC
Spectral Subtraction and parallel
model combination
44
Results
No compensation scheme
Spectral Subtraction
PMC
Spectral Subtraction and parallel
model combination
45
Results
No compensation scheme
Spectral Subtraction
PMC
Spectral Subtraction and parallel
model combination
46
Coments and Conclusions

Since training and recognition with the same
speech enhancement scheme have not been tried
before, hence a new area of research has been
open.
How can we combine CHMM, such that we do not need
to train for all enhancement conditions.
Are all the enhancement technique suited for CHMM
combination?
We show how to combine enhanced-speech, noise and
clean CHMM for SS scheme.
It was shown that equations for ES-PMC-SS were
straightforward.

We expect that training with a better enhancement
technique we can also obtain better recognition
performance.
Future work
Develop equations and experiments for other
enhancement techniques.
Obtain the optimal alpha for SS scheme.
Compensate in the Cepstrum Domain.

Write a Comment

User Comments (0)