HIWIRE MEETING CRETE, SEPTEMBER 2324, 2004

About This Presentation

Title:

HIWIRE MEETING CRETE, SEPTEMBER 2324, 2004

Description:

VAD for noise suppression & frame-dropping. Long-Term Spectral ... Decision is based on averaged QSNR defined as a inter-quantile difference. Feedback structure ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 36

Provided by: cvspC

Category:

more less

Transcript and Presenter's Notes

Title: HIWIRE MEETING CRETE, SEPTEMBER 2324, 2004

1
HIWIRE MEETINGCRETE, SEPTEMBER 23-24, 2004

JOSÉ C. SEGURA LUNA

2
Schedule

VAD for noise suppression frame-dropping
Long-Term Spectral divergence
Subband OS-based detector
Non-linear feature normalization
Histogram equalization
OS-based equalization
Segmental implementation

3
VAD (1)

VAD motivation
To get an estimation of the background noise for
Wiener filter design
Spectral subtraction
To discard non-speech frames

4
VAD (2)

Our approach
Use of rather long time spans (100ms) instead of
instantaneous measures
Increase discrimination
Use an statistical model in the log-FBE domain
Smoother estimations
Use a feedback decision coupled with noise
suppression
VAD works on less noisy speech
Use of Order Statistics
More robust estimation

5
Long-Term Spectral Divergence (1)

J. Ramírez , J.C. Segura, C. Benítez, A. de la
Torre and A.J. Rubio, Efficient voice activity
detection algorithms using long-term speech
information, Speech Communication 42 (2004)
271287

6
Long-Term Spectral Divergence (2)
7
Long-Term Spectral Divergence (3)
8
Long-Term Spectral Divergence (4)
9
Long-Term Spectral Divergence (5)
10
Long-Term Spectral Divergence (7)

Recognition experiments with AURORA 2 and 3

11
Long-Term Spectral Divergence (6)
12
Subband OSF VAD (1)

J. Ramírez, J.C. Segura, C. Benítez, A. de la
Torre, and A.J. Rubio,An Effective Subband
OSF-based VAD with Noise Reduction for Robust
Speech Recognition, IEEE Trans. On Speech and
Audio Processing (to appear in 2005)
Decision is based on averaged QSNR defined as a
inter-quantile difference

Feedback structure
VAD operates over the noise-reduced signal

13
Subband OSF VAD (2)
14
Subband OSF VAD (3)
15
Subband OSF VAD (4)
16
Subband OSF VAD (5)
17
Accurate VAD

Open topics
New alternatives to improve the performance
New decision criteria based on OS- filters
Already used for edge detection in images
Computational efficiency
Development of computationally efficient
algorithms

18
Feature normalization

Objective
Transform features to remove undesired
variability
Linear techniques
CMS
Cepstral mean subtraction
Removes the effect of linear channel distortion
CMVN
Cepstral mean and variance normalization
Extension of CMS to deal with variance reduction
caused by the additive noise

19
Feature normalization

Non-linear feature distortion
Environment effects are non-linear for MFCC
features
And can hardly be removed with linear techniques
Because not only the location (mean) and scale
(variance) of the feature distributions are
affected, but also the shape (affecting higher
order moments of the distribution)
Non-linear extensions
CDF-matching approaches (HEQ and related)
Have been proved to be more effective than linear
ones
Give normalization for not only the two first
moments of the probability distributions

20
CDF-matching based equalization

The main idea
Transform the features to match a given PDF
In the one-dimensional case CDF-matching gives
the solution

21
Equalization and robust classifiers
22
Invariance

CMS is invariant to additive bias
CMVN is invariant to linear transformations
Equalization to a reference distribution is
invariant to any invertible transformation
(including non-linear ones)

23
HEQ for robust speech recognition (1)

A. de la Torre, A.M. Peinado, J.C. Segura, J.L.
Pérez, C. Benítez and A.J. Rubio, Histogram
equalization of speech representation for robust
speech recognition, IEEE Tans. On Speech and
Audio Processing (to appear in 2005)
Transformation of each component of the MFCC
vector to a Gaussian reference
Cumulative distribution are estimated using
histograms
Performance compared with CMS, CMVN and
model-based feature compensation (VTS)
Combination with (VTS)

24
HEQ for robust speech recognition (2)
25
HEQ for robust speech recognition (3)
26
HEQ for robust speech recognition (4)
27
HEQ for robust speech recognition (5)
28
Segmental HEQ (1)

J.C. Segura, C. Benítez, A. de la Torre, A.J.
Rubio and J. Ramírez, Cepstral Domain Segmental
Nonlinear Feature Transformations for Robust
Speech Recognition, IEEE Signal Processing
Letters, 11(5), May 2004
A segmental implementation of HEQ for
non-stationary noise
A temporal buffer is used for the histogram
estimation instead of the full sentence
The algorithmic delay is T frames

29
Segmental HEQ (2)
30
OSEQ An efficient implementation (1)

A very computationally efficient algorithm based
on Order Statistics

31
OSEQ An efficient implementation (2)
32
Feature normalization

Open topics
Reference distribution
Clean speech / Gaussian / Others?
Dynamic features normalization (? and ? ?)
After, before or simultaneously Obuchi, Stern,
EUSP03
Progressive normalization
Not all MFCC are equally affected and do not have
equal discriminative power de Wet, , ICASSP03
Lower order moments normalization Hsu, Lee,
ICASSP04
Parametric techniques
Actual approaches are non-parametric Haverinen,
Kiss, EUSP03
New applications
Speaker independence and adaptation
Multi-stream normalization