Title: HIWIRE MEETING CRETE, SEPTEMBER 2324, 2004
1HIWIRE MEETINGCRETE, SEPTEMBER 23-24, 2004
2Schedule
- VAD for noise suppression frame-dropping
- Long-Term Spectral divergence
- Subband OS-based detector
- Non-linear feature normalization
- Histogram equalization
- OS-based equalization
- Segmental implementation
3VAD (1)
- VAD motivation
- To get an estimation of the background noise for
- Wiener filter design
- Spectral subtraction
- To discard non-speech frames
4VAD (2)
- Our approach
- Use of rather long time spans (100ms) instead of
instantaneous measures - Increase discrimination
- Use an statistical model in the log-FBE domain
- Smoother estimations
- Use a feedback decision coupled with noise
suppression - VAD works on less noisy speech
- Use of Order Statistics
- More robust estimation
5Long-Term Spectral Divergence (1)
- J. Ramírez , J.C. Segura, C. Benítez, A. de la
Torre and A.J. Rubio, Efficient voice activity
detection algorithms using long-term speech
information, Speech Communication 42 (2004)
271287
6Long-Term Spectral Divergence (2)
7Long-Term Spectral Divergence (3)
8Long-Term Spectral Divergence (4)
9Long-Term Spectral Divergence (5)
10Long-Term Spectral Divergence (7)
- Recognition experiments with AURORA 2 and 3
11Long-Term Spectral Divergence (6)
12Subband OSF VAD (1)
- J. Ramírez, J.C. Segura, C. Benítez, A. de la
Torre, and A.J. Rubio,An Effective Subband
OSF-based VAD with Noise Reduction for Robust
Speech Recognition, IEEE Trans. On Speech and
Audio Processing (to appear in 2005) - Decision is based on averaged QSNR defined as a
inter-quantile difference
- Feedback structure
- VAD operates over the noise-reduced signal
13Subband OSF VAD (2)
14Subband OSF VAD (3)
15Subband OSF VAD (4)
16Subband OSF VAD (5)
17Accurate VAD
- Open topics
- New alternatives to improve the performance
- New decision criteria based on OS- filters
- Already used for edge detection in images
- Computational efficiency
- Development of computationally efficient
algorithms
18Feature normalization
- Objective
- Transform features to remove undesired
variability - Linear techniques
- CMS
- Cepstral mean subtraction
- Removes the effect of linear channel distortion
- CMVN
- Cepstral mean and variance normalization
- Extension of CMS to deal with variance reduction
caused by the additive noise
19Feature normalization
- Non-linear feature distortion
- Environment effects are non-linear for MFCC
features - And can hardly be removed with linear techniques
- Because not only the location (mean) and scale
(variance) of the feature distributions are
affected, but also the shape (affecting higher
order moments of the distribution) - Non-linear extensions
- CDF-matching approaches (HEQ and related)
- Have been proved to be more effective than linear
ones - Give normalization for not only the two first
moments of the probability distributions
20CDF-matching based equalization
- The main idea
- Transform the features to match a given PDF
- In the one-dimensional case CDF-matching gives
the solution
21Equalization and robust classifiers
22Invariance
- CMS is invariant to additive bias
- CMVN is invariant to linear transformations
- Equalization to a reference distribution is
invariant to any invertible transformation
(including non-linear ones)
23HEQ for robust speech recognition (1)
- A. de la Torre, A.M. Peinado, J.C. Segura, J.L.
Pérez, C. Benítez and A.J. Rubio, Histogram
equalization of speech representation for robust
speech recognition, IEEE Tans. On Speech and
Audio Processing (to appear in 2005) - Transformation of each component of the MFCC
vector to a Gaussian reference - Cumulative distribution are estimated using
histograms - Performance compared with CMS, CMVN and
model-based feature compensation (VTS) - Combination with (VTS)
24HEQ for robust speech recognition (2)
25HEQ for robust speech recognition (3)
26HEQ for robust speech recognition (4)
27HEQ for robust speech recognition (5)
28Segmental HEQ (1)
- J.C. Segura, C. Benítez, A. de la Torre, A.J.
Rubio and J. Ramírez, Cepstral Domain Segmental
Nonlinear Feature Transformations for Robust
Speech Recognition, IEEE Signal Processing
Letters, 11(5), May 2004 - A segmental implementation of HEQ for
non-stationary noise - A temporal buffer is used for the histogram
estimation instead of the full sentence - The algorithmic delay is T frames
29Segmental HEQ (2)
30OSEQ An efficient implementation (1)
- A very computationally efficient algorithm based
on Order Statistics
31OSEQ An efficient implementation (2)
32Feature normalization
- Open topics
- Reference distribution
- Clean speech / Gaussian / Others?
- Dynamic features normalization (? and ? ?)
- After, before or simultaneously Obuchi, Stern,
EUSP03 - Progressive normalization
- Not all MFCC are equally affected and do not have
equal discriminative power de Wet, , ICASSP03 - Lower order moments normalization Hsu, Lee,
ICASSP04 - Parametric techniques
- Actual approaches are non-parametric Haverinen,
Kiss, EUSP03 - New applications
- Speaker independence and adaptation
- Multi-stream normalization
33Combination of techniques
- Development of a combined robust front-end
- An accurate VAD
- For noise parameter estimation
- A noise reduction technique
- Spectral subtraction or Wiener filter
- Statistical feature compensation
- A Frame-Dropping algorithm
- To discard non-speech frames
- And a Feature normalization block
- For residual non-linear distortion compensation
34VAD (1)
- Development of a combined robust front-end
35HIWIRE MEETINGCRETE, SEPTEMBER 23-24, 2004