Title: HIWIRE MEETING Athens, November 34, 2005
1HIWIRE MEETINGAthens, November 3-4, 2005
- José C. Segura, Ángel de la Torre
2Schedule
- HIWIRE database evaluations
- Non-linear feature normalization
- ECDF segmental implementation
- Parametric equalization
- Robust VAD
- Bispectrum-based VAD
- Model-based feature compensation
- VTS results on AURORA4
- Including uncertainty caused by noise
3HIWIRE database evaluations
- PARAMETERS MFCC_0_D_A_Z (39 component)
- MODELS
- TIMIT 46 phone models / 3 states / 128 Gaussians
(17.664 G) - WSJ16k 16.825 triphones / 3.608 tied-states / 6
Gaussians (21.648 G) - WSJ16kFon 40 phone models / 3 states / 128
Gaussians (15.360 G) - ADAPTATION
- MLLR 32 regression classes / 50 adaptation
utterances - GRAMMAR
- LORIA Word-Loop
- MODIFICATIONS Some transcriptions have been
modified to match the grammar definition
4Transcription modifications
BEGIN lista LISTA nfrase 0
linea0 gsub("-","_",linea)
gsub("Due_to_","Due_to ",linea)
gsub("Mayday_Mayday","Mayday Mayday",linea)
gsub("Pan_Pan","Pan Pan",linea) gsub("three
hundred twenty","three_hundred_twenty",linea)
gsub("one hundred sixty","one_hundred_sixty",linea
) printf("s\n",tolower(linea)) nfrase
nfrase1
5HIWIRE database results
6Schedule
- HIWIRE database evaluations
- Non-linear feature normalization
- ECDF segmental implementation
- Parametric equalization
- Robust VAD
- Bispectrum-based VAD
- Model-based feature compensation
- VTS results on AURORA4
- Including uncertainty caused by noise
7ECDF segmental implementation
- ECDF segmental implementation
- Provided LOQUENDO with a reference C
implementation of segmental Gaussian
transformation to be tested within LOQUENDO
recognizer - Current work
- Nonlinear feature transformation with a clean
reference to avoid the problem of system
retraining
8Parametric Equalization (1)
PARAMETRIC NONLINEAR FEATURE EQUALIZATION FOR
ROBUST SPEECH RECOGNITION (submitted ICASSP06)
- HEQ limitations
- Influence of relative amount of silence in
utterances - With a parametric model, a more robust
equalization can be obtained
9Parametric Equalization (2)
10Parametric Equalization (3)
11Parametric Equalization (4)
- In comparison with HEQ, PEQ transformations are
smoother - For C0 a monotonic transformation is obtained
- For other coefficients, the interpolated
transformation is not monotonic
12Parametric Equalization (5)
- BASE
- MFCC_0_D_A_Z (39 component)
- HEQ
- Quantile based CDF-transformation
- Clean reference
- Implemented over MFCC_0 / CMS and regressions
computed after HEQ - AFE
- Standard implementation
- PEQ
- Clean reference
- Implemented over MFCC_0 / CMS and regressions
computed after PEQ
13Parametric Equalization (6)
- Current work
- Development of an on-line version
- Relax the diagonal covariance assumption
- Investigate the normalization of dynamic features
- Using a more detailed model of speech frames
- (i.e. More than one Gaussian)
14Schedule
- HIWIRE database evaluations
- Non-linear feature normalization
- ECDF segmental implementation (LOQ)
- Parametric equalization
- Robust VAD
- Bispectrum-based VAD
- Model-based feature compensation
- VTS results on AURORA4
- Including uncertainty caused by noise
15Bispectrum-based VAD (1)
- Motivations
- Ability of higher order statistics to detect
signals in noise - Polyspectra methods rely on an a priori knowledge
of the input processes - Issues to be addressed
- Computationally expensive
- Variance of the bispectrum estimators is much
higher than that of power spectral estimators for
identical data record size - Solution Integrated bispectrum
- J. K. Tugnait, Detection of non-Gaussian signals
using integrated polyspectrum, IEEE Trans. on
Signal Processing, vol. 42, no. 11, pp.
31373149, 1994. - Computationally efficient and reduced variance
statistical test based on the integrated
polyspectra - Detection of an unknown random, stationary,
non-Gaussian signal in Gaussian noise
16Bispectrum-based VAD (2)
- Integrated bispectrum
- Defined as a cross spectrum between the signal
and its square, and therefore, it is a function
of a single frequency variable - Benefits
- Its computation as a cross spectrum leads to
significant computational savings - The variance of the estimator is of the same
order as that of the power spectrum estimator - Properties
- Bispectrum of a Gaussian process is identically
zero, its integrated bispectrum is as well
17Bispectrum-based VAD (3)
- Two alternatives explored for formulating the
decision rule - Estimation by block averaging
- MO-LRT
- Given a set of N 2m1 consecutive observations
18Bispectrum-based VAD (4)
19Bispectrum-based VAD results (1)
20Bispectrum-based VAD results (2)
21Bispectrum-based VAD results (3)
22Schedule
- HIWIRE database evaluations
- Non-linear feature normalization
- ECDF segmental implementation (LOQ)
- Parametric equalization
- Robust VAD
- Bispectrum-based VAD
- Model-based feature compensation
- VTS results on AURORA4
- Including uncertainty caused by noise
23Schedule
- Model-based feature compensation
- VTS results on AURORA4
- VTS formulation
- VTS vs non linear feature normalization
procedures - VTS results on AURORA 4
- Including uncertainty caused by noise
- Including uncertainty in noise compensation
- Wiener filtering uncertainty results on Aurora
2 - Wiener filtering uncertainty results on Aurora
4 - VTS uncertainty formulation
- Numerical integration of probabilities
formulation
24VTS formulation
- VTS Vector Taylor Series approach to remove
additive (and channel) noise - References
- P.J. Moreno. Speech recognition in noisy
environments Ph.D. Thesis, Carnegie-Mellon
University, Pittsburgh, Pensilvania, Apr. 1996. - A. de la Torre. Técnicas de mejora de la
representación en los sistemas de reconocimiento
automático del habla Ph.D. Thesis, University of
Granada, Spain, Apr. 1999.
25VTS formulation
- VTS provides an estimation of the clean speech in
a statistical framework - Log-FBO domain, assumed additive noise
- Effect of noise described using the correction
function g()
26VTS formulation
- Auxiliary functions f() and h() 1st and 2nd
derivatives - VTS provides estimation of noisy-speech Gaussian
given the clean-speech and the noise Gaussians - Noisy-speech Gaussian obtained with the expected
values
27VTS formulation
- Noisy-speech Gaussian formulas
- Models for noise and clean speech
28VTS formulation
- Model for clean speech provides the model for
noisy speech, and also P(ky) (posterior
probability of each Gaussian) - Estimation of clean speech
29VTS vs non-linear feature normalization
- VTS
- Statistical framework
- Model for noise in log-FBO domain 1 Gaussian PDF
- Model for clean-speech in log-FBO domain
Gaussian mixture - Noise assumed to be additive in FBO domain
- Accurate description of noise process
- ACCURATE COMPENSATION
- Non-linear feature normalization
- No a-priori assumption
- Component-by-component
- MORE FLEXIBLE, LESS ACCURATE
30VTS results on AURORA 4
31Including uncertainty in noise compensation
- Noise is a random process we do not know n,
but p(n) - Then, from an observation y we cannot find x,
but p(xy,?x,?n) - Usually, compensation procedures provide
Exy,?x,?n - What about uncertainty of x ?
- Mean and variance of x
32Including uncertainty in noise compensation
33Including uncertainty in noise compensation
- An approach for the estimation of the variance
- Evaluation of HMM Gaussians
34Wiener filt. uncertainty results on AURORA 2
- Preliminary results with Wiener filtering
- Results on Aurora 2 with Wiener filtering
uncertainty
35Wiener filter uncertainty results on AURORA 4
36VTS uncertainty formulation
- VTS based estimation of clean speech
- VTS based estimation of variance
37Numerical integration of probabilities
formulation
- Computation of expected values
- Numerical integration of expected values
38HIWIRE MEETINGAthens, November 3-4, 2005
- José C. Segura, Ángel de la Torre