Title: Advances in WP1
1Advances in WP1
- Nancy Meeting 6-7 July 2006
www.loquendo.com
2WP1 Environment Sensor RobustnessT1.2 Noise
Independence
- Noise Reduction
- Spectral Subtraction (YEAR 1) and Spectral
Attenuation (YEAR2) - Automatic Speech Recognition
- With a Modified Ephraim-Malah Rule,
- Roberto Gemello, Franco Mana and Renato De Mori
- IEEE Signal Processing Letters, VOL 13, NO 1,
January 2006 - Evaluation of HEQ for feature normalization
- (HEQ study Revision 2)
3Denoising Techniques for Y2 evaluations (1)
Spectral Attenuation (or spectral weighting) is a
form of audio signal enhancement in which noise
suppression can be viewed as the application of a
suppression rule, or non-negative real-valued
gain Gk, to each bin k of the observed signal
magnitude spectrum, in order to form an estimate
of the original signal magnitude spectrum.
4Denoising Techniques for Y2 evaluations (2)
We propose to make the estimation of the a priori
and the a posteriori SNR dependent on the noise
overestimation factor a(m) and the spectral floor
b(m) as follows
5Denoising Techniques for Y2 evaluations (3)
The noise spectrum amplitude is obtained by a
first-order recursion in conjunction with an
energy based Voice Activity Detector (VAD) as
follows
Where ? controls the update speed of the
recursion (0.9), ? controls the allowed dynamics
of noise (4.0), and the noise standard deviation
?(m) is estimated as
6Baseline evaluations of Loquendo ASR on Aurora2
speech databases
7Year 12 Performance evaluations
The testing conditions used in the experiments
are the following 1) No Denoising (ND) Rasta
PLP features (RPLP) are used without any
preliminary noise reduction. 2) Wiener modified
(WM) RPLP with Wiener filtering dependent on
global SNR. 3) Ephraim-Malah modified (EMM) RPLP
with noise reduction based on the modified
Ephraim-Malah spectral attenuation rule.
Test A Test A Test B Test B Test C Test C A-B-C Avg A-B-C Avg
Models Clean Multi Clean Multi Clean Multi Clean Multi
ND 24.4 6.5 22.5 8.9 24.7 9.8 23.7 8.1
WM 16.0 (34.4) 6.1 (6.1) 15.6 (30.7) 7.9 (11.2) 16.7 (32.4) 9.5 (3.0) 16.0 (32.5) 7.5 (7.4)
EMM 14.7 (39.7) 6.0 (7.7) 15.8 (29.8) 8.0 (10.1) 15.2 (38.5) 8.9 (9.2) 15.2 (35.9) 7.4 (8.6)
8Baseline evaluations of Loquendo ASR on Aurora3
speech databases
9Year 12 Performance evaluations
The testing conditions used in the experiments
are the following 1) No Denoising (ND) Rasta
PLP features (RPLP) are used without any
preliminary noise reduction. 2) Wiener modified
(WM) RPLP with Wiener filtering dependent on
global SNR. 3) Ephraim-Malah modified (EMM) RPLP
with noise reduction based on the modified
Ephraim-Malah spectral attenuation rule.
Ita WM Ita HM Spa WM Spa HM
ND 1.8 53.4 2.7 25.4
WM 1.7 (5.5) 22.5 (57.9) 2.4 (11.1) 10.1 (60.2)
EMM 1.6 (11.1) 17.8 (66.7) 2.3 (14.8) 11.5 (54.7)
10Baseline evaluations of Loquendo ASR on Aurora4
speech databases
11Year 12 Performance evaluations
The testing conditions used in the experiments
are the following 1) No Denoising (ND) Rasta
PLP features (RPLP) are used without any
preliminary noise reduction. 2) Wiener modified
(WM) RPLP with Wiener filtering dependent on
global SNR. 3) Ephraim-Malah modified (EMM) RPLP
with noise reduction based on the modified
Ephraim-Malah spectral attenuation rule.
CLEAN Models CLEAN Car Babble Restaurant Street Airport Train Station Noise avg.
ND 14.8 45.7 76.9 70.6 66.0 70.7 67.7 66.3
WM 14.8 (00.0) 33.0 (27.8) 63.4 (17.5) 69.3 (1.8) 56.9 (13.8) 68.1 (3.7) 51.2 (24.4) 57.0 (14.0)
EMM 14.5 (2.02) 29.6 (35.2) 62.9 (18.2) 68.4 (3.1) 54.2 (17.8) 68.4 (3.2) 46.3 (31.6) 55.0 (17.0)
12Year 12 Performance evaluations
The testing conditions used in the experiments
are the following 1) No Denoising (ND) Rasta
PLP features (RPLP) are used without any
preliminary noise reduction. 2) Wiener modified
(WM) RPLP with Wiener filtering dependent on
global SNR. 3) Ephraim-Malah modified (EMM) RPLP
with noise reduction based on the modified
Ephraim-Malah spectral attenuation rule.
MULTI Models CLEAN Car Babble Restaurant Street Airport Train Station Noise avg.
ND 15.7 24.8 40.1 41.8 41.9 39.1 42.3 38.3
WM 16.6 (-5.7) 24.1 (2.8) 39.7 (1.0) 43.2 (-3.3) 39.6 (5.5) 39.5 (-1.0) 37.1 (12.3) 37.2 (2.9)
EMM 15.5 (1.3) 24.7 (0.4) 40.4 (-0.7) 44.2 (-5.7) 39.5 (5.7) 40.4 (-3.3) 38.2 (9.7) 37.9 (1.0)
13HEQ Denoising techniques
14HEQ Evaluation Revision 1 (1)(Loquendo UGR)
Problems (1) Context dependency (whole
utterance CDF estimation the best) (2) High
variability in background noise segment
15HEQ Integration Revision 1 (2)(Loquendo UGR)
Phoneme-based Models
Feature Normalization (Frame -39coeff- level)
Denoise (Power Spectrum level)
AURORA3 ITA - HM SA WA WI WD WS
Loquendo 46.6 77.5 4.8 7.2 10.4
HEQ121 38.2 69.6 4.3 12.6 13.5
HEQ121 37.9 69.1 3.5 13.8 13.5
HEQ1001 46.5 77.7 4.0 7.3 11.0
16HEQ Evaluation Revision 2 (3)(Loquendo UGR)
HEQ (1573)
E12CEP DE12DEP DDE12DDEP (39 coefficients)
HEQ (1573)
HEQ (1573)
Benefits (1) Relation in magnitude and dynamics
among coefficients are preserved (2) More stable
CDF estimation similar to extend the HEQ temporal
window
17HEQ Evaluation Revision 2 (4)(Loquendo UGR)
AURORA3 ITA - HM SA WA WI WD WS
WM 46.6 77.5 4.8 7.2 10.4
HEQ121 47.9 77.7 5.1 6.7 10.5
HEQ241 49.7 79.7 4.3 6.6 9.3
WMHEQ121 49.0 79.2 5.1 5.7 10.0
WMHEQ241 50.8 79.8 4.6 6.1 9.4
18HEQ for denoising (5)(Loquendo UGR)
Comparing RPLP / HEQrev1 / HEQrev2 using the
same clean and noisy signal
19HEQ for signal level equalization (6)(Loquendo
UGR)
Comparing RPLP / HEQrev1 / HEQrev2 using the same
clean signal at normal gain level and at
low gain level
20WP1 Workplan
- Selection of suitable benchmark databases
(m6) - Completion of LASR baseline experimentation of
Spectral Subtraction (Wiener SNR dependent)
(m12) - Discriminative VAD (trainingAURORA3 testing)
(m16) - Exprimentation of Spectral Attenuation rule
(Ephraim-Malah SNR dependent)
(m21) - Preliminary results on spectral subtraction and
HEQ techniques (m24) - Integration of denoising and normalization
techniques (m33) - Noise estimation and reduction for non-stationary
noises (m33)