HIWIRE MEETING CRETE, SEPTEMBER 2324, 2004 - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

HIWIRE MEETING CRETE, SEPTEMBER 2324, 2004

Description:

VAD for noise suppression & frame-dropping. Long-Term Spectral ... Decision is based on averaged QSNR defined as a inter-quantile difference. Feedback structure ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 36
Provided by: cvspC
Category:

less

Transcript and Presenter's Notes

Title: HIWIRE MEETING CRETE, SEPTEMBER 2324, 2004


1
HIWIRE MEETINGCRETE, SEPTEMBER 23-24, 2004
  • JOSÉ C. SEGURA LUNA

2
Schedule
  • VAD for noise suppression frame-dropping
  • Long-Term Spectral divergence
  • Subband OS-based detector
  • Non-linear feature normalization
  • Histogram equalization
  • OS-based equalization
  • Segmental implementation

3
VAD (1)
  • VAD motivation
  • To get an estimation of the background noise for
  • Wiener filter design
  • Spectral subtraction
  • To discard non-speech frames

4
VAD (2)
  • Our approach
  • Use of rather long time spans (100ms) instead of
    instantaneous measures
  • Increase discrimination
  • Use an statistical model in the log-FBE domain
  • Smoother estimations
  • Use a feedback decision coupled with noise
    suppression
  • VAD works on less noisy speech
  • Use of Order Statistics
  • More robust estimation

5
Long-Term Spectral Divergence (1)
  • J. Ramírez , J.C. Segura, C. Benítez, A. de la
    Torre and A.J. Rubio, Efficient voice activity
    detection algorithms using long-term speech
    information, Speech Communication 42 (2004)
    271287

6
Long-Term Spectral Divergence (2)
7
Long-Term Spectral Divergence (3)
8
Long-Term Spectral Divergence (4)
9
Long-Term Spectral Divergence (5)
10
Long-Term Spectral Divergence (7)
  • Recognition experiments with AURORA 2 and 3

11
Long-Term Spectral Divergence (6)
12
Subband OSF VAD (1)
  • J. Ramírez, J.C. Segura, C. Benítez, A. de la
    Torre, and A.J. Rubio,An Effective Subband
    OSF-based VAD with Noise Reduction for Robust
    Speech Recognition, IEEE Trans. On Speech and
    Audio Processing (to appear in 2005)
  • Decision is based on averaged QSNR defined as a
    inter-quantile difference
  • Feedback structure
  • VAD operates over the noise-reduced signal

13
Subband OSF VAD (2)
14
Subband OSF VAD (3)
15
Subband OSF VAD (4)
16
Subband OSF VAD (5)
17
Accurate VAD
  • Open topics
  • New alternatives to improve the performance
  • New decision criteria based on OS- filters
  • Already used for edge detection in images
  • Computational efficiency
  • Development of computationally efficient
    algorithms

18
Feature normalization
  • Objective
  • Transform features to remove undesired
    variability
  • Linear techniques
  • CMS
  • Cepstral mean subtraction
  • Removes the effect of linear channel distortion
  • CMVN
  • Cepstral mean and variance normalization
  • Extension of CMS to deal with variance reduction
    caused by the additive noise

19
Feature normalization
  • Non-linear feature distortion
  • Environment effects are non-linear for MFCC
    features
  • And can hardly be removed with linear techniques
  • Because not only the location (mean) and scale
    (variance) of the feature distributions are
    affected, but also the shape (affecting higher
    order moments of the distribution)
  • Non-linear extensions
  • CDF-matching approaches (HEQ and related)
  • Have been proved to be more effective than linear
    ones
  • Give normalization for not only the two first
    moments of the probability distributions

20
CDF-matching based equalization
  • The main idea
  • Transform the features to match a given PDF
  • In the one-dimensional case CDF-matching gives
    the solution

21
Equalization and robust classifiers
22
Invariance
  • CMS is invariant to additive bias
  • CMVN is invariant to linear transformations
  • Equalization to a reference distribution is
    invariant to any invertible transformation
    (including non-linear ones)

23
HEQ for robust speech recognition (1)
  • A. de la Torre, A.M. Peinado, J.C. Segura, J.L.
    Pérez, C. Benítez and A.J. Rubio, Histogram
    equalization of speech representation for robust
    speech recognition, IEEE Tans. On Speech and
    Audio Processing (to appear in 2005)
  • Transformation of each component of the MFCC
    vector to a Gaussian reference
  • Cumulative distribution are estimated using
    histograms
  • Performance compared with CMS, CMVN and
    model-based feature compensation (VTS)
  • Combination with (VTS)

24
HEQ for robust speech recognition (2)
25
HEQ for robust speech recognition (3)
26
HEQ for robust speech recognition (4)
27
HEQ for robust speech recognition (5)
28
Segmental HEQ (1)
  • J.C. Segura, C. Benítez, A. de la Torre, A.J.
    Rubio and J. Ramírez, Cepstral Domain Segmental
    Nonlinear Feature Transformations for Robust
    Speech Recognition, IEEE Signal Processing
    Letters, 11(5), May 2004
  • A segmental implementation of HEQ for
    non-stationary noise
  • A temporal buffer is used for the histogram
    estimation instead of the full sentence
  • The algorithmic delay is T frames

29
Segmental HEQ (2)
30
OSEQ An efficient implementation (1)
  • A very computationally efficient algorithm based
    on Order Statistics

31
OSEQ An efficient implementation (2)
32
Feature normalization
  • Open topics
  • Reference distribution
  • Clean speech / Gaussian / Others?
  • Dynamic features normalization (? and ? ?)
  • After, before or simultaneously Obuchi, Stern,
    EUSP03
  • Progressive normalization
  • Not all MFCC are equally affected and do not have
    equal discriminative power de Wet, , ICASSP03
  • Lower order moments normalization Hsu, Lee,
    ICASSP04
  • Parametric techniques
  • Actual approaches are non-parametric Haverinen,
    Kiss, EUSP03
  • New applications
  • Speaker independence and adaptation
  • Multi-stream normalization

33
Combination of techniques
  • Development of a combined robust front-end
  • An accurate VAD
  • For noise parameter estimation
  • A noise reduction technique
  • Spectral subtraction or Wiener filter
  • Statistical feature compensation
  • A Frame-Dropping algorithm
  • To discard non-speech frames
  • And a Feature normalization block
  • For residual non-linear distortion compensation

34
VAD (1)
  • Development of a combined robust front-end

35
HIWIRE MEETINGCRETE, SEPTEMBER 23-24, 2004
  • JOSÉ C. SEGURA LUNA
Write a Comment
User Comments (0)
About PowerShow.com