Prsentation COST 277 Limerick - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Prsentation COST 277 Limerick

Description:

Laboratoire des Instruments et Syst mes d'Ile-De-France (LISIF) ... Discriminative Feature Extraction based on the Minimum Classification Error (MCE) ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 21
Provided by: Cheto
Category:

less

Transcript and Presenter's Notes

Title: Prsentation COST 277 Limerick


1
Non-Linear Speech Feature Extraction for Phoneme
Classification And Speaker Recognition
M. Chetouani, M. Faúndez-Zanuy (), B. Gas, J.L.
Zarader Laboratoire des Instruments et Systèmes
dIle-De-France (LISIF) Université Pierre Marie
Curie, PARIS, FRANCE () Escola Universitària
Politècnica de Mataró, BARCELONA, SPAIN
2
Outline
  • Feature extraction in the recognition process
  • Needs for speech feature extraction
  • A non-linear model the Neural Predictive Coding
  • Feature extraction for phoneme classification
  • Feature extraction for speaker recognition
  • Conclusions and future works

3
Speech Recognition Process
  • Speech Feature Extraction Process
  • Feature extraction is the first step of the
    recognition process.
  • Feature extraction is usually computed by
    temporal methods like the Linear Predictive
    Coding (LPC) or frequential methods like the Mel
    Frequency Cepstral Coding (MFCC) or both methods
    like Perceptual Linear Coding (PLP).
  • Limits
  • Linear methods.
  • No a priori class membership information
    (data-driven methods).

4
Feature extraction principle
Common part
Specific parts
5
Needs for Speech Feature Extraction (1)
  • First, a non-linear modelization of the speech
    production model
  • A solution Non-linear predictors (Volterra
    Filters, Neural networks)
  • Our approach An extension of the Linear
    Predictive Coding (LPC) to the non-linear domain
    by neural networks.

6
Needs for Speech Feature Extraction (2)
  • Problem generation of a great number of
    coefficients
  • Solution
  • First layer common to all the phonemes.
  • One second layer specific to each phoneme.

7
The Neural Predictive Coding (NPC) (2)
  • Principle
  • The first weights w are common to all the phoneme
    classes.
  • Each output layer is associated to one phoneme
    class.

8
Learning phase
9
Phoneme classification
  • The objective is to extract from the speech
    signal phonetic features independently to the
    speakers.
  • Common parts of the speech production model are
    modelized by the first layer.
  • The specific parts are modelized by the second
    layers.

10
Discriminative Feature Extraction based on the
Minimum Classification Error (MCE) criterion
  • The key idea is to provide discriminant
    constraints by classifiers way optimal
    constraints.
  • Simultaneous training of both the feature
    extractor and the classifier
  • Neural Predictive Coding.
  • Prototyped-based classifier The Learning Vector
    Quantization.

11
Evaluation on phoneme classification
  • Phonemes are extracted from the NTIMIT speech
    database
  • DR1 and DR2 regions (without the SA sentences).
  • 114 speakers for training and 37 for testing.
  • Classification of confusable phonemes
  • Vowels /ih/, /ey/, /eh/, /ae/,
  • Voiced plosives /b/, /d/, /g/,
  • Unvoiced plosives /p/, /t/, /k/
  • Comparisons with traditional methods LPC, MFCC,
    PLP and a non-linear model NPC without explicit
    discriminant criterion.
  • Classification by GMMs.
  • Context independent classification
    (frame-by-frame).

12
Classification rates (frame-by-frame analysis)
13
Feature extraction for speaker recognition
  • Objective
  • speaker-dependent feature extraction.
  • Speaker recognition process
  • Feature extraction is currently carried out in a
    same way for all the speakers.
  • Our approach
  • A speaker model
  • Feature extractor (NPC)
  • Reference model.

14
A new initialization method for the NPC coding
phase
  • Once the NPC model is parameterized, the coding
    phase consists in the estimation of the second
    layer weights.
  • Like for all optimization processes, the
    initialization is important
  • Traditionally, one uses random initialization
    with different constraints.
  • We use the LPC analysis for linear initialization
    of the non-linear model
  • Data-driven method for linear initialization of
    neural networks.

15
Speaker identification
  • Modification of the traditional enrolment and
    test phases
  • Enrolment phase
  • NPC parameterization phase 12 seconds are used.
  • Computation of reference models by using the
    whole sentence.
  • Test phase
  • The speech input is coded by each NPC model.
  • The obtained features are compared by the
    associated reference models.

16
Evaluation on speaker identification
  • 49 speakers from the Gaudi database
  • Acquisition with a microphone connected to a PC.
  • Vector dimension is set to 16
  • One minute of read text is used for reference
    models training. From this minute, 12 seconds are
    used for the NPC parameterization.
  • 5 sentences for testing (each sentence is about
    2-3 seconds).
  • Comparisons with traditional coding methods LPC,
    MFCC, LPCC and PLP.
  • Reference models computed by Covariance Matrices
    and the Arithmetic-Harmonic Sphericity (AHS) is
    used for comparisons.

17
Results
18
Conclusions
  • The phoneme classification rates are improved by
    non-linear methods, in comparison with
    traditional methods like LPC, MFCC and PLP.
  • The speaker identification rates are improved by
    non-linear methods with linear initialization in
    comparison with traditional methods like LPC,
    LPCC, MFCC and PLP.

19
Perspectives
  • Phoneme classification
  • Cooperation with different classifiers.
  • Application to a large number of phonemes.
  • Speaker recognition
  • Explicit Discriminative Feature Extraction.
  • Different applications identification,
    verification, tracking.

20
  • Thank you for your attention
Write a Comment
User Comments (0)
About PowerShow.com