Title: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI
1Some activities on Non-linear Speech Processing
at ENST/CNRS-LTCI
GĂ©rard CHOLLETchollet_at_tsi.enst.fr
ENST/CNRS-LTCI46 rue Barrault75634 PARIS cedex
13http//www.tsi.enst.fr/chollet
2Outline
- What is ENST/CNRS-LTCI ?
- Research and application topics related to
COST-277 - Speech production and perception,
- Speech analysis and synthesis,
- Speech coding
- The SYMPATEX project
- Automatic speech recognition
- The SIROCCO project
- Speaker characterisation and verification
- Perspectives within COST-277
3Our affiliations
ENST Ecole Nationale Supérieure des
Télécommunicationshttp//www.enst.fr CNRS
Centre National de la Recherche
Scientifiquehttp//www.cnrs.fr LTCI
Laboratoire de Traitement et Communication de
lInformation http//www.enst.fr/externe/ura.html
4What is ENST?Ecole Nationale de
Télécommunications
- classed among the
- Grandes Ecoles d'Ingénieurs.
- 250 state certified engineers
- each year .
- part of Groupement des Ecoles
- de Télécommunications
5GET Groupement des Ecoles de Télécommunications
- ENST-Paris ( )
- ENST-Bretagne in Brest
- Institut National des Télécommunications in Evry
- EURECOM in Sophia-Antipolis
- ENIC (Ecole Nouvelle dIngénieurs en Télécoms) in
Lille - Internet school in Marseille
6Speech Production and Perception
- Parametric Vocal Tract model (Shinji Maeda)
- Non-linear Production model using Distinctive
Regions and Modes (René Carré) - Quantal nature of speech (R. Carré and S. Maeda)
- Perceptual filter (Nicolas Moreau)
- Auditory prosthesis (Alain Goyé and Jacques Prado)
7Speech analysis and synthesis
- Time-Frequency representations, Wavelets
- Time-dependent spectral models (Yves Grenier)
- HNM (Harmonics Noise Model)(Olivier Cappé,
Eric Moulines, Maurice Charbit) - Glottal Excited LPC
8Time-dependent Spectral Models
- Temporal Decomposition (B. Atal, 1983)
- Vectorial Autoregressive models with detection of
model ruptures (A. DeLima, Y. Grenier) - Segmental parameterisation using a time-dependent
polynomial expansion (Y. Grenier)
9Temporal Decomposition
10HNM Harmonics Noise Model
11 A L I S P
- A utomatic
- L anguage
- I ndependent
- S peech
- P rocessing
Automatic discovery of segmental units for
speech coding, synthesis, recognition,
language identification and speaker verification.
12Speech Coding by indexing
SYMPATEX
SYstème de Messagerie unifiée avec présentation
vocale des messages (PArole et TEXte)
Thomson-CSF, ELAN TTS, Irius
GET, ESIEE
13 Coding principle ?
14 ? Decoding
15Automatic Speech Recognition
- Recognition of proper names and spellings
- Keyword spotting, noise robustness, adaptation
- Large Vocabulary Speech Recognition (SIROCCO)
- http//perso.enst.fr/sirocco/index-en.htm
l - Markov Random Fields, Bayesian Networks and
Graphical Models
16Markov Random Fields Bayesian Networks and
Graphical Models
- Speech modelling with state constrained
- Markov Random Field over Frequency bands
- (Guillaume Gravier and Marc Sigelle)
- http//perso.enst.fr/ggravier/recherche.h
tmlthese - Comparative framework to study MRF,
- Bayesian Networks and Graphical Models.
- http//www.cs.berkeley.edu/murphyk/Bayes/bay
es.html
17Speaker Verification
- Typology of approaches (EAGLES Handbook)
- Text dependent
- Public password
- Private password
- Customized password
- Text prompted
- Text independent
- Incremental enrolment
- Evaluation
18Speaker Verification (text independent)
- The ELISA consortium
- ENST, LIA, IRISA, ...
- http//www.lia.univ-avignon.fr/equipes/RAL/elisa/i
ndex_en.html - NIST evaluations
- http//www.nist.gov/speech/tests/spk/index.htm
19Support Vector Machines and Speaker
Verification
- Hybrid GMM-SVM system is proposed
- SVM scoring model trained on development data to
classify true-target speakers access and
impostors access,using new feature
representation based on GMMs
20SVM principles
21Results
22Voice technology in Majordome
- Server side background tasks
- continuous speech recognition applied to voice
messages upon reception - Detection of senders name and subject
- User interaction
- Speaker identification and verification
- Speech recognition (receiving user commands
through voice interaction) - Text-to-speech synthesis (reading text summaries,
E-mails or faxes)
23Perspectives within COST-277
- Text-book on Speech Processing
- Evaluation of parametric representations of
speech for diverse applications - Fundamental work on voice transformations with
applications in coding, synthesis, recognition
and speaker characterisation - Fundamental work on noise robustness with
applications in coding, recognition and speaker
verification