METISS - PowerPoint PPT Presentation

About This Presentation
Title:

METISS

Description:

to design generic, robust, fast and flexible approaches to a ... COLLET. BEN. BENAROYA. BLOUET. MC DONAGH. POREE. BETSER. KIJAK. KRSTULOVIC. GONON. BEN. MORARU ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 36
Provided by: conq150
Category:
Tags: metiss | collet

less

Transcript and Presenter's Notes

Title: METISS


1
METISS
  • Modélisation
  • et Expérimentation
  • pour le Traitement
  • des Informations
  • et des Signaux
  • Sonores

Audio speech processing
INRIA-Rennes
Scientific leader Frédéric BIMBOT
Overview of activities 2002-2005
2
Introduction

3
Framework and foundations
  • General framework

analysis, processing modelling, representation
description, decomposition detection,
classification recognition
audio speech music multimedia
signals recordings streams tracks
of
Audio scene analysis, description and recognition
  • Scientific foundations
  • Probabilistic models and statistical estimation
  • Redundant systems and adaptive representations

4
Scientific objectives
  • to design generic, robust, fast and flexible
    approaches to a variety of problems in speech and
    audio segmentation, detection and classification,
    operating in the probabilistic framework
  • to investigate on theoretical properties and
    practical applications of adaptive
    representations and sparseness criteria with the
    purpose of advanced processing and structured
    description of audio signals
  • to extend and adapt approaches classically used
    in the context of speech processing to other
    classes of signals and problems
  • to study convergence between statistical
    approaches and adaptive decomposition within a
    common framework embedding signal representations
    and classification

5
Application domain and focus
  • Applicative fields
  • Security, verification, authentication, rights
    management
  • Rich audio transcription, content-based indexing,
    multi-purpose navigation, information retrieval
    and summarization
  • Advanced audio processing segmentation,
    separation, spatialisation, sound object
    extraction, music modeling
  • Audio and audio-visual authoring, production and
    repurposing
  • Education and entertainement
  • Primary focuses
  • Speaker characterisation
  • Audio structuring and indexing
  • Sparse representations theory and applications
  • Audio source separation (under-determined case)

6
Team composition
2005
2002
2003
2004
Permanent researchers (CR - CNRS or INRIA)
3
Non-permanent staff (Engineers, ATER, Post-Doc)
2
PhD - 100 with METISS
PhD 50 with METISS
3
2
Marie-Noëlle Georgeault ? administrative
assistant ( 25 )
7
Probabilistic modeling of audio signals

8
Probabilistic modeling (1)
  • 1 audio class or 1 sound object
  • ?
  • a variety of observations
  • 1 family of sounds ? 1 probabilistic model
  • 1 probability density function ? 1
    likelihood function

9
Probabilistic modeling (2)
Probabilistic modeling Statistical
estimation State-sequence decoding Bayesian
decision know-how
Detection Classification Verification Segmentation

?

Probabilistic models offer a well-understood
generic inter-operable framework for the
description and the classification of audio and
speech signals
  • Dominant position of Hidden Markov Models (HMM)
    (and variants)
  • Highly competitive field in speech processing
    (research industry)
  • More open in audio indexing (additional factors
    of complexity)

10
Challenges and positioning
Generalisation to wider classesof signals with
an audio component ? multiple scales ? multiple
sources ? multiple structures ? multiple
sensors ? multiple levels of underlying
processes ? heterogeneous streams
(audio-visual) ? external sources of knowledge
  • Robustness
  • ? to unseen acoustic conditions
  • ? to scarce training data
  • ? to poorly representative samples
  • ? to missing observations
  • ? to
  • Implementability
  • ? size
  • ? speed
  • ? scalability
  • ? distribution
  • ? etc

METISS positioning - robust training and test
methods - compact distributed algorithms -
versatility / migration of formalism -
methodology and evaluation
? speaker verification ? audio segmentation ?
broad sound-class indexing (? speech
recognition)
11
Adaptive representations

12
Adaptive representations (1)
  • Audio signal
  • diversity of structures (time, frequency,
    statistics,)
  • superimposition of objects (notes, sources,
    tracks, )

Redundant system (dictionary of atoms)
Adaptive decomposition
with
  • Selection of the best  decomposition,according
    to a given criterion
  • sparsity
  • perception criterion
  • separability
  • conditional entropy
  • Large set of vectors with various
  • scales
  • time structures
  • frequency structures
  • phases
  • statistical properties

13
Adaptive representations (2)
Sparsity criteria
Decomposition
  • ? 2 quadratic norm ? maximizes dispersion
  • ? 0 minimum non-zero coefficient ?
    NP-complete
  • ? 1 tractable  compromise 

? Pursuit algorithms (Matching Pursuit)
14
Ongoing scientific issues
  • Optimality and convergence of adaptive
    decompositions
  • Dictionary design (knowledge-based, data driven,
    )
  • Deformable, stochastic, multi-dimensional,
    atoms
  • Efficient decomposition algorithms and
    implementations
  • Application scope
  • Recent fast-growing field
  • High applicative potential
  • Intense emerging competition

15
Achievements2002-2005and selected results
  • Speaker characterisation
  • Audio structuring and indexing
  • Sparse representations theory and applications
  • Audio source separation (under-determined case)

16
Speaker characterisation
  • CART trees for scalable and distributable speaker
    verification
  • Model-based metrics and normalisations for
    speaker verification
  • Structural adaptation of speaker models
    (hierarchical Bayesian networks)
  • Methodology and algorithms for optimizing the
    coverage of a speaker database
  • Relative speaker space and metrics for efficient
    speaker indexing and retrieval ongoing

17
CART based speaker verification
Blouet, Bimbot, Gonon, et al.
direct score function assignment
?
CART Trees used as a family of approximating funct
ions
-0.8
NO
YES
0.7
NO
0.3
YES
YES
NO
-0.8
-0.4
0.7
0.9
-0.4
NO
YES
Extension to oblique trees
-0.5
0.9
NO
YES
-0.5
0.3
complexity down 200 x error rate up 33 only
EU-IST INSPIRED Project
18
Speaker recognition inthe model space (1)
Ben, Bimbot et al.
Formal links between LLR and KL-divergence mean-
only adaptation training procedure
likelihood ratio test Euclidean distance in
the model space
?
19
Speaker recognition inthe model space (2)
Ben, Bimbot et al.
Consequences - faster score computation
procedure (at least -50) - simpler
normalization schemes (M-Norm) no
need of additional development data with no
performance degradation
20
Audio indexing
  • HMM-based audio and audio-visual structuring
    (applied to sports programmes)
  • Audio segmentation and tracking using
    probabilistic models and statistical tests
  • Detection of simultaneous events in audio tracks
  • Granular models of audio signals using deformable
    atoms
  • Comparison and evaluation of beam-search
    techniques and hypothesis rescoring using
    external sources of knowledge ongoing
  • Algebraic representations and statistical
    modeling of formal music ongoing

21
Multi-stream HMM modeling (1)of a tennis match
Kijak et al. (with TMM)
multi-level state-sequence representation of a
tennis match
inspired and adapted from the speech recognition p
aradigms
? multi-stream audio-visual HMM
22
Multi-stream HMM modeling (2)
Delakis, Gravier et al. (with TexMex)
  • segmental models ?
  • relaxed synchrony
  • constraints

Video-only Shot-based C 77
VideoAudio Shot-based segmental C 85
?
23
Sparse representations
  • Mathematical test for the optimality of a sparse
    representation
  • Matching pursuit made tractable (1 hour ? 0.25 x
    RT)
  • Structured matching pursuit incorporating
    explicit signal family models
  • Adaptive computational strategies
  • Beyond sparsity recovering structured
    representations
  • Learning shift-invariant atoms (MoTIF algorithms)
    ongoing

24
Sparse solutions to inverse linear problems
Gribonval et al.
  • In the under-determined case
  • BUT if

If a sparse representation is sparse enough, then
it is the sparsest one
25
Matching Pursuit made tractable
Gribonval, Krstulovic et al.
C ToolkitGPL Licence
MPTK
flexible operation reproducible results
for a 1 hour audio signalprocessing time reduced
from 20 h ? 0.25 h
usable in other fields medical signals,
sismology, etc
26
Source separation(with primary focus on
undertermined problems)
  • Statistical schemes and adaptive training for
    single-channel separation
  • Source separation approaches using multi-channel
    Matching Pursuit in the underdetermined case
  • Contributions in evaluation methodology task
    definition performance measurements
  • Speech denoising using underdetermined
    sources separation techniques
  • Dictionary design methods for source separation
    ongoing
  • DEMIX a robust algorithm to estimate the number
    of sources using clustering techniques ongoing

27
Single sensor audio source separation
Observed signalVoice Music
Benaroya, Bimbot, Gribonval, Ozerov (with FTRD)
EstimatedVoice signal
Factorial GMM
Voice GMM
Use of a factorial GMM to build a
time-varying Wiener filter
Music GMM
Wiener filter
Article in IEEE Trans SAP 2006 new results to
come
  • innovative scheme for underdetermined source
    separation
  • compatibility with speech processing
    state-of-the-art
  • strong links with sparse decomposition problems
  • versatile and efficient for a range of audio
    description tasks

28
Underdetermined stereophonicsource separation
using sparse method
Lesage, Gribonval et al.
Mixing matrix
Separation
Audio examples available
least squares ?
sparsity ?
29
Collaborations, Disseminationand Visibility
  • Privileged cooperation with the TEXMEX group at
    IRISA ( VISTA)
  • Consistent network of academic and industrial
    partners outside IRISA
  • Regular participation to collaborative projects
    (EU-IST, RNRT, bilateral partnership, )
  • Strong involvement in concerted research actions
    (ESTER, MathSTIC, GDR-ISIS, NIST evaluations, )
  • Visible participation to and production of free
    software ELISA platform, AudioSeg, MPTK,
    SIROCCO, BSS-EVAL
  • Sustained effort of publication and dissemination
    of the group research results
  • Additional visibility through responsability
    taking in scientific societies, workshop
    organisation and editorial boards

30
Summary 2002-2005Strategy and
perspectives2006-2010

31
Achievements 2002-2005 (1)
  • solid contributions to the state-of-the art with
    respect to several topics related to speaker and
    audio class modelling and recognition
  • key extension, experimentation and validation of
    the Hidden Markov Model framework for joint audio
    and video modelling and structuring
  • major theoretical and experimental progress in
    the field of sparse representations and adaptive
    decomposition
  • pioneering work in mono- and multi-channel source
    separation in the underdetermined case

32
Achievements 2002-2005 (2)
  • strategic improvement in the efficiency of
    pursuit algorithms both in terms of search
    strategy and implementation
  • development of a usable know-how in keyword
    spotting and speech recognition
  • sustained activities in assessment methodology,
    resource distribution and evaluation campaigns
  • scientific objective 4 needs consolidation

33
Strategy 2006-2010
  • To keep our position in our initial field of
    expertise models, algorithms and tools for
    automatic processing of audio and speech signal
  • To push our advantage in the field of sparse
    representations, both from the theoretical and
    applicative viewpoint.
  • To extend our scope towards more powerful
    approaches for the representation and modeling of
    audio and multi-modal signals with an audio
    component
  • To step in and progress in the area of
    compressing large-scale high-dimensional
    multi-modal data

34
Scientific challenges
  • Probabilistic multi-level multi-stream dependency
    models for the representation of multiple sources
    and the integration of heterogeneous levels of
    knowledge in audio (-visual) streams ? Bayesian
    networks
  • Data-driven representations, model discovery and
    self-structuring of information in audio and
    audio-visual streams and contents ?
    theoretical consolidation
  • Experimental platforms and numerically efficient
    algorithms for large scale data and near
    real-time processing ? engineering work
  • Deeper understanding of the links between
    theoretical concepts of adaptive representation,
    sparse decomposition, multi-scale analysis and
    pratical implications in terms of robustness,
    separability and adaptability ? potential
    links with SVM
  • Compressing large-scale high-dimensional
    multimodal data for storage, description and
    classification ? compressed sensing

35
Questions
Write a Comment
User Comments (0)
About PowerShow.com