Dijana Petrovska-Delacr - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Dijana Petrovska-Delacr

Description:

1: DIVA Group, University of Fribourg. 2: GET-ENST, CNRS-LTCI, Paris ... system complement usefully the short therm frequency informations present in the ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 18
Provided by: petr49
Category:

less

Transcript and Presenter's Notes

Title: Dijana Petrovska-Delacr


1
ALISP based improvement of GMMs for
Text-independent Speaker Verification
  • Dijana Petrovska-Delacrétaz 1 Asmaa el Hannani1
  • Gérard Chollet 2
  • 1 DIVA Group, University of Fribourg2
    GET-ENST, CNRS-LTCI, Paris
  • 3-4 December 2003, Biometrics Tutorials, Uni.
    Fribourg

2
Overview
  • 1. Why segmental speaker verification systems ?
  • 2. Speech segmentation problems
  • 3. Proposed segmental system based on DTW
    distance measure
  • 4. Experimental setup
  • 5. Results
  • 6. Conclusions and perspectives

3
1 Why segmental speaker verification systems ?
  • Current reference speaker verification systems
    are based on Gaussian Mixture Models (each speech
    frame is treated independently)
  • Speech is composed of different sounds
  • Phonemes have different discriminant
    characteristics for speaker verification (see
    Eatock, al. 94, J.Olsen 97, Petrovska al.98,
    2000)
  • nasals and vowels convey more speaker
    characteristics than other speech classes
  • we would like to exploit this fact
  • We need a automatic speech segmentation tool !

4
1.1 Advantages and disadvantages of the speech
segmentation
  • Problems
  • Need of a speech segmentation tool
  • Speaker modeling per speech classes gt more data
    needed
  • More complicated systems
  • Advantages
  • Possibility to use it in combination with a
    dialogue based systems, for which a speech
    segmentation is already done
  • Possibility to introduce text-prompted speaker
    verification, designed to include a maximum
    number of speaker specific units

5
2 Speech Segmentation
  • Large Vocabulary Continuous Speech Recognition
    (LVCSR) System
  • good results for a small set of languages
  • need huge amount of annotated speech data
  • language (and task) dependent
  • we do not have such a for American English

6
2.1 ALISP Speech Segmentation
  • Data-driven speech segmentation
  • not yet usable for speech recognition purposes
  • no annotated databases needed
  • language and task independent
  • we could use it to segment the speech data for a
    text-independent speaker verification task
  • We will use the data driven speech segmentation
    method ALISP (Automatic Language Independent
    Speech Processing)

7
2.2 ALISP principles
8
3 Proposed speaker verification system ALISP
segments and DTW 3.1 Segmentation problem
  • Segmentation of the speech data with N ALISP HMM
    models
  • N 64 speech classes
  • Need of (not transcribed) speech data, to train
    the 64 ALISP HMM models
  • With so much speech classes we should change the
    speaker modeling method , not enough data for GMM
    adaptationgt
  • Use of Dynamic Time Warping (DTW)

9
3.2 DTW distance measure for speaker verification
  • Dynamic Time Warping (DTW) was already used for
    speaker verification, in a text-dependent mode
    (Rosenberg 76, Rabiner Schafer 76, Furui 81,
    Pandit and Kittler 98)
  • The DTW distance measure between two speech
    segments conveys speaker specific characteristics
  • Originality used DTW in text-independent mode
  • We first proceed to the segmentation of speech
    data in ALISP classes
  • Measure the distance between speaker and
    non-speaker segments
  • Speaker specific information is extracted from
    the
  • ALISP based speech segments gt Client
    Dictionary
  • Non-speaker (world speakers)
  • ALISP based speech segments gt World Dictionary

10
3.3 Searching in the client and world speech
dictionaries for speaker verification purposes
11
4 Evaluation of the proposed system
experimental setup
  • Development data one subset from NIST 2002
    cellular data (American English)
  • world speakers (60 female 59 male)
  • used to train the ALISP speech segmenter
  • and to model the non-speakers (world speakers)
  • Evaluated on
  • another subset from NIST 2002 (111 79 male
    speakers)

12
4.1 Speech segmentation example
  • 2 another occurrences of the English phone ay
  • the corresponding ALISP sequences HX - Hf
    and (HM) - Hf - Ha-
  • previous slide
    (Hf )-Ha or (HM) - HZ -Ha

13
4.2 Results GMM , ALISP-DTW systems and their
fusion
14
4.3 Results EER comparison
15
4.4 Importance of fusion (33 improvement)
16
4.5 Using only GMMs scores to segmentsgt
segmental Gmm system
17
5. Conclusions
  • State of the art NIST 2002 results for EER
    (best 8 to worst 28)
  • Fusion of classical system with a segmental
    systems big improvements
  • Why higher level informations present in the
    segmental system complement usefully the short
    therm frequency informations present in the GMM
    system
Write a Comment
User Comments (0)
About PowerShow.com