ESCA Tutorial - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

ESCA Tutorial

Description:

Grapheme-to-Allophone transcriptor for continuous speech and multiple pronunciations ... for Spanish due to easy transformation from grapheme to allophone ... – PowerPoint PPT presentation

Number of Views:168
Avg rating:3.0/5.0
Slides: 17
Provided by: YO2
Category:

less

Transcript and Presenter's Notes

Title: ESCA Tutorial


1
ESCA Tutorial Research WorkshopModelling
pronunciation variation for ASR INTRODUCING
MULTIPLE PRONUNCIATIONS IN SPANISH SPEECH
RECOGNITION SYSTEMS
  • Javier Ferreiros, Javier Macías-Guarasa, José M.
    Pardo (GTH UPM), Luis Villarrubia (Telefónica
    ID)

2
Presentation Contents
  • Introduction
  • The strategy applied
  • CSR
  • Task
  • System Architecture
  • Results
  • ISR
  • Task
  • System Architecture
  • Results
  • Conclusions and Future Work

3
Introduction (I)
  • Pronunciation variation common source of
    recognition errors
  • Rule-based strategy to incorporate pronunciation
    alternatives for Spanish
  • Phonetic Rules for actual speaking habits and
    context dependencies (no dialectal) have been
    explored
  • Alternate pronunciations can be found even within
    the same speaker

4
Introduction (II)
  • The lexicon should consider these different
    possibilities even within the same dialect
  • It is important to study the impact of the rules
    on the lexicon
  • Near 20 error rate reduction for continuous
    speech task
  • No significant change for isolated word
    hypothesis generator case

5
The strategy applied (I)
  • Grapheme-to-Allophone transcriptor for continuous
    speech and multiple pronunciations
  • It deals with coarticulation and assimilation
    effects in word boundaries for continuous speech
  • Rules are accurate enough for Spanish due to easy
    transformation from grapheme to allophone
  • Rules are selected according to expert linguistic
    knowledge for Castilian Spanish speaking style

6
The strategy applied (II)
  • Examples of variations considered
  • DIFFERENT HABITS exámen /e k s a m e n/
  • e k s á m e n
  • e ? s á m e n
  • e s á m e n
  • CONTEXT DEPENDENT bote /b o t e/
  • un bote ú m b ó t e
  • el bote e l ? ó t e

7
The strategy applied (III)
  • We have empirically searched for the minimum
    number of rules that produces significant
    improvements to limit the increase in lexicon
    size (i.e. Perplexity)
  • For the isolated word hypothesis generator case,
    further reduction in the number of rules has been
    necessary in order not to worsen the recognition
    rates

8
CSR Task
  • Domain Navy Resources Management in Spanish
  • Speaker Dependent Task
  • Training 600 sentences, 4 speakers
  • Test 100 sentences, the same 4 speakers
  • Base dictionary size 979 words
  • Extended dictionary size 1211 words (23.7)

9
CSR System Architecture
  • One pass algorithm without any grammar
  • In the lexicon some words have several entries,
    each with an alternative allophone sequence
  • (10 MFCC Energy), delta and delta2 parameter
    sets in 3 different codebooks with 256 centroids
    each
  • discrete and semicontinuous HMM models for basic
    allophones (47) and triphones (350)

10
CSR Results
11
ISR Task
  • Domain Proper Names, telephone environment
  • Hypothesis / Verification scheme
  • Tested on the Hypothesis Generator so far
  • Training 5800 words, 3000 speakers
  • Test 2500 words, 2250 speakers
  • Base dictionary size 1175 words
  • Extended dictionary size 1266 words (7.7) with
    the same rules than in CSR task and 1193 words
    (1.5) excluding some rules

12
ISR Hypothesis Generator (I)
  • 8 MFCCEnergy, 8 delta MFCCdelta Energy in 2
    codebooks of 256 centroids each
  • PSBU generates a string of alphabet units (53
    allophone-like units) very fast
  • Lexical Access DP algorithm to match the
    phonetic string against the dictionary where
    multiple pronunciations may be included

13
ISR Hypothesis Generator (II)
Hypothesis Generator
Dictionary
Phonetic string
Indexes
Preprocessing VQ processes
Phonetic String Build-Up
Speech
List of Candidate Words
Lexical Access
Alignment costs
HMMs
VQ books
Durations
14
ISR Results for 12 best hypothesis
15
Conclusions and Future Work (I)
  • The selection of the appropriate model for each
    context is important when two words are
    concatenated for CSR Rules for different entries
    depending on context. For ISR these rules are not
    useful.
  • The acoustic model may not have enough resolution
    to take advantage of the alternatives proposed by
    the rules these rules should work better in the
    verifier for ISR.

16
Conclusions and Future Work (II)
  • It is important to study the real impact of the
    rules on the lexicon. For example Dialectal
    rules should reduce recognition error rates in a
    similar way both for CSR and ISR.
  • We want to test these kind of rules plus
    dialectal variability rules on the verifier stage
    of the ISR system.
Write a Comment
User Comments (0)
About PowerShow.com