ESCA Tutorial

About This Presentation

Title:

ESCA Tutorial

Description:

Grapheme-to-Allophone transcriptor for continuous speech and multiple pronunciations ... for Spanish due to easy transformation from grapheme to allophone ... – PowerPoint PPT presentation

Number of Views:168

Avg rating:3.0/5.0

Slides: 17

Provided by: YO2

Category:

more less

Transcript and Presenter's Notes

Title: ESCA Tutorial

1
ESCA Tutorial Research WorkshopModelling
pronunciation variation for ASR INTRODUCING
MULTIPLE PRONUNCIATIONS IN SPANISH SPEECH
RECOGNITION SYSTEMS

Javier Ferreiros, Javier Macías-Guarasa, José M.
Pardo (GTH UPM), Luis Villarrubia (Telefónica
ID)

2
Presentation Contents

Introduction
The strategy applied
CSR
Task
System Architecture
Results
ISR
Task
System Architecture
Results
Conclusions and Future Work

3
Introduction (I)

Pronunciation variation common source of
recognition errors
Rule-based strategy to incorporate pronunciation
alternatives for Spanish
Phonetic Rules for actual speaking habits and
context dependencies (no dialectal) have been
explored
Alternate pronunciations can be found even within
the same speaker

4
Introduction (II)

The lexicon should consider these different
possibilities even within the same dialect
It is important to study the impact of the rules
on the lexicon
Near 20 error rate reduction for continuous
speech task
No significant change for isolated word
hypothesis generator case

5
The strategy applied (I)

Grapheme-to-Allophone transcriptor for continuous
speech and multiple pronunciations
It deals with coarticulation and assimilation
effects in word boundaries for continuous speech
Rules are accurate enough for Spanish due to easy
transformation from grapheme to allophone
Rules are selected according to expert linguistic
knowledge for Castilian Spanish speaking style

6
The strategy applied (II)

Examples of variations considered
DIFFERENT HABITS exámen /e k s a m e n/
e k s á m e n
e ? s á m e n
e s á m e n
CONTEXT DEPENDENT bote /b o t e/
un bote ú m b ó t e
el bote e l ? ó t e

7
The strategy applied (III)

We have empirically searched for the minimum
number of rules that produces significant
improvements to limit the increase in lexicon
size (i.e. Perplexity)
For the isolated word hypothesis generator case,
further reduction in the number of rules has been
necessary in order not to worsen the recognition
rates

8
CSR Task

Domain Navy Resources Management in Spanish
Speaker Dependent Task
Training 600 sentences, 4 speakers
Test 100 sentences, the same 4 speakers
Base dictionary size 979 words
Extended dictionary size 1211 words (23.7)

9
CSR System Architecture

One pass algorithm without any grammar
In the lexicon some words have several entries,
each with an alternative allophone sequence
(10 MFCC Energy), delta and delta2 parameter
sets in 3 different codebooks with 256 centroids
each
discrete and semicontinuous HMM models for basic
allophones (47) and triphones (350)

10
CSR Results
11
ISR Task

Domain Proper Names, telephone environment
Hypothesis / Verification scheme
Tested on the Hypothesis Generator so far
Training 5800 words, 3000 speakers
Test 2500 words, 2250 speakers
Base dictionary size 1175 words
Extended dictionary size 1266 words (7.7) with
the same rules than in CSR task and 1193 words
(1.5) excluding some rules

12
ISR Hypothesis Generator (I)

8 MFCCEnergy, 8 delta MFCCdelta Energy in 2
codebooks of 256 centroids each
PSBU generates a string of alphabet units (53
allophone-like units) very fast
Lexical Access DP algorithm to match the
phonetic string against the dictionary where
multiple pronunciations may be included

13
ISR Hypothesis Generator (II)
Hypothesis Generator
Dictionary
Phonetic string
Indexes
Preprocessing VQ processes
Phonetic String Build-Up
Speech
List of Candidate Words
Lexical Access
Alignment costs
HMMs
VQ books
Durations
14
ISR Results for 12 best hypothesis
15
Conclusions and Future Work (I)

The selection of the appropriate model for each
context is important when two words are
concatenated for CSR Rules for different entries
depending on context. For ISR these rules are not
useful.
The acoustic model may not have enough resolution
to take advantage of the alternatives proposed by
the rules these rules should work better in the
verifier for ISR.

16
Conclusions and Future Work (II)

It is important to study the real impact of the
rules on the lexicon. For example Dialectal
rules should reduce recognition error rates in a
similar way both for CSR and ISR.
We want to test these kind of rules plus
dialectal variability rules on the verifier stage
of the ISR system.

Write a Comment

User Comments (0)

About PowerShow.com

ESCA Tutorial - PowerPoint PPT Presentation

ESCA Tutorial

Grapheme-to-Allophone transcriptor for continuous speech and multiple pronunciations ... for Spanish due to easy transformation from grapheme to allophone ... – PowerPoint PPT presentation