Digit%20Recognition%20Using%20the%20SPEECHDAT%20Corpus

About This Presentation

Title:

Digit%20Recognition%20Using%20the%20SPEECHDAT%20Corpus

Description:

Age, gender and region distribution are approximately equal in both train and ... Results may be improved through the use of discriminative training techniques ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 26

Provided by: frederico3

Category:

more less

Transcript and Presenter's Notes

Title: Digit%20Recognition%20Using%20the%20SPEECHDAT%20Corpus

1
Robust Recognition of Digits and Natural Numbers
Frederico Rodrigues and Isabel Trancoso
INESC/IST, 2000
2
Summary

Problem overview
Baseline system
Extensions to the baseline system
Conclusions and future work

3
The Problem
4
Corpus Description

Multilingual telephone speech corpus
SPEECHDAT(M) 1000 speakers
SPEECHDAT(II) 4000 speakers
Orthographically transcribed including noise
events

5
Noise events

spk Speaker related noises
sta Stationary noises
int Intermittent noises

6
(No Transcript)
7
Train and Test Set Definition

Selection procedure
Age, gender and region distribution are
approximately equal in both train and test sets
SPEECHDAT II
Fixed 500 speakers evaluation set
Additional 300 speakers development set
SPEECHDAT(M)
200 speakers evaluation set
Overall ratio of 80 Train/20 Test

8
Sub-corpus Used

I1 - Isolated digit strings
B1 - Sequences of 10 digits
N - Natural numbers

9
Feature Extraction

MFCC (Mel Frequency Cepstral Coefficients)
14 Cepstra 14 ? Cepstra Energy ? Energy
Speech signal band-limited between 200 and 3800
Hz
Hamming Window 25 ms each 10 ms
Cepstral Mean Substraction
Simple but effective technique for channel and
speaker normalization

10
Acoustic Modeling

Left-right continuous density HMMs
Word models for each digit. No skips.
Silence and filler models with forward and
backward skips
Gender dependent models

HMM Hidden Markov Model
11
Model Topology
Fillers and silence models topology
12
Baseline System - Isolated Digits

Choose isolated digits with no noise marks
HMM parameters initialized with the global mean
and variance of the training data
Embedded Baum-Welch Reestimation
Evaluate performance withViterbi decoding
Grammar allowing one digit and initial and final
silence
Grammar allowing one digit and any number of
fillers or silence

13
Baseline System - Isolated Digits
14
Baseline System - Isolated Digits

Increment Gaussian mixtures per state up to 3 for
the digit models
Introduce files with noise marks
Repeat re-estimation/evaluation process
Increment Gaussian mixtures per state up to 3 for
the filler and digit models

15
Connected vs Isolated Digits
Example Number 3 1 2 6 said as Isolated
Digits t r e S u d o j S s 6 j S Connected
Digits t r e z u d o j S _ 6 j S
16
Baseline System - Connected Digits

Use best isolated digit models as bootstrap
models
Repeat re-estimation/evaluation process
Increment gradually Gaussian mixtures per state
up to 5 for the digit models

17
Baseline System - Results
18
Extension to the Baseline System

New way of modelling the filler models
Same training/evaluation process
Train the 9 filler and silence models with no
skips
Build a unique filler model concatenating all
filler and silence models

19
New Filler Model Arquitecture
20
Results With New Filler Model
21
Natural Numbers

Phone models with 3 states and no skips
Larger vocabulary size
May be adapted to other tasks
Phones initialized from models already trained
for a directory assistance task
Digits are still modeled by word models
Grammar for natural numbers ranging from zero to
hundreds of millions

22
Natural Numbers Example
Number 25 Hypothesis 1 vinte e cinco (Twenty
and five) Hypotesis 2 vinte cinco (Twenty
five) But vinte cinco could also be the
sequence of natural numbers 20 5
23
Natural Numbers - Results
24
Sample Application
25
Conclusions and Future Work