Ch%205b:%20Discriminative%20Training%20(temporal%20model) - PowerPoint PPT Presentation

About This Presentation

Title:

Ch%205b:%20Discriminative%20Training%20(temporal%20model)

Description:

The MCE framework is used for discriminative training (also MMI is possible) ... How to merge the discriminative power of LVQ with the sequential modeling ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 22

Provided by: iaho

Category:

more less

Transcript and Presenter's Notes

Title: Ch%205b:%20Discriminative%20Training%20(temporal%20model)

1
Ch 5b Discriminative Training (temporal model)

14.2.2002 Ilkka Aho

2
Abbreviations

MCE Minimum Classification Error
MMI Maximum Mutual Information
STLVQ Shift-Tolerant Learning Vector
Quantization
TDNN Time-Delay Neural Network
HMM Hidden Markov Model
DP Dynamic Programming
DTW Dynamic Time Warping
GPD Generalized Probabilistic Descent
PBMEC Prototype-Based Minimum Error Classifier

3
Basics

Prototype-based methods use class representatives
(sample or an average of samples) to classify new
patterns
The MCE framework is used for discriminative
training (also MMI is possible)
A central concern is the design or learning of
prototypes that will yield good classification
performance

4
STLVQ for Speech Recognition

LVQ algorithm in its basic form is a method for
static pattern recognition
STLVQ handles a stream of dynamically varying
patterns (fig. 1.)
STLVQ is much simpler than TDNN model, but
yielded very good results on the same phoneme
recognition tasks

5
Figure 1. STLVQ system architecture.
6
Limitations and Strengths of STLVQ

STLVQ assumes only a single phoneme as an input
token
Training and testing datasets are obtained from
manually labeled speech databases
How to extend the phoneme recognition to word or
sentence recognition?
LVQ is applied locally

7
Expanding the Scope of LVQ for Speech Recognition

Representation of longer speech sequences such as
entire utterances
Global optimization
Application to continuos speech recognition
A need for some kind of time warping or
normalization
How to merge the discriminative power of LVQ with
the sequential modeling abilities of HMMs?
Two methods LVQ-HMM (fig. 2.) and HMM-LVQ (fig.
3.)

8
Figure 2. LVQ-HMM architecture.
9
Figure 3. HMM-LVQ architecture.
10
MCE Interpretation of LVQ

A prototype-based implementation of the MCE
framework
The LVQ classification rule is based on the
Euclidean distance between a pattern vector and
each category's reference vectors
The category of the nearest reference vector is
given as the classification decision
Figures 4, 5 and 6 demonstrate the smoothness of
MCE loss

11
Figure 4. Average empirical loss measured over 10
samples from a one- dimensional, two class
classification problem. The ideal zero-one
loss is used in calculating the overall loss.
12
Figure 5. Now a sigmoidal MCE loss, a 0.1, is
used in calculating the overall loss.
13
Figure 6. The same situation as in the figure 5.
except a 1.0 now.
14
Prototype-based Methods Using DP

DP is used to find the path through a grid of
local matches between prototype and test sample
frames that has the best overall score
When calculating the reference distance between
the input utterance and the reference utterance
it is more practical to use the top path or the
top few paths than every single DP path possible
Nonlinear compressing and stretching prototypes
DTW is a specific application of DP techniques to
speech processing

15
MCE-Trained Prototypes and DTW

The idea is to define the MCE loss in terms of a
discriminant function that reflects the structure
of a straightforward DTW-based recognizer
The loss function have to be continous and
differentiable that some gradient-based
optimization technique (for example GPD) can be
used to minimize the overall loss
Also the loss function have to reflect
classification performance
Good results in the Bell Labs E-set task and in
phoneme recognition tasks

16
PBMEC

PBMEC models prototypes at a finer grain than
MCE-trained DTW
PBMEC prototypes are modeled within phonetic or
subphonetic states
Word models are formed by connecting different
states together
Multi-state PBMEC (fig. 7.)
The discriminant function for a category is
defined as the final accumulated score of the
best DP path for that category (fig. 8.)
MCE-GPD update rule for PBMEC pulls the nearest
reference vectors for the correct category closer
to the input and pushes the nearest reference
vectors for the incorrect category away
MCE-GPD in the context of speech recognition
using phoneme models (fig. 9.)

17
Figure 7. Multi-state PBMEC architecture.
18
Figure 8. Final DP score.
19
Figure 9. DP segmentations for the words aida
and taira.
20
HMM design based on MCE

The prototype-like nature of HMMs
The MCE framework can be applied to HMMs in a
very same way that in the case of the PBMEC model
HMM state likelihood and discriminant function
MCE misclassification measure and loss
Calculating of MCE Gradient for HMMs
There are a very large number of applications of
MCE-trained HMMs
Some of the best context-independent results have
been reported for the Texas Instruments-Massachuse
tts Institute of Technology database

21
Homework Question
Explain the main differencies between following
methods in speech recognition