Isolated-Word Speech Recognition Using Hidden Markov Models - PowerPoint PPT Presentation

About This Presentation

Title:

Isolated-Word Speech Recognition Using Hidden Markov Models

Description:

Trellis Structure of HMMs ... a trellis makes it easy to see the state ... as the best path in the HMM trellis allows us to use the Viterbi algorithm to ... – PowerPoint PPT presentation

Number of Views:664

Avg rating:3.0/5.0

Slides: 34

Provided by: Iri776

Learn more at: http://web.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Isolated-Word Speech Recognition Using Hidden Markov Models

1
Isolated-Word Speech Recognition Using Hidden
Markov Models

6.962 Week 10 Presentation

Irina Medvedev Massachusetts Institute of
Technology April 19, 2001
2
Outline

Markov Processes, Chains, Models
Isolated-Word Speech Recognition
Feature Analysis
Unit Matching
Training
Recognition
Conclusions

3
Markov Process

The process, x(t), is first-order Markov if, for
any set of ordered times, ,
The current value of a Markov process depends
all of the memory necessary to predict the future
The past does not add any additional information
about the future

4
Markov Process

The transition probability density provides an
important statistical description of a Markov
process and is defined as
A complete specification of a Markov process
consists of
first-order density
transition density

5
Markov Chains

A Markov chain can be used to describe a system
which, at any time, belongs to one of N distinct
states,

At regularly spaced times, the system may stay
in the same state or transition to a different
state
State at time t is denoted by qt

Fully Connected Markov Model
6
Markov Chains

The state transition probabilities are made
according to a set of probabilities associated
with each state
These probabilities are stored in the state
transition matrix
where N is the number of states in the Markov
chain.
The state transition probabilities are
and have the properties and

7
Hidden Markov Models

Hidden Markov Models (HMMs) are used when the
states are not observable events.
Instead, the observation is a probabilistic
function of the state rather than the state
itself
The states are described by a probability model
The HMM is a doubly embedded stochastic process

8
HMM Example Coin Toss

How do we build an HMM to explain the observed
sequence of head and tails?
Choose a 2-state model
Several possibilities exist

1-coin Model Observable
2-coin Model States are Hidden
9
Hidden Markov Models

Hidden Markov Models are characterized by
N, the number of states in the model
A, the state transition matrix
Observation probability distribution state
Initial state distribution,
Model Parameter Set

10
Left-Right HMM

Can only transition to a higher state or stay in
the same state
No-skip constraint allows states to transition
only to the next state or remain the same state
Zeros in the state transition matrix represent
illegal state transitions

4-state left-right HMM with no skip transitions
11
Isolated-Word Speech Recognition

Recognize one word at a time
Assume incoming signal is of the form
silence speech silence
Feature Analysis Training
Unit Matching Recognition

12
Feature Analysis

We perform feature analysis to extract
observation vectors upon which all processing
will be performed
The discrete-time speech signal is
with discrete Fourier transform
To reduce the dimensionality of the V-dim speech
vector, we use cepstral coefficients, which serve
as the feature observation vector for all future
processing

13
Cepstral Coefficients

Feature vectors are cepstral coefficients
obtained from the sampled speech vector
where is the periodogram estimate of
the power spectral density of the speech
We eliminate the zeroth component and keep
cepstral coefficients 1 through L-1
Dimensionality reduction

14
Properties of Cepstral Coefficients

Serve to undo the convolution between the pitch
and the vocal tract
High-order cepstral components carry speaker
dependent pitch information, which is not
relevant for speech recognition
Cepstral coefficients are well approximated by a
Gaussian probability density function (pdf)
Correlation values of cepstral coefficients are
very low

15
Modeling of Cepstral Coefficients

HMM assumes that the Markovian states generate
the cepstral vectors
Each state represents a Gaussian source with
mean vector and covariance matrix
Each feature vector of cepstral coefficients can
be modeled as a sample vector of an L-dim
Gaussian random vector with mean vector and
diagonal covariance matrix

16
Formulation of the Feature Vectors
17
Unit Matching

Initial Goal obtain an HMM for each speech
recognition unit
Large vocabulary (300 words)
recognition units are phonemes
Small-vocabulary (10 words) recognition
units are words
We will consider an isolated-word speech
recognition system for a small vocabulary of M
words

18
Notation

Observation vector is , where each
is a cepstral feature vector and is the
number of feature vectors in an observation
State Sequence is , where each
State index
Word index
Time index
The term model will be used for both the HMM and
the parameter set describing the HMM,

19
Training

We need to obtain an HMM for each of the M words
The process of building the HMMs is called
training
Each HMM is characterized by the number of
states, N, and the model parameter set,
Each cepstral feature vector, , in state,
, can be modeled by an L-dim Gaussian pdf
where is the mean vector and is the
covariance matrix in state

20
Training

A Gaussian pdf is completely characterized by
the mean vector and covariance matrix
The model parameter set can be modified to
The training procedure is the same for each
word. For convenience, we will drop the
subscript from

21
Building the HMM

To build the HMM, we need to determine the
parameter set that maximizes the likelihood of
the observation for that word.
Objective
The double maximization can be performed by
optimizing over the state sequence and the model
individually

22
Uniform Segmentation
Determining the initial state sequence
50 segments ? 8 states
23
Maximization over the Model

Given the initial state sequence, we maximize
over the model
The maximization entails estimating the model
parameters from the observation given the state
sequence
Estimation is performed using the Baum-Welch
re-estimation formulas

24
Re-estimation Formulas
is the number of feature vectors in state
25
Model Estimation
26
Maximization over the state sequence

Given the model, we maximize over the state
sequence
The probability expression can be rewritten as

27
Maximization over the state sequence

Applying the logarithm transforms the
maximization of a product into a maximization of
a sum
We are still looking for the state sequence that
maximizes the expression
The optimal state sequence can be determined
using the Viterbi algorithm

28
Trellis Structure of HMMs

Redrawing the HMM as a trellis makes it easy to
see the state sequence as a path through the
trellis

The optimal state sequence is determined by the
Viterbi algorithm as the single best path that
maximizes

29
Training Procedure
Uniform Segmentation
Cepstral Calculation
Estimation of (Baum-Welch)
State Sequence Segmentation (Viterbi)
No
Converged?
Yes
30
Recognition

We have a set of HMMs, one for each word
Objective Choose the word model that maximizes
the probability of the observation given the
model (Maximum Likelihood detection rule)
Classifier for observation is
The likelihood can be written as a summation
over all state sequences

31
Recognition

Replace the full likelihood by an approximation
that takes into account only the most probable
state sequence capable of producing the
observation
Treating the most probable state sequence as the
best path in the HMM trellis allows us to use the
Viterbi algorithm to maximize the above
probability
The best-path classifier for observation is