Title: SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION
1SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION
Presented by Richard Duncan
Hualin Gao, Richard Duncan, Julie A. Baca, Joseph
Picone Human and Systems Engineering Center of
Advanced Vehicular System Mississippi State
University
2INTRODUCTION
The tools described here deal with the block
known as the Acoustic Front-end, which
encapsulates the signal processing portions of a
recognition system.
3- THE PROBLEM WITH AVAILABLE TOOLS
Why reinvented the wheel? The problem with
existing tools
- Run-time efficiency
- File I/O
- Adding new algorithms
The goal of this paper is to solve these
problems.
4- FEATURES OF ISIP FOUNDATION CLASSES
- Unicode support for multilingual applications
- Memory management and tracking
- System and I/O libraries that abstract users from
details of the operating system - Math classes that provide basic linear algebra
and efficient matrix manipulations - Data structures that include generic
implementations of essential tools for speech
recognition code.
5- DESIGN REQUIREMENTS FOR THESE TOOLS
- A library of standard algorithms to provide basic
digital signal processing (DSP) functions - An ability to easily add new algorithm classes
and functions without modifying existing classes - A block diagram approach to describing algorithms
to realize rapid prototyping without programming.
- A mechanism to share the code base for the
feature extraction and recognizer.
6- BASIC DIGITAL PROCESSING FUNCTIONS
This example shows how to realize the basic
digital signal processing functions. It computes
the energy of input vector in dB using the SUM
algorithm // declare an Energy object and an
output vector // Energy egy VectorFloat
output // set the input vector // VectorFloat
input(L"0, 1, 2") // choose algorithm //
enrgy.setAlgorithm(EnergySUM) // choose
implementation // egy.setImplementation(Energy
DB) // compute the energy of input data //
egy.compute(output, input)
7- The implementation using an abstract base class,
AlgorithmBase, and virtual functions or methods
that comprise the interface contract, is the
single most important feature, since it makes the
library extensible. - All algorithm classes are derived from this base
class.
8HIERARCHY OF ALGORITHM CLASSES
New classes can be easily added by following the
interface contract
9- SIGNAL PROCESSING CONFIGURATION TOOLS
The menu of the tools
10- SIGNAL PROCESSING CONFIGURATION TOOLS
Create recipes and configure each block by right
click it
11BLOCK DIAGRAM TO DESIGN FRON-TEND
- Users specify a pool of recipes for use in the
Front-End. - Each recipe is an information holder for
Algorithm objects. - The Front-End creates a dependency graph. In the
top left recipe pool, recipe 6 can not process
before recipe 5 has been processed. - Processing tool runs a breadth first search to
produce output from given inputs.
12- THE SIGNAL PROCESSING TOOLS
- Uses a popular block diagram approach for
designing signal processing algorithms. - Allows rapid prototyping of ideas.
- Allows users to easily integrate new modules into
the IFCs and have them instantly available in all
tools, such as the recognizer.
13THE SIGNAL PROCESSING TOOLS
The signal processing control tool is a driven
program and is called isip_transform.exe in our
environment. It uses the signal processing
library and algorithm libraries and output recipe
files from signal configuration tool to fulfill
the whole procedure of signal processing.
14THE SIGNAL PROCESSING TOOLS
- Parsing the file containing the recipe created by
the user with the configuration tool - Synchronizing different paths along the block
flow diagram contained in this file - Preparing input/output data buffers for each
algorithm, particularly for those requiring
multiple frames of data, such as windows or
calculus - Scheduling the sequences of required signal
processing operations - Processing data through the flow defined by the
recipe - Managing conversational data.
15THE SIGNAL PROCESSING TOOLS
The processing procedure
16MFCC TWO PASSES RECIPES
?First pass to get the cepstrum mean subtraction
and maximum energy normalization.
Second pass to ? get the feature values by using
the results from the first pass.
17- The correctness The implementation of each
algorithm is verified manually or by using other
tools such as MATLAB. - Usability Assessed and enhanced the usability of
our tools through extensive user testing
conducted over the course of many workshops. - Speech recognition experiments The correctness
of the tools was also proved by several speech
recognition experiments.
18- A library of standard algorithms for basic DSP
functions. - The ability to add new algorithms to this library
easily. - A GUI-based configuration tool for creating block
diagrams to describe algorithms, allowing rapid
prototyping without programming. - Tested and verified these tools for both
correctness and usability.