Title: Signal Processing And Analysis Methods For Speech Recognition
1Signal Processing And Analysis Methods For Speech
Recognition
2Introduction
- Spectral analysis is the process of defining the
speech in different parameters for further
processing - Eg short term energy, zero crossing rates, level
crossing rates and so on - Methods for spectral analysis are therefore
considered as core of the signal processing front
end in a speech recognition system
3Spectral Analysis methods
- Two methods
- The Filter Bank spectrum
- The Linear Predictive coding (LPC)
4Spectral Analysis models
- Pattern recognition model
- Acoustic phonetic model
5Spectral Analysis Model
Parameter measurement is common in both the
systems
6Pattern recognition Model
- The three basic steps in pattern recognition
model are - 1. parameter measurement
- 2. pattern comparison
- 3. decision making
71. Parameter measurement
- To represent the relevant acoustic events in
speech signal in terms of compact efficient set
of speech parameters - The choice of which parameters to use is dictated
by other consideration - eg
- computational efficiency,
- type of Implementation ,
- available memory
- The way in which representation is computed is
based on signal processing considerations -
8Acoustic phonetic Model
9Spectral Analysis
- Two methods
- The Filter Bank spectrum
- The Linear Predictive coding (LPC)
10The Filter Bank spectrum
Spectral representation
Digital i/p
The band pass filters coverage spans the
frequency range of interest in the signal
111.The Bank of Filters Front end Processor
- One of the most common approaches for processing
the speech signal is the bank-of-filters model - This method takes a speech signal as input and
passes it through a set of filters in order to
obtain the spectral representation of each
frequency band of interest.
12- Eg
- 100-3000 Hz for telephone quality signal
- 100-8000 Hz for broadband signal
- The individual filters generally do overlap in
frequency - The output of the ith bandpass filter
- where Wi is the normalized frequency
13- Each bandpass filter processes the speech signal
independently to produce the spectral
representation Xn
14The Bank of Filters Front end Processor
15The Bank of Filters Front end Processor
The sampled speech signal, s(n), is passed
through a bank of Q Band pass filters, giving the
signals
16The Bank of Filters Front end Processor
- The bank-of-filters approach obtains the energy
value of the speech signal considering the
following steps - Signal enhancement and noise elimination.- To
make the speech signal more evident to the bank
of filters. - Set of bandpass filters.- Separate the signal in
frequency bands. (uniform/non uniform filters )
17- Nonlinearity.- The filtered signal at every band
is passed through a non linear function (for
example a wave rectifier full wave or half wave)
for shifting the bandpass spectrum to the
low-frequency band.
18The Bank of Filters Front end Processor
- Low pass filter.- This filter eliminates the
high-frequency generated by the non linear
function. - Sampling rate reduction and amplitude
compression.- The resulting signals are now
represented in a more economic way by re-sampling
with a reduced rate and compressing the signal
dynamic range.
The role of the final lowpass filter is to
eliminate the undesired spectral peaks
19The Bank of Filters Front end Processor
Assume that the output of the ith bandpass filter
is a pure sinusoid at frequency ?I If full
wave rectifier is used as the nonlinearity
20The Bank of Filters Front end Processor
21Types of Filter Bank Used For Speech Recognition
- uniform filter bank
- Non uniform filter bank
22uniform filter bank
- The most common filter bank is the uniform filter
bank - The center frequency, fi, of the ith bandpass
filter is defined as - Q is number of filters used in bank of filters
23uniform filter bank
- The actual number of filters used in the filter
bank - bi is the bandwidth of the ith filter
- There should not be any frequency overlap between
adjacent filter channels
24uniform filter bank
- If bi lt Fs/N, then the certain portions of the
speech spectrum would be missing from the
analysis and the resulting speech spectrum would
not be considered very meaningful
25nonuniform filter bank
- Alternative to uniform filter bank is nonuniform
filter bank - The criterion is to space the filters uniformly
along a logarithmic frequency scale. - For a set of Q bandpass filters with center
frequncies fi and bandwidths bi, 1iQ, we set
26nonuniform filter bank
27- The most commonly used values of a2
- This gives an octave band spacing adjacent
filters - And a4/3 gives 1/3 octave filter spacing
28Implementations of Filter Banks
- Depending on the method of designing the filter
bank can be implemented in various ways. - Design methods for digital filters fall into two
classes - Infinite impulse response (IIR) (recursive
filters) - Finite impulse response
29- The FIR filter (finite impulse response) or
non recursive filter - The present output is depend on the present input
sample and previous input samples - The impulse response is restricted to finite
number of samples
30- Advantages
- Stable, noise less sever
- Excellent design methods are available for
various kinds of FIR filters - Phase response is linear
- Disadvantage
- Costly to implement
- Memory requirement and execution time are high
- Require powerful computational facilities
31- The IIR filter (Infinite impulse response) or
recursive filter - The present output sample is depends on the
present input, past input samples and output
samples - The impulse response extends over an infinite
duration
32- Advantage
- Simple to design
- Efficient
- Disadvantage
- Phase response is non linear
- Noise affects more
- Not stable
33FIR Filters
34FIR Filters
- Less expensive implementation can be derived by
representing each bandpass filter by a fixed low
pass window ?(n) modulated by the complex
exponential
35Frequency Domain Interpretation For Short Term
Fourier Transform
A
At nn0
Where FT. denotes Fourier Transform Sn0(ej?i)
is the conventional Fourier transform of the
windowed signal, s(m)w(n0-m), evaluated at the
frequency ? ?i
36Frequency Domain Interpretation For Short Term
Fourier Transform
Shows which part of s(m) are used in the
computation of the short time Fourier transform
37Frequency Domain Interpretation For Short Term
Fourier Transform
- Since w(m) is an FIR filter with size L then from
the definition of Sn(ej?i) we can state that - If L is large, relative to the signal periodicity
then Sn(ej?i) gives good frequency resolution - If L is small, relative to the signal periodicity
then Sn(ej?i) gives poor frequency resolution
38Frequency Domain Interpretation For Short Term
Fourier Transform
For L500 points Hamming window is applied to a
section of voiced speech. The periodicity of
the signal is seen in the windowed time waveform
as well as in the short time spectrum in
which the fundamental frequency and its harmonics
show up as narrow peaks at equally spaced
frequencies.
39Frequency Domain Interpretation For Short Term
Fourier Transform
For short windows, the time sequence s(m)w(n-m)
doesnt show the signal periodicity, nor does
the signal spectrum. It shows the broad spectral
envelop very well.
40Frequency Domain Interpretation For Short Term
Fourier Transform
Shows irregular series of local peaks and
valleys due to the random nature of the unvoiced
speech
41Frequency Domain Interpretation For Short Term
Fourier Transform
Using the shorter window smoothes out the random
fluctuations in the short time spectral
magnitude and shows the broad spectral envelope
very well
42Linear Filtering Interpretation of the short-time
Fourier Transform
- The linear filtering interpretation of the short
time Fourier Transform - i.e Sn(ejwi) is a convolution of the low pass
window, w(n), with the speech signal, s(n),
modulated to the center frequency wi
From A
43FFT Implementation of Uniform Filter Bank Based
on the Short-Time FT
44FFT Implementation of Uniform Filter Bank Based
on the Short-Time FT
45FFT Implementation of Uniform Filter Bank Based
on The Short Time FT
- The FFT implementation is more efficient than
the direct form structure
46Nonuniform FIR Filter Bank Implementations
The most general form of a nonuniform FIR filter
bank
47Nonuniform FIR Filter Bank Implementations
- The kth bandpass filter impulse response, hk(n),
represents a filter with a center frequency ?k,
and bandwidth ??k. - The set of Q bandpass filters covers the
frequency range of interest for the intended
speech recognition application
48Nonuniform FIR Filter Bank Implementations
- Each band pass filter is implemented via a direct
convolution - Each band pass filter is designed via the
windowing design method - The composite frequency response of the Q-channel
filter bank is independent of the number and
distribution of the individual filters
49Nonuniform FIR Filter Bank Implementations
A filter bank with the three filters has the
exact same composite frequency response as the
filter bank with the seven filters shown in
figure above
50Nonuniform FIR Filter Bank Implementations
- The impulse response of the kth bandpass filter
- The frequency response of the kth bandpass filter
Impulse response of ideal band pass filer
FIR window
51Nonuniform FIR Filter Bank Implementations
- Thus the frequency response of the composite
filter bank
1
52Nonuniform FIR Filter Bank Implementations
- Where wmin is the lowest frequency in the filter
bank and wmax is the highest frequency - Equation 1 can be written as
- Which is independent of the number of ideal
filters, Q, and their distribution in the
frequency
53FFT-Based Nonuniform Filter Banks
- By combining two or more uniform channels the
nonuniformity can be created - Consider taking an N-point DFT of the sequence
x(n)
54FFT-Based Nonuniform Filter Banks
- The equivalent kth channel value, Xk can be
obtained by weighing the sequence, x(n) by the
complex sequence 2 exp(-j (?n/N))cos(?n/N). - If more than two channels are combined, then a
different equivalent weighing sequence results
55Tree Structure Realizations of Nonuniform Filter
Banks
- In this method the speech signal is filtered in
the stages, and the sampling rate is successively
reduced at each stage
56Tree Structure Realizations of Nonuniform Filter
Banks
57Tree Structure Realizations of Nonuniform Filter
Banks
- The original speech signal, s(n), is filtered
initially into two bands, a low band and a high
band - The high band is down sampled by 2 and represents
the highest octave band (?/2? ?) of the filter
bank. - The low band is similarly down sampled by 2 and
fed into second filtering stage in which the
signal is again split into two equal bands. - Again the high band of the stage 2 is down
sampled by 2 and is used as a next highest filter
bank output.
58Tree Structure Realizations of Nonuniform Filter
Banks
- The low band is also down sampled by 2 and fed
into a third stage of filters - These third stage output after down sampling by
factor 2, are used as the two lowest filter bands
59Summary of considerations for speech recognition
filter banks
- 1st. Type of digital filter used (IIR (recursive)
or FIR (nonrecursive)) - IIR Advantage simple to implement and
efficient. - Disadvantage phase response is nonlinear
- FIR Advantage phase response is linear
- Disadvantage expensive in implementation
60Summary of considerations for speech recognition
filter banks
- 2nd. The number of filters to be used in the
filter bank. - For uniform filter banks the number of filters,
Q, can not be too small or else the ability of
the filter bank to resolve the speech spectrum is
greatly damaged. The value of Q less than 8 are
generally avoided - The value of Q can not be too large, because the
filter bandwidths would eventually be too narrow
for some talker (eg. High-pitch females) i.e no
prominent harmonics would fall within the band.
(in practical systems the value of Q32).
61Summary of considerations for speech recognition
filter banks
- In order to reduce overall computation, many
practical systems have used nonuniform spaced
filter banks
62Summary of considerations for speech recognition
filter banks
- 3rd. The choice of nonlinearity and LPF used at
the output of each channel - Nonlinearity Full wave or Half wave rectifier
- LPF varies from simple integrator to a good
quality IIR lowpass filter.
63(No Transcript)