Signal Processing And Analysis Methods For Speech Recognition - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Signal Processing And Analysis Methods For Speech Recognition

Description:

Signal Processing And Analysis Methods For Speech Recognition – PowerPoint PPT presentation

Number of Views:425
Avg rating:3.0/5.0
Slides: 62
Provided by: thak5
Category:

less

Transcript and Presenter's Notes

Title: Signal Processing And Analysis Methods For Speech Recognition


1
Signal Processing And Analysis Methods For Speech
Recognition
2
Introduction
  • Spectral analysis is the process of defining the
    speech in different parameters for further
    processing
  • Eg short term energy, zero crossing rates, level
    crossing rates and so on
  • Methods for spectral analysis are therefore
    considered as core of the signal processing front
    end in a speech recognition system

3
Spectral Analysis methods
  • Two methods
  • The Filter Bank spectrum
  • The Linear Predictive coding (LPC)

4
Spectral Analysis models
  • Pattern recognition model
  • Acoustic phonetic model

5
Spectral Analysis Model
Parameter measurement is common in both the
systems
6
Pattern recognition Model
  • The three basic steps in pattern recognition
    model are
  • 1. parameter measurement
  • 2. pattern comparison
  • 3. decision making

7
1. Parameter measurement
  • To represent the relevant acoustic events in
    speech signal in terms of compact efficient set
    of speech parameters
  • The choice of which parameters to use is dictated
    by other consideration
  • eg
  • computational efficiency,
  • type of Implementation ,
  • available memory
  • The way in which representation is computed is
    based on signal processing considerations

8
Acoustic phonetic Model
9
Spectral Analysis
  • Two methods
  • The Filter Bank spectrum
  • The Linear Predictive coding (LPC)

10
The Filter Bank spectrum
Spectral representation
Digital i/p
The band pass filters coverage spans the
frequency range of interest in the signal
11
1.The Bank of Filters Front end Processor
  • One of the most common approaches for processing
    the speech signal is the bank-of-filters model
  • This method takes a speech signal as input and
    passes it through a set of filters in order to
    obtain the spectral representation of each
    frequency band of interest.

12
  • Eg
  • 100-3000 Hz for telephone quality signal
  • 100-8000 Hz for broadband signal
  • The individual filters generally do overlap in
    frequency
  • The output of the ith bandpass filter
  • where Wi is the normalized frequency

13
  • Each bandpass filter processes the speech signal
    independently to produce the spectral
    representation Xn

14
The Bank of Filters Front end Processor
15
The Bank of Filters Front end Processor
The sampled speech signal, s(n), is passed
through a bank of Q Band pass filters, giving the
signals
16
The Bank of Filters Front end Processor
  • The bank-of-filters approach obtains the energy
    value of the speech signal considering the
    following steps
  • Signal enhancement and noise elimination.- To
    make the speech signal more evident to the bank
    of filters.
  • Set of bandpass filters.- Separate the signal in
    frequency bands. (uniform/non uniform filters )

17
  • Nonlinearity.- The filtered signal at every band
    is passed through a non linear function (for
    example a wave rectifier full wave or half wave)
    for shifting the bandpass spectrum to the
    low-frequency band.

18
The Bank of Filters Front end Processor
  • Low pass filter.- This filter eliminates the
    high-frequency generated by the non linear
    function.
  • Sampling rate reduction and amplitude
    compression.- The resulting signals are now
    represented in a more economic way by re-sampling
    with a reduced rate and compressing the signal
    dynamic range.

The role of the final lowpass filter is to
eliminate the undesired spectral peaks
19
The Bank of Filters Front end Processor
Assume that the output of the ith bandpass filter
is a pure sinusoid at frequency ?I If full
wave rectifier is used as the nonlinearity
20
The Bank of Filters Front end Processor


21
Types of Filter Bank Used For Speech Recognition
  • uniform filter bank
  • Non uniform filter bank

22
uniform filter bank
  • The most common filter bank is the uniform filter
    bank
  • The center frequency, fi, of the ith bandpass
    filter is defined as
  • Q is number of filters used in bank of filters

23
uniform filter bank
  • The actual number of filters used in the filter
    bank
  • bi is the bandwidth of the ith filter
  • There should not be any frequency overlap between
    adjacent filter channels

24
uniform filter bank
  • If bi lt Fs/N, then the certain portions of the
    speech spectrum would be missing from the
    analysis and the resulting speech spectrum would
    not be considered very meaningful

25
nonuniform filter bank
  • Alternative to uniform filter bank is nonuniform
    filter bank
  • The criterion is to space the filters uniformly
    along a logarithmic frequency scale.
  • For a set of Q bandpass filters with center
    frequncies fi and bandwidths bi, 1iQ, we set

26
nonuniform filter bank
27
  • The most commonly used values of a2
  • This gives an octave band spacing adjacent
    filters
  • And a4/3 gives 1/3 octave filter spacing

28
Implementations of Filter Banks
  • Depending on the method of designing the filter
    bank can be implemented in various ways.
  • Design methods for digital filters fall into two
    classes
  • Infinite impulse response (IIR) (recursive
    filters)
  • Finite impulse response

29
  • The FIR filter (finite impulse response) or
    non recursive filter
  • The present output is depend on the present input
    sample and previous input samples
  • The impulse response is restricted to finite
    number of samples

30
  • Advantages
  • Stable, noise less sever
  • Excellent design methods are available for
    various kinds of FIR filters
  • Phase response is linear
  • Disadvantage
  • Costly to implement
  • Memory requirement and execution time are high
  • Require powerful computational facilities

31
  • The IIR filter (Infinite impulse response) or
    recursive filter
  • The present output sample is depends on the
    present input, past input samples and output
    samples
  • The impulse response extends over an infinite
    duration

32
  • Advantage
  • Simple to design
  • Efficient
  • Disadvantage
  • Phase response is non linear
  • Noise affects more
  • Not stable

33
FIR Filters
34
FIR Filters
  • Less expensive implementation can be derived by
    representing each bandpass filter by a fixed low
    pass window ?(n) modulated by the complex
    exponential

35
Frequency Domain Interpretation For Short Term
Fourier Transform
A
At nn0
Where FT. denotes Fourier Transform Sn0(ej?i)
is the conventional Fourier transform of the
windowed signal, s(m)w(n0-m), evaluated at the
frequency ? ?i
36
Frequency Domain Interpretation For Short Term
Fourier Transform
Shows which part of s(m) are used in the
computation of the short time Fourier transform
37
Frequency Domain Interpretation For Short Term
Fourier Transform
  • Since w(m) is an FIR filter with size L then from
    the definition of Sn(ej?i) we can state that
  • If L is large, relative to the signal periodicity
    then Sn(ej?i) gives good frequency resolution
  • If L is small, relative to the signal periodicity
    then Sn(ej?i) gives poor frequency resolution

38
Frequency Domain Interpretation For Short Term
Fourier Transform
For L500 points Hamming window is applied to a
section of voiced speech. The periodicity of
the signal is seen in the windowed time waveform
as well as in the short time spectrum in
which the fundamental frequency and its harmonics
show up as narrow peaks at equally spaced
frequencies.
39
Frequency Domain Interpretation For Short Term
Fourier Transform
For short windows, the time sequence s(m)w(n-m)
doesnt show the signal periodicity, nor does
the signal spectrum. It shows the broad spectral
envelop very well.
40
Frequency Domain Interpretation For Short Term
Fourier Transform
Shows irregular series of local peaks and
valleys due to the random nature of the unvoiced
speech
41
Frequency Domain Interpretation For Short Term
Fourier Transform
Using the shorter window smoothes out the random
fluctuations in the short time spectral
magnitude and shows the broad spectral envelope
very well
42
Linear Filtering Interpretation of the short-time
Fourier Transform
  • The linear filtering interpretation of the short
    time Fourier Transform
  • i.e Sn(ejwi) is a convolution of the low pass
    window, w(n), with the speech signal, s(n),
    modulated to the center frequency wi

From A

43
FFT Implementation of Uniform Filter Bank Based
on the Short-Time FT
44
FFT Implementation of Uniform Filter Bank Based
on the Short-Time FT
45
FFT Implementation of Uniform Filter Bank Based
on The Short Time FT
  • The FFT implementation is more efficient than
    the direct form structure

46
Nonuniform FIR Filter Bank Implementations
The most general form of a nonuniform FIR filter
bank
47
Nonuniform FIR Filter Bank Implementations
  • The kth bandpass filter impulse response, hk(n),
    represents a filter with a center frequency ?k,
    and bandwidth ??k.
  • The set of Q bandpass filters covers the
    frequency range of interest for the intended
    speech recognition application

48
Nonuniform FIR Filter Bank Implementations
  • Each band pass filter is implemented via a direct
    convolution
  • Each band pass filter is designed via the
    windowing design method
  • The composite frequency response of the Q-channel
    filter bank is independent of the number and
    distribution of the individual filters

49
Nonuniform FIR Filter Bank Implementations
A filter bank with the three filters has the
exact same composite frequency response as the
filter bank with the seven filters shown in
figure above
50
Nonuniform FIR Filter Bank Implementations
  • The impulse response of the kth bandpass filter
  • The frequency response of the kth bandpass filter

Impulse response of ideal band pass filer
FIR window

51
Nonuniform FIR Filter Bank Implementations
  • Thus the frequency response of the composite
    filter bank


1

52
Nonuniform FIR Filter Bank Implementations
  • Where wmin is the lowest frequency in the filter
    bank and wmax is the highest frequency
  • Equation 1 can be written as
  • Which is independent of the number of ideal
    filters, Q, and their distribution in the
    frequency


53
FFT-Based Nonuniform Filter Banks
  • By combining two or more uniform channels the
    nonuniformity can be created
  • Consider taking an N-point DFT of the sequence
    x(n)

54
FFT-Based Nonuniform Filter Banks
  • The equivalent kth channel value, Xk can be
    obtained by weighing the sequence, x(n) by the
    complex sequence 2 exp(-j (?n/N))cos(?n/N).
  • If more than two channels are combined, then a
    different equivalent weighing sequence results

55
Tree Structure Realizations of Nonuniform Filter
Banks
  • In this method the speech signal is filtered in
    the stages, and the sampling rate is successively
    reduced at each stage

56
Tree Structure Realizations of Nonuniform Filter
Banks
57
Tree Structure Realizations of Nonuniform Filter
Banks
  • The original speech signal, s(n), is filtered
    initially into two bands, a low band and a high
    band
  • The high band is down sampled by 2 and represents
    the highest octave band (?/2? ?) of the filter
    bank.
  • The low band is similarly down sampled by 2 and
    fed into second filtering stage in which the
    signal is again split into two equal bands.
  • Again the high band of the stage 2 is down
    sampled by 2 and is used as a next highest filter
    bank output.

58
Tree Structure Realizations of Nonuniform Filter
Banks
  • The low band is also down sampled by 2 and fed
    into a third stage of filters
  • These third stage output after down sampling by
    factor 2, are used as the two lowest filter bands

59
Summary of considerations for speech recognition
filter banks
  • 1st. Type of digital filter used (IIR (recursive)
    or FIR (nonrecursive))
  • IIR Advantage simple to implement and
    efficient.
  • Disadvantage phase response is nonlinear
  • FIR Advantage phase response is linear
  • Disadvantage expensive in implementation

60
Summary of considerations for speech recognition
filter banks
  • 2nd. The number of filters to be used in the
    filter bank.
  • For uniform filter banks the number of filters,
    Q, can not be too small or else the ability of
    the filter bank to resolve the speech spectrum is
    greatly damaged. The value of Q less than 8 are
    generally avoided
  • The value of Q can not be too large, because the
    filter bandwidths would eventually be too narrow
    for some talker (eg. High-pitch females) i.e no
    prominent harmonics would fall within the band.
    (in practical systems the value of Q32).

61
Summary of considerations for speech recognition
filter banks
  • In order to reduce overall computation, many
    practical systems have used nonuniform spaced
    filter banks

62
Summary of considerations for speech recognition
filter banks
  • 3rd. The choice of nonlinearity and LPF used at
    the output of each channel
  • Nonlinearity Full wave or Half wave rectifier
  • LPF varies from simple integrator to a good
    quality IIR lowpass filter.

63
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com