Title: Chap 6' Speech signal Representations
1Chap 6. Speech signal Representations
- Short-time Fourier analysis
- (1) Speech signal is time-variant
- (2) Short-time stationary
- (3) time domain ? Frequency domain using FFT
- (4) Windowing function (FFT size)
-
- Hamming window was the most frequently
used one. - For 8KHz, window size 256 (240, 30ms)
(zero-pending) - For 16KHz, window size 512 (480, 30ms)
- (5) Spectrograms
- Narrow band, wide band spectrogram (FFT
resolution, output filter) -
-
2Source filter Model
Nostrils all-pole model is not good enough
The lips model Usually using 1-?z-1 ? 0.9,
0.95, 0.97
All-poles model (LPC)
3Linear Prediction Coding (LPC)
- Linear prediction (AR model)
- According to losses tube model (lattice formula
which will introduce later) - 8KHz sampling, c340 m/s, L 17cm ? N8 (2 poles
for 1KHz)
4- Solve the linear prediction coefficients using
MMSE criterion
5 6- Solution of covariance method using Cholesky
decomposition -
- (1) solve V, D
-
-
7 8 9- R is Toepliz
- Using Levinson Dubins algorithm
The algorithm is to transfer Ladder filter ?
Lattice filter lattice filter ? cascade form Ei
the square error of prediction Ki the
coefficients of lattice filter refection
coefficients
10- Lattice filter
- Define forward/backward prediction errors
11 12Spectral analysis vs. LPC
13- Prediction error vs. LPC order
14Conversion between parameters
- Reflection coefficients vs. LPC
- Log-area ratios
15Cepstral processing
- Spectral vs. Cepstral
- Cepstral is a homomorphic transformation
(de-convolution) - The Block diagram
16 17- Cepstrum of pole-zero function
18LPC derived cepstrum
19- Cepstrum of speech signal
- periodic excitation train
- ? impulse train with same period in
cepstrum-domain - Cepstrum of windowing signal
- Example
- (from Fundamentals of Speech Recognition, by B.
H. Juang) -
20Mel-frequency Cepstrum (MFCC)
- Change the frequency scale into Mel-scale
-
Frequency quantiztion?
21-
- M20 for Fs8KHz, 24 for Fs16KHz
-
22Pitch detector of speech signal
- Speech signal is a quasi-periodic signal, because
the speech is a time-variated signal. - Find the pitch frequency (Fundamental freq., F0)
- ? find the period of a discrete signal.
- Assume the pitch contour will continue
- ? find a smooth pitch contour
- ? smoothing/contour tracking algorithm is
needed.
23- Autocorrelation method
- - autocorrelation function of a periodic signal
is also periodic, ? finding the time shift with
max autocorrelation ? period
24- An example of Autocorrelation and pitch contour
- Half pitch/Double pitch error?
- U/V decision?
- Smoothness of pitch contour?
Max picking range? Global or Local max?
25- Normalized Cross-correlation method - used
cross-correlation instead of - Decaying the Normalized
- Cross-correlation
- wrt. to T?
26- Pitch tracking
- ? leave more candidates in each frame
- ? define a cost function
- ? Viterbi search