Title: Linear Predictive Coding for Speech Compression
1Linear Predictive Coding for Speech Compression
9 March 2006
2Overview
- General Model for Speech Synthesis
- Channel Vocoder
- Linear Predictive Coder (LPC-10)
- Code Excited Linear Prediction (CELP)
- Novel Application
- Sub-band adaptive filtering based on cochlear
model
3Model for Speech Synthesis
- Speech produced by forcing air through vocal
cords, larynx, pharynx, mouth and nose - At transmitter speech is divided into segments
- Each segment analyzed to determine excitation
signal and parameters of vocal tract filter
Excitation Source
Vocal tract filter
Speech
4Channel Vocoder - analysis
- Each segment of input speech analyzed by a bank
of (bandpass) analysis filters - Energy at output of each filter is estimated 50
times a second and transmitted to receiver - Decision made whether segment
- voiced /a/, /e/, /o/ or
- unvoiced /s/, /f/
- Estimate of pitch period (period of fundamental
harmonic) is determined
5Voice vs. Unvoiced Speech
6Channel vocoder - synthesis
- Vocal tract filter implemented by bank of
(bandpass) synthesis filters - For voiced segments, periodic pulse generator is
input - For unvoiced segments, pseudonoise source is
input - Period determined by pitch estimate
- Scaled by output of energy estimate
- First approach to speech compression
7Linear Predictive Coder
- Models vocal tract as a single linear filter
- yn ?aiyn-iG?n
- Output yn, Input ?n, Gain G
- Input is random noise (unvoiced) or periodic
pulse (voiced) - LPC-10 is a standard (2.4 kb, 8000 Samples/sec)
8LPC - Voiced/Unvoiced Decision
- Voiced speech has more energy and lower frequency
than unvoiced - Speech segment lowpass filtered, energy at output
relative to background noise used to determine - Zero-crossings counted to determine frequency
- Continuity critereon voicing decision of
neighboring frames taken into account
9LPC - Estimating Pitch Period
- Extracting pitch from short noisy segment is
difficult - One approach is to maximize autocorrelation
- Periodicity isnt strong enough
- Threshold cant be used because maximum value not
known in advance
10LPC - Estimating Pitch Period
- LPC-10 uses average magnitude difference function
(AMDF) - AMDF(P) (1/N)?yi-yi-P
- If yn is periodic with period P0, samples P0
apart will have values close to each other and
AMDF will have a min at P0 - AMDF is periodic for voiced and roughly flat for
unvoiced - AMDF is min when P is the pitch period and
spurious min in unvoiced segments are shallow
11LPC - Obtaining Vocal Tract Filter
- At transmitter, we want filter coeffs that best
match the segment in a mean squared error - en2(yn- ?aiyn-iG?n)2
- Autocorrelation approach assumes yn is
stationary - A R-1P
- Recursive solution uses Levinson-Durbin
12LPC - Obtaining the Vocal Tract Filter
- Covariance approach discards stationarity
assumption (not valid for speech signals) - cij Eyn-iyn-j
- yields
- CA S
13LPC - Obtaining the Vocal Tract Filter
- cij are estimated as
- cij ?yn-iyn-j
- No longer assume values of yn outside of segment
are zero - Cholesky decomposition required
- Reflection coeffs used to update voicing decision
14LPC - Transmitting Parameters
- Tenth order filter used for voiced speech and
fourth order for unvoiced - Vocal tract filter is sensitive to errors in
reflection coeffs close to one - gi (1ki)/(1-ki)
- are quantized and sent instead of ki
15Code Excited Linear Prediction
- Single pulse per pitch period leads to buzzy
twang - Variety of excitation signals is allowed
- For each segment encoder finds excitation vector
that generates synthesized speech that best
matches speech being coded
16Sub-band adaptive filtering
- Multi-channel speech enhancement system
- Greater number of sub-bands used, the faster the
convergence of the overall system
17Cochlear Modelling
- Sub-band filters are distributed logarithmically
in frequency to approximate distribution of
filters in cochlea
18Adaptive Noise Cancellation
- LMS algorithm is used to model differential
transfer function between noise signals in a
number of sub-bands - Lower power and shorter filters used in each
sub-band - Convergence is equal across all bands if power is
distributed equally and filter lengths are the
same - Convergence dominated by sub-band with greatest
power