Title: Speech Coding (Part I) ? Waveform Coding
1Speech Coding (Part I) ? Waveform Coding
2Content
- Overview
- Linear PCM (Pulse-Code Modulation)
- Nonlinear PCM
- Max-Lloyd Algorithm
- Differential PCM (DPCM)
- Adaptive PCM (ADPCM)
- Delta Modulation (DM)
3Speech Coding (Part I) ? Waveform Coding
4Classification of Coding schemes
- Waveform coding
- Vocoding
- Hybrid coding
5Quality versus Bitrate of Speech Codecs
6Waveform coding
- Encode the waveform itself in an efficient way
- Signal independent
- Offer good quality speech requiring a bandwidth
of 16 kbps or more. - Time-domain techniques
- Linear PCM (Pulse-Code Modulation)
- Nonlinear PCM ?-law, a-law
- Differential Coding DM, DPCM, ADPCM
- Frequency-domain techniques
- SBC (Sub-band Coding) , ATC (Adaptive Transform
Coding) - Wavelet techniques
7Vocoding
- Voice coding .
- Encoding information about how the speech signal
was produced by the human vocal system. - These techniques can produce intelligible
communication at very low bit rates, usually
below 4.8 kbps. - However, the reproduced speech signal often
sounds quite synthetic and the speaker is often
not recognisable. - LPC-10 Codec 2400 bps American Military Standard.
8Hybrid coding
- Combining waveform and source coding methods in
order to improve the speech quality and reduce
the bitrate. - Typical bandwidth requirements lie between 4.8
and 16 kbps. - Technique Analysis-by-synthesis
- RELP (Residual Excited Linear Prediction)
- CELP (Codebook Excited Linear Prediction)
- MPLP (Multipulse Excited Linear Prediction)
- RPE (Regular Pulse Excitation)
9Quality versus Bitrate of Speech Codecs
10Speech Coding (Part I) ? Waveform Coding
- Linear PCM
- (Pulse-Code Modulation)
11Pulse-Code Modulation (PCM)
- A method for quantizing an analog signal for the
purpose of transmitting or storing the signal in
digital form.
12Quantization
- A method for quantizing an analog signal for the
purpose of transmitting or storing the signal in
digital form.
13Linear/Uniform Quantization
14Quantization Error/Noise
15Quantization Error/Noise
overload noise
overload noise
granular noise
16Quantization Error/Noise
17Quantization Error/Noise
? Quantization Step Size
Unquantized sinewave
3-bit quantization waveform
3-bit quantization error
8-bit quantization error
18The Model of Quantization Noise
? Quantization Step Size
19Signal-to-Quatization-Noise Ratio (SQNR)
- A measurement of the effect of quantization
errors introduced by analog-to-digital conversion
at the ADC.
20Signal-to-Quatization-Noise Ratio (SQNR)
Assume
21Signal-to-Quatization-Noise Ratio (SQNR)
Assume
Is the assumption always appropriate?
22Signal-to-Quatization-Noise Ratio (SQNR)
Each code bit contributes 6dB.
constant
The term Xmax/?x tells how big a signal can
be accurately represented
23Signal-to-Quatization-Noise Ratio (SQNR)
Determined by A/D converter.
Depending on the distribution of signal, which,
in turn, depends on users and time.
24Signal-to-Quatization-Noise Ratio (SQNR)
In what condition, the formula is reasonable?
25Overload Distortion
26Probability of Distortion
Assume
27Probability of Distortion
Assume
28Overload and Quantization Noise withGaussian
Input pdf and b4
Assume
29Uniform Quantizer Performance
30More on Uniform Quantization
- Conceptually and implementationally simple.
- Imposes no restrictions on signal's statistics
- Maintains a constant maximum error across its
total dynamic range. - ?x varies so much (order of 40 dB) across sounds,
speakers, and input conditions. - We need a quantizing system where the SQNR is
independent of the signals dynamic range, i.e.,
a near-constant SQNR across its dynamic range.
31Speech Coding (Part I) ? Waveform Coding
32Probability Density Functionsof Speech Signals
Counting the number of samples in each interval
provides an estimate of the pdf of the signal.
33Probability Density Functionsof Speech Signals
34Probability Density Functionsof Speech Signals
- Good approx. is a gamma distribution, of the form
- Simpler approx. is a Laplacian density, of the
form
35Probability Density Functionsof Speech Signals
- Distribution normalized so that ?x0 and ?x1
- Gamma density more closely approximates measured
distribution for speech than Laplacian. - Laplacian is still a good model in analytical
studies. - Small amplitudes much more likely than large
amplitudesby 1001 ratio.
36Companding
- The dynamic range of signals is compressed before
transmission and is expanded to the original
value at the receiver. - Allowing signals with a large dynamic range to be
transmitted over facilities that have a smaller
dynamic range capability. - Companding reduces the noise and crosstalk levels
at the receiver.
37Companding
Compressor
Expander
Uniform Quantizer
38Companding
Compressor
Expander
Uniform Quantizer
39Companding
After compression, y is Nearly uniformly
distributed
Compressor
Expander
Uniform Quantizer
40The Quantization-Error Variance of Nonuniform
Quantizer
Compressor
Expander
Uniform Quantizer
Jayant and Noll
41The Quantization-Error Variance of Nonuniform
Quantizer
Compressor
Expander
Uniform Quantizer
Jayant and Noll
42The Optimal C(x)
Jayant and Noll
If the signals pdf is known, then the minimum
SQNR, is achievable by letting
Compressor
Expander
Uniform Quantizer
43The Optimal C(x)
Jayant and Noll
If the signals pdf is known, then the minimum
SQNR, is achievable by letting
Is the assumption realistic.
Compressor
Expander
Uniform Quantizer
44PDF-Independent Nonuniform Quantization
Assuming overload free,
We require that SQNR is independent on p(x).
45Logarithmic Companding
46?-Law A-Law Companding
- ?-Law
- A North American PCM standard
- Used by North America and Japan
- A-Law
- An ITU PCM standard
- Used by Europe
47?-Law A-Law Companding
- ?-Law
- A North American PCM standard
- Used by North America and Japan
- A-Law
- An ITU PCM standard
- Used by Europe
(?255 ? in U.S. and Canada)
(A87.56 ? in Europe)
48?-Law A-Law Companding
49?-Law Companding
50?-Law Companding
51?-Law Companding
Linear
Log
52Histogram for ?-Law Companding
x(n)
y(n)
53?-law Approximation to Log
Distribution of quantization level for a ?-law
3-bit quantizer.
54SQNR of ?-law Quantizer
- 6.02b dependence on b
- Much less dependence on Xmax/?x
- For large ? SQNR is less sensitive to the changes
in Xmax/?x
? good
? good
? good
55Comparison of Linear and ?-law Quantizers
Linear
56A-Law Companding
57A-Law Companding
Linear
Log
58A-Law Companding
59SQNR of A-Law Companding
60Demonstration
PCM Demo
61Speech Coding (Part I) ? Waveform Coding
62How to design a nonuniform quantizer?
Q(x) Quantization (Reconstruction) Level
ck
ck?1
qk
63How to design a nonuniform quantizer?
Q(x) Quantization (Reconstruction) Level
?
?
ck
ck?1
?
qk
64How to design a nonuniform quantizer?
- Major tasks
- Determine the decision thresholds xks
- Determine the reconstruction levels qks
- Related task
- Determine codewords cks
65Optimal Nonuniform Quantization
- Major tasks
- Determine the decision thresholds xks
- Determine the reconstruction levels qks
An optimal quantizer is the one that minimizes
the following quantization-error variance.
66Optimal Nonuniform Quantization
67Necessary Conditions for an Optimum
? leads to the centroid condition
? leads to the nearest neighborhood condition
68Necessary Conditions for an Optimum
? leads to the centroid condition
? leads to the nearest neighborhood condition
69Optimal Nonuniform Quantization
This suggests an iterative algorithm to reach the
optimum.
? leads to the centroid condition
? leads to the nearest neighborhood condition
70The Max-Lloyd algorithm
- Initialize a set of decision levels xk and set
- Calculate reconstruction levels qk by
- Calulate mse by
- If , exit.
- Set and adjust decision levels xk
by - Go to 2
71The Max-Lloyd algorithm
This version assumes that the pdf of signal is
availabe.
- Initialize a set of decision levels xk and set
- Calculate reconstruction levels qk by
- Calulate mse by
- If , exit.
- Set and adjust decision levels xk
by - Go to 2
72The Max-Lloyd algorithm(Practical Version)
Exercise
73Speech Coding (Part I) ? Waveform Coding
74Typical Audio Signals
Do you find any correlation and/or redundancy
among the samples?
A segment of audio signals
75The Basic Idea of DPCM
- Adjacent samples exhibit a high degree of
correlation. - Removing this adjacent redundancy before
encoding, a more efficient coded signal can be
resulted. - How?
- Accompanying with prediction (e.g., linear
prediction) - Encoding prediction error only
76Linear Prediction
77Linear Predictor
78DPCM Codec
Channel
Quantizer
Channel
A/D converter
79DPCM Codec
The dynamic range of prediction error is much
smaller than the signals. ? Less quantization
levels needed
Channel
Quantizer
?
Predictor
Channel
A/D converter
Predictor
80Performance of DPCM
- By using a logarithmic compressor and a 4-bit
quantizer for the error sequence e(n), DPCM
results in high-quality speech at a rate of
32,000 bps, which is a factor of two lower than
logarithmic PCM
81Speech Coding (Part I) ? Waveform Coding
82Basic Concept
- The power level in a speech signal varies slowly
with time. - Let the quantization step dynamically adapt to
the slowly time-variant power level.
?(n)
83Adaptive Quantization Schemes
- Feed-forward-adaptive quantizers
- estimate ?(n) from x(n) itself
- step size must be transmitted
- Feedback-adaptive quantizers
- adapt the step size, ?, on the basis of the
quantized signal - step size needs not to be transmitted
84Feed Forward Adaptation
85Feed Forward Adaptation
The source signal is not available at receiver.
So, the receiver cant evaluate ?(n) by itself.
Quantizer
Encoder
Step-Size Adaptation System
?(n) has to be transmitted.
Decoder
Quantization error
86The Step-Size Adaptation System
Step-Size Adaptation System
Estimate signals short-time energy, ?2(n), and
make ?(n) ? ?(n).
87The Step-Size Adaptation System ? Low-Pass
Filter Approach
88The Step-Size Adaptation System ? Low-Pass
Filter Approach
? 0.99
? 0.9
89The Step-Size Adaptation System ? Moving Average
Approach
90Feed-Forward Quantizer
- ?(n) evaluated every M Samples
- Use M128, 1024 for estimates
- Suitable choosing of ?min and ?max
91Feed-Forward Quantizer
- ?(n) evaluated every M Samples
- Use M128, 1024 for estimates
- Suitable choosing of ?min and ?max
Too long
92Feedback Adaptation
?(n) can be evaluated at both sides using the
same alogorithm. Hence, it needs not to be
transmitted.
93The Step-Size Adaptation System
The same as feed-forward adaptation except that
the input changes.
Step-Size Adaptation System
Step-Size Adaptation System
94Alternative Approach to Adaptation
- P(n)?P1, P2, depends on c(n?1).
- Needs to impose the limits
- The ratio ?max/?min controls the dynamic range of
the quantizer.
95Alternative Approach to Adaptation
- P(n)?P1, P2, depends on c(n?1).
- Needs to impose the limits
- The ratio ?max/?min controls the dynamic range of
the quantizer.
96Alternative Approach to Adaptation
97Speech Coding (Part I) ? Waveform Coding
98Delta Modulation
- Simplest form of DPCM
- The prediction of the next is simply the current
- Sampling rate chosen to be many times (e.g., 5)
the Nyquist rate, adjacent samples are quite
correlated, i.e., s(n)?s(n?1). - 1-bit (2-level) quantizer is used
- Bit-rate sampling rate
99Review DPCM
Quantizer
Channel
Channel
A/D converter
100DM Codec
Quantizer
Channel
z?1
Channel
A/D converter
z?1
101Distortions of DM
0
1
1
1
1
1
0
0
0
0
1
0
0
1
0
code words
102Distortions of DM
slope overload condition
granular noise
code words
103Choosing of Step Size
Needs large step size
Needs small step size
104Adaptive DM (ADM)
105Adaptive DM (ADM)