Chapter 3' Source Coding

About This Presentation

Title:

Chapter 3' Source Coding

Description:

This is called the mutual information between and ... for instantaneously decodable codes must look like a probability mass function ... – PowerPoint PPT presentation

Number of Views:2749

Avg rating:3.0/5.0

Slides: 87

Provided by: philli82

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 3' Source Coding

1
Chapter 3. Source Coding

Recall the Purpose of Source Coding
Efficiently (i.e. minimizes the number of bits)
represents the information source output
digitally
Can be viewed as data compression
Start by developing a mathematical model for
information
NOTE The results of chapter 2 are used since
the information sequence can be viewed as a
stochastic process. This is a subtle but
important point. If we knew what we wanted to
transmit a priori, then there would be no need to
have a communication system. The receiver would
know what to expect.

2
3.1 Mathematical Models for Information Sources

Discrete Source Letters selected from an
alphabet with a finite (say L) number of
elements, .
Binary Source Two letters in the alphabet.
WLOG, alphabet is the set .
Each letter in alphabet has a probability of
occurring at any given time of . That is,
At each point in time, one letter of the alphabet
is chosen, implying,
We will only consider two types of discrete
sources
Memoryless Assumes each letter is chosen
independently of every other letter (past and
present). Gives a discrete memoryless source
(DMS)
Stationary The joint probabilities of any two
sequences of arbitrary length formed by shifting
one sequence by any amount are equal.

3
3.1 Mathematical Models for Information Sources

Analog Source An analog source is represented
by a waveform, that is a sample function of
a stochastic process
Unless otherwise noted, we will assume is
stationary thus having autocorrelation function
and power spectral density
When is band-limited,
then the signal can be represented by
the sampling theorem. The sequence that comes
from the sampling theorem can be
viewed as a discrete time source.
Note that while the stochastic process generated
by the sampling theorem is discrete in time, it
is generally continuous at any instant of time.
Thus there there is the need to quantize the
values of the sequence, producing quantization
error.

4
3.2 A Logarithmic Measure of Information

Lets develop the concept of information as a
measure of how much knowing the outcome of one
RV, tells us about the outcome of another RV,
.
Lets start with two RVs with a finite set of
outcomes
We observe some outcome and wish to
quantitatively determine the amount of
information this occurrence provides about each
possible outcome of the RV
Note the two extremes
If X and Y are independent then knowledge of one
provides no knowledge of the other. (Which we
would like to have a measure of zero
information.)
If X and Y are fully dependent the knowledge of
one provides absolute knowledge of the other.
(Thus the measure of information should relate to
just the probability of .)

5
3.2 A Logarithmic Measure of InformationMutual
Information

A suitable measure that captures this is
This is called the mutual information between
and
The units of are determined by the
base of the logarithms which is usually either 2
or e. Base 2 units are called bits (binary unit)
and base e units are called nats (natural units).
Note that this satisfies our intuition on a
measure for information
Independent events
Fully dependent events

6
3.2 A Logarithmic Measure of InformationSelf-Info
rmation

But note that the equation for fully dependent
events is just the information of the event
. Thus the equation
is called the self-information of the event
.
Note that a high probability event conveys less
information than a low probability event. This
may seem counter-intuitive at first but it is
exactly what we want in a measure of self
information.
Consider the following thought experiment. Which
statement conveys more information?
The forecast for Phoenix AZ for July 1st is sunny
and 95º F.
The forecast for Phoenix AZ for July 1st is 1 of
snow and -5º F.
Note that the more shocking the statement, the
less likely its occurring, thus the more
information it conveys.
In fact, if the outcome is deterministic, then no
information was conveyed. Hence there was no
need to transmit the data.

E
7
3.2 A Logarithmic Measure of InformationCondition
al Self-Information

Lets define conditional self-information as
The reason for this is
Which provides a useful relationship for mutual
information being the removal of conditional
self-information from self-information.
Note that since both and
this implies mutual information can be
positive, negative or equal to zero.

8
3.2.1 Average Mutual Information and
EntropyAverage Mutual Information

Mutual information was defined for a pair of
events . Now we would like to look at
the average value of the mutual information
across all possible pairs of events. This is the
definition of the expectation.
Note While the mutual information of an event
can be negative, the average mutual information
is greater than or equal to zero. And equality
to zero only occurs when X and Y are
statistically independent.

9
3.2.1 Average Mutual Information and
EntropyEntropy

Similarly, we define the average self-information
as
Note that average self-information is denoted by
and this term is called the entropy of
the source.
ASIDES
The definition of entropy (as well as all of the
other definitions for this chapter) are not
functions of the values that the RV takes on but
rather functions of the pdf of the RV. This is
called functional of the distribution.
The reason for the use of the term entropy for
the average measure of self information is that
there is a relation between this measure and the
measure of entropy in thermodynamics.

10
3.2.1 Average Mutual Information and Entropy
Axiomatic Approach to Entropy

We have defined information from an intuitive
approach. This may facilitate learning but is
not rigorous. For example, are there other
possible measures of information? Our approach
does not allow us to explore an answer to that
question. However, it is possible (and is the
approach that Shannon took) to define entropy
(and thus all the other information measures)
axiomatically by defining the properties that
entropy and RVs must satisfy. The axioms needed
for a functional measure of information are based
upon a symmetric function
Normalization
Continuity is a
continuous function of p
Grouping
Under these axioms, the functional of entropy
must be of the form

11
3.2.1 Average Mutual Information and Entropy

Note that when the RV is distributed uniformly,
then
Also, this is the maximum value that the entropy
will take on. That is, the entropy of a discrete
source is a maximum when the output letters are
equally probable.
Note that we will use the convention
since

12
Figure 3.2-1 Binary entropy function.
MATLAB Code q0.001.011
H-q.log2(q)-(1-q).log2(1-q) plot(q,H)
axis square title('Entropy of a Independent
Binary Source') xlabel('Probability q')
ylabel('Entropy H(q)')
13
3.2.1 Average Mutual Information and
EntropyAverage Conditional Entropy

Average conditional entropy is defined as
and is interpreted as the information (or
uncertainty) in X after Y is observed.
We can easily derive a useful relationship for
mutual information as
Since this implies that
with equality iff X and Y
are statistically independent.
This can be interpreted as saying that knowledge
of any event always increases the certainty of
other events (or has no effect if statistically
independent). That is, knowledge never increases
entropy.

14
Figure 3.2-2 Conditional entropy for
binary-input,binary-output symmetric channel.
15
Figure 3.2-3 Average mutual information for
binary-input, binary-output symmetric channel.
16
3.2.1 Average Mutual Information and
EntropyMultiple RVs

Generalization of entropy to multiple RVs
Note the following visual relationship between
all the quantities of average information
mentioned

17
3.2.2 Information Measures for Continuous Random
Variables

Since information measures are functionals of the
pdf, there is a straight forward extension to the
information of continuous RVs. It is just the
replacement of summations by integrations in the
expectation
For example, the continuous RV version of mutual
information is

18
3.2.2 Information Measures for Continuous Random
Variables

Recall that the interpretation of self
information is the number of bits needed to
represent an information source. For a
continuous RV, the probability of any event
occurring is zero, thus the entropy becomes
infinite. However, we can still define the
useful relationship
but note this is called the differential entropy
and cannot be interpreted the same way that
entropy of a discrete RV is interpreted.

19
3.2.2 Information Measures for Continuous Random
Variables

But the concept of differential entropy does
allow us to develop a useful equation.
First define the average conditional entropy for
a continuous RV as
Then
which is the same for discrete RVs.

E
20
3.3 Coding For Discrete Sources

We can now (finally) use the framework developed
to date (i.e stochastic processes and information
measures) to develop source coding for a
communication system.
We will measure the efficiency of the source
encoder by comparing the average number of bits
per letter of the code to the entropy of the
entropy.
Note The problem of coding is easy to solve if
you can assume a DMS (i.e. statistically
independent letters). But a DMS is rarely an
accurate model of an information source.

21
3.3.1 Coding for Discrete Memoryless Sources

Given a DMS producing a symbol every seconds.
The alphabet of the source is
The probability of each symbol at any given point
in time is
The entropy for the source come directly from the
definition
And the entropy is largest when each symbol is
equally probable
Two approaches to DMS source coding
Fixed-length code words
Variable-length code words

22
3.3.1 Coding for Discrete Memoryless
SourcesFixed-Length Code Words

Consider a block encoding scheme which assigns a
unique set of bits to each symbol. Recall we
are given there are symbols, implying there
are a minimum of

bits per symbol required if is a power of 2
or bits per symbol required if is not a power
of 2.

Example 26 letters in the alphabet implies a
fixed length code requires at least

bits per symbol.
The code rate is now bits per symbol.
Since

23
3.3.1 Coding for Discrete Memoryless
SourcesFixed-Length Code Words

Efficiency The efficiency of a coding scheme is
measured as
Note When the number of symbols is equal to a
power of 2 and each symbol is equally likely to
occur, the efficiency is 100.
Note When the number of symbols is not equal to
a power of 2, even when the symbols are equally
likely to occur, efficiency will always be less
than 100 since
Thus if the number of bits needed to encode the
alphabet is large (i.e. is large, which
implies ) the efficiency of the
coding is large.
What can we do to increase the efficiency of the
source encoding if is not large?

24
3.3.1 Coding for Discrete Memoryless
SourcesFixed-Length Code Words

One way to increase the coding efficiency for
fixed-length codes of a DMS is to artificially
increase the number of symbols in the alphabet by
encoding multiple ( ) symbols at a time. For
this case there are unique code words.
bits accommodates code words
To ensure each of the code words are
covered we must ensure
This can be done by setting
The efficiency increases since
Thus we can increase efficiency as much as
possible by arbitrarily increasing .

25
3.3.1 Coding for Discrete Memoryless
SourcesFixed-Length Code Words
26
3.3.1 Coding for Discrete Memoryless
SourcesFixed-Length Code Words

If there is at least one unique code word per
source symbol (or block of source symbols) then
the coding is called noiseless.
There are times when you may not want to have one
code word per symbol. Can anyone think of why
this may be?
When there are fewer code words than source
symbols (or blocks of source symbols) then
rate-distortion approaches are used.
Consider for now the following
We want to reduce the code rate
Only of the most likely of the
possible symbol blocks will be uniquely encoded.
The remaining blocks are
represented by the remaining code word
Thus there will be a decoding error, ,each
time one of these blocks appear. Such an error
is called a distortion.

27
3.3.1 Coding for Discrete Memoryless
SourcesFixed-Length Code Words

Based upon this block encoding procedure, Shannon
proved the following
Source Coding Theorem I Let be the ensemble
of letters from a DMS with finite entropy
. Blocks of symbols from the source are
encoded into code words of length from a
binary alphabet. For any , the
probability of a block decoding failure can
be made arbitrarily small if
and sufficiently large. Conversely, if
then becomes arbitrarily close to 1 as
is made sufficiently large.
Proof omitted.

28
3.3.1 Coding for Discrete Memoryless
SourcesVariable-Length Code Words

Another way to increase the source encoding
efficiency when symbols are not equally likely is
to use variable-length code words.
The approach is to minimize the number of bits
used to represent highly likely symbols (or
blocks of symbols) and use more bits for those
symbols (or blocks of symbols) that occur
infrequently.
This type of encoding is also called entropy
encoding since you are trying to minimize the
entropy of your information source.
There are other constraints to consider as well
Code must be unique
Instantaneously decodable

29
3.3.1 Coding for Discrete Memoryless
SourcesClasses of Codes
Instantaneous Codes
Uniquely Decodable Codes
Non-Singular Codes
All Codes
30
3.3.1 Coding for Discrete Memoryless
SourcesVariable-Length Code Example

Example Consider the DMS with four symbols and
associated probabilities
Three possible codes given below.
Try and decode the sequence 001001..

31
3.3.1 Coding for Discrete Memoryless
SourcesPrefix Condition and Code Trees

A sufficient condition for a code to be
instantaneously decodable, is that no code word
of length that is identical to the first bits
of another code word whose length is greater than
.
This is known as the prefix condition.
Note that unique codes can be visualized by code
trees where branches represent the bit value used
and nodes represent code words.

32
3.3.1 Coding for Discrete Memoryless
SourcesAverage Bits per Source Letter and Kraft
Inequality

Define the average number of bits per source
letter as
where is the length of the code word
associated with source letter
This is the quantity we would like to minimize.
The conditions for the existence of a code that
satisfies the prefix condition is given by the
Kraft inequality.
A necessary and sufficient condition for the
existence of a binary code with code words having
lengths that satisfy
the prefix condition is
or
The effect of this inequality is that code
assignments for instantaneously decodable codes
must look like a probability mass function

33
3.3.1 Conceptualization of the Kraft Inequality
34
3.3.1 Coding for Discrete Memoryless
SourcesSource Coding Theorem II

Theorem Let be the ensemble of letters from
a DMS with finite entropy and output
letters with corresponding
probabilities of occurrence .
It is possible to construct a code that satisfies
the prefix condition and has an average length
that satisfies the inequalities
Unfortunately, as is the case with many proofs
associated with information theory, the proof of
the Source Coding Theorem II is not constructive.
That is, it only proves the existence of a code
to satisfy the inequalities. It does not give
any insight into how to construct such a code.

35
3.3.1 Coding for Discrete Memoryless
SourcesHuffman Coding Algorithm

Huffman (1952) developed an approach for
developing variable length codes that is optimum
in the sense that the average number of bits
needed to represent the source is a minimum,
subject to the constraint that the code words
satisfy the prefix condition.
Procedure
Order the source symbols in decreasing order of
probabilities
Encode the two least probable symbols by
assigning a value of 0 and 1 to the symbols
arbitrarily (or systematically).
Tie these two symbols together, adding their
probabilities to obtain a new symbol.
Are all symbols accounted for?
No, return to step 2
Yes, continue
The symbol code is obtained by looking at the
tree structure developed by the above procedure.

36
Figure 3.3-4 An example of variable-length
sourceencoding for a DMS.
37
Figure 3.3-5 An alternative code for the DMS
inExample 3.3-1.
38
Example 3.3-1 An example of variable-length
source encoding for a DMS.
39
Figure 3.3-6 Huffman code for Example 3.3-2
40
3.3.1 Coding for Discrete Memoryless
SourcesExtension of Source Coding Theorem II to
Blocks of Length J

Extending the Source Coding Theorem II to blocks
of length gives the inequalities
Thus, the average number bits per source symbol
can be made arbitrarily close to the source
entropy by selecting a sufficiently large block
length.

41
Example 3.3-3 An example of variable-length
source encoding for a DMS Using Blocks.
P(x1)0.45
P(x2) 0.35
0.50
P(x3) 0.20
42
Example 3.3-3 An example of variable-length
source encoding for a DMS Using Blocks.
P(x1, x1)0.2025
P(x1, x2)0.1575
P(x2, x1)0.1575
0.5975
0.28
1.0
P(x2, x2)0.1225
0.3175
0.4025
P(x1, x3)0.09
P(x3, x1)0.09
0.16
0.20
P(x2, x3)0.07
P(x3, x2)0.07
0.11
P(x3, x3)0.04
43
Example 3.3-3 An example of variable-length
source encoding for a DMS Using Blocks.
44
3.3.2 Discrete Stationary Sources

Remove the condition of independence from our
source but keep the condition of stationary.
Consider the entropy of a block of symbols from a
source
Recall that joint probabilities can be factored
This leads to the entropy of a block being
factored as
Which can be viewed as the entropy of a block of
k letters

45
3.3.2 Discrete Stationary Sources

To get the entropy per letter for this block of
k letters, divide by k, which gives
Since we can often assume this source will spit
out an infinite number of symbols, we would like
to consider
We can also define the entropy per letter as a
function of the conditional entropy. It can be
shown that this gives

46
3.3.2 Discrete Stationary Sources

Writing the Source Coding Theorem II to
accommodate a joint PDF gives
Now, in the limit, this gives
Thus we can get arbitrarily close to encoding at
100 efficiency by letting the block size grow.
NOTE Huffman coding is still applicable in this
case.
NOTE You must know the joint PDF for the
J-symbol blocks. (Which becomes more difficult
as J increases.)

47
3.3.3 The Limpel-Ziv Algorithm

The joint probabilities needed for a block
Huffman code is quite often unobtainable.
This provided the motivation for the development
of the Limpel-Ziv algorithm. This technique is
independent of the source statistics.
Techniques that are independent of the source
statistics are called universal source codes.
Limpel-Ziv parses a discrete source into
phrases where a phrase is defined as a sequence
of symbols not yet seen by the algorithm.
These phrases are then put into a dictionary
which will be used to reference each phrase.

48
3.3.3 The Limpel-Ziv AlgorithmExample

The sequence
Becomes
Now form a dictionary

10101101001001110101000011001110101100011011
1,0,10,11,01,00,100,111,010,1000,011,001,110,101,1
0001,1011
49
3.3.3 The Limpel-Ziv AlgorithmExample

Note that in this example there are 44 bits.
To encode this sequence we use 51680 bits
No compression occurred here.
This is due to the shortness of the sequence
being encoded.
The longer the sequence, the better the
compression rate, hence the better the
efficiency.
Limpel-Ziv encoding is the basis for .zip based
data compression codes.

50
3.4 Coding For Analog SourcesOptimum
Quantization

Now consider only information sources that are
analog in nature
The output of the information source can be
modeled as sample function of a stochastic
process.

Analog
Information Sequence
Information Source
Source Encoder
Sample
Stochastic Process
51
3.4 Coding For Analog SourcesOptimum
Quantization

The basic approach is to
Sample evenly through time to produce the
sequence
Note that if is band-limited and
stationary, then sampling at or above the Nyquist
rate induces no loss of information.
Note that each sample can take still take on an
infinite number of heights
Quantize the amplitudes to limit the number of
possible values. This provides discrete source.
The number of bins used to quantize is based upon
the number of bits per sample to be used to
enocde
The size of each bin used in quantizing is a
design issue
If the probability of being in each bin is known,
then entropy coding techniques can be used to
design the coding scheme
Quantization induces distortion to the waveform.
We need to be able to understand and measure this
distortion.

52
3.4 Coding For Analog SourcesOptimum
Quantization
53
3.4.1 Rate-Distortion Function

As mentioned earlier, quantization induces
distortion (i.e. a loss of information content)
in the original signal.
We must define a measure for distortion.
Many exist, most of the form
We will only consider the case
Given a sequence of samples, we would like
to know the average distortion per letter
Now, since the average distortion is a function
of a random variables, making is a random
variable. We define its mean as the distortion.

Stationary Assumption
54
3.4.1 Rate-Distortion Function

We want to minimize the rate, (in bits) to
encode the information source with an average
distortion . The distortion is set based upon
a level acceptable to our application.
This is done through the use of mutual
information. (Recall that an interpretation of
mutual information is the how much knowledge of
one random variable tells you about another. And
it is measure in bits.) Thus we want
Note that this is a function of the distortion
and it is the minimum across all conditional
pdfs.
In general, and intuitively, the rate decreases
as the acceptable distortion increases and vise
versa.

55
3.4.1 Rate-Distortion FunctionMemoryless
Gaussian Source

Restrict our interest to a continuous-amplitude,
memoryless Gaussian source. Shannon proved the
following for this case
The minimum information rate necessary to
represent the output of a discrete-time,
continuous-amplitude memoryless Gaussian source
based on a mean-square-error distortion measure
per symbol is
where is the variance of the Gaussian
source output.
Not the this implies that no information needs to
be transmitted when the acceptable distortion is
greater than or equal to the variance.

56
3.4.1 Rate-Distortion Function Memoryless
Gaussian Source
D00.011 R0.5log2(1/D) plot(D,R)
axis square xlabel('D/\sigma2')
ylabel('R_g(D) in bits/symbol')
57
3.4.1 Rate-Distortion FunctionTheorem Source
Coding with a Distortion Measure

Theorem There exists an encoding scheme that
maps the source output into code words such that
for any given distortion, the minimum rate,
in bits per symbol is sufficient to reconstruct
the source output with an average distortion that
is arbitrarily close to .
Proof Omitted. See Shannon, 59 or Cover and
Thomas
Thus the rate-distortion function provides a
lower bound on the source rate for a given level
of acceptable distortion.

58
3.4.1 Rate-Distortion FunctionDistortion-Rate
Function

It is also possible to write the distortion as a
function of the rate. This yields a
distortion-rate function.
Take for example the rate distortion function for
a memoryless Gaussian source. Re-write it as the
distortion as a function of the rate. (This
allows you to design a system when the rate is
fixed, instead of the accepted level of
distortion.)

Expressing the distortion in decibels we have
Implying that each bit reduces the distortion by
about 6 dB

59
3.4.1 Rate-Distortion FunctionUpper and Lower
Bounds

Development of rate-distortion functions for
various pdfs is beyond the scope of this course.
It is useful though to bound the rate-distortion
function of any discrete-time, continuous-amplitud
e, memoryless source. Without proof, the
following inequalities are given
where
Likewise, this bound can be solved WRT the
distortion as a function of the rate. This
gives
where
Note that an implication of the upper bound is
that the Gaussian pdf has the largest
differential entropy for a given variance.

60
3.4.2 Scalar Quantization

If the pdf of the signal amplitudes into the
quantizer is known, then the encoding can be
optimized. This is done by appropriately
selecting the quantization levels such than the
distortion is minimized. That is, we want to
minimize
(I dont know why he changed notation.)
Over all possible set of quantization
bins/levels. This is also called Lloyd-Max
quantization.
Note if we want to use bits, then the number
of levels is
Two approaches of interest are
Uniform levels
Non-uniform levels

61
3.4.2 Scalar QuantizationUniform Quantization
Illustration
62
3.4.2 Scalar QuantizationNon-Uniform
Quantization for 8 Bit Gaussian with Unit Variance
63
3.4.2 Scalar Quantization Non-Uniform
Quantization

For the non-uniform quantization case, we can
minimize through the following analysis
First, write out the distortion function you want
to minimize
Next, recall that a necessary condition to
minimize any equation is that the first
derivative must be equal to zero. Thus to
minimize the distortion we must have the
following conditions

Now, recall Leibniz Rule
Thus

0
1
0
0
1
0
65
3.4.2 Scalar Quantization Non-Uniform
Quantization

Similar analysis for yields
Interpretation of these two conditions gives

Midpoint
Center of Mass
66
3.4.2 Scalar QuantizationNon-Uniform
Quantization

The big picture
The optimum transition levels lie halfway between
the optimum reconstruction levels. In turn, the
optimum reconstruction levels lie at the center
of mass of the probability density in between the
transition levels.
The two equations giving these conditions are
nonlinear and must be solved simultaneously. In
practice, they can be solved by an iterative
scheme such as Newtons method.
Properties of the Optimum Mean Square Quantizer
(proofs omitted)
The quantizer output is an unbiased estimate of
the input
The quantization error is orthogonal to the
quantizer output
It is sufficient to design mean square quantizers
for zero mean unity variance distributions.
Study tables 3.4-2 through 3.4-6

67
Figure 3.4-2 Distortion versus rate curves
fordiscrete-time memoryless Gaussian source.
68
3.4.3 Vector Quantization

Consider now quantization of a block of signal
samples. This is called block or vector
quantization.
Reasons for developing this approach include
Better performance (i.e. less distortion) can be
obtained when through quantization of blocks.
Can take advantage of structure between dependent
samples to further reduce the average bit rate.
The mathematical formulation of vector
quantization is as follows
Given
n-dimensional, real-valued, continuous amplitude
components, vector
Joint pdf associated with this vector
Find
Another n-dimensional vector Modeled through a
mathematical transformation

69
Figure 3.4-3An example of quantization
intwo-dimensional space.
70
3.4.3 Vector Quantization

The average distortion for vector quantization
becomes
where the distortion is often measured as
or, if the data is not distributed with an
identity covariance matrix
where the matrix used is often the inverse of the
covariance matrix of the data distribution.

71
3.4.3 Vector Quantization

Vector quantization can be viewed as the
generalization of scalar quantization to
multi-dimensions. In this light, there should be
little surprise to learn that there are two
conditions for optimally selecting a vector
quantizer. These are
The quantization cell chosen is the one closest
to the vector of interest
The vector representing a quantization cell is
the centroid of that cell. It is the vector that
minimizes
If the joint pdf is known, these two conditions
can be found through iterative approaches.

72
3.4.3 Vector QuantizationK-Means Algorithm

If the pdf of the joint distribution is not
known, an estimate of the optimum quantization
vectors from a set of training vectors. One
approach to this is called the K-Means algorithm.
K-Means Algorithm
Initialize by setting the iteration number
. Choose a set of output vectors
Classify the training vectors
into clusters by applying the
nearest-neighbor rule
Increment your count and recompute the output
vectors of every cluster by computing the
centroid of the training vectors that fall in
each cluster. Also compute the resulting average
distortion at the th iteration.
Terminate the test if the change
in the average distortions is relatively
small. Otherwise go to step 2.

73
3.4.3 Vector QuantizationK-Means Algorithm

The K-Means algorithms will converge to a local
minimum.
The computational burden of K-Means grows
exponentially as a function of the input vector
dimensionality.
Repeating the process with different initial
output vectors may provide insight into the
global minimum but at the expense of additional
computational burden.
Sub-optimal algorithms exist which greatly
mitigate the computational burden. But note that
usually there is a separate requirement for
memory for these approaches. (Not many free
lunches.)
The output vectors of a vector quantizer are
called the code book.

E
74
3.5 Coding Techniques for Analog Sources

The previous section described techniques for
optimally discretizing (quantizing) an analog
information source.
This section investigates several techniques used
in practice to encode an analog information
source. These can roughly be broken into three
categories
Temporal Waveform Coding (Time domain)
PCM
DCPM
Adaptive (PCM/DCPM)
DM
Spectral Waveform Coding (Frequency domain)
SBC
ATC
Model-Based Coding (Model assumed on the
structure of the data)

75
Figure 3.5-1Input-output characteristicfor a
uniform quantizer.
76
Figure 3.5-2Input-output magnitude
characteristic for alogarithmic compressor.
77
Figure 3.5-3(a) Block diagram of a DPCM
encoder. (b) DPCM decoder atthe receiver.
78
Figure 3.5-4DPCM modified by theaddition of
linearlyfiltered error sequence.
79
Figure 3.5-5Example of a quantizer with
anadaptive step size. ( Jayant, 1974. )
80
Figure 3.5-6(a) Block diagram of adelta
modulation system.(b) An equivalentrealization
of a deltamodulation system.
81
Figure 3.5-7An example of slope-overloaddistorti
on and granular noise in adelta modulation
encoder.
82
Figure 3.5-8 An example of variable-step-sizede
lta modulation encoding.
83
Figure 3.5-9An example of a deltamodulation
system withadaptive step size.
84
Figure 3.5-10 Block diagram of a waveform
synthesizer (source decoder) for an LPC system.
85
Figure 3.5-11 Block diagram model of the
generation of a speech signal.
86
Figure 3.5-12 All-pole lattice filter for
synthesizing the speech signal.

Write a Comment

User Comments (0)