Title: Digital Communications Part 1
1Digital Communications Part 1
VERSION 2005_1
- Dr. David G. M. Cruickshank
- Room 2.04, Alexander Graham Bell Building
- Email David Cruickshank_at_ed.ac.uk
2Contents of part 1
- Information theory and source coding (data
compression) - Channel coding (forward error correction coding)
- Probability, noise and decision theory
Most of the material in this course is taken from
the book Digital Communications by I.A. Glover
and P.M. Grant, published by Pearson Education,
ISBN 0 13 089399 - 4 (second edition)
3(No Transcript)
4(No Transcript)
5Lecture 1
- Sources of noise
- pdf and CD
- Noise distributions
- Central limit theorem
- Signal detection
- Important distributions
6Sources of Noise
- Thermal (Random motion of free electrons in a
conductor). Not much you can do about this
except cool your system (on a Kelvin scale) - Man made (car ignition system, electric motors,
switching). Predictable in some ways in that it
tends to be impulsive in nature - Co-channel (Other systems using the same channel.
Actually interference and can be removed under
some circumstances, but often treated as noise.) - Natural sources (lightning, sunspots). Again
impulsive in nature.
7Noise
Noise cannot be predicted and therefore cannot be
eliminated (even in theory). It can only be
described statistically.
If the noise is any way predictable, it tends
to be thought of as interference and signal
processing may be able to reduce it or even
eliminate it. Often possible with co-channel
interference.
8Random Variables
- The statistics of a random variable (all you can
say about a random variable) are completely
defined by its probability density function
(pdf), mean and standard deviation. - The mean is sometimes called the expected value
of X, E(X). - The standard deviation is the square root of the
variance, only one needs to be stated for a
random variable and this allows calculation of
the other.
9Probability Density Function (pdf)
- The probability density function (pdf) of a
random variable X is referred to as pX(x). This
is nearly always shortened to simply p(x). Note
lowercase p. - Why pdf rather than PDF is just a historical
artifact. - A capital letter (usually X) refers to the random
variable - Small letters refer to specific values of X, for
example x, x1, x2
10CD
- The same information that is in the pdf can be
represented by the cumulative distribution (CD).
The CD of a random variable X is denoted PX(x),
nearly always shortened to P(x). Note the
capital P. - In other places CD is described as the cumulative
distribution function (cdf). - In the Digital Signal Analysis course, the CD is
called the probability distribution function
(dont try and abbreviate this or you will get
very confused!).
11pdf and CD
- The CD and pdf are related by the equation-
12pdf
CD
13- The pdf and CD can also be discrete, the pdf
becomes a series of impulses and the CD has steps
in it. - P() is sometimes used to denote the probability
of the function in () being true. The meaning
should be obvious from the contents of the
brackets, if P() means probability then the
brackets will have an expression in them, if P()
means CD then the brackets will have only the
name of a random variable in them. - Some properties of pdf p(x) and CD P(x)-
14Probability of X lying between x1 and x2-
Probability of X being greater than x1-
15Common noise distributions - Uniform
When a random variable X is equally likely to
take any value in the range x1 to x2, we say that
X is uniformly distributed between x1 and x2.
This is some times written as U(x1,x2). The pdf
and CD of a uniform distribution are-
pdf
CD
16(No Transcript)
17CD
18- Quantization noise from analogue to digital
converters is usually uniformly distributed. - Computer generated random numbers are uniformly
distributed (actually, strictly speaking a
computer cannot generate random numbers and the
random numbers generated can be predicted if you
know the deterministic algorithm used, but this
is rarely important, they seem random and are
uniformly distributed!).
19Moments of a Distribution
- As stated earlier, there are only two moments of
interest for most communication systems, the
first moment (the mean, expected value or time
average) and the second central moment (the
variance). - The first moment can be calculated from the pdf
using-
20For example, if we have a uniform distribution
between -0.5 and 0.5, U(-0.5,0.5), the mean is-
For U(0,a)-
21The second moment is the mean square of the
distribution.
The second central moment (usually called the
variance s2) is given by-
The square root of the variance (s ) is the
standard deviation and represents the root mean
square (RMS) of the random variable.
22For example the variance of the distribution
U(-a/2, a/2) is-
This is an important result for quantization
noise where a is the step size between
quantization levels.
23Bit Error Ratio (Pb)
We often wish to calculate the bit error ratio
(Pb) (the ratio of incorrectly received bits to
the total number of bits received) for a binary
signal given an average signal to noise ratio and
a noise distribution. If we normalize the signal
to /- 1 (polar signaling with zero mean and
variance and power 1), we can calculate the
standard deviation of the noise as-
24If we employ a threshold detector, VTh, and
assuming that the two possible errors are equally
bad, the Pb is therefore-
Often PTX(-1)PTX(1)0.5 (random binary signal),
the noise has zero mean and is symmetrical, in
which case the best threshold to use is-
25- Probability density function of
- Binary signal
- Noise
- Signal and noise (split into its two components)
26If we normalize the signal to /-1 V and VTh 0
then we have-
27Gaussian Distribution
This important distribution is named after The
German mathematician Johann Fredrich Gauss. pdf
and CD are-
28m is the mean of the distribution and s is the
standard deviation. No closed form expression
exists for P(x). We can calculate P(x) from
tables such as the one at the back of the Digital
Communications book, where the first thing we do
is calculate z-
Then we look up erf (z) in the table. We can
then calculate P(x) -
29If the mean is 1 and the standard deviation 2
then the probability of X being less than 3 is-
Note that if z is negative, erf(z)-erf(-z)
30(No Transcript)
31CD
32The Central Limit Theorem
Convolution is normally denoted by . Applied to
two time functions f(t) and g(t) -
33It is usually easier to think of the convolution
operation geometrically-
- Reverse the second distribution in the horizontal
direction (irrelevant if the second distribution
is symmetric around t 0). - Offset the second distribution by t (ve t to
the right). - z(t) is the area under the point by point
multiplication of the two distributions.
34 35 36Central Limit Theorem
- The central limit theorem states that if we add
together lots of independent random variables of
equal power, the pdf of the sum is Gaussian
distributed regardless of what the initial
distribution was. - The previous slide shows the result of convolving
three uniformly distributed random variables and
as you can see the results is already tending
towards a Gaussian distribution
37- Note that thermal noise is the sum of many
different electrons moving about randomly excited
by heat. - Therefore it tends to have a Gaussian
distribution.
38Discrete Convolution
Discrete convolution is the same as continuous
except t can only take discrete values and the
result is taken by summing as opposed to
integrating.
39Example Probability of the sum of two die being
4-
40Signal Detection
- We are often interested in the probability that a
random variable X will lie above or below a
certain threshold x. - Example If we have a Gaussian distributed noise
source with a mean of 5V and standard deviation
2V what is the probability that- - A) The variable will exceed 8V.
- B) The variable will be less than 0V.
41Answers-
A)
42B)
43Rayleigh Distribution
Discovered by the English physicist Lord Rayleigh
(1842-1919). The most general definition for
this pdf is-
In communications systems the parameter a is
nearly always zero and this is assumed from now
on. The normalized version of this distribution
has s2 as 1-
44Rayleighpdf
45This distribution gives the distribution of the
distance of darts from the target (notice the
probability of hitting the target exactly is
zero!). More interestingly for a communications
course the received power level in some multi -
path conditions is Rayleigh distributed. The
random variable used in this case is normally R
giving the equation below as the pdf.
This is the definition given in the book.
46Lecture 2
47Contents
- Introduction
- Transmission and reception probabilities in the
presence of noise - Bayess decision criterion
- Neyman Pearson criterion
48Introduction
- There are two types of decision
- Hard decision decoding, where it is immediately
decided whether a 1 or 0 was transmitted - Soft decision decoding where a scale of 8 to 16
levels (3 or 4 bits) is used to decide exactly
how close to 0 or 1 an actual received signal
was. This is often used in conjunction with a
soft input decoding algorithm. - For this lecture only we will assume a hard
decision is what is required.
49The following probabilities need to be
understood-
- P(0) The a priori probability of transmitting a
0 - P(1) The a priori probability of transmitting a
1 - p(v0) The conditional probability of v received
given a 0 was transmitted - p(v1) The conditional probability of v received
given a 1 was transmitted - P(0v) the a posteriori probability that 0 was
transmitted given v received - P(1v) the a posteriori probability that 1 was
transmitted given v received
50a priori means before transmission a posteriori
means after transmission
- and 2) are properties of the source and are often
known in advance e.g. random binary data has
P(0)P(1)0.5 - 3) and 4) are properties of the channel (not
always known) - 5) and 6) are what we would like to know!
51Example Binary symmetric channel
A binary symmetrical channel is one in which the
two binary possibilities (1 and 0) are sent
with equal probability. Assuming that p is the
probability of error in a single bit then the
schematic of the transition probabilities is
shown overleaf.
52(No Transcript)
53Some properties of this channel
are- P(0RX0TX)P(1RX0TX )1 P(0RX1TX)P(1RX1T
X )1 Because only 1 or 0 can be
received. P(0RX1TX)P(1RX0TX )p Because p is
the probability of error P(0RX0TX)P(1RX1TX
)1-p Because 1-p is the probability of correct
transmission
54Thus- P(0RX )P(0TX )P(0RX0TX )P(1TX
)P(0RX1TX ) P(1RX )P(1TX )P(1RX1TX )P(0TX
)P(1RX0TX ) or alternatively- P(0RX )P(0TX )
(1-p)P(1TX )p P(1RX )P(1TX ) (1-p)P(0TX )p
55Multiple Signal Transmission Example
If we have a source that produces 6 different
signals (A-F ) then we have a 6 by 6 transmission
matrix-
X
Matrix shows P(YRXX TX )
Y
56and assuming we have the following a priori
transmission probabilities-
57- Calculate the probability of error if a single D
is transmitted. - Calculate the probability of error if a random
string (obeying the properties on the previous
slide) is transmitted. - Calculate the probability of receiving a C in the
above two cases.
58- The probability of receiving a D when a D was
transmitted is 0.6667 (row 4, column 4 of the
matrix), so probability of error is 1.0 -
0.66670.3333 - For a random data stream A-F, we take the
probability of occurrence for each symbol and
multiply it with the probability of error for the
same symbol.
593) i) For a D transmitted the probability of
getting a C is read straight from the matrix
(column 4, row 3) as 0.0. 3)ii) For a random
stream of data we multiply the probability of
occurrence for each symbols times the probability
of getting a C for that symbol (row 3 in the
matrix)
60Bayess Decision Criterion
This rule minimizes the average cost of making a
decision whether it was a 1 or 0 transmitted at
the receiver. In a binary transmission system
there are only two costs associated with each
decision. C0 the cost associated with
incorrectly deciding that a 1 transmitted was a 0
at the receiver. C1 the cost associated with
incorrectly deciding that a 0 transmitted was a 1
at the receiver. In many cases C0C1, but not
always. C0 , C1 can have any units at all (,
or lives).
61Or in words both sides of the equation give us
the probability that vRX because of a 1
transmitted. Rearranging gives us Bayess rule
(Bayess theorum)-
62In words, the average cost in deciding vRX was a
0 is the cost of the mistake C0 multiplied by the
probability that vRX was caused by a 1. From
the general symmetry of the problem-
We take the decision that has the lower
conditional loss, for example we decide on a 1
for vRX if -
63Substituting the first two equations on the
previous slide into the last equation on the
previous slide yields that we should decide 1 if-
Now we use Bayess rule which states that-
and
64Substituting Bayes rule into first equation on
the previous slide yields that we should decide 1
if-
This is Bayess decision criterion.
65Maximum Likelihood Decision Criterion
If the costs are equal and the probability of
transmission of 1 and 0 are equal to 0.5, then we
have-
and
As is the case in many channels, we should decide
on 1 if-
In words should decide on a 1 if the received
voltage vRX is more likely to have been caused by
a 1 than a 0. This is maximum likelihood
decision criterion.
66Binary Transmission Example
A binary transmission process is subject to
additive Gaussian noise with a mean of 1 volt and
standard deviation of 2.5 volts. A logical 1 is
transmitted as 4 volts and a logical 0 is
generated as -4 volts, before the noise is added.
In this case C0 C1100.00 and P(1)2P(0). a)
Find the average probability of error per bit
transmitted if a 0 volt decision threshold is
used. b) Find the average cost per bit
transmitted with a 0 volt threshold.
67c) Find the optimum decision threshold voltage
for the minimum average cost. d) Find the average
probability of error for the optimum threshold
from part c) e) Find the average cost per bit
transmitted with the optimum decision threshold
as calculated in part c) above.
68p(x)
pdf of transmitted signal
2/3
1/3
4 volts
Amplitude (volts)
-4 volts
p(x)
pdf of noise
1 volt
Amplitude (volts)
69p(x)
pdf of received signal and noise
5 volts
Amplitude (volts)
-3 volts
70Binary Transmission Example -Solution
a)
71b) Average cost per bit is
72c) P(1)2/3 and P(0)1/3. The optimum decision
threshold occurs when Bayess decision rule
becomes an equality-
The distributions P(vth1TX) and P(vth1TX) are
given by-
73c) (cont)
Thus-
74c) cont.
Taking natural logs (ln) of both sides of this
equation-
75c) cont.
76d) Now we have our new threshold, we can
substitute it into the earlier equation for total
error probability-
77d) cont. Which is less than part a). This isnt
always the case, Bayess decision rule can
actually increase the probability of error if the
costs are unequal, making more cheap wrong
decisions in order to make less expensive wrong
decisions. e) Average cost per bit is now-
Which is less than part b) as should always be
the case since Bayess decision rule minimizes
the average cost.
78Neyman Pearson Criterion
This does not require any knowledge of the source
statistics. This also works well in situations
where the cost of missing detection is very much
greater than the cost of false alarm, C0 gtgtC1.
If we have a threshold vth of detection PD is-
Unfortunately the above equation doesnt help if
you dont know in advance the probability of the
target being present, in a RADAR system for
example.
79The probability of false alarm is-
80The threshold is set to give an acceptable PFA.
It is important that the noise pdf used is
accurate, but this should not be a problem as you
can measure noise at your receiver for long
periods, for example point your RADAR system at
an area of space where you know there are no
targets and measure the noise statistics.
81Lecture 6
- Information Theory and Source Coding
82Contents
- Formal definition of information
- Formal definition of entropy
- Information loss due to noise
- Channel capacity
83Terminology
- An information source consists of an alphabet of
symbols. Each symbol has a probability
associated with it. A message consists of one or
more symbols.
84- Example If were to transmit an ASCII text file,
then the alphabet could be all the characters in
the ASCII set, a symbol would be any individual
character, e.g. the letter a with its
associated probability P(a). - A message is any sequence of symbols e.g. the
letter a, a whole word aardvark or a whole
file.
85- If you are going to be successful at data
compression then you need to think carefully
about your definition of the alphabet of symbols.
Just because someone says it is an ASCII file
doesnt mean there is 256 8 binary digit
characters. It may help your cause in data
compression to think about it differently.
86- For example is there something to be gained by
redefining your alphabet as 216 65536
characters of 16 bits (there is if the 8 bit
characters are not independent as in most ASCII
files. For example think about the probability
of a u following a q in an English text file
or a carriage return following a in a C
program).
87Information
In communications information is a measurable
quantity with a precise definition. If a message
m has probability P(m) then the information
conveyed by the message is
Note that log2 is usually used to give the
information in bits. Do not confuse bits as the
information conveyed with 16 bit as in 16 bit
microprocessor, this second quantity we shall
call binary digits for the remainder of this
lecture.
88Also remember that-
When using your calculators! Note that reception
of a highly probable event contains little
information, for example if P(m)1.0 then the
information conveyed is 0 bits. Reception of a
highly improbable message gives lots of
information for example if P(m)0.0 then this
contains an infinite amount of information.
89Information (Example)
- If an alphabet of consists of two symbols A and B
with probabilities P(A) and P(B) calculate the
information associated with receiving message
consisting of A followed by B if A and B are
independent.
90The answer to this question can be calculated by
two methods, if your knowledge of the properties
of the log function is good.
Method 1 Calculate the probability of the
message and then calculate the information
associated with it. Probability of the message
is P(A)P(B) if A and B are independent therefore
the information content is-
91Method 2 Calculate the information associated
with the each symbol then add. Information
associated with receiving symbol A is
log2(P(A)) and the information associated with
symbol B is log2(P(B)) therefore the probability
associated with receiving A followed by B is
Remembering that log (x)log (y)log (xy)
92Entropy
In this case nothing to do with thermodynamics.
The entropy of a source is a measure of its
randomness, the more random a source the higher
its entropy. The entropy of a source is also the
upper bound for arithmetic source coding, it
gives the minimum number of average binary digits
per symbol. Arithmetic source coding is any
technique which assumes there is no memory in the
system. An example of arithmetic source coding is
Huffman coding, see next lecture.
93Entropy is defined as-
For example if a source can transmit one of three
symbols A, B, C with associated probabilities
P(A)0.60, P(B)P(C)0.20 then the entropy of
this source is-
94This means that the best possible arithmetic
coding scheme would represent this message with
an average of 1.371 binary digits per symbol
transmitted. The highest entropy occurs when the
symbols have equal probabilities and in this case
the best thing to do is to allocate each one an
equal length code.
95Example 1 If we have four symbols, A, B, C and
D, each with probability 0.25 then the entropy of
this source is-
Therefore the coding scheme-
96is 100 efficient since the average number of
binary digits per symbol is equal to the
entropy. Next lecture we shall look at arithmetic
coding schemes which can approach 100 efficiency
when the probabilities are not equal.
97Example 2 For a two symbol (binary) source where
p is the probability of transmitting a 1 we have-
98Information Loss Due to Noise
If a symbol A has probability P(A) then we have
already seen the information transmitted is-
If the channel is completely noiseless, then this
is also the information received. If the channel
is not noiseless, then there is a loss in
information as the result of incorrect decisions.
99The information received when there is noise in
the system is redefined as-
This can be written mathematically as-
The top line of this equation is always less than
1.000 in a system with noise. The effective or
received entropy is therefore less than the
transmitted entropy-
100The difference between the transmitted entropy
and the received (effective) entropy Heff is
called the equivocation (E) and is the loss in
information rate caused by noise (in bits/symbol)
101Example A source has an alphabet of three
symbols, P(A)0.5, P(B)0.2 and P(C)0.3. For an
A transmitted the probability of reception for
each symbol is A0.6, B0.2, C0.2. For a B
transmitted, the probability of reception for
each symbol is A0.5, B0.5, C0.0. For a C
transmitted, the probability of reception for
each symbol is A0.0, B0.333, C0.667.
102Calculate the information associated with the
transmission and reception of each symbol and
calculate the equivocation.. On transmission, for
P(A)0.5000, P(B)0.2000, P(C)0.3000. Therefore
the information associated with each symbol at
the transmitter is-
103On reception we need to calculate the probability
of A received-
104Therefore we can calculate P(ATXARX) which is
simply 0.3/0.40.75.
Notice the reduction from 1.0000 bits on
transmission to 0.585 bits on reception.
105If we go through a similar process for the symbol
B
A reduction from 2.3219 bits on transmission to
0.737 bits on reception.
106If we go through a similar process for the symbol
C
A reduction from 1.7370 bits on transmission to
1.153 bits on reception.
107Note how the symbol with the lowest probability
(B) has suffered the greatest loss due to noise.
Noise creates uncertainty and this has the
greatest effect on those signals which are very
low probability. Think about this in the context
of alarm signals. To find the equivocation we
need all the probabilities-
Using Bayes rule-
108(No Transcript)
109Therefore-
110Channel Capacity
The Hartley-Shannon coding theorem states that
the maximum capacity of a channel (RMAX) is given
by
where B is the bandwidth of the channel in Hz and
S/N is the power signal to noise ratio (as a
power ratio, NOT in dB)
111If we divide by the bandwidth we obtain the
Shannon limit
Average signal power S can be expressed as-
Eb is the energy per bit. k is the number of
bits per symbol. T is the duration of a symbol.
Ck/T is the transmission rate of the system in
bits/s.
112NN0B is the total noise power. N0 is the one
sided noise power spectral density in W/Hz
From this we can calculate the minimum bit energy
to noise power spectral density, called the
Shannon Bound-
113This is for a continuous input, infinite block
size and coding rate 0.00 (not very practical!).
If we use a rate ½ code the capacity for a
continuous input and infinite block size is 0.00
dB (still not practical for digital
communications). If we use a rate ½ code and a
binary input and infinite block size, the
capacity limit is 0.19 dB. It is difficult (but
not impossible) to get very close to this limit,
for example Turbo codes in lecture 10.
114Lecture 7
115Lecture 7 contents
- Definition of coding efficiency
- Desirable properties of a source code
- Huffman coding
- Lempel-Ziv coding
- Other source coding methods
116Definition of coding efficiency
For an arithmetic code (i.e. one based only on
the probabilities of the symbols, for example
Huffman coding), the efficiency of the code is
defined as-
117Where H is the entropy of the source as defined
in lecture 6 and L is the average length of a
codeword.
Where P(m) is the probability of the symbol m and
lm is the length of the codeword assigned to the
symbol m in binary digits.
118Variable Length Arithmetic Codes
The principle behind arithmetic coding schemes
(Huffman coding) is to assign different length
codewords to symbols. By assigning shorter
codewords to the more frequent (probable)
symbols, we hope to reduce the average length of
a codeword, L. There is one essential property
and one desirable property of variable length
codewords.
119It is essential that a variable length code is
uniquely decodable. This means that a received
message must have a single possible meaning. For
example if we have four possible symbols in our
alphabet and we assign them the following codes.
A0, B 01, C11 and D00. If we receive the
message 0011 then it is not known whether the
message was D, C or A, A, C. This code is not
uniquely decodable and therefore useless.
120It is also desirable that a code is
instantaneously decodable. For example if we
again have four symbols in our alphabet and we
assign them the following codes. A0, B01,
C011, D111. This code is uniquely decodable
but not instantaneously decodable. If we receive
a sequence 0010 (which represents A, B, A) we do
not know that the first digit (0) represents A
rather than B or C until we receive the second
digit (0).
121Similarly we do not know if the second (0) and
the third received digit (1) represents B rather
than C until we receive the fourth digit (0)
etc. This code is usable, but the decoding is
unnecessarily complicated. If we reverse the
order of the bits in our previous code, so that
our new code is-
122A0, B10, C110, D 111. Then we have uniquely
and instantaneously decodable code. This is
called a comma code, because receipt of a 0
indicates the end of a codeword (except for the
maximum length case). The same message as used
in the previous example would be 0100 and this
can be instantaneously decoded as ABA using the
diagrams on the next slide.
123(No Transcript)
124Simple Coding and Efficiency
For comparison, we shall use the same example for
simple and Huffman coding. This example consists
of eight symbols (A-H) with the probabilities of
occurrence given in the table.
125Simple Coding
For the simple case (often referred to as the
uncoded case) we would assign each of the symbols
a three bit code, as shown opposite.
126Simple Coding (efficiency)
The entropy of this source is-
The average length of a codeword is 3 binary
digits, so the efficiency is-
127Huffman coding
This is a variable length coding method. The
method is- 1) Reduction. Write the symbols in
descending order of probability. Reduce the two
least probable symbols into one symbol which has
the probability of the two symbols added
together. Reorder again in descending order of
probability. Repeat until all symbols are
combined into 1 symbol of probability 1.00
128(No Transcript)
1292) Splitting process. Working backwards (from
the right) through the tree you have created,
assign a 0 to the top branch of each combining
operation and a 1 to the bottom branch. Add each
new digit to the right of the previous one.
130(No Transcript)
131The average length of a codeword for this code
is-
132The efficiency of this code is therefore
Note that this efficiency is higher than simple
coding. If the probabilities are all
Where n is an integer then Huffman coding is 100
efficient
133Lempel-Ziv Coding
This family of coding methods is fundamentally
different from the arithmetic technique described
so far (Huffman coding). It uses the fact that
small strings within messages repeat locally.
One such Lempel - Ziv method has a history buffer
and an incoming buffer and looks for matches. If
a match is found the position and length of the
match is transmitted instead of the character.
134If we have the string tam_eht
_no_tas_tac_eht to be transmitted history
buffer (already sent) and we are using a history
buffer of length 16 characters and maximum match
of 8 characters. We wish to encode tam_eht.
135We would transmit _eht as a match to _eht at
the right end of our history buffer using 8 bits,
1 bit to indicate a match has been found, 4 to
give the position in the history buffer of the
match and 3 for the length of the match (note
that we only need 3 bits for a match length of 8
as a match of length zero isnt a match at all!).
So encoding _eht becomes-
136to indicate a match has been found to indicate
the match starts at position 11 in the history
buffer to indicate the length of the match is 4
(000 indicates 1, 001 2, 010 3, 011 4 etc.)
1
1011
011
Thus _eht encodes as 11011011. We then have-
137 tam _eht_no_tas_tac_ to
be transmitted history buffer (already
sent) There is no match for the m. We would
then transmit m as 9 bits, 1 bit to indicate we
couldnt find a match and 8 bits for the ASCII
code for m. So encoding m becomes 0 to
indicate no match 01011101 Eight bit ASCII code
for m. Thus m encodes as 001011101.
138We then have ta
m_eht_no_tas_tac_ to be transmitted history
buffer (already sent) We would then transmit ta
as a match to the ta at history buffer position
9 or 13 using the same sort of 8 bit pattern as
we used for _eht. So encoding ta becomes-
139to indicate a match to indicate the match starts
at position 9 to indicate the length of the match
is 2 characters
1
1001
001
Thus ta encodes as 11001001. Overall this
means we have encoded 7 8 bit ASCII characters
into 25 bits, a compression factor of 25/56.
This is fairly typical for practical
implementations of this type of coding.
140Normally the history buffer is much longer, at
least 256 characters, and the stream to be
encoded doesnt rhyme! This type of technique has
many advantages over arithmetic source coding
techniques such as Huffman. It is more efficient
(often achieving over 100 by our previous
memoryless definition) and it can adapt to
changes in data statistics.
141However, errors can be disastrous, the coding
time is longer as matches have to be searched for
and it doesnt work for short messages. This
technique is used for compression of computer
files for transport (ZIP) and saving space on
hard disk drives (compress, stacker etc.) and in
some data modem standards.
142The particular implementation of Lempel Ziv we
have looked at is not very fast because of the
searching of history buffers. The implementation
Lempel Ziv Welch (LZW) builds up tables of
commonly occurring strings and then allocates
them a 12 bit code. Compression ratio is similar
to the method we have looked at, but it is more
suited to long messages and it is faster (but LZW
is useless for exam questions!)
143Run Length Encoding
Images naturally contain a lot of data. A FAX
page has 1728 pixels per line, 3.85 lines/mm and
a page length of 300mm, which gives almost 2
million pixels per page. This would take around
7 minutes to transmit using the typical 4.8
kbit/s modem built into FAX machines. This is
reduced using run length encoding where pixels
tend to be black or white in runs.
144(No Transcript)
145A standard Huffman code has been devised for the
run lengths based on the above probabilities and
is used in so called Group 3 FAX transmission. An
example is-
146(No Transcript)
147There is not much of a saving in this case (34
bits reduced to 26), but remember a real page of
text contains large white spaces. If you insert
a document in a FAX machine, it will quickly pass
through the white areas of a page but slow down
on dense text or images. An average page has a
compression factor of around 7 and therefore
takes around a minute to pass through the FAX
machine.
148Source Coding for Speech and Audio
In speech the time series of samples is broken up
into short blocks or frames before encoding. We
then transmit the co-efficients of a model. For
example in linear prediction coding (LPC) the
encoder breaks the incoming sound into 30-50
frames per second. For each frame it sends the
frequency (pitch) and intensity (volume) of a
simple buzzer. It also sends the coefficients of
an FIR filter that when used with the buzzer give
a minimum error between the input signal and the
filtered buzzer.
149The co - efficients of the filters are estimated
using an adaptive filter which compares its
output with the voice input samples. LPC can
bring the data rate down to 2400 bits/s without
sounding too bad and more modern techniques such
as harmonic excited linear prediction (HELP) can
be intelligible at 960 bits/s. At these coding
rate people tend to sound a bit artificial,
typical mobile phone coding rates for voice range
from 9.6 kbit/s to around 13 kbit/s.
LPC using 2400bits/s
Original signal
HELP using 960 bits/s
150Other Techniques for Compression
Other techniques transform the input filter
blocks into the frequency domain and use a
perceptional/precision adaptive subband coding
(PASC) model to eliminate frequency bins which
the ear cannot hear due to masking. This
technique was used in digital compact cassette
(DCC) and has high perceived quality (comparable
with CD) while compressing audio by a factor of
4.
151PASC
Spectral mask
Power (dB)
Threshold of human hearing
Masked because of signal close in frequency
Frequency kHz (using log scale)
152Other Techniques for Compression
LPC, HELP and PASC are lossy, there is a loss in
information in the encoding process, even when
there is no noise in the transmission channel.
This is very noticeable in HELP with extremely
low data rate transmission. However, the loss in
DCC is imperceptible? compared with a CD.
However DCC is being scrapped, the people who you
rely on most to pay for new audio technology
wouldnt pay for a compressed system and now DCC
has been superceded by CD - R and CR - RW.
153Lecture 8
- Block codes for error rate control
154Contents
- Introduction
- Error rate control
- Forward error correction coding (channel coding)
- Block codes
- Probability of word errors
- Group codes
155Introduction
- In this lecture and the next two we look at
channel coding. The object of channel coding is
to reduce the number of errors induced into our
system by the channel. The most commonly used
measure for the quality of our transmission is
the bit error ratio (abbreviated to Pb)-
156Bit Error Ratio Pb
For digital communications systems, different Pb
will produce acceptable results. In speech
transmission, a bit error ratio of 10-2 to 10-3
is often considered acceptable, for example in
digital mobile phones. This means 1 in 100 to 1
in 1000 bits may be wrong. In data transmission
from computer to computer or digitized music such
as CD, Pb needs to be much lower lt10-8. To
achieve such low error rates, forward error
correcting codes (FECC) are used.
157Error Rate Control
If the Pb of the channel you wish to use is
unacceptable because of noise/interference then
you can-
- Increase the transmitter power. Increasing the
signal to noise ratio will reduce the Pb.
Unfortunately, with an interference limited
system, this will lead to a power storm (where
every transmitter in the system increases its
transmission power and eventually one or more
transmitters reaches the maximum power its
hardware is capable of generating) and be
completely self defeating.
158- Diversity. By adding together two signals that
have gone through two independent channels, the
signal to noise ratio is doubled (3 dB
improvement) in Gaussian noise and even greater
improvement is achieved in a fading channel. The
Pb reduces accordingly. This can be achieved
through space (antenna) diversity, frequency
diversity or time diversity.
159In space diversity, the signal is received by two
antenna far enough apart to be considered
independent. In frequency diversity, the signal
is transmitted on two different frequencies,
again separated in frequency by enough to be
considered independent. In time diversity, the
signal is transmitted twice, spaced far enough
apart in time to be considered independent. The
two received signals must be fully independent
for diversity to work effectively.
160- Duplex transmission. In duplex transmission, the
receiver can echo the message back to the
transmitter and the transmitter can verify that
what it receives was what it sent. This requires
twice the bandwidth of a one way (simplex)
system. This requires transmitter and receiver
(transceivers) at both ends of the system. In
military systems the return signal would betray
the position of the receiver, which may be
unacceptable. There will also be a considerable
delay in fixing erroneous blocks.
161- Automatic request repeat (ARQ). In this
technique a few parity bits are added to the
transmitted signal to enable the detection of
errors. When the receiver detects an error, the
receiver asks the transmitter to re - send the
block with the error in it. This has similar
drawbacks to duplex transmission although it does
not require as much bandwidth.
162- Forward error correcting coding (FECC) This is
the technique that we will talk about over the
next three lectures. In this technique we will
code the signal in such a way that the decoder
will be able to fix most of the errors introduced
by a noisy channel. The extra bits required for
the transmission of the redundant information
will consume more bandwidth if we wish to
maintain the same throughput, but if we wish to
obtain low error rates then this trade off is
often acceptable. FECC is used in CDs, computer
storage, all manor of radio links, modems, space
communications etc.
163FECC is most useful when the desired error rate
is very low. Eb/N0 is the energy per bit to
noise power spectral density ratio as defined
earlier.
164Block Codes
Block codes group incoming data into groups of k
binary digits and add coding (parity) bits to
make the coded block length n bits, where ngtk.
The coding rate R is simply the ratio of data
bits to the overall block length, k/n. The
number of parity check (redundant) bits is
therefore n-k. The code is often written as a
(n, k) code.
165(No Transcript)
166Parity check bit
In some computer communications, system
information is sent as 7 bit ASCII codes with a
parity check bit added on the end. The block
length n8 and the number of information bits k7
and the coding rate is therefore 7/8. For even
parity, the parity bit is set so that the number
of zeros and ones in the codeword is even, for
odd parity the number of zeros and ones should be
odd.
1670110101 would be coded as 01101010 for even
parity and 01101011 for odd parity. If there is
an odd number of errors in the transmission, then
the receiver can tell that an error has occurred,
but it cant correct the error because it doesnt
know the position of the error
168We assume that the probability of error for a
single bit, Pe , is small, so that the
probability of 1 error is much greater than the
probability of 3 errors in a block. We therefore
call parity check a single error detecting code.
169We can calculate the error detecting and
correcting power of a code from the minimum
distance in bits between error free blocks
(codewords). Example using even parity. 0000000
codes to 00000000 but 0000001 codes to 00000011.
This case (and all other cases) has a binary
digit difference (Hamming Distance) of 2. The
minimum distance in binary digits between any two
codewords is known as the minimum Hamming
Distance, Dmin which is 2 for the case of odd or
even parity check.
170The number of errors a code can detect is-
and the number of errors a code can correct is-
Note that a code cannot operate in detection and
correction mode simultaneously and still have the
power as defined in the above equations.
171In certain types of code the codeword weight (the
number of 1s in the codeword) can be used to
determine the minimum Hamming distance, see
section on group codes later. Although we shall
look exclusively at coding schemes for binary
systems, error correcting and detecting codes is
not confined to binary digits. For example the
ISBN numbers used on books have a checksum
appended to them and are modulo 11 arithmetic
(they use X for 10 in the checksum C). This code
is designed to spot digits in the wrong order.
172Example A, Digital Communications by Glover and
Grant (first edition) is...
173Example B, Mobile Radio Propagation Channel by
J. D. Parsons.
174Suppose we want to code k4 information bits into
a n7 bit codeword, giving a coding rate of 4/7.
We can achieve this (7,4) block code using 3
input exclusive or (EX - OR) gates to form three
even parity check bits, P1, P2 and P3.
175This circuitry can be written in the form of
parity check equations or a parity check matrix
H-
Identity matrix
Parity check equations
176Remember that indicates EX - OR. Later we
will see how this matrix can be used to generate
codewords. This is an example of a systematic
code, where the data is included in the
codeword. A non - systematic code does not
explicitly include the data in the transmission,
although what is transmitted must be derived from
the data.
177Probability of more than R errors in n binary
digits
If we have an error correcting code which can
correct R errors, than the probability of a
codeword not being correctable is the probability
of having more than R errors in n digits. The
probability of having more than R errors is
given by-
178Where the probability of j errors is-
Pe is the probability of error in a single binary
digit and n is the block length. nCj is the
number of ways of choosing j positions from a
block of length n binary digits. It is given by-
where ! denotes the factorial operation.
179Example If we have an error correcting code
which can correct 3 errors and the block length n
is 10, what is the probability that the code
cannot correct a received block if
Pe0.01? Solution The code cannot correct the
received block if there are more than 3 errors.
From the previous slide, the probability of gt 3
errors is-
180Thus the probability that the code cannot correct
a received block is-
181Note that this is much less than the original
probability of error in a single bit, Pe
0.01. Note also the need for high precision
arithmetic (It may be my eight digit calculator
was not good enough to put the answer to more
than 1 significant figure)
182Group Codes
Group codes are a special kind of block codes.
They contain the all zeros codeword and have a
special property called closure. This property
means that if any two valid codewords are bit
wise EX - ORed they produce another valid
codeword. An example of a (5,2) group code is
given on the next slide.
183(No Transcript)
184The closure property means that to find the
minimum Hamming distance, all that is required is
to compare all the other codewords with the all
zeros codeword (3 comparisons in our example)
instead of comparing all the possible pairs of
codewords (321 comparisons in our example).
The saving gets bigger the longer the codeword
(think about a code with 100 codewords and work
out the number of comparisons for Group codes,
100, compared with 100999821 for a
non-group code).
185For Group codes therefore the minimum Hamming
distance Dmin is equal to the minimum codeword
weight (minimum number of 1s in the codewords).
In our example the minimum codeword weight is 3
and this code could be used to correct-
or detect
It cannot do both! Reed-Solomon and
Bose-Chaudhuri-Hocquenghem (BCH) codes are group
codes. Reed-Solomon codes are used extensively in
CDs and memories.
186Lecture 9
- Group and cyclic codes
- In this lecture all vectors are column vectors by
definition
187Group and Cyclic Codes
- Nearest Neighbor decoding
- Hamming Bound
- Syndrome decoding
- Cyclic codes
Contents
188Nearest Neighbor Decoding
Nearest neighbour decoding assumes that the
codeword nearest in Hamming distance to the
received word is what was transmitted. This
inherently contains the assumption that the
probability of t errors is greater than the
probability of t1 errors, or that Pe is small. A
nearest neighbor decoding table for the (5,2)
group code is shown on the next slide.
189Codewords
Single bit errors (correctable)
Double bit errors (detectable but not correctable)
190All the single error patterns can be assigned
uniquely to an error free codeword, therefore
this code is capable of correcting 1 error. There
are also eight double error patterns which are
equal in Hamming distance from at least 2
codewords. These errors can be detected but not
corrected.
191Note that nearest neighbor decoding can also be
done on a soft decision basis, with real
non-binary numbers from the receiver. The
nearest Euclidean distance (nearest in terms of
5D geometry) is then used and this gives a
considerable performance increase over the hard
decision decoding as in the previous slide.
192Upper bound on the Performance of Block Codes
The upper bound on the performance of block codes
is given by the Hamming Bound, some times called
the sphere packing bound. If we are trying to
create a code to correct t errors with a block
length of n and k information digits, then if-
such a code is possible.
193If this equation is not true, then we must be
less ambitious by reducing t or k (for the same
block length n) or increasing n (while
maintaining t and k). Example Comment on the
possibility of a (5,2) code which corrects 1
error and the possibility of a (5,2) code which
corrects 2 errors. Solution k2, n5 and t1,
leads to
194which is true so such a code is possible. If we
try to design a (5,2) code which corrects 2
errors we have k2, n5 and t2
which is false and such a code cannot be created.
This is exactly the situation in the example we
used for nearest neighbour decoding, we could
correct all single error patterns but we could
not correct all the double error patterns.
195Syndrome decoding
For a large block length, nearest neighbour
decoding tables get prohibitively large and it
takes too long to look up the received codeword.
A much easier method to correct incoming
codewords is syndrome decoding. We can generate
codewords using the generator matrix G which is
derived from the parity check matrix H as shown
on the next slide for the (7,4) code example.
196parity check bits
identity matrix
Transpose of parity check bits
identity matrix
197The part to the right of the line in G is the
transpose of the part of H to the left of the
line. The other parts are appropriately sized
identity matrices. Note that when adding in the
following matrix operations, add modulo 2 so
110! A codeword c is formed multiplying the
data vector dT by G
198cT
dT
G
Since H is a set of even parity check equations-
For our example-
1990T
H
c
The received vector is the codeword plus an error
vector-
200For our example, if there is an error in the 4th
bit-
r
c
e
We can generate the syndrome vector s by
multiplying H by r-
201If there are no errors, e0T and HrHc0T. If
there is an error then its position is given by
the syndromes via a syndrome table, constructed
by considering He-
Error pattern eT
Syndrome sT He
202Using the syndrome table for our example-
And looking the syndrome up in the syndrome table
tells us the error is in the fourth bit.
203We then simply change the fourth bit in our
received vector r which gives us back c.
We can then discard the parity bits to get our
original data back-
204Cyclic Codes
Cyclic codes are a subclass of group codes, in
that they satisfy the closure property. A
codeword is generated using a generator
polynomial P(x). All codewords are cyclic shifts
of each other. Firstly the message is padded
with the number of zeros equal to the highest
power of P(x). For example if-
205 and the message M(x)1001 then 3 zeros are
appended to M(x) to give 1001000. We then
divide 1001000 by the generator polynomial
P(x)1101. Note that as usual in coding we use
bitwise EX - OR arithmetic, which has no carries
and 1 - 10, 1 - 01, 0 - 11 and 0 - 00.
206(No Transcript)
207We then replace the padded zeros with the
remainder, so in our case the codeword to be
transmitted is 1001011. The received codeword is
then divided by the generator polynomial. An
error free codeword should give a remainder of 0.
208(No Transcript)
209A codeword with an error in it will give a
syndrome remainder which can be looked up in a
syndrome table. The syndrome table can be
created by considering errors in the all zeros
codeword-
210If we take our earlier codeword, 1001011, and
create an error in the fourth bit, we get
1000011. This should give the same syndrome as
0001000-
211Knowing that the error is in the fourth bit, we
simply change the fourth bit in our received
codeword 1000011 to get back to 1001011. We then
discard the last three digits to recover our data
as 1001.
212If we have the polynomial P(x)x3x21 this can
be represented as 1101. If we have messages
a)1100, b)0101 and c) 1011, where the italic bit
is the first bit to enter the encoder, then the
division process for the (7,4) code would be-
213a)
b)
c)
1100000
1101
0101000
1101
1011000
1101
1101
1101
1101
1000
11100
110000
1101
1101
1101
101
110
100
Circuit is-
214logic 0
1
x3
x2
switch after message clocked in
F
B
E
C
D
message input A
Codeword output
parity bits
Parity bits are ECB110
Parity bits are ECB100
Parity bits are ECB101
215We get the same codeword in the same order by
replacing the padding zeros in the division
process with the remainder. We also get the same
bits in the same order from the circuit.
216If we change the generator polynomial to
P(x)x3x1, this can be represented as 1011. If
we use the same message examples, we get-
a)
b)
c)
1100000
1011
0101000
1011
1011000
1011
1011
1011
1011
111000
100
000
1011
10100
1011
010
217Parity bits are EDB010
Parity bits are EDB100
Parity bits are EDB000
218The decoding for the circuit on the previous page
can be done using another simple circuit-
219Lecture 10
- Convolutional Codes, Interleaving and Turbo Codes
220Contents
- Convolutional encoding
- Viterbi decoding
- Practical codes
- Interleaving to prevent burst errors
- Performance comparisons
- Turbo codes
221Convolutional encoding
Convolutional codes are another type of FECC.
They are simpler to implement for longer codes
and soft decision decoding can be employed easily
at the decoder. Convolutional codes are
generated by passing a data sequence through a
transversal (FIR) filter. The coder output may
be regarded as the convolution of the input
sequence with the impulse response of the coder,
hence the name convolutional codes. A simple
example is shown on the next slide.
222The shift register is initially assumed to
contain all 0s.
output
output
01
Shift register contents 100 110 011 101
223This encoder is rate ½ as it produces two outputs
for every input bit. The first output is
obtained with the switch in the upper position,
the second with the switch in the lower position.
This encoder has 3 stages in the filter and
therefore we say that the constraint length n3.
The very latest encoders available commercially
have constraint length n9. We can consider the
outputs as being generated by two polynomials,
P1(x)1x2 and P2 (x)1x. These are often
expressed in octal notation, in our example P15o
and P26o.
224This encoder may be regarded as a state machine.
The next state is determined by the input and the
previous two input values stored in the shift
register. We can regard this as a Mealy state
machine with four states corresponding to all the
possible combinations of the first two stages in
the shift register. A tree diagram for this
state machine is shown on the next slide.
22500 H
Outputs (1 2)
00
00 D
11 I
00
10
00 B
01 J
00
01
11 E
10
10 K
A
11
10 h
00
00
01 F
01 i
01
10
11 C
11 j
10
State (first two bits in shift register)
01
10 G
11
10 k
11
226The tree diagram on the previous slide tends to
suggest that there are eight states in the last
layer of the tree and that this will continue to
grow. However some states in the last layer are
equivalent as indicated by the same letter on the
tree (for example H and h). These pairs of
states may be assumed to be equivalent because
they have the same internal state for the first
two stages of the shift register and therefore
will behave exactly the same way to a new input.
Thus the tree can be folded into a trellis as
shown on the next slide. Note the arrangement of
states A, B, D, H in a horizontal line. The same
thing applies to states C, E, I and M etc.
227Outputs (above arrows)
Contents of shift register
Upper path, input 0 Lower path, input 1
A
B
00
D
H
L
00
00
00
a (00x)
11
11
11
11
M
C
E
10
I
10
b (10x)
01
01
01
01
01
J
N
F
10
c (01x)
10
10
11
11
G
K
O
00
00
d (11x)
228The horizontal direction corresponds to time (the
whole diagram on the previous slide is for 4 data
bits). The vertical direction corresponds to
state. It can be shown that states along the
time access are also equivalent, for example H is
equivalent to L and C is equivalent to E. In
fact all the states in a horizontal line are
equivalent. Thus we can identify only four
states, a, b, c and d. We can draw a state
diagram containing only these states-
229Input (output 1 output 2)
230Viterbi de