Title: Question from exercises 2
1Question (from exercises 2)
- Are the following sources likely to be stationary
and ergodic? - (i) Binary source, typical sequence
aaaabaabbabbbababbbabbbbbabbbbaabbbbbba...... - (ii) Quaternary source (4 symbols), typical
sequences abbabbabbababbbaabbabbbab.......
and cdccdcdccddcccdccdccddcccdc...... - (iii) Ternary source (3 symbols), typical
sequence AABACACBACCBACBABAABCACBAA - (iv) Quaternary source, typical sequence
124124134124134124124
2Definitions
- A source is stationary if its symbol
probabilities do not change with time, e.g. - Binary source Pr(0) Pr(1) 0.5
- Probabilities assumed same all the time
- A source is ergodic if it is stationary and
- No proper subset of it is stationary
- i.e. source does not get locked in subset of
symbols or states - It is not periodic
- i.e. the states do not occur in a regular pattern
- E.g. output s1 s2 s3 s1 s4 s3 s1 s4 s5 s1 s2 s5
s1 s4 s3 - is periodic because s1 occurs every 3 symbols
3Review
- remove redundancy to maximise information
transfer - use redundancy to correct transmission errors
4Shannon Source Coding Theorem
N identical independently distributed random
variables each with entropy H(x)
Number of bits
virtually certain that no information will be lost
compression
N H(x) bits
virtually certain that information will be lost
0
5Optimal Coding
Noise-free communication channel
- Requirements for a code
- efficiency
- uniquely decodable
- immunity to noise
- instantaneous
6Definitions
Coding
conversion of source symbols into a different
alphabet for transmission over a channel.Input
to encoder source alphabet encoder output
alphabet Coding necessary if n lt m
Code word
group of output symbols corresponding to an input
symbol (or group of input symbols)
Code
set (table) of all input symbols (or input words)
and the corresponding code words
Word Length
number of output symbols in a code word
Average Word Length (AWL)
where Ni length of word for symbol ai
Optimal Code
has minimum average word length for a given
source
Efficiency
where H is the entropy per symbol of the
source
7Binary encoding
- A (binary) symbol code f is a mapping
- or (abusing notation)
- where 0,1 0, 1, 00, 01, 10, 11, 000, 001,
- if f has an inverse then it is uniquely decodable
- compression is achieved (on average) by assigning
- shorter encodings to the more probable symbols in
A - longer encodings to the less probable symbols
- easy to decode if we can identify the end of a
codeword as soon as it arrives (instantaneous) - no codeword can be a prefix of another codeword
- e.g 1 and 10 are prefixes of 101
8Prefix codes
- no codeword is a prefix of any other codeword.
- also known as an instantaneous or
self-punctuating code, - an encoded string can be decoded from left to
right without looking ahead to subsequent
codewords - prefix code is uniquely decodeable (but not all
uniquely decodable codes are prefix codes) - can be written as a tree, leaves codewords
9Limits on prefix codes
- the maximum number of codewords of length l is 2l
- if we shorten one codeword, we must lengthen
others to retain unique decodability - For any uniquely decodable binary coding, the
codeword lengths li satisfy
(Kraft inequality)
10Coding example
- all source digits equally probable
- source entropy log210 3.32 bits/sym
11Prefix codes (reminder)
- variable length
- uniquely decodable
- instantaneous
- can be represented as a tree
- no code word is a prefix of another
- e.g. if ABAACA is a code word then A, AB, ABA,
ABAA, ABAAC cannot be used as code words - Kraft inequality
12Optimal prefix codes
- if Pr(a1) ? Pr(a2) ? ? Pr(am),then l1 ?
l2 ? ? lm where li length of word
for symbol ai - at least 2 (up to n) least probable input symbols
will have the same prefix and only differ in the
last output symbol - every possible sequence up to lm-1 output
symbols must be a code word or have one of its
prefixes used as a code word (lm is the longest
word length) - for a binary code, the optimal word length for a
symbol is equal to the information content i.e. - li log2(1/pi)
13Converse
- conversely, any set of word lengths li
implicitly defines a set of symbol probabilities
qi for which the word lengths li are optimal
14Compression - How close can we get to the entropy?
- We can always find a binary prefix code with
average word length L satisfying
15Huffman prefix code
- used for image compression
- General approach
- Work out necessary conditions for a code to be
optimal - Use these to construct code
- from condition (3) of prefix code (earlier slide)
- am ? x x x 0 (least probable)
- am-1 ? x x x 1 (next probable)
- therefore assign final digit first
- e.g. consider the source on the right
16Algorithm
- Lay out all symbols in a line, one node per
symbol - Merge the two least probable symbols into a
single node - Add their probabilities and assign this to the
merged node - Repeat until only one node remains
- Assign binary code from last node, assigning 0
for the lower probability link at each step
17Example
18Example - contd.
s1 Pr(s1)0.1
s2 Pr(s2)0.25
s3 Pr(s3)0.2
s4 Pr(s4)0.45
0.3
0.55
19Example - step 5
s1 Pr(s1)0.1
s2 Pr(s2)0.25
s3 Pr(s3)0.2
s4 Pr(s4)0.45
0.3
0.55
1
20Algorithm
- Lay out all symbols in a line, one node per
symbol - Merge the two least probable symbols into a
single node - Add their probabilities and assign this to the
merged node - Repeat until only one node remains
- Assign binary code from last node, assigning 0
for the lower probability link at each step
21Comments
- we can choose different ordering of 0 or 1 at
each node - 2m different codes (m number of merging nodes,
i.e., not symbol nodes) - 23 8 in previous example
- But, AWL is the same for all codes
- hence source entropy and efficiency are the same
- What if n (number of symbols in code alphabet) is
larger than 2? - Condition (2) says we can group from 2 to n
symbols - Condition (3) effectively says we should use
groups as large as possible and end with one
composite symbol at end
22Disadvantages of Huffman Code
- we have assumed that probabilities of our source
symbols are known and fixed - symbol frequencies may vary with context (e.g.
markov source) - up to 1 extra bit per symbol is needed
- could be serious if H(A) 1bit !
- e.g. English entropy is approx 1 bit per
character - beyond symbol codes - arithmetic coding
- move away from the idea that one symbol ? integer
number of bits - e.g. Lempel-Ziv coding
- not covered in this course
23Another question
- consider a message (sequence of characters) from
a, b, c, d encoded using the code shown - what is the probability that a randomly chosen
bit from the encoded message is 1?
24Shannon-Fano theorem
- Channel capacity
- Entropy (bits/sec) of encoder determined by
entropy of source (bits/sym) - If we increase the rate at which source generates
information (bits/sym) eventually we will reach
the limit of the encoder (bits/sec). At this
point the encoders entropy will have reached a
limit - This is the channel capacity
- S-F theorem
- Source has entropy H bits/symbol
- Channel has capacity C bits/sec
- Possible to encode the source so that its symbols
can be transmitted at up to C/H symbols per
second, but no faster - (general proof in notes)
25satisfies kraft
average word length