Question from exercises 2 - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Question from exercises 2

Description:

A source is stationary if its symbol probabilities do not change with time, e.g. ... an encoded string can be decoded from left to right without looking ahead to ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 26
Provided by: trevor97
Category:

less

Transcript and Presenter's Notes

Title: Question from exercises 2


1
Question (from exercises 2)
  • Are the following sources likely to be stationary
    and ergodic?
  • (i) Binary source, typical sequence
    aaaabaabbabbbababbbabbbbbabbbbaabbbbbba......
  • (ii) Quaternary source (4 symbols), typical
    sequences abbabbabbababbbaabbabbbab.......
    and cdccdcdccddcccdccdccddcccdc......
  • (iii) Ternary source (3 symbols), typical
    sequence AABACACBACCBACBABAABCACBAA
  • (iv) Quaternary source, typical sequence
    124124134124134124124

2
Definitions
  • A source is stationary if its symbol
    probabilities do not change with time, e.g.
  • Binary source Pr(0) Pr(1) 0.5
  • Probabilities assumed same all the time
  • A source is ergodic if it is stationary and
  • No proper subset of it is stationary
  • i.e. source does not get locked in subset of
    symbols or states
  • It is not periodic
  • i.e. the states do not occur in a regular pattern
  • E.g. output s1 s2 s3 s1 s4 s3 s1 s4 s5 s1 s2 s5
    s1 s4 s3
  • is periodic because s1 occurs every 3 symbols

3
Review
  • remove redundancy to maximise information
    transfer
  • use redundancy to correct transmission errors

4
Shannon Source Coding Theorem
N identical independently distributed random
variables each with entropy H(x)
Number of bits
virtually certain that no information will be lost
compression
N H(x) bits
virtually certain that information will be lost
0
5
Optimal Coding
 
Noise-free communication channel
  • Requirements for a code
  • efficiency
  • uniquely decodable
  • immunity to noise
  • instantaneous

6
Definitions
Coding
conversion of source symbols into a different
alphabet for transmission over a channel.Input
to encoder source alphabet encoder output
alphabet Coding necessary if n lt m
Code word
group of output symbols corresponding to an input
symbol (or group of input symbols)
Code
set (table) of all input symbols (or input words)
and the corresponding code words
Word Length
number of output symbols in a code word
Average Word Length (AWL)
where Ni length of word for symbol ai
Optimal Code
has minimum average word length for a given
source
Efficiency
where H is the entropy per symbol of the
source
7
Binary encoding
  • A (binary) symbol code f is a mapping
  • or (abusing notation)
  • where 0,1 0, 1, 00, 01, 10, 11, 000, 001,
  • if f has an inverse then it is uniquely decodable
  • compression is achieved (on average) by assigning
  • shorter encodings to the more probable symbols in
    A
  • longer encodings to the less probable symbols
  • easy to decode if we can identify the end of a
    codeword as soon as it arrives (instantaneous)
  • no codeword can be a prefix of another codeword
  • e.g 1 and 10 are prefixes of 101

8
Prefix codes
  • no codeword is a prefix of any other codeword.
  • also known as an instantaneous or
    self-punctuating code,
  • an encoded string can be decoded from left to
    right without looking ahead to subsequent
    codewords
  • prefix code is uniquely decodeable (but not all
    uniquely decodable codes are prefix codes)
  • can be written as a tree, leaves codewords

9
Limits on prefix codes
  • the maximum number of codewords of length l is 2l
  • if we shorten one codeword, we must lengthen
    others to retain unique decodability
  • For any uniquely decodable binary coding, the
    codeword lengths li satisfy

(Kraft inequality)
10
Coding example
  • all source digits equally probable
  • source entropy log210 3.32 bits/sym

11
Prefix codes (reminder)
  • variable length
  • uniquely decodable
  • instantaneous
  • can be represented as a tree
  • no code word is a prefix of another
  • e.g. if ABAACA is a code word then A, AB, ABA,
    ABAA, ABAAC cannot be used as code words
  • Kraft inequality

12
Optimal prefix codes
  • if Pr(a1) ? Pr(a2) ? ? Pr(am),then l1 ?
    l2 ? ? lm where li length of word
    for symbol ai
  • at least 2 (up to n) least probable input symbols
    will have the same prefix and only differ in the
    last output symbol
  • every possible sequence up to lm-1 output
    symbols must be a code word or have one of its
    prefixes used as a code word (lm is the longest
    word length)
  • for a binary code, the optimal word length for a
    symbol is equal to the information content i.e.
  • li log2(1/pi)

13
Converse
  • conversely, any set of word lengths li
    implicitly defines a set of symbol probabilities
    qi for which the word lengths li are optimal

14
Compression - How close can we get to the entropy?
  • We can always find a binary prefix code with
    average word length L satisfying

15
Huffman prefix code
  • used for image compression
  • General approach
  • Work out necessary conditions for a code to be
    optimal
  • Use these to construct code
  • from condition (3) of prefix code (earlier slide)
  • am ? x x x 0 (least probable)
  • am-1 ? x x x 1 (next probable)
  • therefore assign final digit first
  • e.g. consider the source on the right

16
Algorithm
  • Lay out all symbols in a line, one node per
    symbol
  • Merge the two least probable symbols into a
    single node
  • Add their probabilities and assign this to the
    merged node
  • Repeat until only one node remains
  • Assign binary code from last node, assigning 0
    for the lower probability link at each step

17
Example
18
Example - contd.
s1 Pr(s1)0.1
s2 Pr(s2)0.25
s3 Pr(s3)0.2
s4 Pr(s4)0.45
0.3
0.55
19
Example - step 5
s1 Pr(s1)0.1
s2 Pr(s2)0.25
s3 Pr(s3)0.2
s4 Pr(s4)0.45
0.3
0.55
1
20
Algorithm
  • Lay out all symbols in a line, one node per
    symbol
  • Merge the two least probable symbols into a
    single node
  • Add their probabilities and assign this to the
    merged node
  • Repeat until only one node remains
  • Assign binary code from last node, assigning 0
    for the lower probability link at each step

21
Comments
  • we can choose different ordering of 0 or 1 at
    each node
  • 2m different codes (m number of merging nodes,
    i.e., not symbol nodes)
  • 23 8 in previous example
  • But, AWL is the same for all codes
  • hence source entropy and efficiency are the same
  • What if n (number of symbols in code alphabet) is
    larger than 2?
  • Condition (2) says we can group from 2 to n
    symbols
  • Condition (3) effectively says we should use
    groups as large as possible and end with one
    composite symbol at end

22
Disadvantages of Huffman Code
  • we have assumed that probabilities of our source
    symbols are known and fixed
  • symbol frequencies may vary with context (e.g.
    markov source)
  • up to 1 extra bit per symbol is needed
  • could be serious if H(A) 1bit !
  • e.g. English entropy is approx 1 bit per
    character
  • beyond symbol codes - arithmetic coding
  • move away from the idea that one symbol ? integer
    number of bits
  • e.g. Lempel-Ziv coding
  • not covered in this course

23
Another question
  • consider a message (sequence of characters) from
    a, b, c, d encoded using the code shown
  • what is the probability that a randomly chosen
    bit from the encoded message is 1?

24
Shannon-Fano theorem
  • Channel capacity
  • Entropy (bits/sec) of encoder determined by
    entropy of source (bits/sym)
  • If we increase the rate at which source generates
    information (bits/sym) eventually we will reach
    the limit of the encoder (bits/sec). At this
    point the encoders entropy will have reached a
    limit
  • This is the channel capacity
  • S-F theorem
  • Source has entropy H bits/symbol
  • Channel has capacity C bits/sec
  • Possible to encode the source so that its symbols
    can be transmitted at up to C/H symbols per
    second, but no faster
  • (general proof in notes)

25
satisfies kraft
average word length
Write a Comment
User Comments (0)
About PowerShow.com