Huffman coding - PowerPoint PPT Presentation

About This Presentation
Title:

Huffman coding

Description:

Title: Slide 1 Author - Last modified by - Created Date: 5/6/2005 5:03:27 PM Document presentation format: On-screen Show Company - Other titles – PowerPoint PPT presentation

Number of Views:510
Avg rating:3.0/5.0
Slides: 38
Provided by: 16959
Category:

less

Transcript and Presenter's Notes

Title: Huffman coding


1
Huffman coding
2
Optimal codes - I
  • A code is optimal if it has the shortest codeword
    length L
  • This can be seen as an optimization problem

3
Optimal codes - II
  • Lets make two simplifying assumptions
  • no integer constraint on the codelengths
  • Kraft inequality holds with equality
  • Lagrange-multiplier problem

4
Optimal codes - III
  • Substitute into the Kraft
    inequality
  • that is
  • Note that

the entropy, when we use base D for logarithms
5
Optimal codes - IV
  • In practice the codeword lengths must be integer
    value, so obtained results is a lower bound
  • Theorem
  • The expected length of any istantaneous D-ary
    code for a r.v. X satisfies
  • this fundamental result derives frow the work of
    Shannon

6
Optimal codes - V
  • What about the upper bound?
  • Theorem
  • Given a source alphabet (i.e. a r.v.) of entropy
    it is possible to find an instantaneous
    binary code which length satisfies
  • A similar theorem could be stated if we use the
    wrong probabilities instead of the true
    ones the only difference is a term which
    accounts for the relative entropy

7
The redundance
  • It is defined as the average codeword legths
    minus the entropy
  • Note that
  • (why?)

8
Compression ratio
  • It is the ratio between the average number of
    bit/symbol in the original message and the same
    quantity for the coded message, i.e.

9
Uniquely decodable codes
  • The set of the instantaneous codes are a small
    subset of the uniquely decodable codes.
  • It is possible to obtain a lower average code
    length L using a uniquely decodable code that is
    not instantaneous? NO
  • So we use instantaneous codes that are easier to
    decode

10
Summary
  • Average codeword length L
  • for uniquely decodable codes (and
    for instantaneous codes)
  • In practice for each r.v. with entropy
    we can build a code with average codeword length
    that satisfies

11
Shannon-Fano coding
  • The main advantage of the Shannon-Fano technique
    is its semplicity
  • Source symbols are listed in order of
    nonincreasing probability.
  • The list is divided in such a way to form two
    groups of as nearly equal probabilities as
    possible
  • Each symbol in the first group receives a 0 as
    first digit of its codeword, while the others
    receive a 1
  • Each of these group is then divided according to
    the same criterion and additional code digits are
    appended
  • The process is continued until each group
    contains only one message

12
example
  • H1.9375 bits
  • L1.9375 bits

13
Shannon-Fano coding - exercise
  • Encode, using Shannon-Fano algorithm

14
Is Shannon-Fano coding optimal?
  • H2.2328 bits
  • L2.31 bits

L12.3 bits
15
Huffman coding - I
  • There is another algorithm which performances are
    slightly better than Shanno-Fano, the famous
    Huffman coding
  • It works constructing bottom-up a tree, that has
    symbols in the leafs
  • The two leafs with the smallest probabilities
    becomes sibling under a parent node with
    probabilities equal to the two childrens
    probabilities

16
Huffman coding - II
  • At this time the operation is repeated,
    considering also the new parent node and ignoring
    its children
  • The process continue until there is only parent
    node with probability 1, that is the root of the
    tree
  • Then the two branches for every non-leaf node are
    labeled 0 and 1 (typically, 0 on the left branch,
    but the order is not important)

17
Huffman coding - example
1.0
1.0
0
1
0.4
0.4
  • 0

1
0.2
0.6
0.2
0.6
0
0
1
1
0.1
0.3
0.1
0.3
0
0
1
1
a 0.05
b 0.05
c 0.1
d 0.2
e 0.3
f 0.2
g 0.1
a 0.05
b 0.05
c 0.1
d 0.2
e 0.3
f 0.2
g 0.1
18
Huffman coding - example
  • Exercise evaluate H(X) and L(X)
  • H(X)2.5464 bits
  • L(X)2.6 bits !!

19
Huffman coding - exercise
  • Code the sequence
  • aeebcddegfced and calculate the
    compression ratio
  • Sol 0000 10 10 0001 001 01 01
  • 10 111 110 001 10 01
  • Aver. orig. symb. length 3 bits
  • Aver. compr. symb. length 34/13
  • C.....

20
Huffman coding - exercise
  • Decode the sequence
  • 0111001001000001111110
  • Sol dfdcadgf

21
Huffman coding - exercise
  • Encode with Huffman the sequence
  • 01cc0a02ba10
  • and evaluate entropy, average codeword length
    and compression ratio

22
Huffman coding - exercise
  • Decode (if possible) the Huffman coded bit
    streaming
  • 01001011010011110101...

23
Huffman coding - notes
  • In the huffman coding, if, at any time, there is
    more than one way to choose a smallest pair of
    probabilities, any such pair may be chosen
  • Sometimes, the list of probabilities is
    inizialized to be non-increasing and reordered
    after each node creation. This details doesnt
    affect the correctness of the algorithm, but it
    provides a more efficient implementation

24
Huffman coding - notes
  • There are cases in which the Huffman coding does
    not uniquely determine codeword lengths, due to
    the arbitrary choice among equal minimum
    probabilities.
  • For example for a source with probabilities
  • it is possible to obtain
    codeword lengths of and of
  • It would be better to have a code which
    codelength has the minimum variance, as this
    solution will need the minimum buffer space in
    the transmitter and in the receiver

25
Huffman coding - notes
  • Schwarz defines a variant of the Huffman
    algorithm that allows to build the code with
    minimum .
  • There are several other variants, we will explain
    the most important in a while.

26
Optimality of Huffman coding - I
  • It is possible to prove that, in case of
    character coding (one symbol, one codeword),
    Huffman coding is optimal
  • In another terms Huffman code has minimum
    redundancy
  • An upper bound for redundancy has been found
  • where is the probability of the most likely
    simbol

27
Optimality of Huffman coding - II
  • Why Huffman code suffers when there is one
    symbol with very high probability?
  • Remember the notion of uncertainty...
  • The main problem is given by the integer
    constraint on codelengths!!
  • This consideration opens the way to a more
    powerful coding... we will see it later

28
Huffman coding - implementation
  • Huffman coding can be generated in O(n) time,
    where n is the number of source symbols, provided
    that probabilities have been presorted (however
    this sort costs O(nlogn)...)
  • Nevertheless, encoding is very fast

29
Huffman coding - implementation
  • However, spatial and temporal complexity of the
    decoding phase are far more important, because,
    on average, decoding will happen more frequently.
  • Consider a Huffman tree with n symbols
  • n leafs and n-1 internal nodes

has the pointer to a symbol and the info that it
is a leaf
has two pointers
30
Huffman coding - implementation
  • 1 million symbols 16 MB of memory!
  • Moreover traversing a tree from root to leaf
    involves follow a lot of pointers, with little
    locality of reference. This causes several page
    faults or cache misses.
  • To solve this problem a variant of Huffman coding
    has been proposed canonical Huffman coding

31
canonical Huffman coding - I
1.0
0
1
(0)
(1)
0.53
0.47
0
1
(0)
(1)
0
(1)
1
(0)
0.23
0.27
0
0
1
1
(0)
(0)
(1)
(1)
?
b 0.12
c 0.13
d 0.14
e 0.24
f 0.26
a 0.11
32
canonical Huffman coding - II
  • This code cannot be obtained through a Huffman
    tree!
  • We do call it an Huffman code because it is
    instantaneous and the codeword lengths are the
    same than a valid Huffman code
  • numerical sequence property
  • codewords with the same length are ordered
    lexicographically
  • when the codewords are sorted in lexical order
    they are also in order from the longest to the
    shortest codeword

33
canonical Huffman coding - III
  • The main advantage is that it is not necessary to
    store a tree, in order to decoding
  • We need
  • a list of the symbols ordered according to the
    lexical order of the codewords
  • an array with the first codeword of each distinct
    length

34
canonical Huffman coding - IV
  • Encoding. Suppose there are n disctinct symbols,
    that for symbol i we have calculated huffman
    codelength and

numlk number of codewords with length
k firstcodek integer for first code of
length k nextcodek integer for the next
codeword of length k to be assigned symbol-,-
used for decoding codewordi the rightmost
bits of this integer are the code for symbol i
35
canonical Huffman - example
  • 1. Evaluate array numl
  • 2. Evaluate array firstcode
  • 3. Construct array codeword and symbol

symbol
0 1 2 3
1 2 3 4 5
- - - -
a e h -
d - - -
- - - -
b c f g
36
canonical Huffman coding - V
  • Decoding. We have the arrays firstcode and symbols

nextinputbit() function that returns next input
bit firstcodek integer for first code of
length k symbolk,n returns the symbol number
n with codelength k
37
canonical Huffman - example
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
symbol3,0 d
symbol3,0 d
symbol2,2 h
symbol2,2 h
symbol2,1 e
symbol2,1 e
symbol5,0 b
symbol5,0 b
symbol
0 1 2 3
symbol2,0 a
symbol2,0 a
1 2 3 4 5
- - - -
a e h -
d - - -
- - - -
b c f g
symbol3,0 d
symbol3,0 d
  • Decoded dhebad
Write a Comment
User Comments (0)
About PowerShow.com