Huffman codes - PowerPoint PPT Presentation

About This Presentation
Title:

Huffman codes

Description:

001011101 (codeword: a=0, b=00, c=01, d=11.) Where to cut? ... Output: a binary tree representing codewords so that the total number of bits ... – PowerPoint PPT presentation

Number of Views:284
Avg rating:3.0/5.0
Slides: 19
Provided by: scie241
Category:

less

Transcript and Presenter's Notes

Title: Huffman codes


1
Huffman codes
  • Binary character code each character is
    represented by a unique binary string.
  • A data file can be coded in two ways

The first way needs 100?3300 bits. The second
way needs 45 ?113 ?312 ?316 ?39 ?45 ?4232
bits.
2
Variable-length code
  • Need some care to read the code.
  • 001011101 (codeword a0, b00, c01, d11.)
  • Where to cut? 00 can be explained as either aa
    or b.
  • Prefix of 0011 0, 00, 001, and 0011.
  • Prefix codes no codeword is a prefix of some
    other codeword. (prefix free)
  • Prefix codes are simple to encode and decode.

3
Using codeword in Table to encode and decode
  • Encode abc 0.101.100 0101100
  • (just concatenate the codewords.)
  • Decode 001011101 0.0.101.1101 aabe

4
  • Encode abc 0.101.100 0101100
  • (just concatenate the codewords.)
  • Decode 001011101 0.0.101.1101 aabe
  • (use the (right)binary tree below)

Tree for the fixed length codeword
Tree for variable-length codeword
5
Binary tree
  • Every nonleaf node has two children.
  • The fixed-length code in our example is not
    optimal.
  • The total number of bits required to encode a
    file is
  • f ( c ) the frequency (number of occurrences)
    of c in the file
  • dT(c) denote the depth of cs leaf in the tree

6
Constructing an optimal code
  • Formal definition of the problem
  • Input a set of characters Cc1, c2, , cn,
    each c?C has frequency fc.
  • Output a binary tree representing codewords so
    that the total number of bits required for the
    file is minimized.
  • Huffman proposed a greedy algorithm to solve the
    problem.

7
a45
d16
e9
f5
b13
c12
(a)
(b)
8
(c)
(d)
9
(f)
(e)
10
HUFFMAN(C) 1 nC 2 QC 3 for i1 to n-1
do 4 zALLOCATE_NODE() 5 xleftzEXTRACT_MI
N(Q) 6 yrightzEXTRACT_MIN(Q) 7 fzfx
fy 8 INSERT(Q,z) 9 return EXTRACT_MIN(Q)
11
The Huffman Algorithm
  • This algorithm builds the tree T corresponding to
    the optimal code in a bottom-up manner.
  • C is a set of n characters, and each character c
    in C is a character with a defined frequency
    fc.
  • Q is a priority queue, keyed on f, used to
    identify the two least-frequent characters to
    merge together.
  • The result of the merger is a new object
    (internal node) whose frequency is the sum of
    the two objects.

12
Time complexity
  • Lines 4-8 are executed n-1 times.
  • Each heap operation in Lines 4-8 takes O(lg n)
    time.
  • Total time required is O(n lg n).
  • Note The details of heap operation will not be
    tested. Time complexity O(n lg n) should be
    remembered.

13
Another example
e4
a6
c6
b9
d11
14
d11
15
(No Transcript)
16
Correctness of Huffmans Greedy Algorithm
(Fun Part, not required)
  • Again, we use our general strategy.
  • Let x and y are the two characters in C having
    the lowest frequencies. (the first two characters
    selected in the greedy algorithm.)
  • We will show the two properties
  • There exists an optimal solution Topt (binary
    tree representing codewords) such that x and y
    are siblings in Topt.
  • Let z be a new character with frequency
    fzfxfy and CC-x, y?z. Let
    T be an optimal tree for C. Then we can get
    Topt from T by replacing z with

z
x
y
17
Proof of Property 1
Topt
Tnew
  • Look at the lowest siblings in Topt, say, b and
    c.
  • Exchange x with b and y with c.
  • B(Topt)-B(Tnew)?0 since fx and fy are the
    smallest.
  • 1 is proved.

18
  • Let z be a new character with frequency
    fzfxfy and CC-x, y?z. Let T be an
    optimal tree for C. Then we can get Topt from T
    by

  • replacing z with
  • Proof Let T be the tree obtained from T by
  • replacing z with the three nodes.
  • B(T)B(T)fxfy. (1)
  • (the length of the codes for x and y are 1 bit
    more than that of z.)
  • Now prove T Topt by contradiction.
  • If T?Topt, then B(T)gtB(Topt). (2)
  • From 1, x and y are siblings in Topt .
  • Thus, we can delete x and y from Topt and get
    another tree T for C.
  • B(T)B(Topt) fx-fyltB(T)-fx-fyB(T).
  • using (2)
    using (1)
  • Thus, T(T)ltB(T). Contradiction to the
    assumption T is optimum for C.

z
y
x
Write a Comment
User Comments (0)
About PowerShow.com