Title: Huffman encoding
1Huffman encoding
2Fixed Length Codes
Represent data as a sequence of 0s and
1s Sequence BACADAEAFABBAAAGAH
A fixed length code
A 000 B 001 C 010 D 011 E 100 F
101 G 110 H 111
Encoding of sequence
00100001000001100010000010100000100100000000011000
0111
The Encoding is 18x354 bits long. Can we make
the encoding shorter?
3Variable Length Code
Make use of frequencies. Frequency of A8, B3,
others 1.
A 0 B 100 C 1010 D 1011 E 1100
F 1101 G 1110 H 1111
Example BACADAEAFABBAAAGAH 100010100101101100011
010100100000111001111
But how do we decode?
4Prefix code ? Binary tree
Prefix code No codeword is a prefix of any other
codeword
A 0 B 100 C 1010 D 1011 E 1100
F 1101 G 1110 H 1111
5Decoding Example
10001010
10001010 B
10001010 BA
10001010 BAC
6Huffman Tree Optimal Length Code
Optimal no code has better weighted average
length
7Huffmans Algorithm
Build tree bottom-up, so that lowest weight
leaves are farthest from the root.
Repeatedly Find two trees of lowest
weight. merge them to form a new tree whose
weight is the sum of their weights.
8Construction of Huffman tree
9Two questions
- Why does the algorithm produce the best tree ?
- How do you implement it efficiently ?
10Huffman(C) n ? C Q ? C for i ? 1
to n-1 do new(z) left(z)
? x ? delete-min(Q) right(z) ? y ?
delete-min(Q) f(z) ? f(x) f(y)
insert(z,Q) return delete-min(Q)
11Correctness
- Let x and y be the characters with lowest
frequencies. We prove that there is an optimal
tree in which x and y are siblings, deepest
leaves - The tree without x and y is an optimal tree for
the set in which we replace x and y with a single
character whose frequency is f(x) f(y). - Then correctness follows by induction