Title: Huffman encoding
1Huffman encoding
2Fixed Length Codes
Represent data as a sequence of 0s and
A fixed length code
A 000 B 001 C 010 D 011 E 100 F
101 G 110 H 111
Encoding of sequence
The Encoding is 18x354 bits long. Can we make
the encoding shorter?
3Variable Length Code
Make use of frequencies. Frequency of A8, B3,
others 1.
A 0 B 100 C 1010 D 1011 E 1100
F 1101 G 1110 H 1111
Example BACADAEAFABBAAAGAH 100010100101101100011
But how do we decode?
4Prefix code ? Binary tree
Prefix code No codeword is a prefix of any other
A 0 B 100 C 1010 D 1011 E 1100
F 1101 G 1110 H 1111
5Decoding Example
10001010 B
10001010 BA
10001010 BAC
6Huffman Tree Optimal Length Code
Optimal no code has better weighted average
7Huffmans Algorithm
Build tree bottom-up, so that lowest weight
leaves are farthest from the root.
Repeatedly Find two trees of lowest
weight. merge them to form a new tree whose
weight is the sum of their weights.
8Construction of Huffman tree
9Two questions
- Why does the algorithm produce the best tree ?
- How do you implement it efficiently ?
10Huffman(C) n ? C Q ? C for i ? 1
to n-1 do new(z) left(z)
? x ? delete-min(Q) right(z) ? y ?
delete-min(Q) f(z) ? f(x) f(y)
insert(z,Q) return delete-min(Q)
- Let x and y be the characters with lowest
frequencies. We prove that there is an optimal
tree in which x and y are siblings, deepest
leaves - The tree without x and y is an optimal tree for
the set in which we replace x and y with a single
character whose frequency is f(x) f(y). - Then correctness follows by induction