Title: Area : Computer Science
1- Area Computer Science
- Subject Algorithms
- Huffman Coding
- Report submitted by- Akash Singhal (07005016)
2Greedy Algorithms are
algorithms that follow the strategy of making the
locally optimum choice at each stage with the
hope of finding the global optimum.Huffman
Coding is an algorithm for lossless data
compression. This algorithm assign
variable-length codes to each symbol used in the
file such that no code is a prefix of the other.
3AIM PROCEDUREThe aim of this animation is to
clarify the concepts of greedy algorithms by
illustrating the problem of HUFFMAN CODING.
- This animation will use a sample file.
- The sample file will be hashed and a table will
be created containing the frequency of each
symbol/alphabet used in the file. - The animation will show the construction of a
binary tree which shows how a code is allotted to
every symbol while compressing a data file.
4Problem Statement Given a file find a code
allotment for every symbol used in the file such
that the size of the compressed is minimum. For
well-defined interpretation of codes no code is a
prefix of another code.
- Combine the two symbols of least frequency to
make it a single node with its frequency equal to
the sum of the frequencies of both.
ab (5)
a (2)
b (3)
a (2)
b (3)
5- The process continues like that until the final
shape of the tree becomes like this
abcdef(19)
abc (9)
def (10)
6- Now the animation shows unfolding of the tree
such that it shows the code allotment
abcdef(30)
abcdef(30)
0
1
0
1
abc (9)
abc (9)
def (21)
def (21)
0
1
c (4)
ab (5)
1
0
d (6)
ef (15)
7abcdef(30)
0
1
abc (9)
def (21)
0
1
c (4)
1
0
ab (5)
d (6)
ef (15)
0
0
1
1
a (2)
b (3)
f (8)
e (7)
Code Allotment using Huffman Coding
8Review Questions
- Huffman coding is an example of _________
algorithms. (ans- greedy) - Huffman coding uses this data structure for code
allotment (ans- binary tree) - If the code for a is 001 then which of these
can be a code for b - a) 0100
- b) 0010
- c) 0011
- Ans- 0100 (because no code is a prefix of
another) - If there are 20 symbols in a file, how many
leaves will the binary tree of huffman coding
have? (Ans- 20) - Since no code can be a prefix of another, in
order to optimize the compression we must ensure
that the symbol with highest frequency has
smallest possible code and the symbol with least
frequency has a code not smaller than others.
Verify.
9Further Interactivity
- The user can himself upload a file and the
animation can then return the user a table
showing the frequency of each symbol used in the
file and the code allotted to each symbol. - An animation for the user uploaded file.
- We can also return the user a compressed version
of his file. Hence this animation can act as a
file compression tool also.
10Further Reading Links
- http//en.wikipedia.org/wiki/Huffman_coding
- http//www.cdeep.iitb.ac.in/nptel/Computer20Scien
ce/Design20and20Analysis20of20Algorithms/TOC.h
tm
11CREDITS