Huffman Codes - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Huffman Codes

Description:

Huffman Codes Computing an Optimal Code for a Document * – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 31
Provided by: RollinS5
Learn more at: https://www.cse.usf.edu
Category:
Tags: codes | dick | huffman | moby

less

Transcript and Presenter's Notes

Title: Huffman Codes


1
Huffman Codes
  • Computing an Optimal Code for a Document

2
Objectives
  • You will be able to
  • Create an optimal code for an ASCII text file.
  • Encode the text file using the optimal code and
    output the compressed text as a binary file.
  • Read the compressed binary file and reconstruct
    the original ASCII text.
  • Output the decoded message to a text file.
  • Encode and decode a large text file
  • Moby Dick

3
Getting Started
  • Download program from last class.
  • http//www.cse.usf.edu/turnerr/Data_Structures/Do
    wnloads/2011_04_13_Huffman_Codes_with_Binary_IO/
  • File Huffman_Codes_with_Binary_IO.zip
  • A bit of cleanup
  • Improve the prompts as shown on the following
    slides.
  • Delete commented out sections in main.cpp
  • Remove output of sorted list in Make_Decode_Tree

4
Modifications to Prompts
  • main.cpp
  • In do_decode (line 29)
  • //cout ltlt "File name for input? "
  • cout ltlt "File name for compressed input file? "
  • In do_encode (line 89)
  • //cout ltlt "File name for output? "
  • cout ltlt "File name for compressed output file? "

5
An Error on Circe
  • Binary_File.h, line 14 should be
  • static const size_t FIRST_BIT_POSITION
    8sizeof(size_t)
  • int and size_t are the same size on 32 bit
    Windows systems.
  • Not on Circe.
  • Probably not on other 64 bit systems.
  • Other errors and warnings on Circe have fairly
    obvious fixes.

6
Program Running
7
Text Files for Testing
  • Download to a convenient directory
  • Full text of Moby Dick
  • http//www.cse.usf.edu/turnerr/Data_Structures/Do
    wnloads/Moby_Dick.txt
  • Abridged version
  • http//www.cse.usf.edu/turnerr/Data_Structures/Do
    wnloads/Moby_Quick.txt

8
Moby Dick (Abridged)
9
Get Input from a File
  • Modify the Huffman Code program to get its input
    for encode from a text file rather than from the
    keyboard.

10
main.cpp
  • Insert above do_encode
  • void get_text_input_file(string input_filename,
    ifstream infile)
  • string junk
  • while (true)
  • cout ltlt "File name for text input? "
  • cin gtgt input_filename
  • getline(cin, junk) // Skip newline char
  • infile.open(input_filename.c_str())
  • if (infile.good())
  • break
  • infile.clear()
  • cout ltlt "Open failed for file " ltlt
    input_filename ltlt endl
  • cout ltlt "Please try again\n"

http//www.cse.usf.edu/turnerr/Data_Structures/Do
wnloads/2011_04_18_Huffman_Code_for_Document/get_t
ext_input_file.cpp.txt
11
do_encode()
  • Revised version that gets input from a file
    rather than from the keyboard
  • http//www.cse.usf.edu/turnerr/Data_Structures/Do
    wnloads/2011_04_18_Huffman_Code_for_Document/do_en
    code.cpp.txt

12
do_encode()
  • void do_encode(void)
  • string msg
  • string output_filename
  • Binary_Output_File outfile
  • string junk
  • string input_filename
  • ifstream infile
  • get_text_input_file(input_filename, infile)
  • while (true)
  • cout ltlt "\nFile name for compressed
    output file? "
  • cin gtgt output_filename
  • getline(cin,junk) // Skip newline char
  • try

13
do_encode()
  • //cout ltlt "\n\nEnter message to encode\n"
  • //getline(cin, msg)
  • while (infile.good())
  • char next_char
  • infile.get(next_char)
  • string code huffman_tree.Encode_Char(tol
    ower(next_char))
  • if (code.size() 0)
  • cout ltlt endl ltlt "Invalid character in
    input "
  • ltlt next_char ltlt endl
  • continue
  • outfile-gtOutput(code)
  • infile.close()

14
Program in Action
15
Program continuing
16
Some Issues
  • White space
  • newline characters lost
  • Punctuation
  • Capitalization
  • Let's build a code specifically for this
    document.
  • Include all characters.
  • Optimize weights for the document.

17
Developing a Code for the Document
  • New version of build_huffman_tree
  • Read the input text file and count occurrences of
    each character.
  • Also total number of characters in the file
  • For each ASCII value that appears in the input
    text file
  • Compute relative frequency.
  • Add char and frequency to the Huffman tree.

18
New build_huffman_tree()
  • http//www.cse.usf.edu/turnerr/Data_Structures/Do
    wnloads/2011_04_18_Huffman_Code_for_Document/build
    _huffman_tree.cpp.txt
  • void build_huffman_tree(ifstream infile)
  • int counts128 0
  • int total 0
  • // Count characters in the input file.
  • while (infile.good())
  • char next_char
  • infile.get(next_char)
  • assert (next_char gt 0)
  • assert (next_char lt 127)
  • countsnext_char
  • total
  • infile.close()
  • infile.clear()

19
New build_huffman_tree()
  • for (int i 0 i lt 128 i)
  • if (countsi gt 0)
  • huffman_tree.Add(i, (1.0countsi) /
    total)

20
main.cpp
  • Add at top
  • include ltcassertgt
  • string input_filename
  • ifstream infile
  • Add to main()
  • int main(void)
  • cout ltlt "This is the Huffman code program
    \n"
  • get_text_input_file(input_filename, infile)
  • build_huffman_tree(infile)

21
do_encode()
  • We have to reopen the input file after reading it
    Build_Huffman_Tree.
  • No longer call get_text_input_file.
  • Comment out call to get_text_input_file near the
    top.
  • At line 104
  • //cout ltlt "\n\nEnter message to encode\n"
  • //getline(cin, msg)
  • infile.open(input_filename.c_str())
  • while (infile.good())

22
do_encode()
  • At line 112 remove call to tolower()
  • infile.open(input_filename.c_str())
  • while (infile.good())
  • char next_char
  • infile.get(next_char)
  • string code huffman_tree.Encode_Char(tolower
    (next_char))
  • We now can encode all characters.

23
Program Running
24
So far, so good!
  • The program seems to be working for a short file.
  • Let's try it on the full text.
  • You may not want to wait for the complete output!

25
Output Decoded Message to a File
  • Add above do_decode() http//www.cse.usf.edu/tur
    nerr/Data_Structures/Downloads/2011_04_18_Huffman_
    Code_for_Document/get_text_output_file.cpp.txt
  • void get_text_output_file(string
    output_filename, ofstream outfile)
  • string junk
  • while (true)
  • cout ltlt "File name for text output? "
  • cin gtgt output_filename
  • getline(cin, junk) // Skip newline char
  • outfile.open(output_filename.c_str())
  • if (outfile.good())
  • break
  • outfile.clear()
  • cout ltlt "Open failed for file " ltlt
    output_filename ltlt endl
  • cout ltlt "Please try again\n"

26
Output Decoded Message to a File
  • At end of do_decode
  • original_message huffman_tree.Decode_Msg(cod
    ed_message)
  • //cout ltlt "Original message " ltlt
    original_message ltlt endl
  • //cout ltlt endl ltlt endl
  • string output_filename
  • ofstream outfile
  • get_text_output_file(output_filename,
    outfile)
  • outfile ltlt original_message
  • outfile.close()
  • cout ltlt "File " ltlt output_filename ltlt "
    written"
  • cout ltlt endl ltlt endl

27
Test on Full Text of Moby Dick
28
Test on Full Text of Moby Dick
29
On Circe
(After some tweaking)
30
Embedding the Code
  • In order for the compressed file to be useful, we
    have to store the code along with it.
  • Then we can read and decode the file at a later
    time.
  • Even on a different computer (with the same
    architecture)
  • In order to decode
  • First read the code.
  • Reconstitute the decode tree.
  • Then read and decode the message.
  • Project 7
Write a Comment
User Comments (0)
About PowerShow.com