Greedy Algorithms (Huffman Coding) - PowerPoint PPT Presentation

About This Presentation
Title:

Greedy Algorithms (Huffman Coding)

Description:

(Huffman Coding) Slide * * Huffman Coding A technique to compress data effectively Usually between 20%-90% compression Lossless compression No information is lost ... – PowerPoint PPT presentation

Number of Views:305
Avg rating:3.0/5.0
Slides: 33
Provided by: webCsWpi99
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Greedy Algorithms (Huffman Coding)


1
Greedy Algorithms(Huffman Coding)
2
Huffman Coding
Original file
  • A technique to compress data effectively
  • Usually between 20-90 compression
  • Lossless compression
  • No information is lost
  • When decompress, you get the original file

3
Huffman Coding Applications
Huffman coding
Compressed file
Original file
  • Saving space
  • Store compressed files instead of original files
  • Transmitting files or data
  • Send compressed data to save transmission time
    and power
  • Encryption and decryption
  • Cannot read the compressed file without knowing
    the key

4
Main Idea Frequency-Based Encoding
  • Assume in this file only 6 characters appear
  • E, A, C, T, K, N
  • The frequencies are

Character Frequency
E 10,000
A 4,000
C 300
T 200
K 100
N 100
  • Option I (No Compression)
  • Each character 1 Byte (8 bits)
  • Total file size 14,700 8 117,600 bits
  • Option 2 (Fixed size compression)
  • We have 6 characters, so we need
  • 3 bits to encode them
  • Total file size 14,700 3 44,100 bits

Character Fixed Encoding
E 000
A 001
C 010
T 100
K 110
N 111
5
Main Idea Frequency-Based Encoding(Contd)
  • Assume in this file only 6 characters appear
  • E, A, C, T, K, N
  • The frequencies are

Character Frequency
E 10,000
A 4,000
C 300
T 200
K 100
N 100
  • Option 3 (Huffman compression)
  • Variable-length compression
  • Assign shorter codes to more frequent characters
    and longer codes to less frequent characters
  • Total file size

Char. HuffmanEncoding
E 0
A 10
C 110
T 1110
K 11110
N 11111
(10,000 x 1) (4,000 x 2) (300 x 3) (200 x
4) (100 x 5) (100 x 5) 20,700 bits
6
Huffman Coding
  • A variable-length coding for characters
  • More frequent characters ? shorter codes
  • Less frequent characters ? longer codes
  • It is not like ASCII coding where all characters
    have the same coding length (8 bits)
  • Two main questions
  • How to assign codes (Encoding process)?
  • How to decode (from the compressed file, generate
    the original file) (Decoding process)?

7
Decoding for fixed-length codes is much easier
010001100110111000
Character Fixed-length Encoding
E 000
A 001
C 010
T 100
K 110
N 111
Divide into 3s
010 001 100 110 111 000
Decode
C A T K N E
8
Decoding for variable-length codes is not that
easy
000001
Character Variable-length Encoding
E 0
A 00
C 001



Huffman encoding guarantees to avoid this
uncertainty Always have a single decoding
9
Huffman Algorithm
  • Step 1 Get Frequencies
  • Scan the file to be compressed and count the
    occurrence of each character
  • Sort the characters based on their frequency
  • Step 2 Build Tree Assign Codes
  • Build a Huffman-code tree (binary tree)
  • Traverse the tree to assign codes
  • Step 3 Encode (Compress)
  • Scan the file again and replace each character by
    its code
  • Step 4 Decode (Decompress)
  • Huffman tree is the key to decompress the file

10
Step 1 Get Frequencies
Input File
Eerie eyes seen near lake.
11
Step 2 Build Huffman Tree Assign Codes
  • It is a binary tree in which each character is a
    leaf node
  • Initially each node is a separate root
  • At each step
  • Select two roots with smallest frequency and
    connect them to a new parent (Break ties
    arbitrary) The greedy choice
  • The parent will get the sum of frequencies of the
    two child nodes
  • Repeat until you have one root

12
Example
Each char. has a leaf node with its frequency
13
Find the smallest two frequenciesReplace them
with their parent
14
Find the smallest two frequenciesReplace them
with their parent
15
Find the smallest two frequenciesReplace them
with their parent
16
Find the smallest two frequenciesReplace them
with their parent
17
Find the smallest two frequenciesReplace them
with their parent
18
Find the smallest two frequenciesReplace them
with their parent
19
Find the smallest two frequenciesReplace them
with their parent
20
Find the smallest two frequenciesReplace them
with their parent
21
Find the smallest two frequenciesReplace them
with their parent
22
Find the smallest two frequenciesReplace them
with their parent
23
Find the smallest two frequenciesReplace them
with their parent
24
Now we have a single rootThis is the Huffman Tree
25
Lets Analyze Huffman Tree
  • All characters are at the leaf nodes
  • The number at the root of characters in the
    file
  • High-frequency chars (E.g., e) are near the
    root
  • Low-frequency chars are far from the root

26
Lets Assign Codes
  • Traverse the tree
  • Any left edge ? add label 0
  • As right edge ? add label 1
  • The code for each character is its root-to-leaf
    label sequence

27
Lets Assign Codes
1
0
0
0
1
1
1
0
0
1
0
1
1
0
0
1
1
0
0
1
1
0
  • Traverse the tree
  • Any left edge ? add label 0
  • As right edge ? add label 1
  • The code for each character is its root-to-leaf
    label sequence

28
Lets Assign Codes
  • Traverse the tree
  • Any left edge ? add label 0
  • As right edge ? add label 1
  • The code for each character is its root-to-leaf
    label sequence

29
Huffman Algorithm
  • Step 1 Get Frequencies
  • Scan the file to be compressed and count the
    occurrence of each character
  • Sort the characters based on their frequency
  • Step 2 Build Tree Assign Codes
  • Build a Huffman-code tree (binary tree)
  • Traverse the tree to assign codes
  • Step 3 Encode (Compress)
  • Scan the file again and replace each character by
    its code
  • Step 4 Decode (Decompress)
  • Huffman tree is the key to decompess the file

30
Step 3 Encode (Compress) The File
Input File
Eerie eyes seen near lake.

0000
10
1100
0001
10
.
Notice that no code is prefix to any other code
? Ensures the decoding will be unique (Unlike
Slide 8)
31
Step 4 Decode (Decompress)
  • Must have the encoded file the coding tree
  • Scan the encoded file
  • For each 0 ? move left in the tree
  • For each 1 ? move right
  • Until reach a leaf node ? Emit that character and
    go back to the root

32
Huffman Algorithm
  • Step 1 Get Frequencies
  • Scan the file to be compressed and count the
    occurrence of each character
  • Sort the characters based on their frequency
  • Step 2 Build Tree Assign Codes
  • Build a Huffman-code tree (binary tree)
  • Traverse the tree to assign codes
  • Step 3 Encode (Compress)
  • Scan the file again and replace each character by
    its code
  • Step 4 Decode (Decompress)
  • Huffman tree is the key to decompess the file
Write a Comment
User Comments (0)
About PowerShow.com