Compression - PowerPoint PPT Presentation

About This Presentation

Title:

Compression

Description:

Insert the next least frequently used symbol (G, with weight=3) Choose the place to insert the new symbol to minimize the total weights produced. ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 25

Provided by: lilliann

Learn more at: http://what.csc.villanova.edu

Category:

more less

Transcript and Presenter's Notes

Title: Compression

1
Compression
2
Compression

Compression ratio how much is the size reduced?
Symmetric/asymmetric time difference to
compress, decompress?
Lossless lossy any information lost in the
process to compress and decompress?
Adaptive/static does the compression dictionary
change during processing?

3
Run-length encoding

Noticing long runs of repeated data
Lossless, completely reversible
represent the run with a count and a value
Example
SSSSSVVVVVVVVVVTTTTTTTTTURRRRR
4S9V8T0U4R
Use count - 1 to maximize use of the range of
values available.
Alternative
Only encode where there is repetition. Use a
non-occurring (escape) character to indicate
compression begins.

4
Example for Run Length Encode
Assume the background color is represented by
00000000, face color by 00000001, eye color by
00000010, smile by 00000011 and there are 50
pixels across each row. What would be the
encoding with/without run length coding?
5
Huffman encoding

Statistical encoding
Requires knowledge of relative frequency of
elements in the string
Sender and receiver must both know encoding
chosen
Create a tree structure that assigns longest
representations to most rarely used symbols

6
Huffman example

First, start with statistics about occurrence of
symbols in the text to be compressed.
That assumption might not be right for every
message.
Sometimes expressed as percentage, sometimes as
relative frequencies
A(5) E(7) S(5) I(4) D(4) G(3) N(2) P(2) R(1) W(1)
We want shorter codes for A, E, longer codes for
R, W to minimize the overall message lengths
We are saying that in analysis of a large body of
typical text, we find that the occurrence of E is
7 times more common than the occurrence of W, for
example

7
Constructing the Code
First, combine the least frequently used symbols
The weight (frequency) of the pair (R,W) is 2, of
the pair (N,P) is 4
2
4
R(1)
W(1)
N(2)
P(2)
Insert the next least frequently used symbol (G,
with weight3)
Choose the place to insert the new symbol to
minimize the total weights produced.
Choices add 3 to 2 5, add 3 to 4 7, or add
existing 2 to 4 6 and make a new subtree for
the 3.

A(5) E(7) S(5) I(4) D(4) G(3) N(2) P(2) R(1) W(1)

8
34
19
15
10
5
A(5)
E(7)
8
9
2
D(4)
I(4)
G(3)
4
S(5)
R(1)
W(1)
N(2)
P(2)
A(5) E(7) S(5) I(4) D(4) G(3) N(2) P(2) R(1) W(1)
9
34
0
1
19
0
1
15
10
1
0
0
1
5
A(5)
E(7)
8
9
0
1
1
0
1
0
2
D(4)
I(4)
G(3)
4
1
0
S(5)
0
1
R(1)
W(1)
N(2)
P(2)
A(5) E(7) S(5) I(4) D(4) G(3) N(2) P(2) R(1) W(1)
10
Completed code

E 11
A 001
S 011
I 101
D 100
G 0000
N 0100
P 0101
R 00010
W 00011

Average code length
A has weight 5 and length 3, etc.
72 53 53 43 43 34 24 24 15
15
106/34 3.117

A(5) E(7) S(5) I(4) D(4) G(3) N(2) P(2) R(1) W(1)
11
In class exercise

Working in pairs, encode a message of at least 15
letters using the code we just generated.
Do not leave any spaces between the letters in
your message.
Pass the message to some other team.
Make sure you give and get a message.
Decode the message you received.

12
Entropy per symbol

Entropy, E, is information content
Entropy is inversely proportional to the
probability of occurrence
E -?pi log2 pi
i1,n
where n is the number of symbols and pi is the
probability of occurrence of the ith symbol
This is the lower bound on weighted compression
-- the goal to shoot for.
How well did we do in our code?

3.098 to our 3.117
13
Properties of the Huffman code

Variable length code
Prefix property
Average bits per symbol (entropy)
Huffman codes approach the theoretical limit for
amount of information per symbol
Static coding. Code must be known by sender and
receiver and used consistently

14
Dynamic Huffman Code

Build the code as the message is transmitted.
The code will be the best for this particular
message.
Sender and receiver use the same rules for
building the code.

15
Constructing the tree

Sender and receiver begin with an initial tree
consisting of a root node and a left child with a
null character and weight 0
First character is sent uncompressed and is added
to the tree as the right branch from the root.
The new node is labeled with the character, its
weight is 1 and the tree branch is labeled 1
also.
A list shows the tree entries in order

16
Example

banana

r
Initial tree
(0)
r
Weight (1) number of times that character has
occurred so far
Transmit b
(0)
b(1)
(0) b(1)
List version of the tree
17
A new character seen

Whenever a new character appears in the message,
it is sent as follows
send the path to the empty node
send the uncompressed representation of the new
character.
Place the new character into the tree and update
the list representation.

r
(0) a(1) 1 b(1)
Null node moves down to make room for the new
node as its sibling
List is formed by reading the tree left to right,
bottom level to top level
1
b(1)
(0)
a (1)
ba
18
Another character
r
2
b(1)

a (1)
1
(0) n(1) 1 a(1) 2 b(1)
(0)
n(1)
List entries are not in non decreasing
order. Adjust the list and show the corresponding
tree.
r
b(1)
2
(0) n(1) 1 a(1) b(1) 2
a (1)
1
(0)
n(1)
(Note all left branches are coded as 1, all right
branches as 0)
ban
19
Our first repeated character
r

b(1)
3
(0) n(1) 1 a(2) b(1) 3
a (2)
1
(0)
n(1)
Again there is a problem. The numbers in the
list do not obey the requirement of non
decreasing order
r
a(2)
2
Adjust the list and make the tree match
(0) n(1) 1 b(1) a(2) 2
b(1)
1
Note that the 3 changed to a 2 as a result of
the tree restructuring.
(0)
n(1)
bana
20
Another repeat
Code sent for this n will be 101 corresponding to
the original position of n. Then the
restructuring will be done.
r

a(2)
3
b(1)
2
(0) n(2) 2 b(1) a(2) 3
(0)
n(2)
Another misfit.
r
b and n must trade places
a(2)
3
n(2)
1
(0) b(1) 1 n(2) a(2) 3
(0)
b(1)
banan
21
One more letter

This a is encoded as 0. No restructuring of
the tree is needed.
r
a(3)
3
n(2)
1
(0) b(1) 1 n(2) a(3) 3
(0)
b(1)
banana
22
In class exercise

Create the dynamic Huffman code for the message
Tennessee

23
Summary

Compression seeks to minimize the amount of
transmission by making efficient representations
for the data.
Static compression keeps the same codes and
depends on consistency in the distribution of
characters to code
Dynamic compression adjusts as it works to allow
the most efficient compression for the current
message.

24
Some extra resources

Huffman coding resources
http//www.dogma.net/DataCompression/Huffman.shtml
Final note David Huffman died
October 7, 1999 at age 74

Huffman is probably best known for the
development of the Huffman Coding Procedure, the
result of a term paper he wrote while a graduate
student at the Massachusetts Institute of
Technology (MIT). "Huffman Codes" are used in
nearly every application that involves the
compression and transmission of digital data,
such as fax machines, modems, computer networks,
and high-definition television.
http//www.ucsc.edu/currents/99-00/10-11/huffman.h
tml

Write a Comment

User Comments (0)