Representation of Strings - PowerPoint PPT Presentation

About This Presentation
Title:

Representation of Strings

Description:

If only a subset S of S is actually used in w, we could represent the strings in ... Eventually only two nodes remain, the parent node is created and the loop ends ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 11
Provided by: mun113
Category:

less

Transcript and Presenter's Notes

Title: Representation of Strings


1
Huffman Encodings Section 9.4
2
Data Compression Array Representation
  • S denotes an alphabet used for all strings
  • Each element in S is called a character
  • Typical representation contiguous memory
  • The bit sequence representing characters is
    called the encoding
  • number of bit sequences of length n?

2n
  • Number of bits to represent S?

?log2 S)?
Data Compression problem Given an string w over
S, store it using as few bits as possible in such
a way that it can be recovered at will
3
Motivation for the Solution
For representing strings we want to take
advantage of the fact that not all characters
occurs with the same frequency
Example FitalyStamp Do you have what it takes
to type 50 words per minute in your palm
organizer?
If only a subset S of S is actually used in w, we
could represent the strings in log2(S)
  • Problems
  • We need to know S in advance
  • It doesnt account for ranking of occurrences
  • Improvement only if ?log2 S)? lt ?log2 S)?

4
Encoding Trees
Idea use different lengths to encode members of S
Potential problem E 101 T
110 Q 101110
Solution No encoding of a character can be the
prefix to the encoding of other character
Suppose that I 0000, V 0001, M 0010, U 0011,
D 010, H0110, N 0111, A 10, ? 110, F
111 Question how do we represent these codes in
a binary tree?
5
Encoding Trees
A
D
F
H
N
V
M
U
I
Encoding trees can always be assumed to be full!
6
Decoding with Encoding Trees
AIDA FAN 10000001101011011111001111
Procedure TreeDecode(pointer T, bitstream b)
P ? T while not Empty(b) do
if NextBit(b) 0 then
P ? LC(P) else
P ? RC(P)
if isLeaf(P) then
print(value(P)) P ? T
How to generate encoding trees?
7
Constructing Encoding Trees
Example f(A) 0.35, B 0.1, C 0.2, D 0.2,
E 0.15
Many possible trees (combinatorial number). We
like the one that has minimum cost
Notation L(T) is the set of all leaves in T
c(n) is the cost or weight of node n
Idea 0 use exhaustive search to find the tree
with minimum cost
8
Idea 1 Huffman Encoding Tree
For each character c we now the frequency fc with
which c occurs in w
  • Construction method
  • Create one node for each character c in S with
    weight fc (each of these nodes will be a leaf in
    the tree)
  • Repeat the following steps
  • Pick two nodes n1 and n2 with smallest weight and
    without parent
  • Create a new parent node for n1 and n2 with
    weight weight(n1) weight(n2)

Eventually only two nodes remain, the parent node
is created and the loop ends
9
Properties of Huffman Encoding Trees
Characters with higher frequency are placed
nearer the root, thus
They have shorter encoding!
Theorem. Let N be a set of nodes and C(n) the
weight of each node n in N. Let T be a Huffman
tree encoding for N. If X is any other tree
encoding for N, then WPL(T) WPL(X)
Is the Huffman method for generating the encoding
trees greedy?
Yes!
10
Compression Ratio
Compression ratio (CR) ?log2 S)? is to 100
as (?log2 S)? - WPL(T) ) is to the CR
Huffman compression ratio falls between 20 and
80
Write a Comment
User Comments (0)
About PowerShow.com