Title: Compression: objectives, methods
1Compression objectives, methods
- Objective is throughput maximization
- Compression ratio, information transfer rate
measure efficiency - CR Length of original string length of
compressed string - Two primary approaches
- Redundancy reduction (lossless techniques)
- Compression codes based on fixed bit size
- Compression codes based on probability of
occurrence - Entropy reduction (potentially lossy techniques)
2Fixed bit size coding (1)
- Null suppression
- If gt 3 blanks, sends special character and blank
count - Run length encoding
- If gt 4 of any character, sends special
character, character, and character count - Diatomic encoding
- Most commonly encountered pairs encoded as single
characters - Pattern substitution
- Character patterns of any length are substituted
by codes
3Fixed bit size coding (2)
- Bit mapping
- Special bitmap character shows position of
repeating characters sends other characters
after bitmap - Half-byte packing
- Common strings in bit structure eliminated
(packed decimal) - Relative encoding
- Sends only changed part of data string, as in
telemetry data - Forms-mode operation
- Only variable portions on screen sent
4Fixed bit size coding LZW
- Based on work of J. Ziv and A. Lempel
- Most widely used fixed bit size compression
algorithm - Used in standards such as V.42bis
- Most major compression algorithms use some
variation of Lempel-Ziv approach - Dynamic pattern substitution algorithm
- Objective is to substitute variable length
patterns with fixed-length codes - Uses a dictionary of substitutions that is
continuously updated
5How LZW works definitions
- Terminology
- Input
- Character an encoded character (e.g., ASCII
character) - String/pattern a series of characters
- Charstream stream of characters to be compressed
- Output
- Codeword bit stream representing a string of
characters - Dictionary/string table stores strings,
corresponding codewords - Codestream stream of codewords produced after
compression - Values stored by LZ program
- Sizes character size codeword size max
dictionary size - Positions next available unused codeword first
codeword used to represent a string of gt1
character
6How LZW works dictionary
- If codeword size 10 bits, then 210 1,024
codewords can be stored in the dictionary/string
table - If character size 8 bits, then 28 256
character-based codewords are initial dictionary
entries - As slots 0-255 are occupied by single characters,
next available codeword not yet used is 256 - This is the same as the first codeword used to
represent a string of more than one character - Each character-based codeword (i.e.,
one-character string) is the root of a codeword
'tree - Children are strings that begin with the root
character - If root is B (codeword66), child may be BA with
codeword256
7How LZW works encoding
Initial string table 0 A 1 B 2 C 3 D 4 5 6
7 8 . . . 15
- 1Initialize string table
- 2X ? empty
- 3Y ? next character in charstream
- 4Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
8How LZW works encoding
String table 0 A 1 B 2 C 3 D 4 5 6 7 8
. . . 15
- Initialize string table
- X ? empty
- Y ? next character in charstream
- Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
Charstream ABACABADABAD X ltemptygt Y A
XY A Is XY in the string table? Yes X
A Codestream
9How LZW works encoding
String table 0 A 1 B 2 C 3 D 4 AB 5 6 7
8 . . . 15
- Initialize string table ?
- X ? empty ?
- Y ? next character in charstream
- Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
Charstream ABACABADABAD X A Y B XY
AB Is XY in the string table? No Add XY to
the string table (4 AB) Output X to
codestream (A 0 in string table output 0) X
B Codestream 0
10How LZW works encoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6
7 8 . . . 15
- Initialize string table ?
- X ? empty ?
- Y ? next character in charstream
- Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
Charstream ABACABADABAD X B Y A XY
BA Is XY in the string table? No Add XY to
the string table (5 BA) Output X to
codestream (B 1 in string table output 1) X
A Codestream 01
11How LZW works encoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 8 . . . 15
- Initialize string table ?
- X ? empty ?
- Y ? next character in charstream
- Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
Charstream ABACABADABAD X A Y C XY
AC Is XY in the string table? No Add XY to
the string table (6 AC) Output X to
codestream (A 0 in string table output 0) X
C Codestream 010
12How LZW works encoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 . . . 15
- Initialize string table ?
- X ? empty ?
- Y ? next character in charstream
- Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
Charstream ABACABADABAD X C Y A XY
CA Is XY in the string table? No Add XY to
the string table (7 CA) Output X to
codestream (C 2 in string table output 2) X
A Codestream 0102
13How LZW works encoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 . . . 15
- Initialize string table ?
- X ? empty ?
- Y ? next character in charstream
- Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
Charstream ABACABADABAD X A Y B XY
AB Is XY in the string table? Yes X
AB Codestream 0102
14How LZW works encoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 ABA 9 10 11 12
- Initialize string table ?
- X ? empty ?
- Y ? next character in charstream
- Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
Charstream ABACABADABAD X AB Y A XY
ABA Is XY in the string table? No Add XY to
the string table (8 ABA) Output X to
codestream (AB 4 in string table output
4) X A Codestream 01024
15How LZW works encoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 ABA 9 AD 10 11 12
- Initialize string table ?
- X ? empty ?
- Y ? next character in charstream
- Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
Charstream ABACABADABAD X A Y D XY
AD Is XY in the string table? No Add XY to
the string table (9 AD) Output X to
codestream (A 0 in string table output 0) X
D Codestream 010240
16How LZW works encoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 ABA 9 AD 10 DA 11 12
- Initialize string table ?
- X ? empty ?
- Y ? next character in charstream
- Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
Charstream ABACABADABAD X D Y A XY
DA Is XY in the string table? No Add XY to
the string table (10 DA) Output X to
codestream (D 3 in string table output 3) X
A Codestream 0102403
17How LZW works encoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 ABA 9 AD 10 DA 11 12
- Initialize string table ?
- X ? empty ?
- Y ? next character in charstream
- Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
Charstream ABACABADABAD X A Y B XY
AB Is XY in the string table? Yes X
AB Codestream 0102403
18How LZW works encoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 ABA 9 AD 10 DA 11 12
- Initialize string table ?
- X ? empty ?
- Y ? next character in charstream
- Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
Charstream ABACABADABAD X AB Y A XY
ABA Is XY in the string table? Yes X
ABA Codestream 0102403
19How LZW works encoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 ABA 9 AD 10 DA 11 ABAD 12
- Initialize string table ?
- X ? empty ?
- Y ? next character in charstream
- Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
Charstream ABACABADABAD X ABA Y D XY
ABAD Is XY in the string table? No Add XY to
the string table (11 ABAD) Output X to
codestream (ABA 8 in string table output
8) X D Codestream 01024038
20How LZW works encoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 ABA 9 AD 10 DA 11 ABAD 12
- Initialize string table ?
- X ? empty ?
- Y ? next character in charstream
- Is XY in string table?
- if yes X ? XY
- if no add XY to string table
- output code for X to codestream
- X ? Y
- Go to ?
Charstream ABACABADABAD X D Y ltemptygt
XY D Is XY in the string table? Yes As Y
is empty, output code for XY to codestream
(3) End Codestream 010240383
21How LZW works decoding
Initial string table 0 A 1 B 2 C 3 D 4 5 6
7 8 . . . 15
- Initialize string table
- Get first code P
- Output string for P to charstream
- Q P
- P ? next code in codestream
- Does P exist in string table?
- if yes Output string for P to charstream
- R ? translation for Q
- S ? first character of translation for P
- add RS to string table
- Q ? P
- if no R ? translation for Q
- S ? first character of R
- Output RS to charstream and add to string
table - Q ? P
- Go to ?
22How LZW works decoding
Initial string table 0 A 1 B 2 C 3 D 4 5 6
7 8 . . . 15
Codetream 010240383 Initial
steps P0 output A (string for 0) to
charstream Q0 P1 Charstream A
23How LZW works decoding
String table 0 A 1 B 2 C 3 D 4 AB 5 6 7
8 . . . 15
Codetream 010240383 P1 Q0 1
exists in the string table Output B to
charstream RA (translation for Q) SB (first
character of translation for P) Add AB to the
string table Q1 Charstream AB
24How LZW works decoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6
7 8 . . . 15
Codetream 010240383 P0 Q1 0
exists in the string table Output A to
charstream RB (translation for Q) SA (first
character of translation for P) Add BA to the
string table Q0 Charstream ABA
25How LZW works decoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 8 . . . 15
Codetream 010240383 P2 Q0 2
exists in the string table Output C to
charstream RA (translation for Q) SC (first
character of translation for P) Add AC to the
string table Q2 Charstream ABAC
26How LZW works decoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 . . . 15
Codetream 010240383 P4 Q2 4
exists in the string table Output AB to
charstream RC (translation for Q) SA (first
character of translation for P) Add CA to the
string table Q4 Charstream ABACAB
27How LZW works decoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 ABA . . . 15
Codetream 010240383 P0 Q4 0
exists in the string table Output A to
charstream RAB (translation for Q) SA (first
character of translation for P) Add ABA to the
string table Q0 Charstream ABACABA
28How LZW works decoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 ABA 9 AD 10 11 12
Codetream 010240383 P3 Q0 3
exists in the string table Output D to
charstream RA (translation for Q) SD (first
character of translation for P) Add AD to the
string table Q3 Charstream ABACABAD
29How LZW works decoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 ABA 9 AD 10 DA 11 12
Codetream 010240383 P8 Q3 8
exists in the string table Output ABA to
charstream RD (translation for Q) SA (first
character of translation for P) Add DA to the
string table Q8 Charstream ABACABADABA
30How LZW works decoding
String table 0 A 1 B 2 C 3 D 4 AB 5 BA 6 A
C 7 CA 8 ABA 9 AD 10 DA 11 ABAD 12
Codetream 010240383 P3 Q8 3
exists in the string table Output D to
charstream RABA (translation for Q) SD
(first character of translation for P) Add ABAD
to the string table Q3 No more codes available
for P end of procedure. Charstream
ABACABADABAD
31Probability coding Huffman
- Fewer bits assigned to more frequent characters
- Similar method is Shannon-Fano
- No code can prefix another instantly decodable
- Technique
- Identify probabilities of occurrence of all
characters - Arrange characters in descending order of
probability - Draw branch between lowest two probabilities,
write sum - Continue until top (most probable) character is
reached - For each branch, assign 0 to the top, 1 to the
bottom - Follow branches right to left note string of 0s
and 1s created - Resultant strings assigned to characters
32Huffman example (1)
- A p0.34
- B p0.08
- C p0.04
- D p0.29
- E p0.11
- F p0.03
- G p0.05
- H p0.06
- A p0.34
- D p0.29
- E p0.11
- B p0.08
- H p0.06
- G p0.05
- C p0.04
- F p0.03
33Huffman example (2)
0.34
A
00
0
0.63
0.29
0
D
01
1
0.11
E
100
0
1.00
0.08
B
110
0.22
0
0
0.06
H
1010
0.37
0
0.11
1
1
0.05
G
1011
0.15
1
1
0.04
C
String DADABEADGAAF No compression 31236
bits Huffman compression 30 bits
1110
0
0.07
0.03
1
F
1111
1
34Modified Huffman as fax standard
- Used in ITU-T Group 3 standard (T.4)
- Most widely used fax standard
- Fax composed of runs of black and white pixels
- e.g., 59 white followed by 67 black followed by 3
white, etc. - Number of black and white run combinations is
large - Uncompressed approach would be to send, for
example, a 0 for every white and a 1 for every
black - The T.4 standard
- 1728 pixels specified per line
- Provision exists for longer lines
- Pre-calculated probabilities of run length
occurrence
35Modified Huffman technique
- For each line, start with white (could be W0)
- If string run length lt 63, look up terminating
run length (n), send corresponding terminating
codeword - If string run length gt 63
- Look up highest makeup run length (m) that is
less than or equal to string run length, send
makeup codeword for m - Calculate n(string run length-m), send
terminating codeword for n - End each line with EOL string (000000000001)
- Also precedes first data line on a document
- Six EOLs indicate end of document
36Modified Huffman table