Title: Lossless Compression
1Chapter 20
2Introduction
- Redundant elements exist in virtually all forms
of data - Compression can be used to reduce communications
capacity requirements - Certain types of data requires that no
information is lost due to compression - Several widely used lossless compression
techniques are critical to modern network
performance - Run-length encoding
- Arithmetic coding
- Ziv-Lempel (LZ) dictionary
3Coding Techniques
- Run-length
- Modified Huffman - MH
- Modified Relative Element Address Designate
(READ) MR - Modified Modified READ MMR
- Arithmetic Pure and Interval
- String Matching LZ77, LZ78, LZW
4Run Length Encoding
Compression Format
Examples
5Run-length Encoding Efficiency
From Table 20.1
6Run Length Coding Image Compression Example
0000000000 0000000000 0001111000 0001001000 000111
1000 0000001000 0000001000 0001111000 0000000000 0
000000000
Image 100 pixels
Binary Code 100 bits
23W 4B 6W 1B 2W 1B 6W 4B 9W 1B 9W 1B 6W 4B
23W or, simply 23 4 6 1 2 1 4 9 1 9 1 6 4 23
Simple Run-Length Coding 15 characters 120 bits
7Facsimile Compression
- Characterized by black and white points (pels) on
a page - Group 3
- 200 ppi (H) x 100 or 200 ppi (V)
- modem via analog phone line
- Group 4
- 200 400 ppi (H) x 200 400 ppi (V)
- digital networks up to 64Kbps
- ITU-T define two lossless compression standards
- MH/MR for Group 3
- MMR for Group 4
8MH Modified Huffman Code
- Due to the inherent characteristics of text
documents, variable coding can be used
effectively - Huffman encoding applied one line at a time
- count white and black space, e.g. w7, b5, w3,
b9, - run length N 64m n
- m 0, 1, 2, , 27 n 0, 1, 2, , 63
- represent each run of black or white pels as a
multiple of 64 plus a remainder and assign a
Huffman code - terminating codes (n) used for N lt 64
- make-up codes (m) needed for N gt 64
- See Table 20.2
Efficiency can be improved using MR.
9MR Technique Changing Picture Elements
- Basis
- 75 of all transitions can be defined that is /-
at most 1 pel from the line above it. - changing elements (i.e. w to b, or b to w) can
be identified based on what happened in the line
before. - Encoding, then, is based on vertical as well as
horizontal relationships between pels
10MR Code Table
- Notes
- MR code is more error-sensitive that MH
- ITU-T recommends using MH for every Kth scanning
line K 2 for 3.85 lines/mm and K 4 for 7.7
lines/mm
11Facsimile Compression Techniques
1
1 Joint Bi-Level Image Experts Group Coding.
Based on Arithmetic Coding Technique
12Huffman Coding Revisited
- Huffman achieves maximum efficiency when all
probabilities involved are negative powers of 2 - Example
13Arithmetic Coding Techniques
- Designed to provide efficient compression by
approximating probabilities as negative powers of
2 - Used in JPEG and MPEG standards for lossless
encoding - Basic Method (in brief)
- arrange outcomes on the half-open unit interval
0,1) - approximate lower bound of outcome probabilities
as a negative power of 2 - encode symbol string using pure arithmetic or
interval arithmetic algorithm
14Arithmetic Coding - Unit Interval Arrangement
Three Symbol (A,B,C), Independent Sequence (from
page 557)
15Arithmetic Coding Probability Intervals - Example
Three Symbol (A,B,C) Sequence
Drawbacks?
01
16Pure Arithmetic Coding Technique - Example
- Algorithm
- Begin with the half-open interval 0,1)
- Subdivide current interval into sub-intervals,
one for each symbol. - Select sub-interval corresponding to the symbol
that actually occurs, and make it the new current
interval - Repeat steps 1 and 2 until the entire message is
processed - 3. Output enough bits to distinguish the final
interval from two adjacent intervals - 4. Output a special end-of-message symbol.
Drawbacks?
17Interval Arithmetic Coding Technique - Example
18String-Matching Algorithms
- Algorithms stem for works by Ziv and Lempel in
1977 1978, and improvements by Welch in 1978 - LZ77 version used in PKZIP, gzip, zipit, etc.
- LZ78 adds improvements based on tree-structured
dictionary - LZW adds performance enhancements
- LZ78/LZW used in V.42bis, GIF and Unix compress
- Algorithms all use pattern matching to identify
repeated symbol sequences - Basic Method (in brief)
- scan input symbols
- create codes for them on the fly
- make dictionary entries to record the symbol-code
pairs
19LZ77 Scheme - Example
- Store symbols in fixed-size sliding history
buffer - New input kept in fixed-size look-ahead buffer
- Attempt to match two or more characters from the
beginning of the look-ahead buffer with
characters in the sliding history buffer - No match output first look-ahead symbol as 9-bit
character and shift it into window - Match continue to scan for longest match, then
output triplet (indicator, pointer, length) and
shift sliding window
20LZ77 Scheme
Drawbacks?
0b27d5d
21LZ78/LZW Example
22LZW Dictionary - Example
23Tree-Based LZ Dictionary
Root Nodes