Title: DATA COMPRESSION
1DATA COMPRESSION
2Outline
- Introduction
- Lossless Compression techniques
- Lossy Image coding
3Outline
- Introduction
- Lossless Compression techniques
- Lossy Image coding
4Introduction
- Why Compression ???
- To reduce redundancy in stored or communicated
data - Applications in file storage and Distributed
Systems - Reduces storage and transmission costs
5Introduction
- Prime Target
- Redundant and Irrelevant Information
- Compression techniques exploit inherent
redundancy and irrelevancy by transforming a data
file into a smaller file
6Introduction
- Performance Efficiency
- Compression efficiency
- Complexity
- Distortion Measurement ( lossy algorithm)
7Introduction
- Classification
- Lossless compression
- lossless compression for legal and medical
documents, computer programs - exploit only data redundancy
- Lossy compression
- digital audio, image, video where some errors or
loss can be tolerated - exploit both data redundancy and irrelevant data
8Outline
- Introduction
- Lossless Compression techniques
- Lossy Image coding
9Lossless Compression Techniques
- Image can be recovered exactly upon decompression
- These algorithms fall into two broad categories
- Dictionary-based techniques
- Statistical methods
10Lossless Compression Techniques
- DictionaryBased Techniques
- Generates a compressed file containing
fixed-length codes, representing a particular
sequence of bytes in the original file - Run-Length Encoding
- Simplest consecutive elements, or runs, are
replaced by just one element showing how many are
in the run - "aaaabbaaa" -gt "4a2b3a"
- 9 characters have become 6
11Lossless Compression Techniques
- Run length encoding cont
- Code can be stored specifying the value followed
by length of the run, rather than simply storing
the value many times over - WWWWWWWWWWWWBWWWWWWWWWWWWBBB
- 12WB12W3B
12Lossless Compression Techniques
- LZW (Lemple Ziv Welch) Encoding
- Like RLE, it effects compression by encoding
strings of characters - Unlike RLE, it builds up a table of strings and
their corresponding codes as it encodes the file
13Lossless Compression Techniques
- LZW Encoding cont..
- Dictionary is maintained to catalog pieces of
data - Ask not what your country can do for you - ask
what you can do for your country - 1 not 2 3 4 5 6 7 8 - 1 2 8 5 6 7 3 4
14Lossless Compression Techniques
- LZW Encoding cont..
- Ask not what your country can do for you --
ask what you can do for your country - Quotation has 17 words- 61 letters, 16 spaces,
1 dash, 1 period. If each letter, space or
punctuation mark takes up one unit of memory,
total file size -79 units - The redundancies that can be noticed are
- "ask" appears two times ,"what" appears two times
- "your" appears two times, "country" appears two
times - "can" appears two times, "do" appears two times
- "for" appears two times , "you" appears two times
15Lossless Compression Techniques
- LZW Encoding cont..
- Compressing program doesnt have concept of
separate wordslooks for patterns - Ask not what your country can do for you -- ask
what you can do for your country
16Lossless Compression Techniques
- LZW adaptive based Encoding eg
- Here Underscore represents space
"1not__2345__-__12354"
17Lossless Compression Techniques
- Statistical Encoding Methods
- They implement data compression by representing
frequently occurring characters in the file with
fewer bits than they do less commonly occurring
ones - Huffman Coding
- It uses Binary encoding tree representing
commonly occurring values in few bits and less
commonly occurring values in more bits
18Lossless Compression Techniques
- Algorithm to generate Huffman codes
- Find the probability of each data and sort them
- Generate new node by combination the two smallest
probability together then sort the probability of
new node with the remainder probabilities - Define 1 for a brach of new node and 0 for
another - Repeat (2) and (3) until the final probability is
1.0
19Lossless Compression Techniques
20Lossless Compression Technique
Courtesy www.cs.utexas.edu/scottm/cs307/handouts
/Slides
21Outline
- Introduction
- Lossless Compression techniques
- Lossy Image coding
22Lossy Compression algorithm
- Eliminates redundant as well as irrelevant
information - only an approximate reconstruction of the
original image is possible - Achieves higher compression ratios
23Lossy Compression Techniques
- Vector Quantization
- Composed of two operations
- Encoder
- takes an input vector and outputs the index of
the codeword that offers the lowest distortion - Once the closest codeword is found, the index of
that codeword is sent - Decoder
- The decoder receives the index of the codeword,
and outputs the codeword
24Lossy Compression Technique Vector Quantization
cont..
Courtesy http//www.geocities.com/mohamedqasem/ve
ctorquantization/vq.html
25Lossy Compression Techniques
- The algorithm
- Determine the number of code words, N, or the
size of the codebook - Select N code words at random, and let that be
the initial codebook (randomly chosen from the
set of input vectors) - Using the Euclidean distance measure, cluster the
vectors around each codeword ( by finding the
Euclidean distance between it and each
codeword, The input vector belongs to the cluster
of the codeword that yields the minimum distance
26Lossy Compression Technique
- Algorithm cont..
- Compute the new set of code words. This is done
by obtaining the average of each cluster. Add
the component of each vector and divide by the
number of vectors in the cluster - Repeat steps 3 and 4 until either the code words
don't change or the change in the code words is
small
27Questions?