DATA COMPRESSION - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

DATA COMPRESSION

Description:

lossless compression for legal and medical documents, computer programs ... Dictionary Based Techniques ... Dictionary is maintained to catalog pieces of data ' ... – PowerPoint PPT presentation

Number of Views:124

Avg rating:3.0/5.0

Slides: 28

Provided by: csU62

Category:

more less

Transcript and Presenter's Notes

Title: DATA COMPRESSION

1
DATA COMPRESSION

By ARCHANA MEKA

2
Outline

Introduction
Lossless Compression techniques
Lossy Image coding

3
Outline

Introduction
Lossless Compression techniques
Lossy Image coding

4
Introduction

Why Compression ???
To reduce redundancy in stored or communicated
data
Applications in file storage and Distributed
Systems
Reduces storage and transmission costs

5
Introduction

Prime Target
Redundant and Irrelevant Information
Compression techniques exploit inherent
redundancy and irrelevancy by transforming a data
file into a smaller file

6
Introduction

Performance Efficiency
Compression efficiency
Complexity
Distortion Measurement ( lossy algorithm)

7
Introduction

Classification
Lossless compression
lossless compression for legal and medical
documents, computer programs
exploit only data redundancy
Lossy compression
digital audio, image, video where some errors or
loss can be tolerated
exploit both data redundancy and irrelevant data

8
Outline

Introduction
Lossless Compression techniques
Lossy Image coding

9
Lossless Compression Techniques

Image can be recovered exactly upon decompression
These algorithms fall into two broad categories
Dictionary-based techniques
Statistical methods

10
Lossless Compression Techniques

DictionaryBased Techniques
Generates a compressed file containing
fixed-length codes, representing a particular
sequence of bytes in the original file
Run-Length Encoding
Simplest consecutive elements, or runs, are
replaced by just one element showing how many are
in the run
"aaaabbaaa" -gt "4a2b3a"
9 characters have become 6

11
Lossless Compression Techniques

Run length encoding cont
Code can be stored specifying the value followed
by length of the run, rather than simply storing
the value many times over
WWWWWWWWWWWWBWWWWWWWWWWWWBBB
12WB12W3B

12
Lossless Compression Techniques

LZW (Lemple Ziv Welch) Encoding
Like RLE, it effects compression by encoding
strings of characters
Unlike RLE, it builds up a table of strings and
their corresponding codes as it encodes the file

13
Lossless Compression Techniques

LZW Encoding cont..
Dictionary is maintained to catalog pieces of
data
Ask not what your country can do for you - ask
what you can do for your country
1 not 2 3 4 5 6 7 8 - 1 2 8 5 6 7 3 4

14
Lossless Compression Techniques

LZW Encoding cont..
Ask not what your country can do for you --
ask what you can do for your country
Quotation has 17 words- 61 letters, 16 spaces,
1 dash, 1 period. If each letter, space or
punctuation mark takes up one unit of memory,
total file size -79 units
The redundancies that can be noticed are
"ask" appears two times ,"what" appears two times
"your" appears two times, "country" appears two
times
"can" appears two times, "do" appears two times
"for" appears two times , "you" appears two times

15
Lossless Compression Techniques

LZW Encoding cont..
Compressing program doesnt have concept of
separate wordslooks for patterns
Ask not what your country can do for you -- ask
what you can do for your country

16
Lossless Compression Techniques

LZW adaptive based Encoding eg
Here Underscore represents space

"1not__2345__-__12354"
17
Lossless Compression Techniques

Statistical Encoding Methods
They implement data compression by representing
frequently occurring characters in the file with
fewer bits than they do less commonly occurring
ones
Huffman Coding
It uses Binary encoding tree representing
commonly occurring values in few bits and less
commonly occurring values in more bits

18
Lossless Compression Techniques

Algorithm to generate Huffman codes
Find the probability of each data and sort them
Generate new node by combination the two smallest
probability together then sort the probability of
new node with the remainder probabilities
Define 1 for a brach of new node and 0 for
another
Repeat (2) and (3) until the final probability is
1.0

19
Lossless Compression Techniques

Huffman encoding eg

20
Lossless Compression Technique

Huffman encoding eg

Courtesy www.cs.utexas.edu/scottm/cs307/handouts
/Slides
21
Outline

Introduction
Lossless Compression techniques
Lossy Image coding

22
Lossy Compression algorithm

Eliminates redundant as well as irrelevant
information
only an approximate reconstruction of the
original image is possible
Achieves higher compression ratios

23
Lossy Compression Techniques

Vector Quantization
Composed of two operations
Encoder
takes an input vector and outputs the index of
the codeword that offers the lowest distortion
Once the closest codeword is found, the index of
that codeword is sent
Decoder
The decoder receives the index of the codeword,
and outputs the codeword

24
Lossy Compression Technique Vector Quantization
cont..
Courtesy http//www.geocities.com/mohamedqasem/ve
ctorquantization/vq.html
25
Lossy Compression Techniques

The algorithm
Determine the number of code words, N, or the
size of the codebook
Select N code words at random, and let that be
the initial codebook (randomly chosen from the
set of input vectors)
Using the Euclidean distance measure, cluster the
vectors around each codeword ( by finding the
Euclidean distance between it and each
codeword, The input vector belongs to the cluster
of the codeword that yields the minimum distance

26
Lossy Compression Technique

Algorithm cont..
Compute the new set of code words. This is done
by obtaining the average of each cluster. Add
the component of each vector and divide by the
number of vectors in the cluster
Repeat steps 3 and 4 until either the code words
don't change or the change in the code words is
small

27
Questions?

Write a Comment

User Comments (0)