DATA COMPRESSION - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

DATA COMPRESSION

Description:

lossless compression for legal and medical documents, computer programs ... Dictionary Based Techniques ... Dictionary is maintained to catalog pieces of data ' ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 28
Provided by: csU62
Category:
Tags: compression | data

less

Transcript and Presenter's Notes

Title: DATA COMPRESSION


1
DATA COMPRESSION
  • By ARCHANA MEKA

2
Outline
  • Introduction
  • Lossless Compression techniques
  • Lossy Image coding

3
Outline
  • Introduction
  • Lossless Compression techniques
  • Lossy Image coding

4
Introduction
  • Why Compression ???
  • To reduce redundancy in stored or communicated
    data
  • Applications in file storage and Distributed
    Systems
  • Reduces storage and transmission costs

5
Introduction
  • Prime Target
  • Redundant and Irrelevant Information
  • Compression techniques exploit inherent
    redundancy and irrelevancy by transforming a data
    file into a smaller file

6
Introduction
  • Performance Efficiency
  • Compression efficiency
  • Complexity
  • Distortion Measurement ( lossy algorithm)

7
Introduction
  • Classification
  • Lossless compression
  • lossless compression for legal and medical
    documents, computer programs
  • exploit only data redundancy
  • Lossy compression
  • digital audio, image, video where some errors or
    loss can be tolerated
  • exploit both data redundancy and irrelevant data

8
Outline
  • Introduction
  • Lossless Compression techniques
  • Lossy Image coding

9
Lossless Compression Techniques
  • Image can be recovered exactly upon decompression
  • These algorithms fall into two broad categories
  • Dictionary-based techniques
  • Statistical methods

10
Lossless Compression Techniques
  • DictionaryBased Techniques
  • Generates a compressed file containing
    fixed-length codes, representing a particular
    sequence of bytes in the original file
  • Run-Length Encoding
  • Simplest consecutive elements, or runs, are
    replaced by just one element showing how many are
    in the run
  • "aaaabbaaa" -gt "4a2b3a"
  • 9 characters have become 6

11
Lossless Compression Techniques
  • Run length encoding cont
  • Code can be stored specifying the value followed
    by length of the run, rather than simply storing
    the value many times over
  • WWWWWWWWWWWWBWWWWWWWWWWWWBBB
  • 12WB12W3B

12
Lossless Compression Techniques
  • LZW (Lemple Ziv Welch) Encoding
  • Like RLE, it effects compression by encoding
    strings of characters
  • Unlike RLE, it builds up a table of strings and
    their corresponding codes as it encodes the file

13
Lossless Compression Techniques
  • LZW Encoding cont..
  • Dictionary is maintained to catalog pieces of
    data
  • Ask not what your country can do for you - ask
    what you can do for your country
  • 1 not 2 3 4 5 6 7 8 - 1 2 8 5 6 7 3 4

14
Lossless Compression Techniques
  • LZW Encoding cont..
  • Ask not what your country can do for you --
    ask what you can do for your country
  • Quotation has 17 words- 61 letters, 16 spaces,
    1 dash, 1 period. If each letter, space or
    punctuation mark takes up one unit of memory,
    total file size -79 units
  • The redundancies that can be noticed are
  • "ask" appears two times ,"what" appears two times
  • "your" appears two times, "country" appears two
    times
  • "can" appears two times, "do" appears two times
  • "for" appears two times , "you" appears two times

15
Lossless Compression Techniques
  • LZW Encoding cont..
  • Compressing program doesnt have concept of
    separate wordslooks for patterns
  • Ask not what your country can do for you -- ask
    what you can do for your country

16
Lossless Compression Techniques
  • LZW adaptive based Encoding eg
  • Here Underscore represents space

"1not__2345__-__12354"
17
Lossless Compression Techniques
  • Statistical Encoding Methods
  • They implement data compression by representing
    frequently occurring characters in the file with
    fewer bits than they do less commonly occurring
    ones
  • Huffman Coding
  • It uses Binary encoding tree representing
    commonly occurring values in few bits and less
    commonly occurring values in more bits

18
Lossless Compression Techniques
  • Algorithm to generate Huffman codes
  • Find the probability of each data and sort them
  • Generate new node by combination the two smallest
    probability together then sort the probability of
    new node with the remainder probabilities
  • Define 1 for a brach of new node and 0 for
    another
  • Repeat (2) and (3) until the final probability is
    1.0

19
Lossless Compression Techniques
  • Huffman encoding eg

20
Lossless Compression Technique
  • Huffman encoding eg

Courtesy www.cs.utexas.edu/scottm/cs307/handouts
/Slides
21
Outline
  • Introduction
  • Lossless Compression techniques
  • Lossy Image coding

22
Lossy Compression algorithm
  • Eliminates redundant as well as irrelevant
    information
  • only an approximate reconstruction of the
    original image is possible
  • Achieves higher compression ratios

23
Lossy Compression Techniques
  • Vector Quantization
  • Composed of two operations
  • Encoder
  • takes an input vector and outputs the index of
    the codeword that offers the lowest distortion
  • Once the closest codeword is found, the index of
    that codeword is sent
  • Decoder
  • The decoder receives the index of the codeword,
    and outputs the codeword

24
Lossy Compression Technique Vector Quantization
cont..
Courtesy http//www.geocities.com/mohamedqasem/ve
ctorquantization/vq.html
25
Lossy Compression Techniques
  • The algorithm
  • Determine the number of code words, N,  or the
    size of the codebook
  • Select N code words at random, and let that be
    the initial codebook (randomly chosen from the
    set of input vectors)
  • Using the Euclidean distance measure, cluster the
    vectors around each codeword ( by finding the
    Euclidean distance between it and each
    codeword, The input vector belongs to the cluster
    of the codeword that yields the minimum distance

26
Lossy Compression Technique
  • Algorithm cont..
  • Compute the new set of code words.  This is done
    by obtaining the average of each cluster.  Add
    the component of each vector and divide by the
    number of vectors in the cluster
  • Repeat steps 3 and 4 until either the code words
    don't change or the change in the code words is
    small

27
Questions?
Write a Comment
User Comments (0)
About PowerShow.com