An introduction to Data Compression - PowerPoint PPT Presentation

About This Presentation

An introduction to Data Compression


Title: Slide 1 Author - Last modified by - Created Date: 5/5/2005 1:53:41 PM Document presentation format: On-screen Show Company - Other titles – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 31
Provided by: 6383156


Transcript and Presenter's Notes

Title: An introduction to Data Compression

An introduction toData Compression
General informations
  • Requirements
  • some programming skills (not so much...)
  • knowledge of data structures
  • ... some work!
  • Office hours ...
  • ... please write me an email monfardini_at_dii.unis

What is compression?
  • Intuitively compression is a method to press
    something into a smaller space.
  • In our domains a better definition is to make
    information shorter

Some basic questions
  • What is information?
  • How can we measure the amount of information?
  • Why compression is useful?
  • How do we compress?
  • How much we can compress?

What is information? - I
  • Commonly the term information refers to the
    knowledge of some fact, circumstance or thought.
  • For example we can think about reading a
    newspaper, news are the information.
  • syntax
  • letters, punctuation marks, white spaces, grammar
    rules ...
  • semantics
  • meaning of the words and of the sentences

What is information? - II
  • In our domain, information is merely the syntax,
    i.e. we are interested in the symbols of the
    alphabet used to express the information.
  • In order to give a mathematical definition of
    information we need some principle of Information

The fundamental concept
  • A key concept in Information Theory is that the
    information is conveyed by randomness
  • Which information give us a biased coin, which
    outcome is always head?
  • What about another biased coin, which outcome is
    head with 90 probability?
  • We need a way to measure quantitatively the
    amount of information in some mathematical sense

The Uncertainty - I
  • Suppose we have a discrete random variable and
    is a particular outcome with probability
  • uncertainty
  • The units are given by the base of the logarithms
  • base 2 ? bits
  • base 10 ? nats

The Uncertanty - II
  • Suppose the random variable output
  • ? each outcome has 1 bit of information
  • ? 0 gives no information at all, while if the
    outcome is 1 the information is

The Entropy
  • More useful is the entropy of a random variable
    with values in a space
  • The entropy is a measure of the average
    uncertanty of the random variable

The entropy - examples
  • Consider again a r.v. with only two possible
    outcomes, 0 and 1
  • In this case

Compression and loss
  • lossless
  • decompressed message (file) is an exact copy of
    the original. Useful for text compression
  • lossy
  • some information is lost in the decompressed
    message (file). Useful for image and sound
  • lgnore for a while lossy compression

Definitions - I
  • A source code from a r.v. is a mapping from
    to , the set of finite-length string from a
    D-ary alphabet.
  • , codeword for
  • , length of

Definitions - II
  • non-singular code (... trivial ...)
  • every element of is mapped in a different
    string of
  • extension of a code
  • uniquely decodable code
  • its extension is uniquely decodable

Definitions - III
  • prefix (better prefix-free) or istantaneous code
  • no codeword is a prefix of any other codeword
  • the advantage is that decoding has no need to

a 11
b 110
... 11? ...
Code 1 Code 2 Code 3 Code 4
1 01 0 10 0
2 110 010 00 10
3 010 10 11 110
4 110 01 110 111
not singular, but not uniquely decodable
uniquely decodable, but not instantaneous
Kraft Inequality - I
  • Theorem (Kraft Inequality)
  • For any instantaneous code over an alphabet of
    size D, the codeword lengths must
  • Conversely, given a set of codeword lengths that
    satisfy this inequality there exists an
    istantaneous code with these word lengths

Kraft Inequality - II
  • Consider a complete D-ary tree
  • at level k, there are nodes
  • a node at level has
    descendants that are nodes at level k

level 0
level 1
level 2
level 3
Kraft Inequality - III
  • Proof
  • Consider a D-ary tree (not necessarily complete)
    representing the codewords, each path down the
    tree is a sequence of symbols, and each leaf
    (with its unique path) is a codeword. Let be
    the longest codeword.
  • A codeword of length , being a leaf,
    imply that at level there are
    missing nodes

Kraft Inequality - IV
  • The total number of possible nodes at level
  • Summing over all codewords
  • Dividing by

Kraft Inequality - V
  • Proof
  • Suppose (without loss of generality) that
    codewords are ordered by length, i.e.
  • Consider a D-ary tree and start assigning each
    codeword to a node, starting from .
  • For a generic codeword with length consider
    the set K of codewords with length , except i.
  • Suppose there is no available node at level i.
    That is,

Kraft Inequality - VI
  • but this means that
  • Then
  • that is absurd. Then the obtained tree
    represents an instantaneous code with desidered
    codeword lengths

Models and coders
compressed text
  • The model supplies the probabilities of the
    symbols (or of the group of symbols, as we will
    see later)
  • The coder encodes and decodes starting from these

Good modeling is crucial
  • What happens if the true probability of the
    symbols to be coded are but we use ?
  • Simply, compressed text will be longer, i.e. the
    average number of bits/symbol will be greater
  • It is possible to calculate the difference in
    bit/symbol from the two mass probability p and q,
    known as relative entropy

Finite-context models
  • in english text ...
  • ... but
  • A finite-context model of order m uses the
    previous m symbols to make the prediction
  • Better modeling but we need to extimate much more

Finite-state models
a 0.5
b 0.01
b 0.5
a 0.99
  • Although potentially more powerful (e.g. they can
    model wheather an odd or even number of as have
    occurred consecutively), they are not so popular.
  • Obviously the decoder uses the same model, so
    they are always in the same states

Static models
  • A models is static if we set up a reasonable
    probability distribution and use it for all the
    texts to be coded.
  • Poor performance in case of different kind of
    sources (english text, financial data...)
  • One solution is to have K different models and to
    send the index of the used model
  • ... but cfr. the book Gadsby by E. V. Wright

Adaptive models
  • In order to solve the problems of static
    modeling, adaptive (or dynamic) models begin with
    a bland probability distribution, that is refined
    as more symbols of the text are known
  • The encoder and the decoder have the same initial
    distribution, and the same rules to alter it
  • There could be adaptive models of order mgt0

The zero-frequency problem
  • The situation in which a symbol is predicted with
    probability zero should be avoided, as it cannot
    be coded
  • One solution the total number of symbols in the
    text is increased by 1. This 1/total probability
    is divided among all unseen symbols
  • Another solution to augment by 1 the count of
    every symbol
  • Many more solutions...
  • Which is the best? If text is sufficiently long
    the compression is similar

Symbolwise and dictionary models
  • The set of all possible symbols of a source is
    called the alphabet
  • Symbolwise models provide an extimated
    probability for each symbol in the alphabet
  • Dictionary models instead replace substrings in a
    text with codewords that identify each substring
    in a collection, called dictionary or codebook
Write a Comment
User Comments (0)