An introduction to Data Compression - PowerPoint PPT Presentation

About This Presentation

Title:

An introduction to Data Compression

Description:

Title: Slide 1 Author - Last modified by - Created Date: 5/5/2005 1:53:41 PM Document presentation format: On-screen Show Company - Other titles – PowerPoint PPT presentation

Number of Views:176

Avg rating:3.0/5.0

Slides: 31

Provided by: 6383156

Category:

more less

Transcript and Presenter's Notes

Title: An introduction to Data Compression

1
An introduction toData Compression
2
General informations

Requirements
some programming skills (not so much...)
knowledge of data structures
... some work!
Office hours ...
... please write me an email monfardini_at_dii.unis
i.it

3
What is compression?

Intuitively compression is a method to press
something into a smaller space.
In our domains a better definition is to make
information shorter

4
Some basic questions

What is information?
How can we measure the amount of information?
Why compression is useful?
How do we compress?
How much we can compress?

5
What is information? - I

Commonly the term information refers to the
knowledge of some fact, circumstance or thought.
For example we can think about reading a
newspaper, news are the information.
syntax
letters, punctuation marks, white spaces, grammar
rules ...
semantics
meaning of the words and of the sentences

6
What is information? - II

In our domain, information is merely the syntax,
i.e. we are interested in the symbols of the
alphabet used to express the information.
In order to give a mathematical definition of
information we need some principle of Information
Theory

7
The fundamental concept

A key concept in Information Theory is that the
information is conveyed by randomness
Which information give us a biased coin, which
outcome is always head?
What about another biased coin, which outcome is
head with 90 probability?
We need a way to measure quantitatively the
amount of information in some mathematical sense

8
The Uncertainty - I

Suppose we have a discrete random variable and
is a particular outcome with probability
uncertainty
The units are given by the base of the logarithms
base 2 ? bits
base 10 ? nats

9
The Uncertanty - II

Suppose the random variable output
? each outcome has 1 bit of information
? 0 gives no information at all, while if the
outcome is 1 the information is

10
The Entropy

More useful is the entropy of a random variable
with values in a space
The entropy is a measure of the average
uncertanty of the random variable

11
The entropy - examples

Consider again a r.v. with only two possible
outcomes, 0 and 1
In this case

12
Compression and loss

lossless
decompressed message (file) is an exact copy of
the original. Useful for text compression
lossy
some information is lost in the decompressed
message (file). Useful for image and sound
compression
lgnore for a while lossy compression

13
Definitions - I

A source code from a r.v. is a mapping from
to , the set of finite-length string from a
D-ary alphabet.
, codeword for
, length of

14
Definitions - II

non-singular code (... trivial ...)
every element of is mapped in a different
string of
extension of a code
uniquely decodable code
its extension is uniquely decodable

15
Definitions - III

prefix (better prefix-free) or istantaneous code
no codeword is a prefix of any other codeword
the advantage is that decoding has no need to
look-ahead

codewords
a 11
b 110
... 11? ...
16
Examples
Code 1 Code 2 Code 3 Code 4
1 01 0 10 0
2 110 010 00 10
3 010 10 11 110
4 110 01 110 111
singular
not singular, but not uniquely decodable
uniquely decodable, but not instantaneous
instantaneous
17
Kraft Inequality - I

Theorem (Kraft Inequality)
For any instantaneous code over an alphabet of
size D, the codeword lengths must
satisfy
Conversely, given a set of codeword lengths that
satisfy this inequality there exists an
istantaneous code with these word lengths

18
Kraft Inequality - II

Consider a complete D-ary tree
at level k, there are nodes
a node at level has
descendants that are nodes at level k

level 0
level 1
level 2
level 3
19
Kraft Inequality - III

Proof
Consider a D-ary tree (not necessarily complete)
representing the codewords, each path down the
tree is a sequence of symbols, and each leaf
(with its unique path) is a codeword. Let be
the longest codeword.
A codeword of length , being a leaf,
imply that at level there are
missing nodes

20
Kraft Inequality - IV

The total number of possible nodes at level
is
Summing over all codewords
Dividing by

21
Kraft Inequality - V

Proof
Suppose (without loss of generality) that
codewords are ordered by length, i.e.
.
Consider a D-ary tree and start assigning each
codeword to a node, starting from .
For a generic codeword with length consider
the set K of codewords with length , except i.
Suppose there is no available node at level i.
That is,

22
Kraft Inequality - VI

but this means that
Then
that is absurd. Then the obtained tree
represents an instantaneous code with desidered
codeword lengths

23
Models and coders
model
model
compressed text
text
text
encoder
decoder

The model supplies the probabilities of the
symbols (or of the group of symbols, as we will
see later)
The coder encodes and decodes starting from these
probabilities

24
Good modeling is crucial

What happens if the true probability of the
symbols to be coded are but we use ?
Simply, compressed text will be longer, i.e. the
average number of bits/symbol will be greater
It is possible to calculate the difference in
bit/symbol from the two mass probability p and q,
known as relative entropy

25
Finite-context models

in english text ...
... but
A finite-context model of order m uses the
previous m symbols to make the prediction
Better modeling but we need to extimate much more
probabilities

26
Finite-state models
a 0.5
b 0.01
b 0.5
1
2
a 0.99

Although potentially more powerful (e.g. they can
model wheather an odd or even number of as have
occurred consecutively), they are not so popular.
Obviously the decoder uses the same model, so
they are always in the same states

27
Static models

A models is static if we set up a reasonable
probability distribution and use it for all the
texts to be coded.
Poor performance in case of different kind of
sources (english text, financial data...)
One solution is to have K different models and to
send the index of the used model
... but cfr. the book Gadsby by E. V. Wright

28
Adaptive models

In order to solve the problems of static
modeling, adaptive (or dynamic) models begin with
a bland probability distribution, that is refined
as more symbols of the text are known
The encoder and the decoder have the same initial
distribution, and the same rules to alter it
There could be adaptive models of order mgt0

29
The zero-frequency problem

The situation in which a symbol is predicted with
probability zero should be avoided, as it cannot
be coded
One solution the total number of symbols in the
text is increased by 1. This 1/total probability
is divided among all unseen symbols
Another solution to augment by 1 the count of
every symbol
Many more solutions...
Which is the best? If text is sufficiently long
the compression is similar

30
Symbolwise and dictionary models

The set of all possible symbols of a source is
called the alphabet
Symbolwise models provide an extimated
probability for each symbol in the alphabet
Dictionary models instead replace substrings in a
text with codewords that identify each substring
in a collection, called dictionary or codebook

Write a Comment

User Comments (0)