010'141 Engineering Mathematics II Lecture 15 Entropy

About This Presentation

Title:

010'141 Engineering Mathematics II Lecture 15 Entropy

Description:

Axiom 1: S(1) = 0. 6. Increasing Surprise ... Surprisingly, only a very few mathematical functions satisfy all these axioms. Theorem: ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 22

Provided by: scSn

Category:

more less

Transcript and Presenter's Notes

Title: 010'141 Engineering Mathematics II Lecture 15 Entropy

1
010.141 Engineering Mathematics IILecture
15Entropy

Bob McKay
School of Computer Science and Engineering
College of Engineering
Seoul National University

2
Outline

Axiomatising uncertainty
Entropy
Coding

3
Surprise!!
4
Axiomatising Surprise

Can we do mathematics about surprise?
Clearly, it has something to do with probability
-
We are surprised when something improbable
happens
We are not surprised by probable events
To do mathematics, we have to be able to
formalise and axiomatise the property we are
interested in
Can we write axioms for surprise?
Can we relate it to probability?

5
Surprise and Certainty

When we know that something is certain to happen,
we wont be surprised by it
Axiom 1
S(1) 0

6
Increasing Surprise

The more improbable an event, the more surprised
we are by it
Axiom 2
Surprise strictly decreases with probability
If p lt q then S(p) gt S(q)

7
Continuity of Surprise

If probability changes by a small amount, we
expect to only change our level of surprise by a
small amount
Axiom 3
S(p) is a continuous function of p

8
Additivity of Surprise

Suppose E and F are independent events with
probabilities p and q
We would expect that the additional surprise on
knowing F would not change just because we know E
(of course, it might change if E and F were
dependent)
Axiom 4
S(pq) S(p) S(q) for p and q between 0 and 1

9
Defining Surprise

Surprisingly, only a very few mathematical
functions satisfy all these axioms
Theorem
If S(.) satisfies Axioms 1 - 4, then
S(p) -C log2 p
where C is a positive integer
Usually, we set C 1, and speak of a value in
bits

10
Surprise and Entropy

Suppose X is a random variable with values x1 to
xn, and probabilities p1 to pn
Our surprise on learning xi is thus - log pi
Hence our expected surprise on learning the value
of X is
H(X) -?i1n pi log pi
H(X) is known as the entropy of X
We can also treat it as the information given by X

11
Relative Entropy

Given two random variables, X and Y, we can
consider the relative information (uncertainty)
remaining in X given that we know Y
Theorem
H(X,Y) H(Y) HY(X)
Corollary
HY(X) ? H(X) (equal only if X and Y are
independent)

12
Coding and Entropy

Suppose we want to send a message between two
places
For example, we might want to send a DNA sequence
in binary code as compactly as possible
DNA is composed of A,C,T,G
Possible codings

A ? 00 C ? 01 T ? 10 G ? 11
A ? 0 C ? 10 T ? 110 G ? 111
13
Coding and Entropy

These two codings have the necessary property
that (reading from the left), no sequence is an
extension of another
Necessary to avoid ambiguity
Forbids codes such as

A ? 0 C ? 1 T ? 00 G ? 01
14
Noiseless Coding

Theorem
Suppose X is a random variable with values x1 to
xn, and probabilities p1 to pn
For any coding that assigns ni bits to xi
-?i1N ni.p(xi) ? H(X) -?i1N p(xi) log p(xi)

15
Noiseless Coding

Can we achieve this bound?
Let p(A) 0.5, p(C) 0.25, p(T) p(G) 0.125,
Given
A ? 0
C ? 10
T ? 110
G ? 111
We achieve the bound

16
Noiseless Coding

Can we always achieve this bound?
Let p(A) 0.45, p(C) 0.3, p(T) p(G) 0.125,
No coding can achieve the bound
However we can always achieve a coding that goes
within 1 bit of the bound
That is, H(X) ? L lt H(X) 1

17
Noisy Coding

What if the channel we are transmitting over adds
noise to the signal
Now, of course, we want redundancy in the coding
to guarantee receipt of the message
For example, the coding
A ? 000000
C ? 000111
T ? 111000
G ? 111111
Guarantees correct receipt so long as there is no
more than one error per 3 bits (use majority
decoding)

18
Noisy Coding

Is this guarantee useful?
Not completely, because with any given error
rate, there is a finite probability that more
than 1 out of 3 bits will change
(so long as errors are independent)
However this coding does decrease the probability
of error
In fact, by transmitting more bits per symbol, we
can reduce the error probability as much as we
want
But it seems that decreasing the error
probability also reduces the effective rate of
transmission, presumably to zero

19
Surprise!!! Noisy Coding Theorem

Theorem
There is a number C such that for any R lt C, and
any ? gt 0, there is a coding-decoding scheme that
transmits with an average rate of R bits per
signal, and an error (per bit) lt ?
Definition
The largest such value of C is known as the
channel capacity C
For a binary symmetric channel
C 1 p log p (1 - p) log (1 - p)

20
Summary