Information Theory and Security - PowerPoint PPT Presentation

About This Presentation
Title:

Information Theory and Security

Description:

These systems have focused on issues of confidentiality: Ensuring that an ... In today's lecture we will put a more formal framework around the notion of what ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 25
Provided by: wade3
Category:

less

Transcript and Presenter's Notes

Title: Information Theory and Security


1
Information Theory and Security
2
Lecture Motivation
  • Up to this point we have seen
  • Classical Crypto
  • Symmetric Crypto
  • Asymmetric Crypto
  • These systems have focused on issues of
    confidentiality Ensuring that an adversary
    cannot infer the original plaintext message, or
    cannot learn any information about the original
    plaintext from the ciphertext.
  • In todays lecture we will put a more formal
    framework around the notion of what information
    is, and use this to provide a definition of
    security from an information-theoretic point of
    view.

3
Lecture Outline
  • Probability Review Conditional Probability and
    Bayes
  • Entropy
  • Desired properties and definition
  • Chain Rule and conditioning
  • Coding and Information Theory
  • Huffman codes
  • General source coding results
  • Secrecy and Information Theory
  • Probabilistic definitions of a cryptosystem
  • Perfect Secrecy

4
The Basic Idea
  • Suppose we roll a 6-sided dice.
  • Let A be the event that the number of dots is
    odd.
  • Let B be the event that the number of dots is at
    least 3.
  • A 1, 3, 5
  • B 3, 4, 5, 6
  • I tell you the roll belongs to both A and B then
    you know there are only two possibilities 3, 5
  • In this sense tells you more than
    just A or just B.
  • That is, there is less uncertainty in
    than in A or B.
  • Information is closely linked with this idea of
    uncertainty Information increases when
    uncertainty decreases.

5
Probability Review, pg. 1
  • A random variable (event) is an experiment whose
    outcomes are mapped to real numbers.
  • For our discussion we will deal with
    discrete-valued random variables.
  • Probability We denote pX(x) Pr(X x).
  • For a subset A,
  • Joint Probability Sometimes we want to consider
    more than two events at the same time, in which
    we case we lump them together into a joint random
    variable, e.g. Z (X,Y).
  • Independence We say that two events are
    independent if

6
Probability Review, pg. 2
  • Conditional Probability We will often ask
    questions about the probability of events Y given
    that we have observed Xx. In particular, we
    define the conditional probability of Yy given
    Xx by
  • Independence We immediately get
  • Bayess Theorem If pX(x)gt0 and pY(y)gt0 then

7
Entropy and Uncertainty
  • We are concerned with how much uncertainty a
    random event has, but how do we define or measure
    uncertainty?
  • We want our measure to have the following
    properties
  • To each set of nonnegative numbers
    with
    , we define the uncertainty by
    .
  • should be a continuous function A
    slight change in p should not drastically
    change

  • for all ngt0. Uncertainty increases when there are
    more outcomes.
  • If 0ltqlt1, then

8
Entropy, pg. 2
  • We define the entropy of a random variable by
  • Example Consider a fair coin toss. There are two
    outcomes, with probability ½ each. The entropy is
  • Example Consider a non-fair coin toss X with
    probability p of getting heads and 1-p of getting
    tails. The entropy is
  • The entropy is maximum when p ½.

9
Entropy, pg. 3
  • Entropy may be thought of as the number of yes-no
    questions needed to accurately determine the
    outcome of a random event.
  • Example Flip two coins, and let X be the number
    of heads. The possibilities are 0,1,2 and the
    probabilities are 1/4, 1/2, 1/4. The Entropy is
  • So how can we relate this to questions?
  • First, ask Is there exactly one head? You will
    half the time get the right answer
  • Next, ask Are there two heads?
  • Half the time you needed one question, half you
    needed two

10
Entropy, pg. 4
  • Suppose we have two random variables X and Y, the
    joint entropy H(X,Y) is given by
  • Conditional Entropy In security, we ask
    questions of whether an observation reduces the
    uncertainty in something else. In particular, we
    want a notion of conditional entropy. Given that
    we observe event X, how much uncertainty is left
    in Y?

11
Entropy, pg. 5
  • Chain Rule The Chain Rule allows us to relate
    joint entropy to conditional entropy via H(X,Y)
    H(YX)H(X).
  • (Remaining details will be provided on the white
    board)
  • Meaning Uncertainty in (X,Y) is the uncertainty
    of X plus whatever uncertainty remains in Y given
    we observe X.

12
Entropy, pg. 6
  • Main Theorem
  • Entropy is non-negative.
  • where denotes the
    number of elements in the sample space of X.
  • (Conditioning reduces entropy)
  • with equality if and only if X and Y are
    independent.

13
Entropy and Source Coding Theory
  • There is a close relationship between entropy and
    representing information.
  • Entropy captures the notion of how many Yes-No
    questions are needed to accurately identify a
    piece of information that is, how many bits are
    needed!
  • One of the main focus areas in the field of
    information theory is on the issue of
    source-coding
  • How to efficiently (Compress) information into
    as few bits as possible.
  • We will talk about one such technique, Huffman
    Coding.
  • Huffman coding is for a simple scenario, where
    the source is a stationary stochastic process
    with independence between successive source
    symbols

14
Huffman Coding, pg. 1
  • Suppose we have an alphabet with four letters A,
    B, C, D with frequencies
  • We could represent this with A00, B01, C10,
    D11. This would mean we use an average of 2 bits
    per letter.
  • On the other hand, we could use the following
    representation A1, B01, C001, D000. Then the
    average number of bits per letter becomes
  • (0.5)1(0.3)2(0.1)3(0.1)3 1.7
  • Hence, this representation, on average, is more
    efficient.

A B C D 0.5 0.3 0.1 0.1
15
Huffman Coding, pg. 2
  • Huffman Coding is an algorithm that produces a
    representation for a source.
  • The Algorithm
  • List all outputs and their probabilities
  • Assign a 1 and 0 to smallest two, and combine to
    form an output with probability equal to the sum
  • Sort List according to probabilities and repeat
    the process
  • The binary strings are then obtained by reading
    backwards through the procedure

1
A
0.5
1.0
1
B
0.3
0.5
1
0
C
0.1
0.2
0
D
0.1
0
Symbol Representations A 1 B 01 C 001 D 000
16
Huffman Coding, pg. 3
  • In the previous example, we used probabilities.
    We may directly use event counts.
  • Example Consider 8 symbols, and suppose we have
    counted how many times they have occurred in an
    output sample.
  • We may derive the Huffman Tree (Exercise will be
    done on whiteboard)
  • The corresponding length vector is
    (2,2,3,3,3,4,5,5)
  • The average codelength is 2.83. If we had used a
    full-balanced tree representation (i.e. the
    straight-forward representation) we would have
    had an average codelength of 3.

S1 S2 S3 S4 S5 S6 S7 S8 28 25 20 16 15 8 7 5
17
Huffman Coding, pg. 4
  • We would like to quantify the average amount of
    bits needed in terms of entropy.
  • Theorem Let L be the average number of bits per
    output for Huffman encoding of a random variable
    X, then
  • Here, lx length of codeword assigned to symbol
    x.
  • Example Lets look back at the 4 symbol example
  • Our average codelength was 1.7 bits.

18
Huffman Coding, pg. 5
  • An interesting and useful question is What if I
    use the wrong distribution when calculating the
    code? How badly will my code perform?
  • Suppose the true distribution is px, and you used
    another distribution to find the lengths lx.
    Define the auxiliary distribution
    .
  • Theorem If we code the source X with lx instead
    of the correct Huffman code, then the resulting
    average codelength will satisfy
  • where the Kullback-Leibler Divergence D(pq) is

19
Another way to look at cryptography, pg. 1
  • So far in class, we have looked at the security
    problem from an algorithm point-of-view (DES,
    RC4, RSA,).
  • But why build these algorithms? How can we say we
    are doing a good job?
  • Enter information theory and its relationship to
    ciphers
  • Suppose we have a cipher with possible plaintexts
    P, ciphertexts C, and keys K.
  • Suppose that a plaintext P is chosen according to
    a probability law.
  • Suppose the key K is chosen independent of P
  • The resulting ciphertexts have various
    probabilities depending on the probabilities for
    P and K.

20
Another way to look at cryptography, pg. 2
  • Now, enter Eve She sees the ciphertext C and
    several security questions arise
  • Does she learn anything about P from seeing C?
  • Does she learn anything about the key K from
    seeing C?
  • Thus, our questions are associated with H(PC)
    and H(KC).
  • Ideally, we would like for the uncertainty to not
    decrease, i.e.
  • H(P C) H(P)
  • H(K C) H(K)

21
Another way to look at cryptography, pg. 3
  • Example Suppose we have three plaintexts a,b,c
    with probabilities 0.5, 0.3, 0.2. Suppose we
    have two keys k1 and k2 with probabilities 0.5
    and 0.5. Suppose there are three ciphertexts
    U,V,W.
  • We may calculate probabilities of the ciphertexts
  • Similarly we get pC(V)0.25 and pC(W)0.25

Ek1(a)U Ek1(b)V Ek1(c)W Ek2(a)U Ek2(b)W Ek2
(c)V
22
Another way to look at cryptography, pg. 4
  • Suppose Eve observes the ciphertext U, then she
    knows the plaintext was a.
  • We may calculate the conditional probabilities
  • Similarly we get pP(cV)0.4 and pP(aV)0. Also
    pP(aW)0 , pP(bW)0.6 , pP(cW)0.4.
  • What does this tell us? Remember, the original
    plaintexts probabilities were 0.5, 0.3, and 0.2.
    So, if we see a ciphertext, then we may revise
    the probabilities Something is learned

23
Another way to look at cryptography, pg. 5
  • We use entropy to quantify the amount of
    information that is learned about the plaintext
    given the ciphertext is observed.
  • The conditional entropy of P given C is
  • Thus an entire bit of information is revealed
    just by observing the ciphertext!

24
Perfect Secrecy and Entropy
  • The previous example gives us the motivation for
    the information-theoretic definition of security
    (or secrecy)
  • Definition A cryptosystem has perfect secrecy if
    H(PC)H(P).
  • Theorem The one-time pad has perfect secrecy.
  • Proof See the book for the details. Basic idea
    is to show each ciphertext will result with equal
    likelihood. We then use manipulations like
  • Equating these two as equal and using H(K)H(C)
    gives the result.

Why?
Write a Comment
User Comments (0)
About PowerShow.com