EEL 5930 sec. 5 / 4930 sec. 7, Spring - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

EEL 5930 sec. 5 / 4930 sec. 7, Spring

Description:

http://www.eng.fsu.edu/~mpf EEL 5930 sec. 5 / 4930 sec. 7, Spring 05 Physical Limits of Computing Slides for a course taught by Michael P. Frank – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 55
Provided by: Mich1162
Learn more at: https://eng.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: EEL 5930 sec. 5 / 4930 sec. 7, Spring


1
EEL 5930 sec. 5 / 4930 sec. 7, Spring
05Physical Limits of Computing
http//www.eng.fsu.edu/mpf
  • Slides for a course taught byMichael P. Frankin
    the Department of Electrical Computer
    Engineering

2
Module 2 Review of Basic Theory of Information
Computation
  • Probability
  • Information Theory
  • Computation Theory

3
Outline of this Module
  • Topics covered in this module
  • Probability and statistics Some basic concepts
  • Some basic elements of information theory
  • Various usages of the word information
  • Measuring information
  • Entropy and physical information
  • Some basic elements of the theory of computation
  • Universality
  • Computational complexity
  • Models of computation

4
Review of Basic Probability and Statistics
Background
  • Events, Probabilities, Product Rule, Conditional
    Mutual Probabilities, Expectation, Variance,
    Standard Deviation

5
Probability
  • In statistics, an event E is any possible
    situation (occurrence, state of affairs) that
    might or might not be the actual situation.
  • The proposition P the event E occurred (or
    will occur) could turn out to be either true or
    false.
  • The probability of an event E is a real number p
    in the range 0,1 which gives our degree of
    belief in the proposition P, i.e., the
    proposition that E will/did occur, where
  • The value p 0 means that P is false with
    complete certainty, and
  • The value p 1 means that P is true with
    complete certainty,
  • The value p ½ means that the truth value of P
    is completely unknown
  • That is, as far as we know, it is equally likely
    to be either true or value.
  • The probability p(E) is also the fraction of
    times that we would expect the event E to occur
    in a repeated experiment.
  • That is, on average, if the experiment could be
    repeated infinitely often, and if each repetition
    was independent of the others.
  • If the probability of E is p, then we would
    expect E to occur once for every 1/p independent
    repetitions of the experiment, on average.
  • Well call 1/p the improbability i of E.

6
Joint Probability
  • Let X and Y be events, and let XY denote the
    event that events X and Y both occur together
    (that is, jointly).
  • Then p(XY) is called the joint probability of X
    and Y.
  • Product rule If X and Y are independent events,
    then p(XY) p(X) p(Y).
  • This follows from basic combinatorics.
  • It can also be considered a definition of what it
    means for X and Y to be independent.

7
Event Complements, Mutual Exclusivity,
Exhaustiveness
  • For any event E, its complement E is the event
    that event E does not occur.
  • Complement rule p(E) p(E) 1.
  • Two events E and F are called mutually exclusive
    if it is impossible for E and F to occur
    together.
  • That is, p(EF) 0.
  • Note that E and E are always mutually exclusive.
  • A set S E1, E2, of events is exhaustive if
    the event that some event in S occurs has
    probability 1.
  • Note that S E, E is an exhaustive set of
    events.
  • Theorem The sum of the probabilities of any
    exhaustive set S of mutually exclusive events is
    1.

8
Conditional Probability
  • Let XY be the event that X and Y occur jointly.
  • Then the conditional probability of X given Y is
    defined by p(XY) p(XY) / p(Y).
  • It is the probability that if we are given that Y
    occurs, that X would also occur.
  • Bayes rule p(XY) p(X) p(YX) / p(Y).

r(XY)
Space of possible outcomes
Event Y
Event XY
Event X
9
Mutual Probability Ratio
  • The mutual probability ratio of X and Y is
    defined as r(XY) p(XY)/p(X)p(Y).
  • Note that r(XY) p(XY)/p(X) p(YX)/p(Y).
  • I.e., r is the factor by which the probability of
    either X or Y gets boosted upon learning that the
    other event occurs.
  • WARNING Some authors define the term mutual
    probability to be the reciprocal of our quantity
    r.
  • Dont get confused! I call that mutual
    improbability ratio.
  • Note that for independent events, r 1.
  • Whereas for dependent, positively correlated
    events, r gt 1.
  • And for dependent, anti-correlated events, r lt 1.

10
Expectation Values
  • Let S be an exhaustive set of mutually exclusive
    events Ei.
  • This is sometimes known as a sample space.
  • Let f(Ei) be any function of the events in S.
  • This is sometimes called a random variable.
  • The expectation value or expected value or norm
    of f, written Exf or ?f?, is just the mean or
    average value of f(Ei), as weighted by the
    probabilities of the events Ei.
  • WARNING The expected value may actually be
    quite unexpected, or even impossible to occur!
  • Its not the ordinary English meaning of the word
    expected.
  • Expected values combine linearly
    ExafgaExf Exg.

11
Variance Standard Deviation
  • The variance of a random variable f is s2(f)
    Ex(f - Exf)2
  • The expected value of the squared deviation of f
    from the norm. (The squaring makes it positive.)
  • The standard deviation or root-mean-square (RMS)
    difference of f from its mean is s(f)
    s2(f)1/2.
  • This is usually comparable, in absolute
    magnitude, to a typical value of f - Exf.

12
The Theory of InformationSome Basic Concepts
  • Basic Information Concepts
  • Quantifying Information
  • Information and Entropy

13
Etymology of Information
  • Earliest historical usage in English (from Oxford
    English Dictionary)
  • The act of informing,
  • As in education, instruction, training.
  • Five books come down from Heaven for information
    of mankind. (1387)
  • Or a particular item of training, i.e., a
    particular instruction.
  • Melibee had heard the great skills and reasons
    of Dame Prudence, and her wise informations and
    techniques. (1386)
  • Derived by adding the action noun ending ation
    (descended from Latins tio) to the
    pre-existing verb to inform,
  • Meaning to give form (shape) to the mind
  • to discipline, instruct, teach
  • Men so wise should go and inform their kings.
    (1330)
  • And inform comes from Latin informare, derived
    from noun forma (form),
  • Informare means to give form to, or to form an
    idea of.
  • Latin also even already contained the derived
    word informatio,
  • meaning concept or idea.
  • Note The Greek words e?d?? (eídos) and µ??f?
    (morphé),
  • Meaning form, or shape,
  • were famously used by Plato ( later Aristotle)
    in a technical philosophical sense, to denote the
    true identity or ideal essence of something.
  • Well see that our modern concept of physical
    information is not too dissimilar!

14
Information Our Definition
  • Information is that which distinguishes one thing
    (entity) from another.
  • It is all or part of an identification or
    description of the thing.
  • A specification of some or all of its properties
    or characteristics.
  • We can say that every thing carries or embodies a
    complete description of itself.
  • Simply in virtue of its own being this is called
    the entitys form or constitutive essence.
  • But, let us also take care to distinguish between
    the following
  • A nugget of information (for lack of a better
    phrase)
  • A specific instantiation (i.e., as found in a
    specific entity) of some general form.
  • A cloud or stream of information
  • A physical state or set of states, dynamically
    changing over time.
  • A form or pattern of information
  • An abstract pattern of information, as opposed to
    a specific instantiation.
  • Many separate nuggets of information contained in
    separate objects may have identical patterns, or
    content.
  • We may say that those nuggets are copies of each
    other.
  • An amount or quantity of information
  • A quantification of how large a given nugget,
    cloud, or pattern of information is.
  • Measured in logarithmic units, applied to the
    number of possible patterns.

15
Information-related concepts
  • It will also be convenient to discuss the
    following
  • An embodiment of information
  • The physical system that contains some particular
    nugget or cloud of information.
  • A symbol or message
  • A nugget of information or its embodiment
    produced with the intent that it should convey
    some specific meaning, or semantic content.
  • A message is typically a compound object
    containing a number of symbols.
  • An interpretation of information
  • A particular semantic interpretation of a form
    (pattern of information), tying it to potentially
    useful facts of interest.
  • May or may not be the intended meaning!
  • A representation of information
  • An encoding of one pattern of information within
    some other (frequently larger) pattern.
  • According to some particular language or code.
  • A subject of information
  • An entity that is identified or described by a
    given pattern of information.
  • May be abstract or concrete, mathematical or
    physical

16
Information Concept Map
Meaning (interpretationof information)
Describes, identifies
Interpretedto get
Representedby
Quantity ofinformation
Thing (subjector embodiment)
Form (pattern ofinformation)
Measures size of
May be a
Measures
Instantiatedby/in
Maybe a
Instantiates,has
Measures
Forms, composes
Contains, carries, embodies
Cloud (dynamicbody of information)
Physicalentity
Nugget (instanceof a form)
Has a changing
17
Quantifying Information
  • One way to quantify forms is to try to count how
    many distinct ones there are.
  • The number of all conceivable forms is not
    finite.
  • However
  • Consider a situation defined in such a way that a
    given nugget (in the context of that situation)
    can only take on some definite number N of
    possible distinct forms.
  • One way to try to characterize the size of the
    nugget is then to specify the value of N.
  • This describes the amount of variability of its
    form.
  • However, N by itself does not seem to have the
    right mathematical properties to be used to
    describe the informational size of the nugget

18
Compound Nuggets
  • Consider a nugget of information C formed by
    taking two separate and independent nuggets of
    information A, B, and considering them together
    as constituting a single compound nugget of
    information.
  • Suppose now also that A has N possible forms, and
    that B has M possible forms.
  • Clearly then, due to the product rule of
    combinatorics, C has NM possible distinct
    forms.
  • Each is obtained by assigning a form to A and a
    form to B independently.
  • Would the size of the nugget C then be the
    product of the sizes of A and B?
  • It would seem more natural to say sum,so that
    the whole is the sum of the parts.

Nugget C Has NM forms
Nugget A
Nugget B
N possibleforms
M possibleforms
19
Information Logarithmic Units
  • We can convert the product to a sum by using
    logarithmic units.
  • Let us then define the informational size I of
    (or amount of information contained in) a nugget
    of information that has N possible forms as being
    the indefinite logarithm of N, that is, as I
    log N.
  • With an unspecified base for the logarithm.
  • We can interpret indefinite-logarithm values as
    being inherently dimensional (not dimensionless
    pure-number) quantities.
  • Any numeric result is always (explicitly or
    implicitly) paired with a unit log b which is
    associated with the base b of the logarithm that
    is used.
  • The unit log 2 is called the bit, the unit log
    10 is the decade or bel, log 16 is sometimes
    called a nybble, and log 256 is the byte.
  • Whereas, the unit log e (most widely used in
    physics) is called the nat.
  • The nat is also expressed as Boltzmanns constant
    kB (e.g. in Joules/K)
  • A.k.a. the ideal gas constant R (frequently
    expressed in kcal/mol/K)

Log Unit
Log Unit
Number
log a (logb a) log b (logc a)
log c
20
The Size of a Form
  • Suppose that in some situation, a given nugget
    has N possible forms.
  • Then the size of the nugget is I log N.
  • Can we also say that this is the size of each of
    the nuggets possible forms?
  • In a way, but we have to be a little bit careful.
  • We distinguish between two concepts
  • The actual size I log N of each form.
  • That is, given how the situation is described.
  • The entropy or compressed size S of each form.
  • Which we are about to define.

21
The Entropy of a Form
  • How can we measure the compressed size of an
    abstract form?
  • For this, we need a language that we can use to
    represent forms using concrete nuggets of
    linguistic information whose size we can measure.
  • We then say that the compressed size or entropy S
    of a form is the size of the smallest nugget of
    information representing it in our language.
    (Its most compressed description.)
  • At first, this seems pretty ambiguous, but
  • In their algorithmic information theory,
    Kolmogorov and Chaitin showed that this quantity
    is even almost language-independent.
  • It is invariant to a language-dependent additive
    constant.
  • That is, among computationally universal
    (Turing-complete) languages.
  • Also, whenever we have a probability distribution
    over forms, Shannon shows us how to choose an
    encoding that minimizes the expected size of the
    codeword nugget that is needed.
  • If a probability distribution is available, we
    assume a language chosen to minimize the expected
    size of the nugget representing the form.
  • We define the compressed size or entropy of the
    form to be the size of its description in this
    optimal language.

22
The Optimal Encoding
  • Suppose a specific form F has probability p.
  • Thus, improbability i 1/p.
  • Note that this is the same probability that F
    would have if it were one of i equally-likely
    forms.
  • We saw earlier that a nugget of information
    having i possible forms is characterized as
    containing a quantity of information I log i.
  • And the actual size of each form in that
    situation is the same, I.
  • If all forms are equally likely, their average
    compressed size cant be any less.
  • So, it seems reasonable to declare that the
    compressed size S of a form F with probability p
    is the same as its actual size in this situation,
    that is, S(F) log i log 1/p -log p.
  • This suggests that in the optimal encoding
    language, the description of the form F would be
    represented in a nugget of that size.
  • In his Mathematical Theory of Communication
    (1949) Claude Shannon showed that in fact this is
    exactly correct,
  • So long as we permit ourselves to consider
    encodings in which many similar systems (whose
    forms are chosen from the same distribution) are
    described together.
  • Modern block-coding schemes in fact closely
    approach Shannons ideal encoding efficiency.

23
Optimal Encoding Example
  • Suppose a system has four forms A, B, C, D with
    the following probabilities
  • p(A)½, p(B)¼, p(C)p(D)1/8.
  • Note that the probabilities sum to 1, as they
    must.
  • Then the corresponding improbabilities are
  • i(A)2, i(B)4, i(C)i(D)8.
  • And the form sizes (log-improbabilities) are
  • S(A) log 2 1 bit, S(B) log 4 2 log 2
    2 bits, S(C) S(D) log 8 3 log 2 3 bits.
  • Indeed, in this example, we can encode the forms
    using bit-strings of exactly these lengths, as
    follows
  • A0, B10, C110, D111.
  • Note that this code is self-delimiting
  • the codewords can be concatenated together
    without ambiguity.

0
1
A
1
0
B
1
0
C
D
24
Entropy Content of a Nugget
  • Naturally, if we have a probability distribution
    over the possible forms F of a nugget,
  • We can easily calculate the expected entropy ?S?
    (expected compressed size) of the nuggets form.
  • This is possible since S itself is a random
    variable,
  • a function of the event that the system has a
    specific form F.
  • The expected entropy ?S? of the nuggets form is
    then
  • We usually drop the expected, and just call
    this the amount of entropy S contained in the
    nugget.
  • It is really the expected compressed size of the
    nugget.

Notethe -!
25
Visualizing Boltzmann-Gibbs-Shannon Statistical
Entropy
26
Known vs. Unknown Information
  • We can consider the informational size I log N
    of a nugget that has N forms as telling us the
    total amount of information that the nugget
    contains.
  • Meanwhile, we can consider its entropy S ?log
    i(f)? as telling us how much of the total
    information that it contains is unknown to us.
  • In the perspective specified by the distribution
    p().
  • Since S I, we can also define the amount of
    known information (or extropy) in the nugget as X
    I - S.
  • Note that our probability distribution p() over
    the nuggets form could change (if we gain or
    lose knowledge about it),
  • Thus, the nuggets entropy S and extropy X may
    also change.
  • However, note that the total informational size
    of a given nugget, I log N X S, always
    still remains a constant.
  • Entropy and extropy can be viewed as two forms of
    information, which can be converted to each
    other, but whose total amount is conserved.

27
Information/Entropy Example
  • Consider a tetrahedral die which maylie on any
    of its 4 faces labeled 1,2,3,4
  • We say that the answer to the question Which
    side is up? is a nugget of information having 4
    possible forms.
  • Thus, the total amount of information contained
    in this nugget, and in the orientation of the
    physical die itself, is log 4 2 bits.
  • Now, suppose the die is weighted so that p(1)½,
    p(2)¼, and p(3)p(4)1/8 for its post-throw
    state.
  • Then S(1)1b, S(2)2b, and S(3)S(4)3b.
  • The expected entropy is then S 1.75 bits.
  • This much information remains unknown before the
    die is thrown.
  • The extropy (known information) is then X 0.25
    bits.
  • Exactly one-fourth of a bits worth of knowledge
    about the outcome is already expressed by this
    specific probability distribution p().

28
NuggetVariable, FormValue, and Types of Events.
  • A nugget basically means a variable V.
  • Also associated with a set of possible values
    v1,v2,.
  • Meanwhile, a form is basically a value v.
  • A primitive event is a proposition that assigns a
    specific form v to a specific nugget, Vv.
  • I.e., a specific value to a specific variable.
  • A compound event is a conjunctive proposition
    that assigns forms to multiple nuggets,
  • E.g., Vv, Uu, Ww.
  • A general event is a disjunctive set of primitive
    and/or compound events.
  • Essentially equivalent to a Boolean combination
    of assignment propositions.

29
Entropy of a Binary Variable
Below, little s of an individual form or
probability denotesthe contribution to the total
entropy of a form with that probability.
Maximum s(p) (1/e) nat (lg e)/e bits .531
bits _at_ p 1/e .368
30
Joint Distributions over Two Nuggets
  • Let X, Y be two nuggets, each with many forms
    x1, x2, and y1, y2, .
  • Let xy represent the compound event Xx,Yy.
  • Note all xys are mutually exclusive and
    exhaustive.
  • Suppose we have available a joint probability
    distribution p(xy) over the nuggets X and Y.
  • This then implies the reduced or marginal
    distributions p(x)?y p(xy) and p(y)?x p(xy).
  • We also thus have conditional probabilities
    p(xy) and p(yx), according to the usual
    definitions.
  • And we have mutual probability ratios r(xy).

31
Joint, Marginal, Conditional Entropyand Mutual
Information
  • The joint entropy S(XY) ?log i(xy)?.
  • The (prior, marginal or reduced) entropy S(X)
    S(p(x)) ?log i(x)?. Likewise for S(Y).
  • The entropy of each nugget, taken by itself.
  • Entropy is subadditive S(XY) S(X) S(Y).
  • The conditional entropy S(XY) ExyS(p(xy))
  • The expected entropy after Y is observed.
  • Theorem S(XY) S(XY) - S(Y). Joint entropy
    minus that of Y.
  • The mutual information I(XY) Exlog r(xy).
  • We will prove Theorem I(XY) S(X) - S(XY).
  • Thus the mutual information is the expected
    reduction of entropy in either variable as a
    result of observing the other.

32
Conditional Entropy Theorem
The conditional entropy of X given Y is the joint
entropy of XY minus the entropy of Y.
33
Mutual Information is Mutual Reduction in Entropy
And likewise, we also have I(XY) S(Y) -
S(YX), since the definition is symmetric.
I(XY) S(X) S(Y) - S(XY)
Also,
34
Visualization of Mutual Information
  • Let the total length of the bar below represent
    the total amount of entropy in the system XY.

S(YX) conditional entropy of Y given X
S(X) entropy of X
S(XY) joint entropy of X and Y
S(XY) conditional entropy of X given Y
S(Y) entropy of Y
35
Example 1
  • Suppose the sample space of primitive events
    consists of 5-bit strings Bb1b2b3b4b5.
  • Chosen at random with equal probability (1/32).
  • Let variable Xb1b2b3b4, and Yb3b4b5.
  • Then S(X) ___ bits, and S(Y) ___ b.
  • Meanwhile S(XY) ___ b.
  • Thus S(XY) ___ b, and S(YX) ___ b
  • And so I(XY) ___ b.

4
3
5
2
1
2
36
Example 2
  • Let the sample space A consist of the 8 letters
    a,b,c,d,e,f,g,h. (All equally likely.)
  • Let X partition A into x1a,b,c,d and
    x2e,f,g,h.
  • Y partitions A into y1a,b,e, y2c,f,
    y3d,g,h.
  • Then we have
  • S(X) 1 bit.
  • S(Y) 2(3/8 log 8/3) (1/4 log 4) 1.561278
    bits
  • S(YX) (1/2 log 2) 2(1/4 log 4) 1.5 bits.
  • I(XY) 1.561278b - 1.5b .061278 b.
  • S(XY) 1b 1.5b 2.5 b.
  • S(XY) 1b - .061278b .938722 b.

Y
a
b
c
d
X
e
f
g
h
(Meanwhile, the total information content of the
sample space log 8 3 bits)
37
Physical Information
  • Now, physical information is simply information
    that is contained in the state of a physical
    system or subsystem.
  • We may speak of a holder, pattern, amount,
    subject, embodiment, meaning, cloud or
    representation of physical information, as with
    information in general.
  • Note that all information that we can manipulate
    ultimately must be (or be represented by)
    physical information!
  • So long as we are stuck in the physical universe!
  • In our quantum-mechanical universe, there are two
    very different categories of physical
    information
  • Quantum information is all the information that
    is embodied in the quantum state of a physical
    system.
  • Unfortunately, it cant all be measured or
    copied!
  • Classical information is just a piece of
    information that picks out a particular measured
    state, once a basis for measurement is already
    given.
  • Its the kind of information that were used to
    thinking about.

38
Objective Entropy?
  • In all of this, we have defined entropy as a
    somewhat subjective or relative quantity
  • Entropy of a subsystem depends on an observers
    state of knowledge about that subsystem, such as
    a probability distribution.
  • Wait a minute Doesnt physics have a more
    objective, observer-independent definition of
    entropy?
  • Only insofar as there are preferred states of
    knowledge that are most readily achieved in the
    lab.
  • E.g., knowing of a gas only its chemical
    composition, temperature, pressure, volume, and
    number of molecules.
  • Since such knowledge is practically difficult to
    improve upon using present-day macroscale tools,
    it serves as a uniform standard.
  • However, in nanoscale systems, a significant
    fraction of the physical information that is
    present in one subsystem is subject to being
    known, or not, by another subsystem (depending on
    design).
  • ? How a nanosystem is designed how we deal with
    information recorded at the nanoscale may vastly
    affect how much of the nanosystems internal
    physical information effectively is or is not
    entropy (for practical purposes).

39
Entropy in Compound Systems
  • When modeling a compound system C having at least
    two subsystems A and B, we can adopt either of
    (at least) two different perspectives
  • The external perspective where we treat AB as a
    single system, and we (as modelers) have some
    probability distribution over its states.
  • This allows us to derive an entropy for the whole
    system.
  • The internal perspective in which we imagine
    putting ourselves in the shoes of one of the
    subsystems (say A), and considering its state of
    knowledge about B.
  • A may have more knowledge about B than we do.
  • Well see how to make the total expected entropy
    come out the same in both perspectives!

40
Beyond Statistical Entropy
41
Entropy as Information
  • A bit of history
  • Most of the credit for originating this concept
    really should go to Ludwig Boltzmann.
  • He (not Shannon) first characterized the entropy
    of a system as the expected log-improbability of
    its state, H -?(pi log pi).
  • He also discussed combinatorial reasons for its
    increase in his famous H-theorem
  • Shannon brought Boltzmanns entropy to the
    attention of communication engineers
  • And he taught us how to interpret Boltzmanns
    entropy as unknown information, in a
    communication-theory context.
  • von Neumann generalized Boltzmann entropy to
    quantum mixed states
  • That is, the S -Tr ? ln ? expression that we
    all know and love
  • Jaynes clarified how the von Neumann entropy of a
    system can increase over time
  • Either when the Hamiltonian itself is unknown, or
    when we trace out entangled subsystems
  • Zurek suggested adding algorithmically
    incompressible information to the part of
    physical information that we consider to be
    entropy
  • I will discuss a variation on this theme.

42
Why go beyond the statistical definition of
entropy?
  • We may argue the statistical concept of entropy
    is incomplete,
  • because it doesnt even begin to break down the
    ontology-epistemology barrier
  • In the statistical view, a knower (such as
    ourselves) must always be invoked to supply a
    state of knowledge (probability distribution)
  • But we typically treat the knower as being
    fundamentally separate from the physical system
    itself.
  • However, in reality, we ourselves are part of the
    physical system that is our universe
  • Thus, a complete understanding of entropy must
    also address what knowledge means, physically

43
Small Physical Knowers
  • Of course, humans are extremely large complex
    physical systems, and to physically characterize
    our states of knowledge is a very long way off
  • However, we can hope to characterize the
    knowledge of simpler systems.
  • Computer engineers find that in practice, it can
    be very meaningful and useful to ascribe
    epistemological states even to extremely simple
    systems.
  • E.g., digital systems and their component
    subsystems.
  • When analyzing complex digital systems,
  • we constantly say things like, At such-and-such
    time, component A knows such-and-such information
    about the state of component B
  • Means, essentially, that there is a specific
    correlation between the states of A and B.
  • For nano-scale digital devices, we can strive to
    exactly characterize their logical states in
    mathematical physics terms
  • Thus we ought to be able to say exactly what it
    means, physically, for one component to know some
    information about another.

44
What wed like to say
  • We want to formalize arguments such as the
    following
  • Component A doesnt know the state of component
    B, so the physical information in B is entropy to
    component A. Component A cant destroy the
    entropy in B, due to the 2nd law of
    thermodynamics, and therefore A cant reset B to
    a standard state without expelling Bs entropy to
    the environment.
  • We want all of these to be mathematically
    well-defined and physically meaningful
    statements, and we want the argument itself to be
    formally provable!
  • One motivation A lot of head-in-the-sand
    technologists are still in a state of denial
    about Landauers principle!
  • Oblivious erasure of non-entropy information
    turns it into entropy.
  • We need to be able to prove it to them with
    simple, undeniable, clear and correct arguments!
  • To get reversible/quantum computing more traction
    in industry.

45
Insufficiency of Statistical Entropy for Physical
Knowers
  • Unfortunately for this kind of program
  • If the ordinary statistical definition of entropy
    is used,
  • together with a knower that is fully defined as
    an actual physical system, then
  • The 2nd law of thermodynamics no longer holds!
  • Note the unknown information in a system can be
    reduced
  • Simply let the knower system perform a
    (coherent, reversible) measurement of the target
    system, to gain knowledge about the state of the
    target system!
  • The entropy of the target system (from knowers
    perspective) is then reduced.
  • The 2nd law says there must be a corresponding
    increase in entropy somewhere, but where?
  • This is the essence of Maxwells Demon paradox.

46
Entropy in knowledge?
  • Resolution suggested by Bennett
  • The demons knowledge of the result of his
    measurement can itself be considered to
    constitute one form of entropy!
  • It must be expelled into environment in order to
    reset his state.
  • But, what if we imagine ourselves in the demons
    shoes?
  • Clearly, the demons knowledge of the measurement
    result itself constitutes known information,
    from his own perspective!
  • I.e., the demons own subjective posterior
    probability distribution that he would (or
    should) assess over the possible values of his
    knowledge of the result, after he has already
    obtained this knowledge, will be entirely
    concentrated on the actual outcome.
  • The statistical entropy of this distribution is
    zero!
  • So, here we have a type of entropy that is
    present in someones (the demons) own knowledge
    itself, and is not unknown information!
  • Needed A way to make sense of this, and to
    mathematically quantify this entropy of
    knowledge.

47
Quantifying the Entropy ofKnowledge, Approach 1
  • The traditional position says In order to
    properly define the entropy in the demons state
    of knowledge, we must always pop up to the
    meta-perspective from which we are describing the
    whole physical situation.
  • We ourselves always implicitly possess some
    probability distribution over the states of the
    joint demon-target system.
  • We should just take the statistical entropy of
    that distribution.
  • Problem This approach doesnt face up to the
    fact that we are physical systems too!
  • It doesnt offer any self-consistent way that
    physical systems themselves can ever play the
    role of a knower!
  • I.e., describe other systems, assess subjective
    probability distributions over their state,
    modify those distributions via measurements, etc.
  • This contradicts our own personal physical
    experience,
  • as well as what we expect that quantum computers
    performing coherent measurements of other systems
    ought to be able to do

48
Approach 2
  • The entropy inherent in some known information is
    the smallest size to which this information can
    be compressed.
  • But of course, this depends on the coding system.
  • Zurek suggests, use Kolmogorov complexity. (Size
    of shortest generating program.)
  • But there are two problems with doing that
  • Its only well-defined up to an additive
    constant.
  • That is, modulo a choice of universal programming
    language.
  • Its uncomputable!
  • What else might we try?

49
Approach 3 (We Suggest)
  • We propose The entropy content of some known
    piece of information is its compressed size
    according to whatever encoding would have yielded
    the smallest expected compressed size, a priori.
  • That is, taking the expectation value over all
    the possible patterns of information before the
    actual one was obtained.
  • This is nice, because the expected value of
    posterior entropy then closely matches the
    ordinary statistical entropy of the prior
    distribution.
  • Even exactly, in special cases, or in the limit
    of many repetitions
  • Due to a simple application of Shannons
    channel-capacity theorem.
  • We can then show that the 2nd law gets obeyed on
    average.
  • But, from whose a priori probability distribution
    is this expectation value of compressed size to
    be obtained?

Expected length of thecodeword ci
encodinginformation pattern i
50
Who picks the compressor?
  • Two possible answers to this
  • Use our probability distribution when we
    originally describe and analyze the hypothetical
    situation from outside.
  • Although this is a bit distasteful, since here we
    are resorting to the meta-perspective again,
    which we were trying to avoid
  • However, at least we do manage to sidestep the
    paradox
  • Or, we can use the demons own a priori
    assessment of the probabilities
  • That is, essentially, let him pick his own
    compression system, however he wants!
  • The entropy of knowledge is then defined in a
    relative way, as the smallest size that a given
    entity with that knowledge would or could
    compress that knowledge to,
  • given a specification of its capabilities,
    together with any of its previous decisions
    commitments as to the compression strategy it
    would use.

51
A Simple Example
  • Suppose we have a seperable two-qubit system ab,
  • Where qubit a initially contains 1 bit of
    entropy
  • I.e., described by density operator ?a ?0 ?1
    0??0 1??1.
  • while qubit b is in a pure state (say 0?)
  • Its density operator (if we care) is ?b ?0
    0??0.
  • Now, suppose we do a CNOT(a,b).
  • Can view this process as a measurement of qubit
    a by qubit b.
  • Qubit b could be considered a subsystem of some
    quantum knower
  • Assuming the observer knows that this process has
    occurred,
  • We can say that he now knows the state of a!
  • Since the state of a is now correlated with a
    part of bs own state.
  • I.e., from bs personal subjective point of
    view,bit a is no longer an unknown bit
  • But it is still entropy, because theexpected
    compressed size of anencoding of this data is
    still 1 bit!
  • This becomes clearer in a larger example

?a ?0?1
?ab ?00 ?01 00??00 11??11
?b 0?
52
Slightly Larger Example
  • Suppose system A initially contains 8 random
    qubits a0a7, with a uniform distribution over
    their values
  • a thus contains 8 bits of entropy.
  • And system B initially contains a large number
    b0, of empty qubits.
  • b contains 0 entropy initially
  • Now, say we do CNOT(ai, bi) for i0 to 3
  • B now knows the values of a0,,a3.
  • The information in A that is unknown by b is now
    only the 4 other bits a4a7.
  • But, the AB system also contains an additional 4
    bits of information about A (shared between A and
    B) which (though known by B) is (we expect) still
    incompressible by B
  • I.e., the encoding that offers the minimum
    expected length (prior to learning a0a3) still
    has an expected length of 4 bits!
  • A second CNOT(bi, ai) can allow B to reversibly
    clear the entropyfrom system A.
  • Note this is a Maxwells Demon type of scenario.
  • Entropy isnt lost because the incompressible
    information in B is still entropy!
  • From an outside observers perspective, the
    amount of unknown information remains the same in
    all these situations
  • But from an inside perspective, entropy can flow
    (reversibly) from known to unknown and back

53
Entropy Conversion
4 bits of Aknown to B(correlation)
Target system A
4 bits un-known to B
8 bits unknown to B
CNOT(a0-3?b0-3)
a0 a1 a2 a3 a4 a5 a6 a7
x0 x1 x2 x3 x4 x5 x6 x7
a0 a1 a2 a3 a4 a5 a6 a7
x0 x1 x2 x3 x4 x5 x6 x7
A
A
b0 b1 b2 b3 b4 b5 b6 b7
x0 x1 x2 x3 0 0 0 0
b0 b1 b2 b3 b4 b5 b6 b7
0 0 0 0 0 0 0 0
B (reversibly)measures A
B
B
Demon system B
4 bits of knowledge 8 bits all
together compressibleto 4 bits
  • In all stages, there remain 8 total bits of
    entropy.
  • All 8 are unknown to us in our
    meta-perspective.
  • But some may be known to subsystem B!
  • Still call them entropy for B if we dont
    expect B can compress them away

4 bits un-known to B
a0 a1 a2 a3 a4 a5 a6 a7
0 0 0 0 x4 x5 x6 x7
A
CNOT(b0-3?a0-3)B (reversibly)controls A
b0 b1 b2 b3 b4 b5 b6 b7
x0 x1 x2 x3 0 0 0 0
B
4 incompressiblebits in Bs internalstate of
knowledge
54
Are we done?
  • I.e., have we arrived at a satisfactory
    generalization of the entropy concept?
  • Perhaps not quite, because
  • Weve been vague about how to define the
    compression system that the knower would use.
  • Or in other words, the knowers prior
    distribution.
  • We havent yet provided an operational definition
    (that can be replicably verified by a third
    party) of the meaning of
  • The entropy of a physical system A, as assessed
    by another physical system (the knower) B.
  • However, there might be no way to do better

55
One Possible Conclusion
  • Perhaps the entropy of a particular piece of
    known information can only be defined relative to
    a given description system.
  • Where by description system I mean a bijection
    between compressed decompressed
    informational objects ci ? di
  • Most usefully, the map should be computable.
  • This is not really any worse than the situation
    with standard statistical entropy, where it is
    only defined relative to a given state of
    knowledge, in the form of a probability
    distribution over states of the system.
  • The existence of optimal compression systems for
    given probability distributions strengthens the
    connection.
  • In fact, we can also infer a probability
    distribution from the description system, in
    cases of optimal description systems
  • We could consider a description system, rather
    than a probability distribution, to be the
    fundamental starting point for any discussion of
    entropy.
  • But, can we do better?

56
The Entropy Game
  • A game (or adversarial protocol) between two
    players (A and B) that can be used to
    operationally define the entropy content of a
    given target physical system X.
  • X should have a well-defined state space,
    with N states total information content Itot
    log N.
  • Basic idea B must use A (reversibly) as a
    storage medium for data provided by C.
  • The entropy of C is defined as its total
    info. content, minus the expected logarithm of
    the number of messages that A can reliably store
    and retrieve from it.
  • Rules of the game
  • A and B start out unentangled with each other
    (and with C).
  • A publishes his own exact initial classical
    state A0 in a public record.
  • B can probe A to make sure he is telling the
    truth.
  • Meanwhile, B prepares in secret any string WW0
    of any number n of bits.
  • B passes his string W to A. A may observe its
    length n.
  • A may then carry out any fixed quantum algorithm
    Q1 operating on the closed joint system (A,X,W),
    under the condition
  • The final state must leave (A,X,W) unentangled,
    AA0, and W 0n.
  • B is allowed to probe A and W to verify that
    AA0 and W0n.
  • Finally, A carries out another fixed quantum
    algorithm Q2, returning again to his initial
    state A0, and supposedly restoring W to its
    initial state.
  • A returns W to B B is allowed to check W and A
    again to verify that these conditions are
    satisfied.

Iterate till convergence.
Definition The entropy of system X is C minus
the maximum over As strategies (starting states
A0, and algorithms Q1,Q2) of the expectation
value (over states of X) of the minimum over Bs
strategies (sequences of strings) of the average
length of those strings that are exactly
returned by A (in step 8) with zero probability
of error.
57
Intuitions behind the Game
  • A wants to show that X has a low entropy (high
    available storage capacity or extropy).
  • He will choose an encoding of strings W in Xs
    state that is as efficient as possible.
  • A chooses his strategy without knowledge of what
    strings B will provide
  • The coding scheme must thus be very general.
  • Meanwhile, B wants to show that X has a high
    entropy (low capacity).
  • B will

58
Explaining Entropy Increase
  • When the Hamiltonian of a closed system is
    exactly known,
  • The statistical (von Neumann) entropy of the
    systems density operator is exactly conserved.
  • I.e., there is no entropy increase.
  • In the traditional statistical view of entropy,
  • Entropy can only increase in one of the following
    situations
  • (a) The Hamiltonian is not precisely known, or
  • (b) The system is not closed
  • Entropy can leak into the system from an unknown
    outside environment
  • (c) We estimate entropy by tracing over entangled
    subsystems
  • Take reduced density operators of individual
    subsystems
  • And pretend the entropy is additive
  • However, in the

59
Extra Slides
  • Omitted from talk for lack of time

60
Information Content of a Physical System
  • The (total amount of) information content I(A) of
    an abstract physical system A is the unknown
    information content of the mathematical object D
    used to define A.
  • If D is (or implies) only a set S of (assumed
    equiprobable) states, then we have I(A)
    U(S) log S.
  • If D implies a probability distribution PS over
    a set S (of distinguishable states), then
    I(A) U(PS) -Pi log Pi.
  • We would expect to gain I(A) information if we
    measured A (using basis set S) to find its exact
    actual state s?S.
  • ? we say that amount I(A) of information is
    contained in A.
  • Note that the information content depends on how
    broad (how abstract) the systems description D
    is!

61
Information Capacity Entropy
  • The information capacity of a system is also the
    amount of information about the actual state of
    the system that we do not know, given only the
    systems definition.
  • It is the amount of physical information that we
    can say is in the state of the system.
  • It is the amount of uncertainty we have about the
    state of the system, if we know only the systems
    definition.
  • It is also the quantity that is traditionally
    known as the (maximum) entropy S of the system.
  • Entropy was originally defined as the ratio of
    heat to temperature.
  • The importance of this quantity in thermodynamics
    (the observed fact that it never decreases) was
    first noticed by Rudolph Clausius in 1850.
  • Today we know that entropy is, physically, really
    nothing other than (unknown, incompressible)
    information!

62
Known vs. Unknown Information
  • We, as modelers, define what we mean by the
    system in question using some abstract
    description D.
  • This implies some information content I(A) for
    the abstract system A described by D.
  • But, we will often wish to model a scenario in
    which some entity E (perhaps ourselves) has more
    knowledge about the system A than is implied by
    its definition.
  • E.g., scenarios in which E has prepared A more
    specifically, or has measured some of its
    properties.
  • Such E will generally have a more specific
    description of A and thus would quote a lower
    resulting I(A) or entropy.
  • We can capture this by distinguishing the
    information in A that is known by E from that
    which is unknown.
  • Let us now see how to do this a little more
    formally.

63
Subsystems (More Generally)
  • For a system A defined by a state set S,
  • any partition P of S into subsets can be
    considered a subsystem B of A.
  • The subsets in the partition P can be considered
    the states of the subsystem B.

Another subsytem of A
In this example,the product of thetwo
partitions formsa partition of Sinto singleton
sets.We say that this isa complete set
ofsubsystems of A.In this example, the two
subsystemsare also independent.
One subsystemof A
64
Pieces of Information
  • For an abstract system A defined by a state set
    S, any subset T?S is a possible piece of
    information about A.
  • Namely it is the information The actual state of
    A is some member of this set T.
  • For an abstract system A defined by a probability
    distribution PS, any probability distribution
    P'S such that P0 ? P'0 and U(P')ltU(P) is
    another possible piece of information about A.
  • That is, any distribution that is consistent with
    and more informative than As very definition.

65
Known Physical Information
  • Within any universe (closed physical system) W
    described by distribution P, we say entity E (a
    subsystem of W) knows a piece P of the physical
    information contained in system A (another
    subsystem of W) iff P implies a correlation
    between the state of E and the state of A, and
    this correlation is meaningfully accessible to E.
  • Let us now see how to make this definition more
    precise.

The Universe W
Entity(Knower)E
The PhysicalSystem A
Correlation
66
What is a correlation, anyway?
  • A concept from statistics
  • Two abstract systems A and B are correlated or
    interdependent when the entropy of the combined
    system S(AB) is less than that of S(A)S(B).
  • I.e., something is known about the combined state
    of AB that cannot be represented as knowledge
    about the state of either A or B by itself.
  • E.g. A,B each have 2 possible states 0,1
  • They each have 1 bit of entropy.
  • But, we might also know that AB, so the entropy
    of AB is 1 bit, not 2. (States 00 and 11.)

67
Known Information, More Formally
  • For a system defined by probability distribution
    P that includes two subsystems A,B with
    respective state variables X,Y having mutual
    information IP(XY),
  • The total information content of B is I(B)
    U(PY).
  • The amount of information in B that is known by A
    is KA(B) IP(XY).
  • The amount of information in B that is unknown by
    A is UA(B) U(PY) - KA(B) S(Y) - I(XY)
    S(YX).
  • The amount of entropy in B from As perspective
    is SA(B) UA(B) S(YX).
  • These definitions are based on all the
    correlations that are present between A and B
    according to our global knowledge P.
  • However, a real entity A may not know,
    understand, or be able to utilize all the
    correlations that are actually present between
    him and B.
  • Therefore, generally more of Bs physical
    information will be effectively entropy, from As
    perspective, than is implied by this definition.
  • We will explore some corrections to this
    definition later.
  • Later, we will also see how to sensibly extend
    this definition to the quantum context.

68
Maximum Entropy vs. Entropy
Total information content I Maximum entropy
Smax logarithm of states consistent with
systems definition
Unknown information UA Entropy SA(as seen by
observer A)
Known information KA I - UA Smax - SAas
seen by observer A
Unknown information UB Entropy SB(as seen by
observer B)
69
A Simple Example
  • A spin is a type of simple quantum system having
    only 2 distinguishable states.
  • In the z basis, the basis states are called up
    (?) and down (?).
  • In the example to the right, we have a compound
    system composed of 3 spins.
  • ? it has 8 distinguishable states.
  • Suppose we know that the 4 crossed-out states
    have 0 amplitude (0 probability).
  • Due to prior preparation or measurement of the
    system.
  • Then the system contains
  • One bit of known information
  • in spin 2
  • and two bits of entropy
  • in spins 1 3

70
Entropy, as seen from the Inside
  • One problem with our previous definition of
    knowledge-dependent entropy based on mutual
    information is that it is only well-defined for
    an ensemble or probability distribution of
    observer states, not for a single observer state.
  • However, as observers, we always find ourselves
    in a particular state, not in an ensemble!
  • Can we obtain an alternative definition of
    entropy that works for (and can be used by)
    observers who are in individual states also?
  • While still obeying the 2nd law of
    thermodynamics?
  • Zurek proposed that entropy S should be defined
    to include not only unknown information U, but
    also incompressible information N.
  • By definition, incompressible information (even
    if it is known) cannot be reduced, therefore the
    validity of the 2nd law can be maintained.
  • Zurek proposed using a quantity called Kolmogorov
    complexity to measure the amount of
    incompressible information.
  • Size of shortest program that computes the
Write a Comment
User Comments (0)
About PowerShow.com