Introduction to information theory - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Introduction to information theory

Description:

Entropy of the language and perplexity. Mutual information. 4. Entropy ... Perplexity is the weighted average number of choices a random variable has to make. ... – PowerPoint PPT presentation

Number of Views:168
Avg rating:3.0/5.0
Slides: 25
Provided by: coursesWa1
Category:

less

Transcript and Presenter's Notes

Title: Introduction to information theory


1
Introduction to information theory
  • LING 572
  • Fei Xia, Dan Jinguji
  • Week 1 1/10/06

2
Today
  • Information theory
  • Hw 1
  • Exam 1

3
Information theory
4
Information theory
  • Reading MS 2.2
  • It is the use of probability theory to quantify
    and measure information.
  • Basic concepts
  • Entropy
  • Cross entropy and relative entropy
  • Joint entropy and conditional entropy
  • Entropy of the language and perplexity
  • Mutual information

5
Entropy
  • Entropy is a measure of the uncertainty
    associated with a distribution.
  • The lower bound on the number of bits that it
    takes to transmit messages.
  • An example
  • Display the results of horse races.
  • Goal minimize the number of bits to encode the
    results.

6
An example
  • Uniform distribution pi1/8.
  • Non-uniform distribution (1/2,1/4,1/8, 1/16,
    1/64, 1/64, 1/64, 1/64)

(0, 10, 110, 1110, 111100, 111101, 111110, 111111)
  • Uniform distribution has higher entropy.
  • MaxEnt make the distribution as uniform as
    possible.

7
Cross Entropy
  • Entropy
  • Cross Entropy
  • Cross entropy is a distance measure between p(x)
    and q(x) p(x) is the true probability q(x) is
    our estimate of p(x).

8
Relative Entropy
  • Also called Kullback-Leibler divergence
  • Another distance measure between probability
    functions p and q.
  • KL divergence is asymmetric (not a true
    distance)

9
Reading assignment 1
  • Read MS 2.2 Essential Information Theory
  • Questions For a random variable X, p(x) and q(x)
    are two distributions Assuming p is the true
    distribution.
  • p(Xa)p(Xb)1/8, p(Xc)1/4, p(Xd)1/2
  • q(Xa)q(Xb)q(Xc)q(Xd)1/4
  • (a) What is H(X)?
  • What is H(X, q)?
  • What is KL divergence D(pq)?
  • What is D(qp)?

10
H(X) and H(X, q)
11
D(pq)
12
D(qp)
13
Joint and conditional entropy
  • Joint entropy
  • Conditional entropy

14
Entropy of a language(per-word entropy)
  • The entropy of a language L
  • If we make certain assumptions that the language
    is nice, then the cross entropy can be
    calculated as

15
Per-word entropy (cont)
  • p(x1n) can be calculated by n-gram models
  • Ex unigram model

16
Perplexity
  • Perplexity is 2H.
  • Perplexity is the weighted average number of
    choices a random variable has to make.
  • gt We learned how to calculate perplexity in
    LING570.

17
Mutual information
  • It measures how much is in common between X and
    Y
  • I(XY)KL(p(x,y)p(x)p(y))
  • I(XY) I(YX)

18
Summary on Information theory
  • Reading MS 2.2
  • It is the use of probability theory to quantify
    and measure information.
  • Basic concepts
  • Entropy
  • Cross entropy and relative entropy
  • Joint entropy and conditional entropy
  • Entropy of the language and perplexity
  • Mutual information

19
Hw1
20
Hw1
  • Q1-Q5 Information theory
  • Q6 Condor submit
  • Q7 Hw10 from LING570.
  • You are not required to turn in anything for Q7.
  • If you want feedback on this, you can choose to
    turn it in.
  • It wont be graded. You get 30 points for free.

21
Q6 condor submission
  • http//staff.washington.edu/brodbd/orientation.pdf
  • Especially Slide 22 - 28.

22
  • For a command we can run as
  • mycommand -a -n ltmycommand.in
    gtmycommand.out
  • The submit file might look like this save it to
    .cmd
  • Executable mycommand ? The
    command
  • Universe vanilla
  • getenv true
  • input mycommand.in ?
    STDIN
  • output mycommand.out ?
    STDOUT
  • error mycommand.error ?
    STDERR
  • Log /tmp/brodbd/mycommand.log ? A log
    file that stores the results

  • of condor sumbission
  • arguments "-a -n
    ? The arguments for the command
  • transfer_executable false
  • Queue

23
Submission and monitoring jobs on condor
  • Submission
  • condor_submit mycommand.cmd
  • gt get a job number
  • List the job queue
  • condor_q
  • Status changes from I (idle) to R (run)
    to
  • H means the job fails. Look at the log file
    specified
  • in .cmd
  • Disappeared from the queue You will receive an
    email
  • Use man condor_q etc. to learn more about those
    commands.

24
The path names for files in .cmd
  • In the .cmd file
  • Executable aa194.exec
  • input file1
  • The environment (e.g., /.bash_profile) might not
    be set properly
  • It assumes that the files are in the current
    directory (the dir where the job is submitted)
  • gt Use the full part names if needed.
Write a Comment
User Comments (0)
About PowerShow.com