Title: Probability: Overview, Definitions, Jargon Blood Feuds
1ProbabilityOverview, Definitions, Jargon Blood
Feuds
2Probability vs Statistics
- Statistics is a hard sell (on a good day)
- Most palatable approach that I know
- Concentrate on modeling rather than tests
- Bayesian vs Frequentist schools
- Emphasize connection to information theory
- Physical limits on information storage
transmission - Where probability meets signals systems
engineering
3Bayesians and Frequentists
- Believe it or not, statisticians fight
- Frequentists (old school)
- Emphasis on tests t, ?2, ANOVA
- Write down competing hypotheses, but only analyze
null hypothesis (!) - Report significance (actually improbability)
- Bayesians (new school)
- Emphasis on modeling
- Build a model for all competing hypotheses
- Use probabilities to represent levels of belief
4A word on notation
5Distributions densities
6Discrete vs continuous
Binomial
Gaussian
7Cumulative distributions
- Density function
- Cumulative distribution function
8More definitions
9Normalization
Similarly for probability density functions
etc. (replace sums by integrals)
10Independence
11I.I.D.
12Uniform
13Lets get Bayesian
14Example Fall 05 admissions
P(C1)
P(A1 C1)
P(A1, C1)
P(C1 A1)
P(A1)
15Example Fall 05 admissions
P(C1)
P(A1 C1)
0.23 / 0.83
P(A1, C1)
P(C1 A1)
0.23 / 0.26
P(A1)
16Bayesian inference
- Probabilities frequencies are essentially
alternative ways of looking at the same thing - However... frequencies are sometimes more
intuitive - We will return to more examples of Bayes Theorem
and inference
17Experimental error
18Experimental error (cont.)
19Approximate errors
20Shannon information
Can also be interpreted as number of bits that an
ideal data compression algorithm needs to
encode message x
21Entropy of a bent coin
- X1 means heads
- X0 means tails
- P(X1) q
- P(X0) 1-q
22LHopitals rule
23Entropy of a bent coin
- X1 means heads
- X0 means tails
- P(X1) q
- P(X0) 1-q
24Information binary codes
- Illustration of Shannon information in the
special case of a uniform distribution
25Information uniform distributions
- Consider an alphabet of N symbols (each of which
appears equally frequently) - A simple binary code needs lg(N) bits per symbol
(rounded up) - Each symbol occurs equally frequently means
that the probability of any symbol is p1/N and
the Shannon information is h-lg(p)lg(N) - This is a special case of a more general theorem
that h(x) represents the number of bits needed to
encode x
26Shannon entropy
S lth(x)gt
Can also be interpreted as a performance measure
for an ideal data compression algorithm
27Relative entropy
A measure of difference between probabilistic
models (not a true distance, despite the name)
Can also be interpreted as a measure of relative
efficiency between two ideal data compression
algorithms
28DNA example
- Consider a sequence of DNA that is all As and
Ts. Distribution isp(A)p(T)1/2p(C)p(G) 0 - Consider another sequence that is uniformly
distributedq(A)q(C)q(G)q(T)1/4 - If I tell you a nucleotide is from the second
sequence, you need two bits to encode it - If I tell you the nucleotide is from the first
sequence, you only need one bit to encode it - If I say its from the second sequence, but its
actually from the first, youve wasted a bit - This is whats meant by D(pq)2
- Its not the same as D(qp) if I tell you the
nucleotide is from the first sequence but its
really from the second, you might not be able to
encode it at all (technically the relative
entropy is infinite)
29Mutual information
Measure of increased data compression efficiency
obtainable by compressing two related texts
together
30Example DNA double helix
- Consider strand-symmetric DNA
- Assume uniform distribution over nucleotides
- Pick a random nucleotide (x) and its
opposite-strand partner (y) - P(x) 1/4, P(y) 1/4 so P(x)P(y)1/16
- P(x,y) 1/4 if x is complement of y, 0
otherwise - Mutual information is 2 bits
31Message length
32Binary codes
33Prefix codes
34Unique decodability
35Kraft-McMillan Inequality
36Ideal codes
An ideal code, C, for a probability distribution,
P, is one in which the codeword lengths in C
match the Shannon information contents in P.
37Why ideal?
38Relative entropy codes
39Mutual information codes
I(xy) S(x) S(y) - S(x,y)
40Conditional entropy
Measures the information content of a variable,
x, when another variable, y, is known
S(xy) S(x,y) - S(x)
I(xy) S(x) S(y) - S(x,y) S(x) - S(xy)
S(y) - S(yx)
41Mutual information example 2
42Combinatorics
43Multinomials
44Rates of events
45Gaussian distribution
Density function
Cumulative distribution function
46Extreme Value Distribution
47Bayesian Inference Examples
Several of these examples are taken from
http//yudkowsky.net/bayes/bayes.html Others are
from David MacKays book
48Bayes Theorem (reminder)
49Xdisease, Ysymptom (or test result)
- 1 of women at age forty who participate in
routine screening have breast cancer. - 80 of women with breast cancer will get positive
mammographies. - 9.6 of women without breast cancer will also get
positive mammographies. - A woman in this age group had a positive
mammography in a routine screening. - What is the probability that she actually has
breast cancer?
Scary fact 85 of doctors get this
wrong (Casscells, Schoenberger, and Graboys 1978
Eddy 1982 Gigerenzer and Hoffrage 1995)
50Alternative presentation
- 100 out of 10,000 women at age forty who
participate in routine screening have breast
cancer. - 80 of every 100 women with breast cancer will
get positive mammographies. - 950 out of 9,900 women without breast cancer
will also get positive mammographies. - If 10,000 women in this age group have a positive
mammography in a routine screening - About what fraction of them actually have breast
cancer?
Equally scary fact 54 of doctors still get it
wrong
51All relevant probabilities
52Similar example
- A drug test is 99 accurate.
- Suppose 0.5 of people actually use the drug.
- What is the probability that a person who tests
positive is actually a user?
53Xrisk factor, Ydisease
- Medical researchers know that the probability of
getting lung cancer if a person smokes is .34. - The probability that a nonsmoker will get lung
cancer is .03. - It is also known that 11 of the population
smokes. - What is the probability that a person with lung
cancer was a smoker?
54Sometimes the problem is stated less clearly
- Suppose you have a large barrel containing a
number of plastic eggs. - Some eggs contain pearls, the rest contain
nothing. - Some eggs are painted blue, the rest are painted
red. - Suppose that
- 40 of the eggs are painted blue
- 5/13 of the eggs containing pearls are painted
blue - 20 of the eggs are both empty and painted red.
- What is the probability that an egg painted blue
contains a pearl?
55Pearls and eggs
- X is egg color (0 for blue, 1 for red)
- Y is the pearl (0 if absent, 1 if present)
56The Monty Hall problem
- We are presented with three doors - red, green,
and blue - one of which has a prize. - We choose the red door, but this door is not
opened (yet), according to the rules. - The rules are that the presenter knows what door
the prize is behind, and who must open a door,
but is not permitted to open the door we have
picked or the door with the prize. - The presenter opens the green door, revealing
that there is no prize behind it, and
subsequently asks if we wish to change our mind
about our initial selection of red. - What are the probabilities that the prize is
behind (respectively) the blue and red doors? - Xprize door, Ypresenter door