Probability: Overview, Definitions, Jargon Blood Feuds

About This Presentation

Title:

Probability: Overview, Definitions, Jargon Blood Feuds

Description:

Statistics is a hard sell (on a good day) Most palatable ... Physical limits on information storage ... probability that a nonsmoker will get lung ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 57

Provided by: biow

Category:

more less

Transcript and Presenter's Notes

Title: Probability: Overview, Definitions, Jargon Blood Feuds

1
ProbabilityOverview, Definitions, Jargon Blood
Feuds

BioE131/231

2
Probability vs Statistics

Statistics is a hard sell (on a good day)
Most palatable approach that I know
Concentrate on modeling rather than tests
Bayesian vs Frequentist schools
Emphasize connection to information theory
Physical limits on information storage
transmission
Where probability meets signals systems
engineering

3
Bayesians and Frequentists

Believe it or not, statisticians fight
Frequentists (old school)
Emphasis on tests t, ?2, ANOVA
Write down competing hypotheses, but only analyze
null hypothesis (!)
Report significance (actually improbability)
Bayesians (new school)
Emphasis on modeling
Build a model for all competing hypotheses
Use probabilities to represent levels of belief

4
A word on notation
5
Distributions densities
6
Discrete vs continuous
Binomial
Gaussian
7
Cumulative distributions

Density function
Cumulative distribution function

8
More definitions
9
Normalization
Similarly for probability density functions
etc. (replace sums by integrals)
10
Independence
11
I.I.D.
12
Uniform
13
Lets get Bayesian
14
Example Fall 05 admissions
P(C1)
P(A1 C1)
P(A1, C1)
P(C1 A1)
P(A1)
15
Example Fall 05 admissions
P(C1)
P(A1 C1)
0.23 / 0.83
P(A1, C1)
P(C1 A1)
0.23 / 0.26
P(A1)
16
Bayesian inference

Probabilities frequencies are essentially
alternative ways of looking at the same thing
However... frequencies are sometimes more
intuitive
We will return to more examples of Bayes Theorem
and inference

17
Experimental error
18
Experimental error (cont.)
19
Approximate errors
20
Shannon information
Can also be interpreted as number of bits that an
ideal data compression algorithm needs to
encode message x
21
Entropy of a bent coin

X1 means heads
X0 means tails
P(X1) q
P(X0) 1-q

22
LHopitals rule
23
Entropy of a bent coin

X1 means heads
X0 means tails
P(X1) q
P(X0) 1-q

24
Information binary codes

Illustration of Shannon information in the
special case of a uniform distribution

25
Information uniform distributions

Consider an alphabet of N symbols (each of which
appears equally frequently)
A simple binary code needs lg(N) bits per symbol
(rounded up)
Each symbol occurs equally frequently means
that the probability of any symbol is p1/N and
the Shannon information is h-lg(p)lg(N)
This is a special case of a more general theorem
that h(x) represents the number of bits needed to
encode x

26
Shannon entropy
S lth(x)gt
Can also be interpreted as a performance measure
for an ideal data compression algorithm
27
Relative entropy
A measure of difference between probabilistic
models (not a true distance, despite the name)
Can also be interpreted as a measure of relative
efficiency between two ideal data compression
algorithms
28
DNA example

Consider a sequence of DNA that is all As and
Ts. Distribution isp(A)p(T)1/2p(C)p(G) 0
Consider another sequence that is uniformly
distributedq(A)q(C)q(G)q(T)1/4
If I tell you a nucleotide is from the second
sequence, you need two bits to encode it
If I tell you the nucleotide is from the first
sequence, you only need one bit to encode it
If I say its from the second sequence, but its
actually from the first, youve wasted a bit
This is whats meant by D(pq)2
Its not the same as D(qp) if I tell you the
nucleotide is from the first sequence but its
really from the second, you might not be able to
encode it at all (technically the relative
entropy is infinite)

29
Mutual information
Measure of increased data compression efficiency
obtainable by compressing two related texts
together
30
Example DNA double helix

Consider strand-symmetric DNA
Assume uniform distribution over nucleotides
Pick a random nucleotide (x) and its
opposite-strand partner (y)
P(x) 1/4, P(y) 1/4 so P(x)P(y)1/16
P(x,y) 1/4 if x is complement of y, 0
otherwise
Mutual information is 2 bits

31
Message length
32
Binary codes
33
Prefix codes
34
Unique decodability
35
Kraft-McMillan Inequality
36
Ideal codes
An ideal code, C, for a probability distribution,
P, is one in which the codeword lengths in C
match the Shannon information contents in P.
37
Why ideal?
38
Relative entropy codes
39
Mutual information codes
I(xy) S(x) S(y) - S(x,y)
40
Conditional entropy
Measures the information content of a variable,
x, when another variable, y, is known
S(xy) S(x,y) - S(x)
I(xy) S(x) S(y) - S(x,y) S(x) - S(xy)
S(y) - S(yx)
41
Mutual information example 2
42
Combinatorics
43
Multinomials
44
Rates of events
45
Gaussian distribution
Density function
Cumulative distribution function
46
Extreme Value Distribution
47
Bayesian Inference Examples
Several of these examples are taken from
http//yudkowsky.net/bayes/bayes.html Others are
from David MacKays book
48
Bayes Theorem (reminder)
49
Xdisease, Ysymptom (or test result)

1 of women at age forty who participate in
routine screening have breast cancer.
80 of women with breast cancer will get positive
mammographies.
9.6 of women without breast cancer will also get
positive mammographies.
A woman in this age group had a positive
mammography in a routine screening.
What is the probability that she actually has
breast cancer?

Scary fact 85 of doctors get this
wrong (Casscells, Schoenberger, and Graboys 1978
Eddy 1982 Gigerenzer and Hoffrage 1995)
50
Alternative presentation

100 out of 10,000 women at age forty who
participate in routine screening have breast
cancer.
80 of every 100 women with breast cancer will
get positive mammographies.
950 out of 9,900 women without breast cancer
will also get positive mammographies.
If 10,000 women in this age group have a positive
mammography in a routine screening
About what fraction of them actually have breast
cancer?

Equally scary fact 54 of doctors still get it
wrong
51
All relevant probabilities
52
Similar example

A drug test is 99 accurate.
Suppose 0.5 of people actually use the drug.
What is the probability that a person who tests
positive is actually a user?

53
Xrisk factor, Ydisease

Medical researchers know that the probability of
getting lung cancer if a person smokes is .34.
The probability that a nonsmoker will get lung
cancer is .03.
It is also known that 11 of the population
smokes.
What is the probability that a person with lung
cancer was a smoker?

54
Sometimes the problem is stated less clearly

Suppose you have a large barrel containing a
number of plastic eggs.
Some eggs contain pearls, the rest contain
nothing.
Some eggs are painted blue, the rest are painted
red.
Suppose that
40 of the eggs are painted blue
5/13 of the eggs containing pearls are painted
blue
20 of the eggs are both empty and painted red.
What is the probability that an egg painted blue
contains a pearl?

55
Pearls and eggs

X is egg color (0 for blue, 1 for red)
Y is the pearl (0 if absent, 1 if present)

56
The Monty Hall problem

We are presented with three doors - red, green,
and blue - one of which has a prize.
We choose the red door, but this door is not
opened (yet), according to the rules.
The rules are that the presenter knows what door
the prize is behind, and who must open a door,
but is not permitted to open the door we have
picked or the door with the prize.
The presenter opens the green door, revealing
that there is no prize behind it, and
subsequently asks if we wish to change our mind
about our initial selection of red.
What are the probabilities that the prize is
behind (respectively) the blue and red doors?
Xprize door, Ypresenter door