Probability - PowerPoint PPT Presentation

About This Presentation

Title:

Probability

Description:

... until you realize' it. its properties are described by a probability, ... Cij = (xi-xi) (xj-xj) p(x1,x2) dx1dx2. has diagonal elements of sxi2 the variance of xi ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 68

Provided by: billm7

Learn more at: https://www.ldeo.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Probability

1

Lecture 2
Probability
and what it has to do with
data analysis

2
Abstraction

Random variable, x
it has no set value, until you realize it
its properties are described by a probability, P

3
pot of an infinite number of xs
One way to think about it
x
p(x)
Drawing one x from the pot realizes x
4
Describing P

If x can take on only discrete values,
say (1, 2, 3, 4, or 5)
then a table would work

40 probability that x4
x 1 2 3 4 5
P 10 30 40 15 5
Probabilities should sum to 100
5

Sometimes you see probabilities written as
fractions, instead of percentages

Probability should sum to 1
x 1 2 3 4 5
P 0.10 0.40 0.40 0.15 0.05
0.15 probability that x4
And sometimes you see probabilities plotted as a
histogram
0.5
0.15 probability that x4
P(x)
x
0.0
1
2
3
4
5
6

If x can take on any value, then use a smooth
function (or distribution) p(x) instead of a
table

probability that x is between x1 and x2 is
proportional to this area
p(x)
x
x1
x2
mathematically P(x1ltxltx2) ?x1x2 p(x) dx
7
p(x)
x
Probability that x is between -? and ? is 100,
so total area 1 Mathematically
?-?? p(x) dx 1
8
One Reason Why all this is relevant

Any measurement of data that contains noise is
treated as a random variable, d
and

The distribution p(d) embodies both the true
value of the datum being measured and the
measurement noise
and

All quantities derived from a random variable are
themselves random variables,
so

The algebra of random variables allows you to
understand how
measurement noise affects inferences made from
the data

12
Basic Description of Distributionswant two
basic numbers1) something that describes what
xs commonly occur2) something that describes
the variability of the xs
13
1) something that describes what xs e commonly
occurthat is, where the distribution is centered
14
Mode x at which distribution has peak most-likely
value of x
peak
p(x)
x
xmode
15

The most popular car in the US is the Honda CR-V

Honda CV-R
But the next car you see on the highway will
probably not be a Honda CR-V
Wheres a CV-R?
16
But modes can be deceptive
100 realizations of x
x N 0-1 3 1-2 18 2-3 11 3-4 8 4-5 11 5-6 14 6-7 8
7-8 7 8-9 11 9-10 9
Sure, the 1-2 range has the most counts, but most
of the measurements are bigger than 2!
peak
p(x)
x
0
10
xmode
17
Median 50 chance x is smaller than xmedian 50
chance x is bigger than xmedian
No special reason the median needs to coincide
with the peak
p(x)
50
50
x
xmedian
18
Expected value or mean value you would get if
you took the mean of lots of realizations of x
Lets examine a discrete distribution, for
simplicity ...
4
3
P(x)
2
1
0
x
1
2
3
19
Hypothetical table of 140 realizations of x

x N
20
80
40
Total 140

mean 20 ? 1 80 ? 2 40 ? 3 /
140 (20/140) ? 1 (80/140) ?
2 (40/140) ? 3 p(1) ? 1 p(2) ? 2 p(3)
? 3 Si p(xi) xi
20
by analogyfor a smooth distribution

Expected (or mean) value of x
E(x) ?-?? x p(x) dx

21
2) something that describes the variability of
the xsthat is, the width of the distribution
22
Heres a perfectly sensible way to define the
width of a distribution
p(x)
50
25
25
x
W50
its not used much, though
23
Width of a distribution Heres another way
Parabola x-E(x)2
p(x)
x
E(x)
multiply and integrate
24
Idea is that if distribution is narrow, then most
of the probability lines up with the low spot of
the parabola
x-E(x)2
p(x)
x
E(x)
But if it is wide, then some of the probability
lines up with the high parts of the parabola
x-E(x)2 p(x)
Compute this total area
x
E(x)
Variance s2 ?-?? x-E(x)2 p(x) dx
25
?variance s A measure of width
p(x)
s
x
E(x)
we dont immediately know its relationship to
area, though
26
the Gaussian or normal distributionp(x)
exp - (x-x)2 / 2s2 )
s2 is variance
x is expected value
1 ?(2p)s
Memorize me !
27
p(x)
x 1 s 1
Examples of Normal Distributions
x
p(x)
x 3 s 0.5
x
28
Properties of the normal distribution
Expectation Median Mode x 95 of
probability within 2s of the expected value
p(x)
95
x
29
Again, Why all this is relevant

Inference depends on data
You use measurement, d, to deduce the values of
some underlying parameter of interest, m.
e.g.
use measurements of travel time, d, to deduce
the seismic velocity, m, of the earth

model parameter, m, depends on measurement, d
so m is a function of d, m(d)
so

If data, d, is a random variable
then so is model parameter, m
All inferences made from uncertain data are
themselves uncertain
Model parameters are described by a distribution,
p(m)

32
Functions of a random variable
any function of a random variable is itself a
random variable
33
Special case of a linear relationship and a
normal distribution

Normal p(d) with mean d and variance s2d
Linear relationship m a d b
Normal p(m) with mean adb and variance a2s2d

34
multivariate distributions
35
Example

Liberty island is inhabited by both pigeons and
seagulls
40 of the birds are pigeons
and 60 of the birds are gulls
50 of pigeons are white and 50 are grey
100 of gulls are white

36
Two variables

species s takes two values
pigeon p
and gull g
color c takes two values
white w
and tan t

Of 100 birds, 20 are white pigeons 20 are grey
pigeons 60 are white gulls 0 are grey gulls
37
What is the probability that a bird has species s
and color c ?
a random bird, that is
p
20
20
s
g
60
0
Note sum of all boxes is 100
w
t
c
38
This is called theJoint Probabilityand is
writtenP(s,c)
39
Two continuous variablessay x1 and x2have a
joint probability distributionand writtenp(x1,
x2)with ? ? p(x1, x2) dx1 dx2 1
40
You would contour a joint probability
distributionand it would look something like
x2
x1
41
What is the probability that a bird has color c ?
Of 100 birds, 20 are white pigeons 20 are grey
pigeons 60 are white gulls 0 are grey gulls
start with P(s,c)
p
20
20
s
g
60
0
w
t
and sum columns
c
80
20
To get P(c)
42
What is the probability that a bird has species s
?
start with P(s,c)
p
20
20
40
and sum rows
s
Of 100 birds, 20 are white pigeons 20 are grey
pigeons 60 are white gulls 0 are grey gulls
g
60
0
60
w
t
To get P(s)
c
43
These operations make sense with distributions,
too
x2
x2
x2
x1
x1
p(x2)
p(x1)
x1
p(x1) ? p(x1,x2) dx2
p(x2) ? p(x1,x2) dx1
distribution of x1 (irrespective of x2)
distribution of x2 (irrespective of x1)
44
Given that a bird is species swhat is the
probability that it has color c ?
Of 100 birds, 20 are white pigeons 20 are grey
pigeons 60 are white gulls 0 are grey gulls
Note, all rows sum to 100
45
This is called theConditional Probability of c
given sand is writtenP(cs)similarly
46
Given that a bird is color cwhat is the
probability that it has species s ?
Of 100 birds, 20 are white pigeons 20 are grey
pigeons 60 are white gulls 0 are grey gulls So
25 of white birds are pigeons
p
25
100
s
g
75
0
w
t
Note, all columns sum to 100
c
47
This is called theConditional Probability of s
given cand is writtenP(sc)
48
Beware!P(cs) ? P(sc)
p
p
50
50
25
100
s
s
g
100
0
g
75
0
w
t
w
t
c
c
49
Actor Patrick Swaysepancreatic cancer victim
Lot of errors occur from confusing the
two Probability that, if you have pancreatic
cancer, that you will die from it 90 Probabilit
y that, if you die, you will have died of
pancreatic cancer 1.4
50
note

P(s,c) P(sc) P(c)

25 of 80 is 20
?

w
t
c
51
and

P(s,c) P(cs) P(s)

50 of 40 is 20
p
?

s
g
52
and if

P(s,c) P(sc) P(c) P(cs) P(s)

then
P(sc) P(cs) P(s) / P(c) and P(cs) P(sc)
P(c) / P(s) which is called Bayes Theorem
53
In this example bird color is the observable,
the data bird species is the model
parameter P(cs) color given species or
P(dm) is making a prediction based on the
model Given a pigeon, what the probability
that its grey? P(sc), species given color
or P(md) is making an inference from the
data Given a grey bird, what the probability
that its a pigeon?
54
Why Bayes Theorem is important It provides a
framework for relating making a prediction from
the model, P(dm) to making an inference
from the data, P(md)
55
Bayes Theorem also implies that the joint
distribution of data and model parameters p(d,
m) is the fundamental quantity If you know
p(d, m), you know everything there is to know
56

Expectation
Variance
And
Covariance
Of a multivariate distribution

57
The expectation is computed by first reducing the
distribution to one dimension
x2
x2
x2
x2
x1
x1
p(x2)
take the expectation of p(x2) to get x2
p(x1)
x1
x1
take the expectation of p(x1) to get x1
58
The varaince is also computed by first reducing
the distribution to one dimension
x2
x2
x2
x2
s2
x1
x1
p(x2)
s1
take the variance of p(x2) to get s22
p(x1)
x1
x1
take the variance of p(x1) to get s12
59
Note that in this distribution if x1 is bigger
than x1, then x2 is bigger than x2 and if x1 is
smaller than x1, then x2 is smaller than x2
x2
This is a positive correlation
x2
x1
x1
Expected value
60
Conversely, in this distribution if x1 is bigger
than x1, then x2 is smaller than x2 and if x1
is smaller than x1, then x2 is smaller than x2
x2
This is a negative correlation
x2
x1
x1
Expected value
61
This correlation can be quantified by multiplying
the distribution by a four-quadrant function
x2
-

x2
-

x1
x1
And then integrating. The function
(x1-x1)(x2-x2) works fine
C ?? (x1-x1) (x2-x2) p(x1,x2) dx1dx2
Called the covariance
62
Note that the matrix C with elementsCij ??
(xi-xi) (xj-xj) p(x1,x2) dx1dx2has diagonal
elements of sxi2 the variance of
xiandoff-diagonal elements of cov(xi,xj) the
covariance of xi and xj
s12 cov(x1,x2) cov(x1,x3)
cov(x1,x2) s22 cov(x2,x2)
cov(x1,x3) cov(x2,x2) s32
C
63
The vector of means of multivatiate
distribution x and the Covariance matrix of
multivariate distributionCxsummarized a lot
but not everything about a multivariate
distribution
64
Functions of a set of random variables, x
A vector of of N random variables in a vector, x
65
Special Case

linear function yMx
the expectation of y is

yMx
Memorize!
66

the covariance of y is

So Cy M Cx MT
Memorize!
67
Note that these rules work regardless of the
distribution of xif y is linearly related to x,
yMx then yMx (rule for means) Cy M Cx
MT(rule for propagating error)
Memorize!

Write a Comment

User Comments (0)