Hypothesis Testing - PowerPoint PPT Presentation

About This Presentation

Title:

Hypothesis Testing

Description:

Jacobian determinant. example. p(x,y) = 1/(2ps2) exp(-x2/2s2) exp(-y2/2s2) ... And the Jacobian determinant is. s=1. u-1/2 sin(v) u-1/2cos(v) u1/2 cos(v) -u1 ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 61

Provided by: billm7

Learn more at: https://www.ldeo.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Hypothesis Testing

1
Lecture 10

Hypothesis Testing

2
from a previous lecture Functions of a Random
Variable
any function of a random variable is itself a
random variable
3
If x has distribution p(x)the y(x) has
distributionp(y) px(y) dx/dy
4
example

Let x have a uniform (white) distribution of 0,1

1
p(x)
0
x
1
Uniform probability that x is anywhere between 0
and 1
5

Let y x2
then xy½
px(y)1 and dx/dy½y-½
So p(y)½y-½ on the interval 0,1

p(y)
y
6
Another example

Let x have a normal distribution with zero
expectation and unit variance. To avoid
complication, assume xgt0, so that the
distribution is twice the usual amplitude

p(x) 2 (2p)-1/2 exp(-½ x2)
The distribution of yx2 is
p(y) p(x(y)) dx/dy (2py)-1/2 exp(-½y)
note that we used, as before, dx/dy½y-½ You can
check that ?0?p(y)dy1 by looking up
?0?y-1/2exp(-ay)dy?(p/a) in a math book.
7
singularity at origin
p(x)
p(y)
x
y
Results not so different from uniform distribution
8
from a previous lecture Functions of Two
Random Variables
any function of a several random variables is
itself a random variable
9
If (x,y) has joint distribution p(x,y)then
given u(x,y) and v(x,y)thenp(u,v)
px(u,v),y(u,v) ?(x,y)/?(u,v)note then
that p(u)?p(u,v)dv and p(v)?p(u,v)du
Jacobian determinant
10
example

p(x,y) 1/(2ps2) exp(-x2/2s2) exp(-y2/2s2)
1/(2ps2) exp(-(x2y2)/2s2)
uncorrelated normal distribution of two variables
with zero expectation and equal variance, s

s1
11
Whats the distribution of ux2y2 ?

We need to choose a function v(x,y). A
reasonable choice is motivated by polar
coordinates, vtan-1(x/y)
Then xu1/2 sin(v) and yu1/2 cos(v)
And the Jacobian determinant is

We usually call these r2 and q in polar
coordinates
½ u-1/2 sin(v) ½u-1/2cos(v) u1/2 cos(v)
-u1/2 sin(v)
?x/?u ?y/?u ?x/?v ?y/?v

s1
½ sin2(v)½cos2(v) ½
12

So p(x,y) 1/(2ps2) exp(-(x2y2)/2s2)
transforms to
p(u,v) 1/(4ps2) exp(-u/2s2)
p(u) ?02p 1/(4ps2) exp(-u/2s2) dv 1/(2s2)
exp(-u/2s2)

Note ?0?p(u)du exp(-u/2s2)0? 1 as expected
p(u)
u
13
The point of my showing you this is to give you
the sense that computingthe probability
distributionsassociated with functions of random
variables isnot particularly mysteriousbut
instead is rather routine(though possibly
algebraically tedious)
14
Four (and only four) Important Distributions

Start with a bunch of random variables, xi
that are uncorrelated, normally distributed,
with zero expectation and unit variance
The four important distributions are
The distribution of xi, itself and the
distributions of three possible choices of
u(x0,x1)
u Si1Nxi2
u x0 / ? N-1 Si1Nxi2
u N-1Si1N xi2 / M-1Si1M xNi2

15
Important Distribution 1

Distribution of xi itself
(normal distribution with zero mean and unit
variance)
p(xi)(2p)-½ exp-½xi2
Suppose that a random variable y has expectation
y and variance sy2.
Then note the variable
Z (y-y)/sy
Is normally distributed with zero mean and unit
variance.
We show this by noting p(Z)p(y(Z)) dy/dZ with
dy/dZsy, so that p(y)(2p)-½ s-1exp-½xi2
transforms to p(Z)(2p)-½ exp-½Z2

16
p(x)
x
17
properties of the normal distribution(with zero
expectation and unit variance)

p(xi) (2p)-½ exp-½xi2

Mean 0 Mode 0 Variance 1
18
Important Distribution 2

Distribution of u Si1Nxi2
the sum of squares of N normally-distributed
random variables with zero expectation and unit
variance
This is called the chi-squared distribution with
N degrees of freedom and u is given the special
symbol ucN2
We have already computed the N1 and N2 cases!

19
N
p(cN2)
Heres the cases we worked out
cN2
20
properties of the chi-squared distribution
1

p(cN2) cN2½N-1 exp -½
cN2

2½N (½N-1)!
Mean N Mode 0 if Nlt2 N-2
otherwise Variance 2N
21
Important Distribution 3

Distribution of u x0 / ? N-1 Si1Nxi2
the ratio of
a normally-distributed random variable with zero
expectation and unit variance
and
the square-root of the sum of squares of N
normally-distributed random variables with zero
expectation and unit variance, divided by N
This is called students t distribution with N
degrees of freedom and u is given the special
symbol utN

22
N
?
p(tN)
Note N1 case is very long-tailed
Looks pretty much like a Gaussian in fact, is a
Gaussian in the limiting case N??
tN
23
properties of students tN distribution
(½N-½)!

p(tN) 1
N-1tN2-½(N1)

?(Np) (½N-1)!
Mean 0 Mode 0 Variance ?
if Nlt3 N/(N-2) otherwise
24
Important Distribution 4

Distribution of u N-1Si1N xi2 / M-1Si1M
xNi2
the ratio of
the sum of squares of N normally-distributed
random variables with zero expectation and unit
variance, divided by N
and
the sum of squares of M normally-distributed
random variables with zero expectation and unit
variance, divided by M
This is called F distribution with N and M
degrees of freedom and u is given the special
symbol uFN,M

25
N
M
p(FN,M)
FN,M
26
properties of the FN,M distribution

p(FN,M) too complicated for me to type in

Mean M/(M-2) if Mgt2 Mode (N-2)/N /
M/(M2) if Ngt2 Variance 2M2(NM-2) if
Mgt4 N(M-2)2(M-4)
27

Hypothesis Testing

The Null Hypothesis
always a variant of this theme
the results of an experiment differs from the
expected value only because of random variation

Test of Significance of Results
say to 95 significance
The Null Hypothesis would generate the observed
result less than 5 of the time

Example You buy an automated pipe-cutting
machine that cuts a long pipes into many segments
of equal length
Specifications
calibration (mean, mm) exact
repeatability (variance, sm2) 100 mm2
Now you test the machine by having it cut 25
10000-mm length pipe segments. You then measure
and tabulate the length of each pipe segment, Li.

Question 1 Is the machines calibration
correct?
Null Hypothesis any difference between the mean
length of the test pipe segments from the
specified 10000 mm can be ascribed to random
variation
you estimate the mean of the 25 samples
mobs9990 mm
The mean length deviates (mm-mobs)10 mm from the
setting of 10000. Is this significant?
Note from a prior lecture, the variance of the
mean is
smean2 sdata2/N.

So the quantity
Z (mm-mobs) / (sm/?N) where mobsN-1SiLi
is a normally-distributed with zero expectation
and unit variance.
In our case Z 10 / (10/5) 5
Z5 means that mm is 5 standard deviations from
the expected value of zero.

Scaling a quantity so it has zero mean and unit
variance is an important trick
33

The amount of area under the normal distribution
that is 5 standard deviations away from the mean
is very small. We can calculated it using the
Excel function
NORMDIST(x,mean,standard_dev,cumulative)
x is the value for which you want the
distribution.
mean is the arithmetic mean of the distribution.
standard_dev is the standard deviation of the
distribution.
Cumulative is a logical value that determines the
form of the function. If cumulative is TRUE,
NORMDIST returns the cumulative distribution
function if FALSE, it returns the probability
mass function.
2NORMDIST(-5,0,1,TRUE) 5.7421E-07 0.00006

Factor of two to account for both tails
34
Thus the Null Hypothesisthat the machine is
well-calibratedcan be excludedto very high
probability
35

Question 2 Is the machines repeatability
within specs?
Null Hypothesis any difference between the
repeatability (variance) of the test pipe
segments from the specified sm2100 mm2 can be
ascribed to random variation
The quantity xi (Li-mm) / sm is
normally-distributed with mean0 and variance1,
so
The quantity cN2 Si (Li-mm)2 / sm2 is
chi-squared distributed with 25 degrees of
freedom.

Suppose that the root mean squared variation of
pipe lengths was N-1 Si (Li-mm)2½ 12 mm.
Then c252 Si (Li-mm)2 / sm2 25 ? 144 / 100
36
CHIDIST(x,degrees_freedom)
x is the value at which you want to evaluate the
distribution.
degrees_freedom is the number of degrees of
freedom.
CHIDIST P(Xgtx), where X is a y2 random
variable.
The probability that c252 ? 36 is
CHIDIST(36,25)0.07 or 7

37
Thus the Null Hypothesisthat the difference from
the expected result of 10 is random
variationcannot be excluded(not to greater
than 95 probability)
38
Question 3But suppose the manufacturer had not
stated a repeatability specsjust a calibration
specyou cant test the calibrationusing the
quantity Z (mm-mobs) / (sm/?N)
Not known
39
Since the manufacturer has not supplied a
variance, we must estimate it from the data
sobs2 N-1 Si (Li-mm)2 144 mm2. and use it
in the formula (mm-mobs) / (sobs/?N)
40
But the quantity (mm-mobs) / (sobs/?N) is not
cN2 distributedbecause sobs is itself a random
variableits t-distributedremember tN x0 /
? N-1 Si1Nxi2
41

In our case tN 10 / (12/5) 4.16
TDIST(x,degrees_freedom,tails)
x is the numeric value at which to evaluate the
distribution.
Degrees_freedom is an integer indicating the
number of degrees of freedom.
Tails specifies the number of distribution tails
to return. If tails 1, TDIST returns the
one-tailed distribution. If tails 2, TDIST
returns the two-tailed distribution.
TDIST is calculated as TDIST P(xltX), where X
is a random variable that follows the
t-distribution.
tN TDIST(4.16,25,1) 0.00016 0.016

42
Thus the Null Hypothesisthat the difference from
the expected result of 10000 is due to random
variationcan be excludedto high
probability,but not nearly has high as when the
manufacturer told us the repeatability
43
Question 4Suppose you performed the test
twice, a year apart, and wanted to knowhas the
repeatability changed? This Year N-1 Si
(Lyr1i-mm)2½ 12 mm Last Year M-1 Si
(Lyr2i-mm)2½ 14 mm(lets say NM25 in both
cases)
44

Null Hypothesis any difference between the
repeatability (variance) of the test pipe
segments between years can be ascribed to random
variation
The ratio of mean-squared error is F-distributed
FN,M N-1Si1N xi2 / M-1Si1M xNi2
12/14 0.857

45
Note that since F is of the form Fa/b with both
a and b fluctuating around a mean value, that we
really want the cumulative probability that
Flt12/14 and Fgt14/12
p(FN,M)
FN,M
12/14
14/12
1
46

FDIST(x,degrees_freedom1,degrees_freedom2)
x is the value at which to evaluate the
function.
Degrees_freedom1 is the numerator degrees of
freedom.
Degrees_freedom2 is the denominator degrees of
freedom.
FDIST is calculated as FDISTP( Fltx ), where F
is a random variable that has an F distribution.
Left hand tail
1-FDIST(0.857,25,25) 1-0.6480.35235.2
Right hand tail
1-FDIST(1/0.857,25,25) 1-0.6480.35235.2
Both tails 70.4

Since P(Fgtx) 1-P(Fltx)
47
Thus the Null Hypothesisthat the year-to-year
difference in variance is due to random
variationcannot be excludedthere is no strong
reason to believe that the repeatability of the
machine has changed between the years
48
Question 5 Suppose you performed the test
twice, a year apart, and wanted to know if the
calibration changed.This Year myr1obs N-1
Si Lyr1i 9990 syr1obs N-1 Si (Lyr1i)2½
12 mm Last Year myr2obs N-1 Si Lyr2i
9993 syr2obs M-1 Si (Lyr2i-mm)2½ 14
mm(lets say NM25 in both cases)
49

The problem that we face is that while
tyr1N (myr1obs mm) / (syr1obs/?N)
and
tyr2N (myr2obs mm) / (syr2obs/?N)
Are individually t-distributed, their difference,
tyr1N tyr2N
Is not t-distributed. Statisticians have
circumvented this problem by cooking up a
function of (myr1obs , myr2obs , syr1obs,
syr1obs) that is approximately t-distributed.
But its messy.

50
In our case
Note Excels function TTEST() allows you to
perform the test on columns of data, without
typing the the formulas very handy!
19
51
Thus the Null Hypothesisthat the difference in
means is due to random variationcannot be
excluded
52
5 tests

mobs mprior when mprior and sprior are known
normal distribution
sobs sprior when mprior and sprior are known
chi-squared distribution
mobs mprior when mprior is known but sprior is
unknown
t distribution
s1obs s2obs when m1prior and m2prior are known
F distribution
m1obs m2obs when s1prior and s2prior are
unknown
modified t distribution

53
Example 1 LaGuardia Airport Mean Daily
Temperature Was the 5-year period 1950-1954
significantly warmer or cooler than the 5-year
period 2000-2004?
1950-1954
2000-2004
Null Hypothesis any differences between the mean
temperatures of these two time periods can be
ascribed to random variation Type of Test t-test
modified to test two means
54
Results 1950-1954 Mean Temperature
55.86580.77 1950-1954 Mean Temperature
55.87920.80 T-test Significance Probability
49 The Null Hypothesis, that the difference in
means is due to random variation, cannot be
rejected
55
Issue about noise

Note that we are estimating s by treating the
short-term (days-to-months) temperature
fluctuations as noise
Is this correct?
Certainly such fluctuations are not measurement
noise in the normal sense.
They might be considered model noise in the
sense that they are caused by weather systems
that are unmodeled (by us)
However, such noise probably does not meet all
the requirements for use in the statistical test.
In particular, it probably has some day-to-day
correlation (hot today, hot tomorrow, too) that
violated our implicit assumption of uncorrelated
noise.

56
Example 2 Does a parabola fit better than a
straight line?
First 7 days of data on Neuse River Hydrograph
shown in an early lecture
N7
Discharge, cfs
day
57
A parabola willalwaysfit better than a straight
linebecause it has an extra parameterBut does
it fit significantly better?Null Hypothesis
Any difference in fit is due to random variation
58
Linear Fit
Quadratic Fit
Approximation ratio of prediction errors
follows an F-distribution with the number of
degrees of freedom given by the number of data
minus the number of parameters in the fit
(N-3)-1Si1N (diobs-dipre)2 6985
(N-2)-1Si1N (diobs-dipre)2 153431
F 153431/ 6985 21.96 P(Flt21.96)
1-FDIST(21.96,5,4) 0.995 99.5
59

The Null Hypothesis can be rejected with
99.5 confidence

60
Another Issue about noise

Note that we are again basing estimates upon
model noise
in the sense that the prediction error is being
controlled at least partly - by the misfit of
the curve, as well as by measurement error
As before, such noise probably does not meet all
the requirements for use in the statistical test.
So the test needs to be used with some caution.

Write a Comment

User Comments (0)