FPP 26-27

About This Presentation

Title:

FPP 26-27

Description:

Significance Tests FPP 26-27 My opinion about statistical significance DO NOT RELY BLINDLY ON A FIXED CUT-OFF Consider two p-values: 0.050001 and 0.049999. – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 45

Provided by: Garrit2

Learn more at: http://www2.stat.duke.edu

Category:

more less

Transcript and Presenter's Notes

Title: FPP 26-27

1
Significance Tests

FPP 26-27

2
Significance tests

Question
Given the collected data, is there evidence
against a specified hypothesis about the
corresponding parameter?
In other words, are the data consistent or not
with a specified hypothesis?

3
Logic of significance tests

Proof by contradiction
1. assume some hypothesis is true
2. find a statistic (a quantity that depends on
data) that takes on extreme values when assumed
hypothesis is false
3. Calculate the value of this statistic in the
collected data
4. Calculate the probability of observing a value
of the statistic as or more extreme than the
observed value, under the assumed hypothesis
5. when this probability is small, one of two
things happened
A. the assumed hypothesis is correct and a rare
event occurred
B . the assumed hypothesis is incorrect.
since rare events are by definition rare, we
interpret small probabilities as evidence that
the assumed hypothesis is false.
When the probability is not small, the data
provide insufficient evidence to claim that the
assumed hypothesis is false.

4
Significance test for a population percentage

Civil rights and the 1960s
In the court case Swain vs. Alabama (1965), the
prosecution alleged there was discrimination
against black people in grand jury selection.
Census data from the time indicates that 25 of
people eligible for grand jury service were
black. A random sample of 1050 people called to
appear for possible jury duty contained 177 black
people. Is there evidence of discrimination?
Reference Devore, J. Probability and
Statistics for Engineering and the Sciences.
Pacific Grove, CA Duxbury, 2000, p. 339

5
Step 1 Formulate hypothesis

Claim There is discrimination
The opposite of this claim is called the null
hypothesis. It usually can be translated as
there is nothing unusual going on.
The claim is called the alternative hypothesis.
It usually can be translated as there is some
unusual pattern in the data
H0 P 0.25 vs HA P lt 0.25

6
Step2 Find a relevant statistic

Values of the sample percentage of black jurors
much smaller than 0.25 suggest the null
hypothesis is not true
Sample proportion 177/1050 0.1689.
Is this much smaller than 0.25?
A good way to determine this is by converting the
difference between 0.1689 and 0.25 to standard
units

7
Step 3 Calculate z in data

We get
The sample percentage of black jurors is six SE
away from zero

8
Step 4 Calculate the p-value

When n (the sample size) is large enough, we an
use a standard normal curve to calculate the
probability of seeing a value of z less (i.e.as
or more extreme) the observed value of -6.06
To find the probability we need the distribution
of z. Do we know it?

9
Conclusion in Swain case

Because the p-value is approximately 0, we reject
the null hypothesis. It is very unlikely that we
would observe a sample percentage of 16.89 or
smaller if the true percentage was 0.25. The
data suggest that black jurors were indeed
selected less frequently than would have been
expected. The data provide some evidence of
discrimination.

10
Stating hypothesis

Null Hypothesis (H0)
The statement being tested in a test of
significance is called the null hypothesis
Usually the null hypothesis
is a statement of no effect or no difference,
is a statement about a population,
is expressed in terms of a (some) parameter(s).
Example H0 ?0

11
Stating hypothesis

Alternative Hypothesis ( Ha )
name given to the statement we hope or suspect to
be true instead of H0
Example Ha ??0
Hypotheses always refer to some population or
model, not a particular outcome
We must decide whether the alternative hypothesis
(Ha) should be one-sided or two-sided

12
Stating hypothesis

One-sided alternative hypotheses
Example Ha µlt 0. Ha µ gt 0
Two-sided alternative hypothesis
Example Ha µ? 0

13
Stating hypothesis

Choosing one-sided or two-sided Hypothesis
The alternative hypothesis should express the
hopes or suspicions we had in mind when we
decided to collect the data
It is cheating to first look at the data and then
frame Ha to fit what the data show
If you do not have a specific direction in
advance, use a two-sided alternative

14
Stating hypothesis

Example Your company hopes to reduce the mean
time (?) required to process customer orders. At
present, this mean is 3.8 days. You study the
process and eliminate some unnecessary steps.
Q Did you succeed in decreasing the average
process time?
Target to show that the mean is now less than
3.8 days.
So alternative hypothesis is one-sided
The null hypothesis is no change value
Ho µ 3.8 vs Ha µlt 3.8

15
Stating hypothesis

The mean area of several thousand apartments in a
new development is advertised to be 1250 sqft. A
tenant group thinks that the apartments are
smaller than advertised. They hire an engineer
to measure a sample of apartments to test their
suspicion.
H0 ?1250 vs. Ha ?lt1250

16
Stating hypothesis

Experimenters on learning in animals sometimes
measure how long it takes a mouse to find its way
through a maze. The mean time is 18 seconds for
one particular maze. A researcher thinks that a
loud noise will cause the mice to complete the
maze slower. She measures how long each of 10
mice takes with a noise as stimulus
H0 ?18 vs. Ha ?gt18

17
Stating hypothesis

Last year, your companys service technicians
took an average of 2.6 hours to respond to
trouble calls from business customers who
purchased service contracts. Do this years data
show a different average response time?
H0 ? 2.6 vs. Ha ? ? 2.6

18
Test Statistic

After correctly formulating the null and
alternative hypothesis we make a comparison
between the hypothesized value and the data by
using a test statistic.
Many test statistics can be thought of as a
standardized distance between a sample estimate
of a parameter and the value of the parameter
specified by the null hypothesis
Most test statistics have generic form
Test statistic for a proportion
Test statistic for a mean

19
P-values

A test of significance assesses the evidence
against the null hypothesis and provides a
numerical summary of this evidence in terms of a
probability
The idea is that surprising outcomes are
evidence against Ho
A surprising outcome is one that is far from what
we would expect if Ho were true

20
P-values

A test of significance finds the probability of
getting an outcome as extreme or more extreme
than the actually observed outcome
The direction or directions that count as far
from what we would expect are determined by the
alternative hypothesis
Definition The probability, assuming that H0 is
true, that the test statistic would take a value
as extreme or more extreme than that actually
observed is called the P-value of the test
the smaller the P-value, the stronger the
evidence against H0 provided by the data

21
P-values

What does as or more extreme really mean?
When the alternative has a gt sign, as or more
extreme means use area to the right of the test
statistic in p-value calculation
When the alternative has a lt sign, as or more
extreme means use area to the left of the test
statistic in p-value calculation
When the alternative uses a? as or more extreme
mean values of the test statistic far from zero
in positive and negative directions.
For these type of alternative hypthoses, add
areas to the left of -test statistic and to
the right of test statistic

22
P-values
23
Interpretation of a p-value

Common misinterpretations of p-values
The p-value is not the probability that the null
hypothesis is true. (the null is either true or
not)
Also, (1-p-value) is not the probability that the
alternative hypothesis is true. (the alternative
is either true or not true)
Correct interpretation
The p-value is the probability of getting a value
of a test statistic as or more extreme than the
value of the statistic computed from the
collected data, under the assumption that the
null hypothesis is true

24
Enough evidence?

Below are some guidelines for judging p-values.
(Dont treat these as golden standards)
p-value Evidence against
H0
lt 0.01-ish very
strong
gt .01-ish and lt.05-ish moderate
gt .05-ish and lt .10-ish weak
gt .10 ish
practically none

25
Etruscan example

In the eighth century B.C., the Etruscan
civilization was the most advanced in all of
Italy. Its art forms and political innovations
were destined to leave indelible marks on the
entire Western world. Originally located in the
region now known as Tuscany, it spread rapidly
across the Apennines and eventually overran much
of Italy. But as quickly as it came, it faded.
Militarily it was no match for the burgeoning
Roman legions, and by the dawn of Christianity it
was all but gone.
No chronicles of the Etruscan empire have ever
been found, and to this day its origin remains
shrouded in mystery. Were the Etruscans native
Italians or were they immigrants? And if they
were immigrants, where did they come from? Much
of our knowledge of the Etruscans derives from
archaeological investigations and anthropometric
studies (for example) body measurements to
determine origins. (Source Larsen and Marx,
Statistics, 2001, p. 513.)
A team of archaeologists collected 84 skulls of
Etruscan men and measured their head breadth (in
mm). Lets assume that these 84 men are a random
sample of Etruscan men. If the Etruscan men were
native, it makes sense to think that the
population average head breadth of Etruscans is
comparable to the head breadth of modern
Italians, 132.44 mm. This assumes evolution has
not shifted average head size substantially over
the last 2800 years, an assumption that is
reasonably close to true.

26
Exploratory data analysis for Etruscans
27
Significance test

Step1 Specify the null and alternative
hypothesis
Claim true average breadth of Etruscan heads
differs from 132.44
Hoµ 132.44 vs Ha µ? 132.44
Step2 compute a test statistic
The sample average is over 17 SEs away from the
hypothesized average of 132.44
Step3 calculate the p-value
For all intents and purposes this p-value is zero
why?
Step4 make a conclusion
There is enough evidence in the data to conclude
that modern Italians and the Estruscans have
different average head sizes.

28
A more wordy conclusion

Its practically impossible to observe a
difference of 17 SEs by chance alone. Our
initial assumption in the null hypothesis is very
unlikely to be true. The data overwhelmingly
suggest that modern Italians and the Etruscans
have different average head sizes, indicating
that Etruscans were not native to Italy.
For those interested, current theory is that
Etruscans came from Asia. But, it remains a
mystery how they got to Italy

29
Significance test using JMP
30
Example 1

A sample of 40 recovery alcoholics was given the
State-Trait Inventory Test. The mean score of
the 40 recovery alcoholics was 38 with a sample
SD of 7. A psychologist suspected that
recovering alcoholics in general had a higher
mean score than the norm of 35. Do the sample
justify the suspicion?

31
Example 2

There was concern among health officials in a
community that an unusually large percentage of
babies with abnormally low birth weight were
being born. Abnormally low birth weight here is
defined as less than 88 ounces. A sample of 180
births showed 14 babies with abnormally low birth
weight. The proportion births that the officials
expect to be abnormally low is 5. Do the data
support the health officials claims?

32
Statistical significance

To formalize testing further, some researchers
advocate strict p-value cutoffs when deciding
whether or not to reject null hypotheses.
Example reject the null hypothesis when the
p-value is less than 0.05. Otherwise, do not
reject it.

33
Statistical significance

These cut-offs are called significance levels.
They are typically labeled with the Greek letter
a (alpha).
Example for a statistical significance level of
0.05, we write
a 0.05
When the null hypothesis is rejected, the term
used to describe the outcome of the test is
statistically significant.
Made-up example with typical language
We go a p-value of 0.036 and used a 0.05. The
results are statistically significant at the 0.05
level.

34
My opinion about statistical significance

DO NOT RELY BLINDLY ON A FIXED CUT-OFF
Consider two p-values 0.050001 and 0.049999.
These two p-values provide the same amount of
evidence against the null hypothesis.
But if we judge strictly by the 0.05 cut-off we
dont reject the null for 0.050001 and we do for
0.04999.
Ridiculous no? Consider p-values on their own
merits

35
Type I and Type II errors

Possible errors from decision to reject or not to
reject the null hypothesis
Type I error reject when Ho is true
Type II error fail to reject when Ha is true
Hypothesis testing is not perfect. You never
know if you are making one these errors!
Important to replicate study whenever possible to
reduce these errors

36
The role of sample size

The chance of a making a Type I error does not
depend on sample size. (Sample sizes
incorporated into test statistics).
The chance of making a Type II error decreases as
sample size increases. (Be wary when using test
based on small sample sizes)

37
The role of sample size

When the hypothesized value is NOT very different
from the actual value of the parameter, you need
a large sample size to reduce the chance of a
Type II error.
In many grant proposals, you have to justify the
study size by methods that attempt to minimize
the chance of Type II errors.
These methods are called power analyses.

38
The role of sample size

Inferences are always improved by obtaining as
much (accurate and relevant) data as possible.
With large enough sample size, you can reject any
false null hypothesis
However,

39
Practical vs. statistical significance

When you get a statistically significant result,
consider whether it is practically significant.
If your sample size is large enough youll be
able to detect a difference between the
hypothesised value of a parameter and its true
value if Ho is wrong.
But is this difference of practical significance
Example of weight lifting study

40
Dangers of excessive fishing

With enough hypothesis tests, youll find
something statistically significant.
Some of these statistically significant results
may really be Type I errors.
Try to avoid excessive fishing for statistical
significance. If you perform many tests, be sure
to report how many you do. And, see if results
are replicated in separate studies

41
Non-significant results

Failing to reject a null hypothesis is not a
failed study
It is just as important to learn that a null
hypothesis explains data well as it is to learn
that it does not

42
Relationship between CI and hypothesis tests

You can use CIs like a hypothesis test
Example Say your null hypothesis is Ho p
0.5.
If 95 CI does not contain null hypothesis vale,
e.g. (0.64, 0.70), then the two sided test has
p.value lt 0.05
If 95 CI contains the null hypothesis value,
e.g. (0.47, 0.87), then the two-sided test has
p-value gt 0.05

43
CIs vs Hypothesis tests

Hypothesis test can identify parameter values
that are inconsistent with the data.
They do not specify parameter values that
plausibly could have produced the data.
Confidence intervals do this. Hence, when given
a choice use CIs over hypothesis tests.