Title: 11 Hypothesis Testing
1STA 291Lecture 24
- 11 Hypothesis Testing
- 11.1 Concepts of Hypothesis Testing
- 11.2 Testing the population mean
- 12.3 Testing the population proportion
2- Bonus Homework, due in the lab April 16-18
- Essay How do you prove or disprove the hot
hand theory? (400-600 words / approximately
one typed page)
3Essay should (at least) include
- Background conflicting theory of hot hand or no
hot hand. - Hypothesis to be tested
- What data to collect? (I suggest only consider
free throws) how much data? - If data were available, what calculation you will
perform? (as specific as possible) - and how this calculation leads to your
affirmation or rejection of the hot hand
theory? (as specific as possible)
4- Instead of estimation how much the new drug
improves survival (which is harder to answer).
We ask Does it help? - Null Hypothesis No difference
- Alternative Hypothesis Some improvement
- Leave the how much improvements question later.
5Significance Test
- A significance test is a way of statistically
testing a hypothesis by comparing the data to
values predicted by the hypothesis - Data that fall far from the predicted values
provide evidence against the hypothesis - Significantly different
6Statistically Significant
- A significant result is usually called
Statistically significant - You may want to follow up by estimating how
large is the difference? (caution difference
may be small) - For example, 720 and 710 (SAT score) will
sometimes be statistically significantly
different but for all practical purposes they
are just as good
7Logical Procedure
- State a hypothesis that you would like to find
evidence against (null Hypothesis, Ho) - Get data and calculate a statistic (for example
sample mean) - The hypothesis often determines the sampling
distribution of our statistic - If the calculated value in 2. is very
unreasonable given 3., then we conclude that the
hypothesis was wrong (sampling result is
significantly different from what we expect from
Ho.)
8Elements of a Significance Test
- Assumptions
- Type of data, type of population distribution
- Hypotheses
- Null and alternative hypothesis Ho and Ha
(usually it is about the parameter(s) of the
population distribution - Test Statistic
- Usually compares point estimate to parameter
value under the null hypothesis - P-value
- Uses sampling distribution to quantify evidence
against null hypothesis - Small P is more contradictory
- Conclusion
- Report P-value
- Make formal rejection decision (optional)
9p-Value
- How likely is the observed test statistic value
when the null hypothesis is assumed true? - The p-value is the probability, assuming that H0
is true, that the test statistic takes values at
least as contradictory to H0 as the value
actually observed. - The smaller the p-value, the more strongly the
data contradict H0
10Example Study design
- In a study comparing two pain killer (Tylenol vs.
Advil etc.), 215 volunteers are give both, one
kind for each week (disguised as just brand A and
B) - After they used both, they state a preference
either A is better or B is better - Hypothesis if there were no difference, then the
preference for A should be 50
11Example -cont.
- Let p popu. proportion prefer A over B
-
- Ha p not 0.5 -- since the preference can go
either way - Computation of the P-value (after the study was
done) - Conclusion
12Example -cont.
- Suppose among the 215 there were 130 prefer brand
A, how strong is the evidence? - P-value 0.002611 (by web)
- Conclusion since the P-value is so small
- (smaller than 1, smaller than 5) we reject the
null hypothesis of p0.5
13- We also say the result is statistically
significant at 1 level. Etc (just mean the
P-value is less than 1)
14Alternative and p-value computation
15- We may also try to compute the P-value by
hand(table, calculator, paper/pencil) - 130/215 0.6046
- 0.6046-0.50.1046
- 0.5(1-0.5)/215
- Z(obs)3.067
16Example
- Somebody makes the claim that 50 of all UK
students wear sandals to class in the month of
Sept. - You dont believe it, so one of those days, you
take a random sample of 10 students, and find
that only 2 out of these 10 students actually
wear sandals - How (un)likely is this under the hypothesis?
- The sampling distribution helps us quantify the
(un)likeliness in terms of a probability (p-value)
17Assumptions in the Example
- What type of data do we have?
- Qualitative with two categories
- Either wearing sandals or not wearing
sandals - What is the population distribution?
- It is Bernoulli type. It is definitely not normal
since it can only take two values - Which sampling method has been used?
- We assume simple random sampling
- What is the sample size?
- n10
18Hypotheses in the Example
- Null hypothesis (H0)
- 50 of all UK students wear sandals to class
- H0 Population proportion 0.5
- Alternative hypothesis (H1)
- The proportion of UK students wearing sandals
is different from 0.5 (two sided)
19Conclusion
- Sometimes, in addition to reporting the p-value,
a formal decision is made about rejecting or not
rejecting the null hypothesis - Most studies require small p-values like plt.05 or
plt.01 as significant evidence against the null
hypothesis - Decision The results are significant/not
significant at the 5 level
20Example, cont.
- The calculation of P-value for this particular
example here is a topic our book do not cover
(only cover for sample size gt30) - But lets suppose we had used a software and it
reported a P-value of 0.109 - (look at the bottom of the syllabus page)
21Conclusion in the Example
- We have calculated a P-value of 0.109
- This is not significant at the 5 level
- So, we cannot reject the null hypothesis (at the
5 level) - So, do we have enough evidence to refute the
claim that the proportion of UK students wearing
sandals is truly 50? - (not yet)
22p-Values and Their Significance
- p-Value lt 0.01
- Highly Significant / Overwhelming Evidence
- 0.01 lt p-Value lt 0.05
- Significant / Strong Evidence
- 0.05 lt p-Value lt 0.1
- Not Significant / Weak Evidence
- p-Value gt 0.1
- Not Significant / No Evidence
23- Not reject Ho can due to one of the two reasons
(sometimes both) - (1) sample size is too small, you can hardly
reject anything. (not enough info.) - (the case in the example)
- (2) there is truly no difference. Even when
sample size is big enough.
24Decisions and Types of Errors in Tests of
Hypotheses
- Terminology
- The alpha-level (significance level) is a number
such that one rejects the null hypothesis if the
p-value is less than or equal to it. The most
common alpha-levels are .05 and .01 - The choice of the alpha-level reflects how
cautious the researcher wants to be
25Type I and Type II Errors
- Type I Error The null hypothesis is rejected,
even though it is true. - Type II Error The null hypothesis is not
rejected, even though it is false.
26Type I and Type II Errors
27Type I and Type II Errors
- Terminology
- Alpha Probability of a Type I error
- Beta Probability of a Type II error
- Power 1 Probability of a Type II error
- For a given data, the smaller the probability of
Type I error, the larger the probability of Type
II error and the smaller the power - If you need a very strong evidence to reject the
null hypothesis (set alpha small), it is more
likely that you fail to detect a real difference
(larger Beta).
28- When sample size increases, both error
probabilities could be made to decrease
29Type I and Type II Errors
- In practice, alpha is specified, and the
probability of Type II error could be calculated,
but the calculations are usually difficult - How to choose alpha?
- If the consequences of a Type I error are very
serious, then chose a small alpha, like 0.01. - For example, you want to find evidence that
someone is guilty of a crime. - In exploratory research, often a larger
probability of Type I error is acceptable (like
0.05 or even 0.1)
3011.2 Significance Test for a Mean
- Example
- The mean age at first marriage for married men in
a New England community was 28 years in 1790. - For a random sample of 40 married men in that
community in 1990, the sample mean and standard
deviation of age at first marriage were 26 and 9,
respectively - Q Has the mean changed significantly?
31Significance Test for a Mean
- Assumptions
- What type of data?
- Quantitative, continuous
- What is the population distribution?
- No special assumptions. The hypothesis refers to
the population mean of the quantitative variable. - Which sampling method has been used?
- Simple Random Sampling
- What is the sample size?
- Minimum sample size of n30 to use Central Limit
Theorem, for sample mean
32- Because the hypothesis is about the (population)
mean, we should study the sample mean, or a
test statistic constructed from it. - Also, Central limit theorem say the sample mean
will be approx. normally distributed for large
samples sizes.
33Significance Test for a Mean
- Hypotheses
- The null hypothesis has the form
- where is an a priori (before taking the
sample) specified number like 28 (years), or 0 or
5.3 etc. - The most common alternative hypothesis is
- This is called a two-sided hypothesis, since it
includes values falling above and below the null
hypothesis
34Significance Test for a Mean
- Test Statistic
- The hypothesis is about the population mean
- So, a natural test statistic would be the sample
mean - The sample mean has, for sample size of at least
n30, an approximately normal sampling
distribution - The parameters of the sampling distribution are,
under the null hypothesis, - Mean (that is, the sampling
distribution is centered around the hypothesized
mean) - Standard error
35Significance Test for a Mean
- Test Statistic
- Then, the z-score
- has a standard
- normal distribution
- The z-score measures how many estimated standard
errors the sample mean falls from the
hypothesized population mean - The farther the sample mean falls from
- the larger the absolute value of the z test
statistic, and the stronger the evidence against
the null hypothesis
36Significance Test for a Mean
- p-Value
- The p-value has the advantage that different test
results from different tests can be compared The
p-value is always a number between 0 and 1 - The p-value can be obtained from Table B3 It is
the probability that a standard normal
distribution takes values more extreme than the
observed z score - The smaller the p-value is, the stronger is the
evidence against the null hypothesis and in favor
of the alternative hypothesis
37Significance Test for a Mean
- Example again
- The mean age at first marriage for married men in
a New England community was 28 years in 1790. - For a random sample of 40 married men in that
community in 1990, the sample mean and standard
deviation of age at first marriage were 26 and 9,
respectively - State the hypotheses, find the test statistic and
P-value for testing whether the mean has changed.
Interpret. - Make a decision, using a significance level of 5
38- (2-sided) P-value2x0.080.16
39One-Sided VersusTwo-Sided Test
- Two-sided tests are more common
- Look for formulations like
- test whether the mean has changed
- test whether the mean has increased
- test whether the mean is the same
- test whether the mean has decreased
40SummaryLarge Sample Significance Test for a Mean
41Attendance Survey Question 24
- On a 4x6 index card
- Please write down your name and section number
- Todays Question
-