Title: Review
1Review
- Central Limit Theorem
- and CI for P
2- s sample distribution is narrower than the
population distribution, by a factor of vn.
Sample means,n subjects
Population, xindividual subjects
m
3Review 95 Confidence Interval for a Proportion
We are 95 confident that p (the true pop
proportion) lies in the interval.
4The best reason ever to study math
5Chapter 22 23
- What Is a Test of Significance?
- Use and Abuse of Statistical Inference
6What is a hypothesis?
7What is a hypothesis?
8Scientific method
1.Define the question 2.Gather information and
resources 3.Form hypothesis 4.Plan
experiment 5.Do experiment and collect
data 6.Analyze data 7.Interpret data and draw
conclusions that serve as a starting point for
new hypotheses 8.Communicate results
9Using Data to Make Decisions
- Examining Confidence Intervals.
- Hypothesis Tests
- Is the sample data statistically significant, or
could it have happened by chance?
10Steps for Testing Hypotheses
- Determine the null hypothesis and the
alternative hypothesis. - Collect data and summarize with a single number
called a test statistic. - Determine how unlikely test statistic would be if
null hypothesis were true. - Make a decision.
11Determine the hypotheses.
- Null hypothesishypothesis that says nothing is
happening, status quo, no relationship, chance
only. - Alternative (research) hypothesis hypothesis is
reason data being collected researcher suspects
status quo belief is incorrect or that there is a
relationship between two variables that has not
been established before.
12Collect data and summarize with a test statistic.
- Decision in hypothesis test based on single
summary of data the test statistic. - standard Z score.
- Also will see t, F, r, .
13Determine how unlikely test statistic would be if
null hypothesis true
p-value - If null hypothesis true, how likely to
observe sample results of this magnitude or
larger (in direction of the alternative) just by
chance? P-value often misinterpreted in the
news.
14Make a Decision.
- p-value not small enough to convincingly rule out
chance. - We cannot reject the null hypothesis as an
explanation for the results. - There is no statistically significant difference
or relationship evidenced by the data. - p-value small enough to convincingly rule out
chance. - We reject the null hypothesis and accept the
alternative hypothesis. - There is a statistically significant difference
or relationship evidenced by the data.
15Hypothesis test for p
Suppose 60 (0.60) of the population are in favor
of new tax legislation. A pollster takes a
sample of 265 people which results in 175, or
0.66, who are in favor. How likely is it that
we would see a sample proportion as large as 0.66
given the true mean proportion of 0.60? Is this
highly unusual?
16Hypothesis test for p
The true population proportion, p 0.60 Our
sample data yields p-hat 175/ 265 0.66,
From the Rule for Sample Proportions, we know
the potential sample proportions in this
situation follow an approximately normal
distribution, with a mean of 0.60 and a standard
deviation of 0.03.
17Hypothesis test for p
If the sampling distn of p-hat is normal with a
mean of 0.60 and a SD of 0.03, how many SDs
above the mean is 0.66? Recall from the chapter
on the normal distribution, what is the z-score
or standardized score of 0.66?
18Test for p Bell-Shaped Curve of Sample
Proportions (n265)
mean 0.60 S.D. 0.03
2. 5
0.60
0.63
0.57
0.66
0.54
19Test for p continued
Suppose that in the previous question we do not
know for sure that the proportion of the
population who favor the new tax legislation is
60. Instead, this is just the claim of a
politician. From the data collected, we have
discovered that if the claim is true, then the
sample proportion observed falls at about the
98th percentile of possible sample proportions
for that sample size. Should we believe the
claim and conclude that we just observed strange
data, or should we reject the claim? What if the
result fell at the 85th percentile? At the
99.99th percentile?
20Bell-Shaped Curve of Sample Proportions (n265)
21The Null Hypothesis H0
- population parameter equals some value
- status quo
- no relationship
- no change
- no difference in two groups
- etc.
- When performing a hypothesis test, we assume that
the null hypothesis is true until we have
sufficient evidence against it
22The Alternative Hypothesis Ha
- population parameter differs from some value
- not status quo
- relationship exists
- a change occurred
- two groups are different
- etc.
23The Hypotheses for Proportions
- Null H0 p p0
- One-sided alternatives
- Ha p gt p0
- Ha p lt p0
- Two-sided alternative
- Ha p ¹ p0
24Example from class dataOur hypotheses do we
have a majority?
- Null H0 p .5
- Alt Ha p gt .5
25Sampling Distribution for Proportions
If numerous simple random samples of size n are
taken, the sample proportions from the
various samples will have an approximately normal
distribution with mean equal to p (the population
proportion) and standard deviation equal to
Since we assume the null hypothesis is true, we
replace p with p0 to complete the test.
26Test Statistic for Proportions
To determine if the observed proportion is
unlikely to have occurred under the assumption
that H0 is true, we must first convert the
observed value to a standardized score
27Test Statistic
- Based on the sample
- n___ (large, so proportions follow normal
distribution) - Sample
-
- standard error of
- (where .50 is p0 from the null hypothesis)
- standardized score (test statistic)
- z (___- 0.50) /SE ____
28P-value
- The P-value is the probability of observing data
this extreme or more so in a sample of this size,
assuming that the null hypothesis is true. - A small P-value indicates that the observed data
(or relationship) is unlikely to have occurred if
the null hypothesis were actually true - The P-value tends to be small when there is
evidence in the data against the null hypothesis - The P-value is NOT the probability that the null
hypothesis is true
29P-value for Testing Proportions
- Ha p gt p0
- When the alternative hypothesis includes a
greater than gt symbol, the P-value is the
probability of getting a value as large or larger
than the observed test statistic (z) value. - The area in the right tail of the bell curve
30P-value for Testing Proportions
- Ha p lt p0
- When the alternative hypothesis includes a less
than lt symbol, the P-value is the probability
of getting a value as small or smaller than the
observed test statistic (z) value. - The area in the left tail of the bell curve (the
same as the percentile value)
31P-value for Testing Proportions
- Ha p ? p0
- When the alternative hypothesis includes a not
equal to ? symbol, the P-value is twice as
large as a one-sided test (the sign of z could go
either way, we just believe there is a
difference). - The area in both tails of the bell curve
- double the area in one tail (symmetry)
32P-value
Ha p gt .50
33Decision
- If we think the P-value is too low to believe the
observed test statistic is obtained by chance
only, then we would reject chance (reject the
null hypothesis) and conclude that a
statistically significant relationship exists
(accept the alternative hypothesis). - Otherwise, we fail to reject chance anddo not
reject the null hypothesis of no relationship
(result not statistically significant).
34Typical Cut-off for the P-value
- Commonly, P-values less than 0.05 are considered
to be small enough to reject chance (reject the
null hypothesis). - Some researchers use 0.10 or 0.01 as the cut-off
instead of 0.05. - This cut-off value is typically referred to as
the significance level (alpha) of the test.
35Decision
- We do not find the result to be statistically
significant. - We fail to reject the null hypothesis. It is
plausible that there was not a majority. - or
- We do find the result to be statistically
significant. - We reject the null hypothesis. The data supports
the hypothesized majority.
36- What numerical value gives you the answer to the
question of how unlikely the test statistic would
be if the null hypothesis were true? - The p-value
- The confidence interval
- The sample standard deviation
37Types of errors in decision making
In the courtroom, juries must make a decision
about the guilt or innocence of a defendant.
Which mistake is more serious A. if the jury
claims the suspect is guilty when in fact he or
she is innocent. B. if the jury claims the
suspect is not guilty when in fact he or she is
guilty
38Example A Jury Trial
If on a jury, must presume defendant is innocent
unless enough evidence to conclude is guilty.
Null hypothesis Defendant is innocent. Alternati
ve hypothesis Defendant is guilty.
- Trial held because prosecution believes status
quo of innocence is incorrect. - Prosecution collects evidence, like researchers
collect data, in hope that jurors will be
convinced that such evidence is extremely
unlikely if the assumption of innocence were true.
39The Two Types of Errors
- Courtroom Analogy Potential choices and errors
- I. We believe enough evidence to conclude the
defendant is guilty. - Potential error An innocent person falsely
convicted and guilty party remains free. - usually seen as more serious.
- II. We cannot rule out that defendant is
innocent, so he or she is set free without
penalty. - Potential error A criminal has been erroneously
freed.
40The Two Types of Errors in Testing
- Type 1 error can only be made if the null
hypothesis is actually true. - Type 2 error can only be made if the alternative
hypothesis is actually true.
41Decision Errors Type I
- If we decide there is a relationship in the
population (reject null hypothesis) - This is an incorrect decision only if the null
hypothesis is true. - The probability of this incorrect decision is
equal to the cut-off (?) for the P-value. - If the null hypothesis is true and the cut-off is
0.05 - There really is no relationship and the extremity
of the test statistic is due to chance. - About 5 of all samples from this population will
lead us to wrongly reject chance.
42Decision Errors Type II
- If we decide not to reject chance and thus allow
for the plausibility of the null hypothesis - This is an incorrect decision only if the
alternative hypothesis is true. - The probability of this incorrect decision
depends on - the magnitude of the true relationship,
- the sample size,
- the cut-off for the P-value.
43Power of a Test
- This is the probability that the sample we
collect will lead us to reject the null
hypothesis when the alternative hypothesis is
true. - The power is larger for larger departures of the
alternative hypothesis from the null hypothesis
(magnitude of difference). - The power may be increased by increasing the
sample size.
44Inference for Population MeansHypothesis Testing
- The last part of this chapter discusses the
situation when interest is completing hypothesis
tests about population means rather than
population proportions. - DO NOT worry about the details for inference on
means just try to get the main idea of
hypothesis testing.
45Test for means
One of the conclusions made by researchers from a
study comparing the amount of bacteria in
carpeted and uncarpeted rooms was, The average
difference in mean bacteria colonies per cubic
foot was 3.48 colonies (95 CI -2.72, 9.68
P-value 0.29). What are the null and
alternative hypotheses being tested here? Is
there a statistically significant difference
between the means of the two groups?
46Answer
- Null The mean number of bacteria for carpeted
rooms is equal to the mean number of bacteria for
uncarpeted rooms. - Alt The mean number of bacteria for carpeted
rooms is different from the mean number of
bacteria for uncarpeted rooms. - P-value is large (gt.05), so there is not a
significant difference (fail to reject the null
hypothesis). - Also, the confidence interval for the
difference contains 0.
47The Hypotheses for aSingle Mean
- Null H0 m m0
- One-sided alternatives
- Ha m gt m0
- Ha m lt m0
- Two-sided alternative
- Ha m ¹ m0
48The Hypotheses for aDifference in Two Means
- Null H0 mdiff mdiff,0 (usually 0)
- One-sided alternatives
- Ha mdiff gt mdiff,0
- Ha mdiff lt mdiff,0
- Two-sided alternative
- Ha mdiff ¹ mdiff,0
49P-value in one-sided and two-sided tests
One-sided (one-tailed) test
Two-sided (two-tailed) test
50Chapter 23
- Use and Abuse of Statistical Inference
51Question
When presenting the results of a study, would it
be sufficient to only report the P-value? Why
would it be a good idea to also give a confidence
interval based on the results?
52Warnings about Reports on Hypothesis Tests Data
Origins
- For any statistical analysis to be valid, the
data must come from proper samples. Complex
formulas and techniques cannot fix bad (biased)
data. In addition, be sure to use an analysis
that is appropriate for the type of data
collected.
53Warnings about Reports on Hypothesis Tests
P-value or C.I.?
- P-values provide information as to whether
findings are more than just good luck, but
P-values alone may be misleading or leave out
valuable information (as seen later in this
chapter). Confidence intervals provide both the
estimated values of important parameters and how
uncertain the estimates are.
54Warnings about Reports on Hypothesis Tests
Significance
- If the word significant is used to try to
convince you that there is an important effect or
relationship, determine if the word is being used
in the usual sense or in the statistical sense
only.
55Warnings about Reports on Hypothesis Tests Large
Sample
- If a study is based on a very large sample size,
relationships found to be statistically
significant may not have much practical
importance.
56Warnings about Reports on Hypothesis Tests Small
Sample
- If you read no difference or no relationship
has been found in a study, try to determine the
sample size used. Unless the sample size was
large, remember that it could be that there is
indeed an important relationship in the
population, but that not enough data were
collected to detect it. In other words, the test
could have had very low power.
57Warnings about Reports on Hypothesis Tests 1- or
2- Sided
- Try to determine whether the test was one-sided
or two-sided. If a test is one-sided, and
details are not reported, you could be misled
into thinking there was no difference, when in
fact there was one in the direction opposite to
that hypothesized.
58Warnings about Reports on Hypothesis Tests Only
Significant are Reported?
- Sometimes researchers will perform a multitude of
tests, and the reports will focus on those that
achieved statistical significance. Remember that
if nothing interesting is happening and all of
the null hypotheses tested are true, then about
1 in 20 (.05) tests should achieve statistical
significance just by chance. Beware of reports
where it is evident that many tests were
conducted, but where results of only one or two
are presented as significant.
59- Which of the following conclusions should make
you suspicious as an educated consumer of
statistical information? - a. Based on our sample results we know there is
no relationship between these two variables in
the population. - b. We looked at all possible correlations
between these 10 variables, and this was the only
one that was significant, signifying its
tremendous importance. - c. All of the above.
60Drug Use in American High Schools
Alcohol Use
Bogert, Carroll. Good news on drugs from the
inner city, Newsweek, Feb. 1995, pp. 28-29.
61Drug Use in American High Schools
- Alternative Hypothesis The percentage of high
school students who used alcohol in 1993 is less
than the percentage who used alcohol in 1992. - Null Hypothesis There is no difference in the
percentage of high school students who used in
1993 and in 1992.
62Drug Use in American High Schools
1993 survey was based on 17,000 seniors, 15,500
10th graders and 18,500 8th graders.
63Drug Use in American High Schools
- The article suggests that the survey reveals
good news since the differences are all
negative. - The differences are significant.
- statistically?
- practically?
64Quitting Smoking with Nicotine Patches
Compared the smoking cessation rates for smokers
randomly assigned to use a nicotine patch versus
a placebo patch.
Null hypothesis The proportion of smokers in the
population who would quit smoking using a
nicotine patch and a placebo patch are the
same. Alternative hypothesis The proportion of
smokers in the population who would quit smoking
using a nicotine patch is higher than the
proportion who would quit using a placebo patch.
65Quitting Smoking with Nicotine Patches
Higher smoking cessation rates were observed in
the active nicotine patch group at 8 weeks (46.7
vs 20) (P lt .001) and at 1 year (27.5 vs 14.2)
(P .011). (Hurt et al., 1994, p. 595)
Conclusion p-values are quite small less than
0.001 for difference after 8 weeks and equal to
0.011 for difference after a year. Therefore,
rates of quitting are significantly higher using
a nicotine patch than using a placebo patch after
8 weeks and after 1 year.
66Study Proves New Bicycle Seat More Healthy ? FT.
LAUDERDALE, FL - A traditional bicycle seat is
more likely to cause male sexual dysfunction and
a variety of other health problems than a new
seat with a revolutionary design, according to
Medicine and Science In Sports and Exercise, the
official journal of the American College of
Sports Medicine. Results of the 2004 study
concluded that the typical sport/racing saddle
with a narrow protruding nose causes twice the
pressure in the perineal region as the Solution
Bicycle Seat, a unique new model without the
nose. For the study, 33 bicycle police patrol
officers pedaled a stationary bicycle at a
controlled cadence and workload, sitting on a
variety of bicycle seats. Previous evolutions of
the traditional seats were tested, including the
split-horn design with gel-strips said to reduce
shock resistance and make the horn softer. Yet
even these seats did not eliminate the pressure
to the perineal region. The study concluded that
it is the seat design, not the padding that makes
the difference. Source of News Solution Bicycle
Seat
67Effect of bicycle saddle designs on the pressure
to the perineum of the bicyclist. Med Sci Sports
Exerc 2004 Jun Vol. 36 (6), pp.
1055-62. METHODS Saddle, pedal, and handlebar
contact pressure were measured from 33 bicycle
police patrol officers pedaling a stationary
bicycle at a controlled cadence and workload.
Pressure was characterized over the saddle as a
whole and over a region of the saddle assumed to
represent pressure on the cyclist's perineum
located anteriorly to the ischial tuberosities.
RESULTS The traditional sport/racing saddle
was associated with more than two times the
pressure in the perineal region than the saddles
without a protruding nose (P lt 0.01). There were
no significant differences in perineal pressure
among the nontraditional saddles. Measures of
load on the pedals and handlebars indicated no
differences between the traditional saddle and
those without protruding noses. This finding is
contradictory to those studies suggesting a shift
toward greater weight distribution on the
handlebars and pedals when using a saddle without
a nose.
68For the study
- The explanatory variable is pressure and the
response variable is seat type. - The explanatory variable is seat type and the
response variable is pressure. - The explanatory variable is bicycle police and
the response variable is erectile dysfunction.
69Given what we know so far, this study was likely
to be
- A sample survey
- An observational study
- An experiment where the same experimental units
were used for all treatments.
70The news report on the research is suspect
because
- The source of the news is a bicycle seat company
- The extent or the size of the claimed difference
is not reported. - The results are extended to a population that may
not be represented by the sample. - All of the above.
71The traditional sport/racing saddle was
associated with more than two times the pressure
in the perineal region than the saddles without a
protruding nose (P lt 0.01).
- Given the p-value listed, we know that the null
hypothesis was - Rejected.
- Not rejected.
The p-value tells us A) the probability of
getting a sample result as extreme or more
extreme than the one we saw if Ho were
true. B) the probability that Ha is true.
72This study proved beyond a doubt that
- The seat design, not the padding, makes the
difference. - Bicycle police patrol officers have a high rate
of erectile dysfunction. - Nothing was definitively proven by this one
study, but the researchers did find a
relationship between saddle type and perineal
pressure that is newsworthy for consumers and
merits further research.
73Key concepts
- Steps of Hypothesis Testing
- P-values and Statistical Significance
- Decision Errors
- Statistically significant vs practically
important - Large/Small Samples and Statistical Significance
- Multiple Tests and Statistical Significance
74April 29in classfinal exam (cumulative)
- Next week in class comprehensive review extra
credit!!