Title: Use and Abuse of Statistical Inference
1Chapter 23
- Use and Abuse of Statistical Inference
2Thought Question 1
Which do you think is more informative when you
give the results of a study, a confidence
interval or a P-value? Explain.
3Thought Question 2
Suppose you were to read that a new study had
found that there was no difference in heart
attack rates for men who exercised regularly and
men who did not. What would you suspect was the
reason for that finding? Do you think the study
found exactly the same rate of heart attacks for
the two groups of men?
4Thought Question 3
The results of a public opinion poll led to the
conclusion that a majority of Americans did not
think Bill Clinton had the honesty and integrity
they expected in a president. Would it be fair
reporting to claim that significantly fewer than
50 of Americans think Bill Clinton has the
honesty and integrity they expect in a
president? Explain.
5Thought Question 3 Answer
- n518
- 233 think Clinton has the honesty and integrity
they expect in a president. -
- 95 C.I. 0.406 to 0.494
6Warnings about Reports on Hypothesis Tests Data
Origins
- For any statistical analysis to be valid, the
data must come from proper samples. Complex
formulas and techniques cannot fix bad (biased)
data. In addition, be sure to use an analysis
that is appropriate for the type of data
collected.
7Warnings about Reports on Hypothesis Tests
P-value or C.I.?
- P-values provide information as to whether
findings are more than just good luck, but
P-values alone may be misleading or leave out
valuable information (as seen later in this
chapter). Confidence intervals provide both the
estimated values of important parameters and how
uncertain the estimates are.
8Warnings about Reports on Hypothesis Tests
Significance
- If the word significant is used to try to
convince you that there is an important effect or
relationship, determine if the word is being used
in the usual sense or in the statistical sense
only.
9Case Study Patient Satisfaction
Women Doctors Fare Better in Patient
Surveyreported in Sacramento Bee, April 26,
1995
Bertakis, Klea D., et. al., The influence of
gender on physician practice style, Medical
Care, Vol. 33, No. 4, 1995, pp 407-416.
10Case Study Patient Satisfaction
- Alternative (Research) Hypothesis The mean
satisfaction rating by patients who first saw a
female physician is different from the mean
satisfaction rating by patients who first saw a
male physician. - Null Hypothesis There is no difference in the
mean satisfaction rating by patients who first
saw a female physician and the mean satisfaction
rating by patients who first saw a male physician.
11Case Study Patient Satisfaction
- The alternative hypothesis is two-sided.
- Study was double blinded (neither patients nor
physicians were told the purpose of the survey). - Survey was completed by 250 patients at the
University of California at Davis Medical Center
who rated medical residents on a scale 1 to 5
(very dissatisfied to very satisfied).
12Case Study Patient Satisfaction
- Bee The female physicians received an average
score of 4.27. The men -- a respectable, yet
significantly lower score of 4.05. - The average difference was 0.22.
- Medical Care the difference was small but
statistically significant (P-value0.02). - Medical Care This difference is both
statistically and clinically significant.
13Warnings about Reports on Hypothesis Tests Large
Sample
- If a study is based on a very large sample size,
relationships found to be statistically
significant may not have much practical
importance.
14Case Study Drug Use in American High Schools
Alcohol Use
Bogert, Carroll. Good news on drugs from the
inner city, Newsweek, Feb.. 1995, pp 28-29.
15Case Study Drug Use in American High Schools
- Alternative Hypothesis The percentage of high
school students who used alcohol in 1993 is less
than the percentage who used alcohol in 1992. - Null Hypothesis There is no difference in the
percentage of high school students who used in
1993 and in 1992.
16Case Study Drug Use in American High Schools
1993 survey was based on 17,000 seniors, 15,500
10th graders and 18,500 8th graders.
17Case Study Drug Use in American High Schools
- The article suggests that the survey reveals
good news since the differences are all
negative. - The differences are significant.
- statistically?
- practically?
18Warnings about Reports on Hypothesis Tests Small
Sample
- If you read no difference or no relationship
has been found in a study, try to determine the
sample size used. Unless the sample size was
large, remember that it could be that there is
indeed an important relationship in the
population, but that not enough data was
collected to detect it. In other words, the test
could have had very low power.
19Case Study Memory Loss
Memory Loss in American Hearing, American Deaf
and Chinese Adults
Levy, B. and E. Langer. Aging free from negative
stereotypes Successful memory in China and among
the American deaf, Journal of Personality and
Social Psychology, Vol. 66, pp 989-997.
20Case Study Memory Loss
- Average Memory Test Scores (higher is better)
- 30 subjects were sampled from each population
21Case Study Memory Loss
- Young Americans (hearing and deaf) have
significantly higher mean scores. - Science News (July 2, 1994, p. 13)
Surprisingly, ...memory scores for older and
younger Chinese did not statistically differ.
22Case Study Memory Loss
- Since the sample sizes are very small, there is
an increased chance that the test will result in
a Type II error if indeed there is a difference
between young and old subjects mean memory
scores. - The surprising result may just be a Type II
error. - The test could have very low power.
23Warnings about Reports on Hypothesis Tests 1 or
2 Sided
- Try to determine whether the test was one-sided
or two-sided. If a test is one-sided, and
details arent reported, you could be misled into
thinking there was no difference, when in fact
there was one in the direction opposite to that
hypothesized.
24Case Study Seen a UFO?
Seen a UFO? You May Be Healthier Than Your
Friends
Roper Organization. Unusual Personal Experiences
An Analysis of the Data from Three National
Surveys, Las Vegas Bigelow Holding Corp., 1992.
25Case Study Seen a UFO?
- Research Hypothesis (Alternative) People who
claim to have seen a UFO are on average more
psychologically disturbed than those who make no
such claim. - Null Hypothesis People who claim to have seen a
UFO are on average no more or less
psychologically disturbed than those who make no
such claim.
26Case Study Seen a UFO?
- 49 subjects were recruited through a newspaper.
- 18 were UFO nonintense
- 31 were UFO intense
- 127 control subjects were recruited
- 74 students of a psychology class (receiving
credit for participation) - 53 community members recruited through a newspaper
27Case Study Seen a UFO?
- New York Times (1993) Study Finds No
Abnormality in Those Reporting UFOs. - Results UFO groups actually scored
significantly better (statistically) on many of
the psychological measures. - The stated one-sided alternative hypothesis was
not supported. Does this mean the null
hypothesis is true?
28Warnings about Reports on Hypothesis Tests Only
Significant are Reported?
- Sometimes researchers will perform a multitude of
tests, and the reports will focus on those that
achieved statistical significance. Remember that
if nothing interesting is happening and all of
the null hypotheses tested are true, then about
1 in 20 .05 tests should achieve statistical
significance just by chance. Beware of reports
where it is evident that many tests were
conducted, but where results of only one or two
are presented as significant.
29Case Study Spinach is Good?
So You Thought Spinach Was Good for You?
Norwak, R. Beta-carotene Helpful or harmful?
Science, Vol. 264, April 22, 1994, pp 500-501.
30Case Study Spinach is Good?
- Startling finding Supplements of the
antioxident beta-carotene markedly increased the
incidence of lung cancer among heavy smokers in
Finland. - This is the result of a large, randomized
clinical trial 29,000 cases - Butthere were multiple tests conducted.
31Key Concepts
- Difference between a statistically significant
effect and a practically important one - Large Samples and Statistical Significance
- Small Samples and Statistical Significance
- Multiple Tests and Statistical Significance