Title: Inference about the difference of statistical analysis
1Chapter 9
- Inference about the difference of statistical
analysis
2Sec. 9.1 Introduction
- Experiment design The procedure used to choose
and assign subjects to the two groups. - Two types of design for comparative experiment
- The sample assigned to group 1 is selected
independently of the sample assigned to group 2. - matched design
31b. All observations in this group are
independent of each other. This is ok.
Samples from population 1.
Samples from population 2
We also need the measurements in each group to
come from a normal distn. This is probably ok.
1a. All observations in this group are
independent of each other. This is ok.
2. An observation in one group must be
independent of an observation in the opposing
group.
4Example 9.1
- A survey of 436 workers showed that 192 of them
said that it was seriously unethical to monitor
employee e-mail. When 121 senior-level bosses
were surveyed, 40 said that it was seriously
unethical to monitor employee e-mail. - Let ?w and ?B be the population proportion of
workers and bosses that feel its unethical to
monitor e-mail respectively.
5Example 9.2
- A sample of 5 students were selected to take an
SAT preparatory course. They took the SAT exam
before they took the course and then they took it
again after the course. - Student A B C D E
- SAT Before700 840 830 860 840
- SAT After 720 840 820 900 870
- Let ?B denote the mean TSE score before the
course, - ?A the mean TSE score after the course.
6Sec. 9.2 Inference about difference between two
population proportions
7Example 9.1 (continued)Find 80 CI for ?w- ?B
- First find an point estimate of this difference.
8- 2.The standard error of ?w- ?B is estimated by
- Pw 192/436 0.4403 and
- pB 40/1210.3305 respectively. This gives a
standard error of ?w- ?B
9- 3. 80CI for ?w- ?B is
- Where z
- Hence the desired CI is
10Confidence Interval for ?1 -?2
- Application 2 Bernoulli populations
- Assumption Independent samples, n1gt30 and n2gt30.
- A (1-?)100 confidence interval for ?1 -?2 is
given by
11Exercise 9.1
- A study suggests that nicotine-laced gum helps
smokers to stop smoking. The study shows that 29
of 106 smokers who chewed nicotine-laced gum
remained smoke free for 1 year and 16 out of 100
smokers who chewed regular gum remained smoke
free for 1 year. Use this information to find a
98 confidence interval for the difference
between the proportions of smokers who
successfully use nicotine-laced gum and those who
successfully use regular gum.
12Large-Sample Hypothesis Test of ?1-?2
13Example 9.1 (continued)
- Given those data, Is the evidence sufficient to
suggest that the larger percentage of workers
feel that its unethical to monitor email. - Solution 1.That is to test
Vs.
14- 2.Under H0 the standardized test statistic is
- where p(19240)/(43640)0.4165. as an estimate
for ? . - Plugging in Pw 192/436 0.4403 and pB
40/1210.3305 yields the observed value of the
test statistic zobs 2.1656.
15- 3.Similar to the one sample tests, we can make a
decision by comparing the p-value to a.
Since p-value P(Z gt 2.1656) 0.015lt0.05. Based
on the data, we reject H0.i.e. there is
significant evidence that the larger percentage
of workers feel that its unethical to monitor
email.
16Large-Sample Hypothesis Test of ?1-?2
- Assumption the two samples are independent of
each other - Observe p1 p2
- construct hypotheses
- test statistic , and sample distribution under
null hypothesis - p1-p2 N(0, )
- z
- p-value of zobs (use z-table)
- make decision
17Exercise 9.2
- The campaign manager for a presidential candidate
wishes to test the claim that the proportion of
Ohio voters who favor the candidate is at least
as large as the proportion of California voters
who favor the candidate. Given these data, test
the manager's claim at a 5 level of significance.
18Sec. 9.3 Inferences about difference between
two population means
19Example 9.3
- 1.)What is the 90 confidence interval of the
difference between the mean salary of of
statisticians in New York and those in
Massachusetts? - 2.)Test if the mean salary of statisticians in
New York significantly different from those in
Massachusetts? - (a0.05)
-
20- 1.Point estimate of ?N- ?M is
- 2. The standard error is
- When the population variances are known
- When the population variances are unknown
These are sample sizes for NY and Mass.
21Recall
- Think about the one sample case first.
- When we test something about a single
- mean, there were 2 cases to consider
- s known which means we use the standard normal
(Z) to make inferences - s unknown which means we use the t distribution
to make inferences
22A little more complicated
Use the standard normal to obtain p-values and
confidence intervals.
sN and sM are both known
sN ? sM. This is a 2 sample t-test. We use a
t-distn but the df has to be approximated.
sN and sM are both unknown
23Two sample t-test
- 3. So if sN and sM are both unknown,the
standardized test statistic - has t distribution with degree of freedom
24Note
- For a conservative approach to the two-sample
t-procedures, the degrees of - freedom are given by
- Dfmin(nN-1, nM-1).
For the example concerning New York and Mass.
salaries, The degrees of freedom to use is
min(45-1, 37-1)36.
25- 4.For 1) The 90 confidence interval of
is of form - Where t is the upper critical value of t(36)
with confidence level .9 - t1.684
RemarkI used 40 degrees of freedom since 36 is
not in the book
26- A 90 confidence interval for is
(-1690, 5090).
27- 5. For 2) testing
- Under H0,
- the standardized test statistic
- Conservatively
28P-value2P(tgttobs)2P(t(36)gt0.8473) Since
0.8473lt.851 P(tgt0.8473)gtP(tgt.851).2 Then
P-valuegt.4 gt0.05 Based on the data, not
reject H0,i.e. there is inefficient evidence to
reject the null hypothesis and the difference
between the mean salary of statisticians in two
cities are not significant
29Remark
- Actually the df of t-statistic in this example is
- The test might be proceeded by using t(75),but
test result is the same
30Inferences about difference between two
population means
- Assumption the two samples are independent of
each other - the estimator
- t
- CI for based on
- Estimator t(standard error)
- where t is based on confidence level (1-?) and
31- degree of freedom (df)
- (round it down to the nearest integer)
- A conservative approach to
- dfmin(n1-1, n2-1)
32Exercise 9.3
- Wind speed data were gathered during January and
July at the site proposed for a wind generator
will be different in the two months. From the
summary data, construct a 99 confidence interval
for the difference between the mean wind speeds
in January and July.
33- (3.934,10.466)
- By conservative approach,
- 7.2(2.75)(1.228),i.e.(3.823,10.577)
34Exercise 9.4
- Plastic grocery hags have almost replaced the
standard brown paper bags at the supermarket. One
particular company was trying to increase the
tensile strength of the bags. These summary data
are from two independent random samples and give
the tensile strengths of plastic bags from two
different production run - Sample 1
- Sample 2
- Determine whether there is a significant
difference between the mean tensile strengths
from the two production runs.
35- Df64
- P-value2P(tgt3.45) 0.001
- Reject H0,this small p-value indicates that the
difference between the mean tensile strength s of
the plastic bags from the two different
production runs is highly significant. - By conservative approach
- df31, P-value2P(tgt3.45)lt2P(tgt3.385)0.002lt0.05
- Reject H0