Inference about Comparing Two Populations

About This Presentation

Title:

Inference about Comparing Two Populations

Description:

Chapter 13 Inference about Comparing Two Populations 12.1 Introduction Variety of techniques are presented whose objective is to compare two populations. – PowerPoint PPT presentation

Number of Views:139

Avg rating:3.0/5.0

Slides: 82

Provided by: sbae61

Category:

more less

Transcript and Presenter's Notes

Title: Inference about Comparing Two Populations

1
Inference about Comparing Two Populations

Chapter 13

2
12.1 Introduction

Variety of techniques are presented whose
objective is to compare two populations.
We are interested in
The difference between two means.
The ratio of two variances.
The difference between two proportions.

3
13.2 Inference about the Difference between Two
Means Independent Samples

Two random samples are drawn from the two
populations of interest.
Because we compare two population means, we use
the statistic .

4
The Sampling Distribution of

is normally distributed if the
(original) population distributions are normal .
is approximately normally
distributed if the (original) population is not
normal, but the samples size is sufficiently
large (greater than 30).
The expected value of is m1 - m2
The variance of is s12/n1
s22/n2

5
Making an inference about m1 m2

If the sampling distribution of is
normal or approximately normal we can write
Z can be used to build a test statistic or a
confidence interval for m1 - m2

6
Making an inference about m1 m2

Practically, the Z statistic is hardly used,
because the population variances are not known.

t
S22
S12
?
?

Instead, we construct a t statistic using the
sample variances (S12 and S22).

7
Making an inference about m1 m2

Two cases are considered when producing the
t-statistic.
The two unknown population variances are equal.
The two unknown population variances are not
equal.

8
Inference about m1 m2 Equal variances

Calculate the pooled variance estimate by

The pooled Variance estimator
n2 15
n1 10
9
Inference about m1 m2 Equal variances

Calculate the pooled variance estimate by

The pooled Variance estimator
n2 15
n1 10
10
Inference about m1 m2 Equal variances

Construct the t-statistic as follows

Perform a hypothesis test
H0 m1 - m2 0
Ha m1 - m2 gt 0

or lt 0
11
Inference about m1 m2 Unequal variances
12
Inference about m1 m2 Unequal variances
Run a hypothesis test as needed, or, build a
confidence interval
13
Which case to useEqual variance or unequal
variance?

Whenever there is sufficient evidence that the
variances are unequal, it is preferable to run
the equal variances t-test.
This is so, because for any two given samples

The number of degrees of freedom for the equal
variances case
The number of degrees of freedom for the unequal
variances case
³
14
(No Transcript)
15
Example Making an inference about m1 m2

Example 13.1
Do people who eat high-fiber cereal for breakfast
consume, on average, fewer calories for lunch
than people who do not eat high-fiber cereal for
breakfast?
A sample of 150 people was randomly drawn. Each
person was identified as a consumer or a
non-consumer of high-fiber cereal.
For each person the number of calories consumed
at lunch was recorded.

16
Example Making an inference about m1 m2

Solution
The data are quantitative.
The parameter to be tested is
the difference between two means.
The claim to be tested is
The mean caloric intake of consumers (m1)
is less than that of non-consumers (m2).

17
Example Making an inference about m1 m2

The hypotheses are
H0 (m1 - m2) 0
H1 (m1 - m2) lt 0
To check the relationships between the
variances, we use a computer output to find the
sample variances (Xm13-1.xls). We have S12
4103, and S22 10,670.
It appears that the variances are unequal.

18
Example Making an inference about m1 m2

Solving by hand
From the data we have

19
Example Making an inference about m1 m2

Solving by hand
The rejection region is t lt -ta,n -t.05,123 _at_
-1.658

20
Example Making an inference about m1 m2
Xm13-1.xls
At 5 significance level there is sufficient
evidence to reject the null hypothesis.
21
Example Making an inference about m1 m2

Solving by hand
The confidence interval estimator for the
differencebetween two means is

22
(No Transcript)
23
Example Making an inference about m1 m2

Example 13.2
An ergonomic chair can be assembled using two
different sets of operations (Method A and Method
B)
The operations manager would like to know whether
the assembly time under the two methods differ.

24
Example Making an inference about m1 m2

Example 13.2
Two samples are randomly and independently
selected
A sample of 25 workers assembled the chair using
design A.
A sample of 25 workers assembled the chair using
design B.
The assembly times were recorded
Do the assembly times of the two methods differs?

25
Example Making an inference about m1 m2
Assembly times in Minutes

Solution
The data are quantitative.
The parameter of interest is the difference
between two population means.
The claim to be tested is whether a difference
between the two designs exists.

26
Example Making an inference about m1 m2

Solving by hand
The hypotheses test is
H0 (m1 - m2) 0 H1 (m1 -
m2) ¹ 0

To check the relationship between the two
variances we calculate the value of S12 and S22
(Xm13-02.xls).
We have S12 0.8478, and S22 1.3031.
The two variances appear to be equal.

27
Example Making an inference about m1 m2

Solving by hand

To calculate the t-statistic we have

28
Example Making an inference about m1 m2

The rejection region is t lt -ta/2,n -t.025,48
-2.009 or t gt ta/2,n t.025,48 2.009
The test Since t -2.009 lt 0.93 lt 2.009, there
is insufficient evidence to reject the null
hypothesis.

For a 0.05
2.009
.093
-2.009
29
Example Making an inference about m1 m2
Xm13-02.xls
30
Example Making an inference about m1 m2

Conclusion From this experiment, it is unclear
at 5 significance level if the two assembly
methods are different in terms of assembly time

31
Example Making an inference about m1 m2
A 95 confidence interval for m1 - m2 is
calculated as follows
Thus, at 95 confidence level -0.3176 lt m1 - m2 lt
0.8616 Notice Zero is included in the
confidence interval
32
Checking the required Conditions for the equal
variances case (example 13.2)
Testing Normality
Testing m1-m2, non-normal populations
The data appear to be approximately normal
33
13.4 Matched Pairs Experiment

What is a matched pair experiment?
Why matched pairs experiments are needed?
How do we deal with data produced in this way?

The following example demonstrates a
situation where a matched pair experiment is the
correct approach to testing the difference
between two population means.
34
(No Transcript)
35
13.4 Matched Pairs Experiment
Additional example

Example 13.3
To investigate the job offers obtained by MBA
graduates, a study focusing on salaries was
conducted.
Particularly, the salaries offered to finance
majors were compared to those offered to
marketing majors.
Two random samples of 25 graduates in each
discipline were selected, and the highest salary
offer was recorded for each one.
From the data, can we infer that finance majors
obtain higher salary offers than do marketing
majors among MBAs?.

36
13.4 Matched Pairs Experiment

Solution
Compare two populations of quantitative data.
The parameter tested is m1 - m2

m1
The mean of the highest salaryoffered to Finance
MBAs

H0 (m1 - m2) 0 H1 (m1 - m2) gt 0

m2
The mean of the highest salaryoffered to
Marketing MBAs
37
13.4 Matched Pairs Experiment

Solution continued

Let us assume equal variances

38
The effect of a large sample variability

Question
The difference between the sample means is 65624
60423 5,201.
So, why could not we reject H0 and favor H1
where(m1 m2 gt 0)?

39
The effect of a large sample variability

Answer
Sp2 is large (because the sample variances are
large) Sp2 311,330,926.
A large variance reduces the value of the t
statistic and it becomes more difficult to reject
H0.

40
Reducing the variability
The range of observations sample A
The values each sample consists of might markedly
vary...
The range of observations sample B
41
Reducing the variability
Differences
...but the differences between pairs of
observations might be quite close to one
another, resulting in a small variability of the
differences.
The range of the differences
0
42
The matched pairs experiment

Since the difference of the means is equal to the
mean of the differences we can rewrite the
hypotheses in terms of mD (the mean of the
differences) rather in terms of m1 m2.
This formulation has the benefit of a smaller
variability.

43
The matched pairs experiment

Example 12.4 (12.3 part II)
It was suspected that salary offers were affected
by students GPA, (which caused S12 and S22 to
increase).
To reduce this variability, the following
procedure was used
25 ranges of GPAs were predetermined.
Students from each major were randomly selected,
one from each GPA range.
The highest salary offer for each student was
recorded.
From the data presented can we conclude that
Finance majors are offered higher salaries?

44
The matched pairs hypothesis test

Solution (by hand)
The parameter tested is mD (m1 m2)
The hypothesesH0 mD 0H1 mD gt 0
The t statistic

The rejection region is t gt t.05,25-1 1.711
Degrees of freedom nD 1
45
The matched pairs hypothesis test

Solution (by hand) continue
From the data (Xm13-4.xls) calculate

46
The matched pairs hypothesis test

Solution (by hand) continue
Calculate t

47
The matched pairs hypothesis test
Xm13-4.xls
48
The matched pairs hypothesis test
Conclusion There is sufficient evidence to
infer at 5 significance level that the Finance
MBAs highest salary offer is, on the average,
higher than this of the Marketing MBAs.
49
The matched pairs mean difference estimation
50
The matched pairs mean difference estimation
Using Data Analysis Plus
Xm13-4.xls
First calculate the differences, then run the
confidence interval procedure in Data Analysis
Plus.
51
Checking the required conditionsfor the paired
observations case
Testing Normality
Testing mD non-normal populations

The validity of the results depends on the
normality of the differences.

52
13.5 Inferences about the ratio of two variances

In this section we draw inference about the ratio
of two population variances.
This question is interesting because
Variances can be used to evaluate the consistency
of processes.
The relationships between variances determine the
technique used to test relationships between mean
values

53
Parameter tested and statistic

The parameter tested is s12/s22
The statistic used is

The Sampling distribution of s12/s22
The statistic s12/s12 / s22/s22 follows the
F distribution with n1 n1 1, and n2 n2 1.

54
Parameter tested and statistic

Our null hypothesis is always
H0 s12 / s22 1

55
(No Transcript)
56
Testing the ratio of two population variances
Example 13.6 (revisiting 13.1)

(see example 13.1)
In order to perform a test regarding average
consumption of calories at peoples lunch in
relation to the inclusion of high-fiber cereal in
their breakfast, the variance ratio of two
samples has to be tested first.

Calories intake at lunch
57
Testing the ratio of two population variances

Solving by hand
The rejection region is FgtFa/2,n1,n2
or Flt1/Fa/2,n2,n1

The F statistic value is FS12/S22 .3845
Conclusion Because .3845lt.63 we can reject the
null hypothesis in favor of the alternative
hypothesis, and conclude that there is sufficient
evidence in the data to argue at 5 significance
level that the variance of the two groups differ.

58
Testing the ratio of two population variances
Example 13.6 (revisiting 13.1)

(see Xm13.1)
In order to perform a test regarding average
consumption of calories at peoples lunch in
relation to the inclusion of high-fiber cereal in
their breakfast, the variance ratio of two
samples has to be tested first.

59
Estimating the Ratio of Two Population Variances

From the statistic F s12/s12 / s22/s22 we
can isolate s12/s22 and build the following
confidence interval

60
Estimating the Ratio of Two Population Variances

Example 13.7
Determine the 95 confidence interval estimate of
the ratio of the two population variances in
example 12.1
Solution
We find Fa/2,v1,v2 F.025,40,120 1.61
(approximately)Fa/2,v2,v1 F.025,120,40 1.72
(approximately)
LCL (s12/s22)1/ Fa/2,v1,v2
(4102.98/10,669.770)1/1.61 .2388
UCL (s12/s22) Fa/2,v2,v1
(4102.98/10,669.770)1.72 .6614

61
13.6 Inference about the difference between two
population proportions

In this section we deal with two populations
whose data are nominal.
For nominal data we compare the population
proportions of the occurrence of a certain event.
Examples
Comparing the effectiveness of new drug vs.old
one
Comparing market share before and after
advertising campaign
Comparing defective rates between two machines

62
Parameter tested and statistic

Parameter
When the data is nominal, we can only count the
occurrences of a certain event in the two
populations, and calculate proportions.
The parameter tested is therefore p1 p2.
Statistic
An unbiased estimator of p1 p2 is
(the difference between the sample proportions).

63
Sampling distribution of

Two random samples are drawn from two
populations.
The number of successes in each sample is
recorded.
The sample proportions are computed.

Sample 1 Sample size n1 Number of successes
x1 Sample proportion
Sample 2 Sample size n2 Number of successes
x2 Sample proportion
64
Sampling distribution of

The statistic is approximately
normally distributed if n1p1, n1(1 - p1), n2p2,
n2(1 - p2) are all equal to or greater than 5.
The mean of is p1 - p2.
The variance of is p1(1-p1) /n1)
(p2(1-p2)/n2)

65
The z-statistic
66
Testing the p1 p2

There are two cases to consider

Case 1 H0 p1-p2 0 Calculate the pooled
proportion
Case 2 H0 p1-p2 D (D is not equal to 0) Do
not pool the data
Then
Then
67
Testing p1 p2 (Case I)

Example 13.8
Management needs to decide which of two new
packaging designs to adopt, to help improve sales
of a certain soap.
A study is performed in two communities
Design A is distributed in Community 1.
Design B is distributed in Community 2.
The old design packages is still offered in both
communities.
Design A is more expensive, therefore,to be
financially viable it has to outsell design B.

68
Testing p1 p2 (Case I)

Summary of the experiment results
Community 1 - 580 packages with new design A
sold 324 packages with old design sold
Community 2 - 604 packages with new design B sold
442 packages with old design sold
Use 5 significance level and perform a test to
find which type of packaging to use.

69
Testing p1 p2 (Case I)

Solution
The problem objective is to compare the
population of sales of the two packaging designs.
The data is qualitative (yes/no for the purchase
of the new design per customer)
The hypotheses test are H0 p1 - p2 0 H1 p1
- p2 gt 0
We identify here case 1.

Population 1 purchases of Design A
Population 2 purchases of Design B
70
Testing p1 p2 (Case I)

Solving by hand
For a 5 significance level the rejection region
isz gt za z.05 1.645

71
Testing p1 p2 (Case I)
Additional example

Excel (Data Analysis Plus)

Conclusion
Since 2.89 gt 1.645, there is sufficient evidence
in the data to conclude at 5 significance level,
that design A will outsell design B.

72
Testing p1 p2 (Case II)

Example 13.9 (Revisit example 13.08)
Management needs to decide which of two new
packaging designs to adopt, to help improve sales
of a certain soap.
A study is performed in two communities
Design A is distributed in Community 1.
Design B is distributed in Community 2.
The old design packages is still offered in both
communities.
For design A to be financially viable it has to
outsell design B by at least 3.

73
Testing p1 p2 (Case II)

Summary of the experiment results
Community 1 - 580 packages with new design A
sold 324 packages with old design sold
Community 2 - 604 packages with new design B sold
442 packages with old design sold
Use 5 significance level and perform a test to
find which type of packaging to use.

74
Testing p1 p2 (Case II)

Solution
The hypotheses to test are H0 p1 - p2
.03 H1 p1 - p2 gt .03
We identify case 2 of the test for difference in
proportions (the difference is not equal to zero).

75
Testing p1 p2 (Case II)

Solving by hand

The rejection region is z gt za z.05
1.645. Conclusion Since 1.58 lt 1.645 do not
reject the null hypothesis. There is insufficient
evidence to infer that packaging with Design A
will outsell this of Design B by 3 or more.
76
Testing p1 p2 (Case II)

Using Excel (Data Analysis Plus)

Xm13-08.xls
77
Estimating p1 p2

Example (estimating the cost of life saved)
Two drugs are used to treat heart attack victims
Streptokinase (available since 1959, costs 460)
t-PA (genetically engineered, costs 2900).
The maker of t-PA claims that its drug
outperforms Streptokinase.
An experiment was conducted in 15 countries.
20,500 patients were given t-PA
20,500 patients were given Streptokinase
The number of deaths by heart attacks was
recorded.

78
Estimating p1 p2

Experiment results
A total of 1497 patients treated with
Streptokinase died.
A total of 1292 patients treated with t-PA died.
Estimate the cost per life saved by using t-PA
instead of Streptokinase.

79
Estimating p1 p2

Solution
The problem objective Compare the outcomes of
two treatments.
The data is nominal (a patient lived/died)
The parameter estimated is p1 p2.
p1 death rate with t-PA
p2 death rate with Streptokinase

80
Estimating p1 p2

Solving by hand
Sample proportions
The 95 confidence interval is

81
Estimating p1 p2

Interpretation
We estimate that between .51 and 1.49 more
heart attack victims will survive because of the
use of t-PA.
The difference in cost per life saved is
2900-460 2440.
The total cost saved by switching to t-PA is
estimated to be between 2440/.0149 163,758 and
2440/.0051 478,431

Write a Comment

User Comments (0)