Inference about Comparing Two Populations - PowerPoint PPT Presentation

1 / 84
About This Presentation
Title:

Inference about Comparing Two Populations

Description:

The mean caloric intake of consumers (m1) is less than that of non-consumers (m2) ... m1= mean caloric intake for fiber consumers. m2= mean caloric intake for ... – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 85
Provided by: sba42
Category:

less

Transcript and Presenter's Notes

Title: Inference about Comparing Two Populations


1
Inference about Comparing Two Populations
  • Chapter 13

2
13.1 Introduction
  • In previous discussions we presented methods
    designed to make an inference about
    characteristics of a single population. We
    estimated, for example the population mean, or
    hypothesized on the value of the standard
    deviation.
  • However, in the real world we encounter many
    times the need to study the relationship between
    two populations.
  • For example, we want to compare the effects of a
    new drug on blood pressure, in which case we can
    test the relationship between the mean blood
    pressure of two groups of individuals those who
    take the drug, and those who dont.
  • Or, we are interested in the effects a certain ad
    has on voters preferences as part of an election
    campaign. In this case we can estimate the
    difference in the proportion of voters who prefer
    one candidate before and after the ad is
    televised.

3
13.1 Introduction
  • Variety of techniques are presented whose
    objective is to compare two populations.
  • These techniques are designed to study the
  • difference between two means.
  • ratio of two variances.
  • difference between two proportions.

4
13.2 Inference about the Difference between Two
Means Independent Samples
  • Well look at the relationship between the two
    population means by analyzing the value of m1
    m2.

5
The Sampling Distribution of
  • is normally distributed if the
    (original) population distributions are normal .
  • is approximately normally
    distributed if the (original) population is not
    normal, but the samples size is sufficiently
    large (greater than 30).
  • The expected value of is m1 -
    m2
  • The variance of is

6
Making an inference about m1 m2
  • If the sampling distribution of is
    normal or approximately normal we can write
  • Z can be used to build a test statistic or a
    confidence interval for m1 - m2

7
Making an inference about m1 m2
  • Practically, the Z statistic is hardly used,
    because the population variances are not known.

t
S22
S12
?
?
  • Instead, we construct a t statistic using the
  • sample variances (S12 and S22).

8
Making an inference about m1 m2
  • Two cases are considered when producing the
    t-statistic.
  • The two unknown population variances are equal.
  • The two unknown population variances are not
    equal.

9
Inference about m1 m2 Equal variances
  • If the two variances s12 and s22 are equal to
    one another, then their estimate S12 and S22
    estimate the same value.
  • Therefore, we can pool the two sample variances
    and provide a better estimate of the common
    populations variance, based on a larger amount
    of information.
  • This is done by forming the pooled variance
    estimate. See next.

10
Inference about m1 m2 Equal variances
  • Calculate the pooled variance estimate by

11
Inference about m1 m2 Equal variances
  • Calculate the pooled variance estimate by

12
Inference about m1 m2 Equal variances
  • Construct the t-statistic as follows

13
Inference about m1 m2 Unequal variances
14
Which case to useEqual variance or unequal
variance?
  • Whenever there is insufficient evidence that the
    variances are unequal, it is preferable to run
    the equal variances t-test.
  • This is so, because for any two given samples

The number of degrees of freedom for the equal
variances case
The number of degrees of freedom for the unequal
variances case
³
15
(No Transcript)
16
Example Making an inference about m1 m2
  • Example 13.1
  • Do people who eat high-fiber cereal for breakfast
    consume, on average, fewer calories for lunch
    than people who do not eat high-fiber cereal for
    breakfast?
  • A sample of 150 people was randomly drawn. Each
    person was identified as a consumer or a
    non-consumer of high-fiber cereal.
  • For each person the number of calories consumed
    at lunch was recorded.

17
Example Making an inference about m1 m2
  • Solution
  • The data are quantitative.
  • The parameter to be tested is
  • the difference between two means.
  • The claim to be tested is
  • The mean caloric intake of consumers (m1)
  • is less than that of non-consumers (m2).

18
Example Making an inference about m1 m2
  • The hypotheses are
  • H0 m1 - m2 0
  • H1 m1 - m2 lt 0
  • To check the relationships between the
    variances, we use a computer output to find the
    sample variances (Xm13-1.xls). From the data
    we have S12 4103, and S22 10,670.
  • It appears that the variances are unequal.

m1 mean caloric intake for fiber consumers m2
mean caloric intake for fiber non-consumers
19
Example Making an inference about m1 m2
  • Solving by hand
  • From the data we have

20
Example Making an inference about m1 m2
  • Solving by hand
  • H1 m1 - m2 lt 0 The rejection region is t lt
    -ta,df -t.05,123 _at_ -1.658

21
Example Making an inference about m1 m2
Xm13-1.xls
At 5 significance level there is sufficient
evidence to reject the null hypothesis.
22
Example Making an inference about m1 m2
  • Solving by hand
  • The confidence interval estimator for the
    differencebetween two means when the variances
    are unequal is

23
Example Making an inference about m1 m2
Note that the confidence interval for the
differencebetween the two means falls entirely
in the negativeregion -56.86, -1.56 even at
best the difference between the two means is m1
m2 -1.56, so we can be 95 confident m1 is
smaller than m2 ! This conclusion agrees with
the results of the test performed before.
24
Example Making an inference about m1 m2
  • Example 13.2
  • An ergonomic chair can be assembled using two
    different sets of operations (Method A and Method
    B)
  • The operations manager would like to know whether
    the assembly time under the two methods differ.

25
Example Making an inference about m1 m2
  • Example 13.2
  • Two samples are randomly and independently
    selected
  • A sample of 25 workers assembled the chair using
    design A.
  • A sample of 25 workers assembled the chair using
    design B.
  • The assembly times were recorded
  • Do the assembly times of the two methods differs?

26
Example Making an inference about m1 m2
Assembly times in Minutes
  • Solution
  • The data are quantitative.
  • The parameter of interest is the difference
  • between two population means.
  • The claim to be tested is whether a difference
  • between the two designs exists.

27
Example Making an inference about m1 m2
  • Solving by hand
  • The hypotheses test is
  • H0 m1 - m2 0 H1 m1 - m2 ¹
    0
  • To check the relationship between the two
    variances we calculate the value of S12 and S22
    (Xm13-02.xls).
  • From the data we have S12 0.8478, and S22
    1.3031.so s12 and s22 appear to be equal.

28
Example Making an inference about m1 m2
  • Solving by hand
  • To calculate the t-statistic we have

29
Example Making an inference about m1 m2
  • The 2-tail rejection region is t lt -ta/2,n
    -t.025,48 -2.009 or t gt ta/2,n
    t.025,48 2.009
  • The test Since t -2.009 lt 0.93 lt 2.009, there
    is insufficient evidence to reject the null
    hypothesis.

For a 0.05
2.009
.093
-2.009
30
Example Making an inference about m1 m2
Xm13-02.xls
31
Example Making an inference about m1 m2
  • Conclusion From this experiment, it is unclear
    at 5 significance level if the two assembly
    methods are different in terms of assembly time

32
Example Making an inference about m1
m2Constructing a Confidence Interval
A 95 confidence interval for m1 - m2 when the
two variances areequal is calculated as follows
Thus, at 95 confidence level -0.3176 lt m1 - m2 lt
0.8616 Notice Zero is included in the
confidence interval and therefore the two mean
values could be equal.
33
Checking the required Conditions for the equal
variances case (example 13.2)
The data appear to be approximately normal
34
13.4 Matched Pairs Experiment -Dependent samples
  • What is a matched pair experiment?
  • A matched pairs experiment is a sampling design
    in which every two observations share some
    characteristic. For example, suppose we are
    interested in increasing workers productivity. We
    establish a compensation program and want to
    study its efficiency. We could select two groups
    of workers, measure productivity before and after
    the program is established and run a test as we
    did before.
  • But, if we believe workers age is a factor that
    may affect changes in productivity, we can divide
    the workers into different age groups, select a
    worker from each age group, and measure his or
    her productivity twice. One time before and one
    time after the program is established. Each two
    observations constitute a matched pair, and
    because they belong to the same age group they
    are not independent.

35
13.4 Matched Pairs Experiment -Dependent samples
Why matched pairs experiments are needed?
The following example demonstrates a
situation where a matched pair experiment is the
correct approach to testing the difference
between two population means.
36
13.4 Matched Pairs Experiment
Additional example
  • Example 13.3
  • To investigate the job offers obtained by MBA
    graduates, a study focusing on salaries was
    conducted.
  • Particularly, the salaries offered to finance
    majors were compared to those offered to
    marketing majors.
  • Two random samples of 25 graduates in each
    discipline were selected, and the highest salary
    offer was recorded for each one.
  • From the data, can we infer that finance majors
    obtain higher salary offers than marketing majors
    among MBAs?.

37
13.4 Matched Pairs Experiment
  • Solution
  • Compare two populations of quantitative data.
  • The parameter tested is m1 - m2

m1
The mean of the highest salaryoffered to Finance
MBAs
  • H0 m1 - m2 0 H1 m1 - m2 gt 0

m2
The mean of the highest salaryoffered to
Marketing MBAs
38
13.4 Matched Pairs Experiment
  • Solution continued
  • Let us assume equal variances

39
The effect of a large sample variability
  • Question
  • The difference between the sample means is 65624
    60423 5,201.
  • So, why could not we reject H0 and favor H1?

40
The effect of a large sample variability
  • Answer
  • Sp2 is large (because the sample variances are
    large) Sp2 311,330,926.
  • A large variance reduces the value of the t
    statistic and it becomes more difficult to reject
    H0.

Recall that rejection of thenull hypothesis
occurs whent is sufficiently large (tgtta). A
large Sp2 reduces t and therefore it does not
fall inthe rejection region.
41
The matched pairs experiment
  • We are looking for hypotheses formulation where
    the variability of the two samples has been
    reduced.
  • By taking matched pair observations and testing
    the differences per pair we achieve two goals
  • We still test m1 m2 (see explanation next)
  • The variability used to calculate the t-statistic
    is usually smaller (see explanation next).

42
The matched pairs experiment Are we still
testing m1 m2?
  • Note that the difference between the two means is
    equal to the mean difference of pairs of
    observations
  • A short example

43
The matched pairs experiment Reducing the
variability
The range of observations sample A
Observations might markedly differ...
The range of observations sample B
44
The matched pairs experiment Reducing the
variability
Differences
...but the differences between pairs of
observations might have much smaller variability.
The range of the differences
0
45
The matched pairs experiment
  • Example 12.4 (12.3 part II)
  • It was suspected that salary offers were affected
    by students GPA, (which caused S12 and S22 to
    increase).
  • To reduce this variability, the following
    procedure was used
  • 25 ranges of GPAs were predetermined.
  • Students from each major were randomly selected,
    one from each GPA range.
  • The highest salary offer for each student was
    recorded.
  • From the data presented can we conclude that
    Finance majors are offered higher salaries?

46
The matched pairs hypothesis test
  • Solution (by hand)
  • The parameter tested is mD (m1 m2)
  • The hypothesesH0 mD 0H1 mD gt 0
  • The t statistic

The rejection region is t gt t.05,25-1 1.711
Degrees of freedom nD 1
47
The matched pairs hypothesis test
  • Solution (by hand) continue
  • From the data (Xm13-4.xls) calculate

48
The matched pairs hypothesis test
  • Solution (by hand) continue
  • Calculate t

See conclusion later
49
The matched pairs hypothesis test
Using Data Analysis in Excel
Xm13-4.xls
50
The matched pairs hypothesis test
Conclusion There is sufficient evidence to
infer at 5 significance level that the Finance
MBAs highest salary offer is, on the average,
higher than this of the Marketing MBAs.
51
The matched pairs mean difference estimation
52
The matched pairs mean difference estimation
Using Data Analysis Plus
Xm13-4.xls
First calculate the differences for each pair,
then run the confidence interval procedure in
Data Analysis Plus.
53
Checking the required conditionsfor the paired
observations case
  • The validity of the results depends on the
    normality of the differences.

54
13.5 Inferences about the ratio of two variances
  • In this section we draw inference about the
    relationship between two population variances.
  • This question is interesting because
  • Variances can be used to evaluate the consistency
    of processes.
  • The relationships between variances determine the
    technique used to test relationships between mean
    values

55
Parameter tested and statistic
  • The parameter tested is s12/s22
  • The statistic used is
  • The Sampling distribution of s12/s22
  • The statistic s12/s12 / s22/s22 follows the
    F distribution withNumerator d.f. n1 1, and
    Denominator d.f. n2 1.

56
Parameter tested and statistic
  • Our null hypothesis is always
  • H0 s12 / s22 1

57
(No Transcript)
58
Testing the ratio of two population variances
Example 13.6 (revisiting 13.1)
  • (see example 13.1)
  • In order to test whether having a rich-in-fiber
    breakfast reduces the amount of caloric intake at
    lunch, we need to decide whether the variances
    are equal or not.

Calories intake at lunch
59
Testing the ratio of two population variances
  • Solving by hand
  • The rejection region is
  • The F statistic value is FS12/S22 .3845
  • Conclusion Because .3845lt.63 we can reject the
    null hypothesis in favor of the alternative
    hypothesis, and conclude that there is sufficient
    evidence in the data to argue at 5 significance
    level that the variance of the two groups differ.

60
Testing the ratio of two population variances
Example 13.6 (revisiting 13.1)
  • (see Xm13.1)

From Data Analysis
61
Estimating the Ratio of Two Population Variances
  • From the statistic F s12/s12 / s22/s22 we
    can isolate s12/s22 and build the following
    confidence interval

62
Estimating the Ratio of Two Population Variances
  • Example 13.7
  • Determine the 95 confidence interval estimate of
    the ratio of the two population variances in
    example 12.1
  • Solution
  • We find Fa/2,v1,v2 F.025,40,120 1.61
    (approximately)Fa/2,v2,v1 F.025,120,40 1.72
    (approximately)
  • LCL (s12/s22)1/ Fa/2,v1,v2
    (4102.98/10,669.770)1/1.61 .2388
  • UCL (s12/s22) Fa/2,v2,v1
    (4102.98/10,669.770)1.72 .6614

63
13.6 Inference about the difference between two
population proportions
  • In this section we deal with two populations
    whose data are nominal.
  • For nominal data we compare the population
    proportions of the occurrence of a certain event.
  • Examples
  • Comparing the effectiveness of new drug vs.old
    one
  • Comparing market share before and after
    advertising campaign
  • Comparing defective rates between two machines

64
Parameter tested and statistic
  • Parameter
  • When the data is nominal, we can only count the
    occurrences of a certain event in the two
    populations, and calculate proportions.
  • The parameter tested is therefore p1 p2.
  • Statistic
  • An unbiased estimator of p1 p2 is
    (the difference between the sample proportions).

65
Sampling distribution of
  • Two random samples are drawn from two
    populations.
  • The number of successes in each sample is
    recorded.
  • The sample proportions are computed.

Sample 1 Sample size n1 Number of successes
x1 Sample proportion
Sample 2 Sample size n2 Number of successes
x2 Sample proportion
66
Sampling distribution of
  • The statistic is approximately
    normally distributed if n1p1, n1(1 - p1), n2p2,
    n2(1 - p2) are all equal to or greater than 5.
  • The mean of is p1 - p2.
  • The variance of is p1(1-p1) /n1)
    (p2(1-p2)/n2)

67
The z-statistic
68
Testing p1 p2
  • There are two cases to consider

Case 1 H0 p1-p2 0 Calculate the pooled
proportion
Case 2 H0 p1-p2 D (D is not equal to 0) Do
not pool the data
Then
Then
69
Testing p1 p2 (Case I)
  • Example 13.8
  • Management needs to decide which of two new
    packaging designs to adopt, to help improve sales
    of a certain soap.
  • A study is performed in two communities
  • Design A is distributed in Community 1.
  • Design B is distributed in Community 2.
  • The old design packages is still offered in both
    communities.
  • Design A is more expensive, therefore,to be
    financially viable it has to outsell design B.

70
Testing p1 p2 (Case I)
  • Summary of the experiment results
  • Community 1 - 580 packages with new design A
    sold 324 packages with old design sold
  • Community 2 - 604 packages with new design B sold
    442 packages with old design sold
  • Use 5 significance level and perform a test to
    find which type of packaging to use.

71
Testing p1 p2 (Case I)
  • Solution
  • The problem objective is to compare the
    population of sales of the two packaging designs.
  • The data is qualitative (yes/no for the purchase
    of the new design per customer)
  • The hypotheses test are H0 p1 - p2 0 H1 p1
    - p2 gt 0
  • We identify here case 1.

Population 1 purchases of Design A
Population 2 purchases of Design B
72
Testing p1 p2 (Case I)
  • Solving by hand
  • For a 5 significance level the rejection region
    isz gt za z.05 1.645

73
Testing p1 p2 (Case I)
  • Conclusion At 5 significance level there
    sufficient evidence to infer that the proportion
    of sales with design A is greater that the
    proportion of sales with design B (since 2.89 gt
    1.645).

74
Testing p1 p2 (Case I)
Additional example
  • Excel (Data Analysis Plus)
  • Conclusion
  • Since 2.89 gt 1.645, there is sufficient evidence
    in the data to conclude at 5 significance level,
    that design A will outsell design B.

75
Testing p1 p2 (Case II)
  • Example 13.9 (Revisit example 13.08)
  • Management needs to decide which of two new
    packaging designs to adopt, to help improve sales
    of a certain soap.
  • A study is performed in two communities
  • Design A is distributed in Community 1.
  • Design B is distributed in Community 2.
  • The old design packages is still offered in both
    communities.
  • For design A to be financially viable it has to
    outsell design B by at least 3.

76
Testing p1 p2 (Case II)
  • Summary of the experiment results
  • Community 1 - 580 packages with new design A
    sold 324 packages with old design sold
  • Community 2 - 604 packages with new design B sold
    442 packages with old design sold
  • Use 5 significance level and perform a test to
    find which type of packaging to use.

77
Testing p1 p2 (Case II)
  • Solution
  • The hypotheses to test are H0 p1 - p2
    .03 H1 p1 - p2 gt .03
  • We identify case 2 of the test for difference in
    proportions (the difference is not equal to zero).

78
Testing p1 p2 (Case II)
  • Solving by hand

The rejection region is z gt za z.05
1.645. Conclusion Since 1.58 lt 1.645 do not
reject the null hypothesis. There is insufficient
evidence to infer that packaging with Design A
will outsell this of Design B by 3 or more.
79
Testing p1 p2 (Case II)
  • Using Excel (Data Analysis Plus)

Xm13-08.xls
80
Estimating p1 p2
  • Example (estimating the cost of life saved)
  • Two drugs are used to treat heart attack victims
  • Streptokinase (available since 1959, costs 460)
  • t-PA (genetically engineered, costs 2900).
  • The maker of t-PA claims that its drug
    outperforms Streptokinase.
  • An experiment was conducted in 15 countries.
  • 20,500 patients were given t-PA
  • 20,500 patients were given Streptokinase
  • The number of deaths by heart attacks was
    recorded.

81
Estimating p1 p2
  • Experiment results
  • A total of 1497 patients treated with
    Streptokinase died.
  • A total of 1292 patients treated with t-PA died.
  • Estimate the cost per life saved by using t-PA
    instead of Streptokinase.

82
Estimating p1 p2
  • Solution
  • The problem objective Compare the outcomes of
    two treatments.
  • The data is nominal (a patient lived/died)
  • The parameter estimated is p1 p2.
  • p1 death rate with t-PA
  • p2 death rate with Streptokinase

83
Estimating p1 p2
  • Solving by hand
  • Sample proportions
  • The 95 confidence interval is

84
Estimating p1 p2
  • Interpretation
  • We estimate that between .51 and 1.49 more
    heart attack victims will survive because of the
    use of t-PA.
  • The difference in cost per life saved is
    2900-460 2440.
  • The total cost saved by switching to t-PA is
    estimated to be between 2440/.0149 163,758 and
    2440/.0051 478,431
Write a Comment
User Comments (0)
About PowerShow.com