CHI-SQUARE TESTS

About This Presentation

Title:

CHI-SQUARE TESTS

Description:

The shape of a chi-squared distribution curve is skewed to the right for small ... Women (W) No Opinions (N) Against (A) In Favor (F) 40. Table 11.6. Solution ... – PowerPoint PPT presentation

Number of Views:186

Avg rating:3.0/5.0

Slides: 93

Provided by: itd976

Category:

more less

Transcript and Presenter's Notes

Title: CHI-SQUARE TESTS

1
CHAPTER 11

CHI-SQUARE TESTS

2
THE CHI-SQUARE DISTRIBUTION

Definition
The chi-square distribution has only one
parameter called the degrees of freedom. The
shape of a chi-squared distribution curve is
skewed to the right for small df and becomes
symmetric for large df. The entire chi-square
distribution curve lies to the right of the
vertical axis. The chi-square distribution
assumes nonnegative values only, and these are
denoted by the symbol ?2 (read as chi-square).

3
Figure 11.1 Three chi-square distribution curves.
4
Example 11-1

Find the value of ?² for 7 degrees of freedom and
an area of .10 in the right tail of the
chi-square distribution curve.

5
Table 11.1 ?2 for df 7 and .10 Area in the
Right Tail
Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve
df .995 .100 .005
1 2 . 7 . 100 .000 .010 .989 67.328 2.706 4.605 12.017 118.498 7.879 10.597 20.278 140.169
Required value of ?²
6
Figure 11.2
df 7
.10
12.017
0
?²
7
Example 11-2

Find the value of ?² for 12 degrees of freedom
and area of .05 in the left tail of the
chi-square distribution curve.

8
Solution 11-2

Area in the right tail
1 Area in the left tail
1 .05 .95

9
Table 11.2 ?2 for df 12 and .95 Area in the
Right Tail
Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve
df .995 .950 .005
1 2 . 12 . 100 .000 .010 3.074 67.328 .004 .103 5.226 77.929 7.879 10.597 28.300 140.169
Required value of ?²
10
Figure 11.3
df 12
Shaded area .95
.05
5.226
0
?²
11
A GOODNESS-OF-FIT TEST

Definition
An experiment with the following characteristics
is called a multinomial experiment.

12
Multinomial Experiment cont.

It consists of n identical trials (repetitions).
Each trial results in one of k possible outcomes
(or categories), where k gt 2.
The trials are independent.
The probabilities of the various outcomes remain
constant for each trial.

13
A GOODNESS-OF-FIT TEST cont.

Definition
The frequencies obtained from the performance of
an experiment are called the observed frequencies
and are denoted by O. The expected frequencies,
denoted by E, are the frequencies that we expect
to obtain if the null hypothesis is true. The
expected frequency for a category is obtained as
E np
Where n is the sample size and p is the
probability that an element belongs to that
category if the null hypothesis is true.

14
A GOODNESS-OF-FIT TEST cont.

Degrees of Freedom for a Goodness-of-Fit Test
In a goodness-of-fit test, the degrees of freedom
are
df k 1
where k denotes the number of possible outcomes
(or categories) for the experiment.

15
Test Statistic for a Goodness-of-Fit Test

The test statistic for a goodness-of-fit test is
?2 and its value is calculated as
where
O observed frequency for a category
E expected frequency for a category np
Remember that a chi-square goodness-of-fit test
is always right-tailed.

16
Example 11-3

A bank has an ATM installed inside the bank, and
it is available to its customers only from 7 AM
to 6 PM Monday through Friday. The manager of the
bank wanted to investigate if the percentage of
transactions made on this ATM is the same for
each of the five days (Monday through Friday) of
the week. She randomly selected one week and
counted the number of transactions made on this
ATM on each of the five days during this week.
The information she obtained is given in the
following table, where the number of users
represents the number of transactions on this ATM
on these days. For convenience, we will refer to
these transactions as people or users.

17
Example 11-3

At the 1 level of significance, can we reject
the null hypothesis that the proportion of people
who use this ATM each of the five days of the
week is the same? Assume that this week is
typical of all weeks in regard to the use of this
ATM.

Day Monday Tuesday Wednesday Thursday Friday
Number of users 253 197 204 179 267
18
Solution 11-3

H0 p1 p2 p3 p4 p5 .20
H1 At least two of the five proportions are
not equal to .20

19
Solution 11.3

There are five categories
Five days on which the ATM is used
Multinomial experiment
We use the chi-square distribution to make this
test.

20
Solution 11-3

Area in the right tail a .01
k number of categories 5
df k 1 5 1 4
The critical value of ?2 13.277

21
Figure 11.4
Reject H0
Do not reject H0
a .01
13.277
?2
Critical value of ?2
22
Table 11.3
Category (Day) Observed Frequency O p Expected Frequency E np (O E) (O E)2
Monday Tuesday Wednesday Thursday Friday 253 197 204 279 267 .20 .20 .20 .20 .20 1200(.20) 240 1200(.20) 240 1200(.20) 240 1200(.20) 240 1200(.20) 240 13 -43 -36 39 27 169 1849 1296 1521 729 .704 7.704 5.400 6.338 3.038
n 1200 Sum 23.184
23
Solution 11-3

All the required calculations to find the value
of the test statistic ?2 are shown in Table 11.3.

24
Solution 11.3

The value of the test statistic ?2 23.184 is
larger than the critical value of ?2 13.277
It falls in the rejection region
Hence, we reject the null hypothesis

25
Example 11-4

In a National Public Transportation survey
conducted in 1995 on the modes of transportation
used to commute to work, 79.6 of the respondents
said that they drive alone, 11.1 car pool, 5.1
use public transit, and 4.2 depend on other
modes of transportation (USA TODAY, April 14,
1999). Assume that these percentages hold true
for the 1995 population of all commuting workers.
Recently 1000 randomly selected workers were
asked what mode of transportation they use to
commute to work. The following table lists the
results of this survey.

26
Example 11-4
Mode of transportation Drive alone Carpool Public transit Other
Number of workers 812 102 57 29

Test at the 2.5 significance level whether the
current pattern of use of transportation modes is
different from that for 1995.
27
Solution 11-4

H0 The current percentage distribution of
the use of transportation modes is the same
as that for 1995.
H1 The current percentage distribution of
the use of transportation modes is different
from that for 1995.

28
Solution 11-4

There are four categories
Drive alone, carpool, public transit, and other
Multinomial experiment
We use the chi-square distribution to make the
test.

29
Solution 11-4

Area in the right tail a .025
k number of categories 4
df k 1 4 1 3
The critical value of ?2 9.348

30
Figure 11.5
Reject H0
Do not reject H0
a .025
9.348
?2
Critical value of ?2
31
Table 11.4
Category Observed Frequency O p Expected Frequency E np (O E) (O E)2
Drive alone Car pool Public transit Other 812 102 57 29 .796 .111 .051 .042 1000(.796) 796 1000(.111) 111 1000(.051) 51 1000(.042) 42 16 -9 6 -13 256 81 36 169 .322 .730 .706 4.024
n 1000 Sum 5.782
32
Solution 11-4

All the required calculations to find the value
of the test statistic ?2 are shown in Table 11.4.

33
Solution 11-4

The value of the test statistic ?2 5.782 is
less than the critical value of ?2 9.348
It falls in the nonrejection region
Hence, we fail to reject the null hypothesis.

34
CONTINGENCY TABLES
Table 11.5 Total 2002 Enrollment at a University
Full-Time Part-Time
Male 6768 2615
Female 7658 3717
Students who are male and enrolled part-time
35
A TEST OF INDEPENDENCE OR HOMOGENEITY

A Test of Independence
A Test of Homogeneity

36
A Test of Independence

Definition
A test of independence involves a test of the
null hypothesis that two attributes of a
population are not related. The degrees of
freedom for a test of independence are
df (R 1)(C 1)
Where R and C are the number of rows and the
number of columns, respectively, in the given
contingency table.

37
A Test of Independence cont.

Test Statistic for a Test of Independence
The value of the test statistic ?2 for a test of
independence is calculated as
where O and E are the observed and expected
frequencies, respectively, for a cell.

38
Example 11-5

Violence and lack of discipline have become major
problems in schools in the United States. A
random sample of 300 adults was selected, and
they were asked if they favor giving more freedom
to schoolteachers to punish students for violence
and lack of discipline. The two-way
classification of the responses of these adults
is represented in the following table.

39
Example 11-5

Calculate the expected frequencies for this table
assuming that the two attributes, gender and
opinions on the issue, are independent.

In Favor (F) Against (A) No Opinions (N)
Men (M) Women (W) 93 87 70 32 12 6
40
Table 11.6

Solution 11-5

In Favor (F) Against (A) No Opinion (N) Row Totals
Men (M) 93 70 12 175
Women (W) 87 32 6 125
Column Totals 180 102 18 300
41
Expected Frequencies for a Test of Independence

The expected frequency E for a cell is calculated
as

42
Table 11.7

Solution 11-5

In Favor (F) Against (A) No Opinion (O) Row Totals
Men (M) 93 (105.00) 70 (59.50) 12 (10.50) 175
Women (W) 87 (75.00) 32 (42.50) 6 (7.50) 125
Column Totals 180 102 18 300
43
Example 11-6

Reconsider the two-way classification table given
in Example 11-5. In that example, a random sample
of 300 adults was selected, and they were asked
if they favor giving more freedom to
schoolteachers to punish students for violence
and lack of discipline. Based on the results of
the survey, a two-way classification table was
prepared and presented in Example 11-5. Does the
sample provide sufficient information to conclude
that the two attributes, gender and opinions of
adults, are dependent? Use a 1 significance
level.

44
Solution 11-6

H0 Gender and opinions of adults are
independent
H1 Gender and opinions of adults are
dependent

45
Solution 11-6

a .01
df (R 1)(C 1) (2 1)(3 1) 2
The critical value of ?2 9.210

46
Figure 11.6
Reject H0
Do not reject H0
a .01
9.210
?2
Critical value of ?2
47
Table 11.8
In Favor (F) Against (A) No Opinion (N) Row Totals
Men (M) 93 (105.00) 70 (59.50) 12 (10.50) 175
Women (W) 87 (75.00) 32 (42.50) 6 (7.50) 125
Column Totals 180 102 18 300
48
Solution 11-6
49
Solution 11-6

The value of the test statistic ?2 8.252
It is less than the critical value of ?2
It falls in the nonrejection region
Hence, we fail to reject the null hypothesis

50
Example 11-7

A researcher wanted to study the relationship
between gender and owning cell phones. She took a
sample of 2000 adults and obtained the
information given in the following table.

51
Example 11-7

At the 5 level of significance, can you conclude
that gender and owning cell phones are related
for all adults?

Own Cell Phones Do Not Own Cell Phones
Men Women 640 440 450 470
52
Solution 11-7

H0 Gender and owning a cell phone are not
related
H1 Gender and owning a cell phone are related

53
Solution 11-7

We are performing a test of independence
We use the chi-square distribution
a .05.
df (R 1)(C 1) (2 1)(2 1) 1
The critical value of ?2 3.841

54
Figure 11.7
Reject H0
Do not reject H0
a .05
3.841
?2
Critical value of ?2
55
Table 11.9
Own Cell Phones (Y) Do Not Own Cell Phones (N) Row Totals
Men (M) 640 (588.60) 450 (501.40) 1090
Women (W) 440 (491.40) 470 (418.60) 910
Column Totals 1080 920 2000
56
Solution 11-7
57
Solution 11-7

The value of the test statistic ?2 21.445
It is larger than the critical value of ?2
It falls in the rejection region
Hence, we reject the null hypothesis

58
A Test of Homogeneity

Definition
A test of homogeneity involves testing the null
hypothesis that the proportions of elements with
certain characteristics in two or more different
populations are the same against the alternative
hypothesis that these proportions are not the
same.

59
Example 11-8

Consider the data on income distributions for
households in California and Wisconsin given in
following table

California Wisconsin Row Totals
High Income 70 34 104
Medium Income 80 40 120
Low Income 100 76 176
Column Totals 250 150 400
60
Example 11-8

Using the 2.5 significance level, test the null
hypothesis that the distribution of households
with regard to income levels is similar
(homogeneous) for the two states.

61
Solution 11-8

H0 The proportions of households that belong
to different income groups are the same in
both states
H1 The proportions of households that belong
to different income groups are not the same
in both states

62
Solution 11-8

a .025
df (R 1)(C 1) (3 1)(2 1) 2
The critical value of ?2 7.378

63
Figure 11.7
Reject H0
Do not reject H0
a .025
7.378
?2
Critical value of ?2
64
Table 11.11
California Wisconsin Row Totals
High income 70 (65) 34 (39) 104
Medium income 80 (75) 40 (45) 120
Low income 100 (110) 76 (66) 176
Column Totals 250 150 400
65
Solution 11-8
66
Solution 11-8

The value of the test statistic ?2 4.339
It is less than the critical value of ?2
It falls in the nonrejection region
Hence, we fail to reject the null hypothesis

67
INFERENCES ABOUT THE POPULATION VARIANCE

Estimation of the Population Variance
Hypothesis Tests About the Population Variance

68
INFERENCES ABOUT THE POPULATION VARIANCE cont.

Sampling Distribution of (n 1)s2 / s2
If the population from which the sample is
selected is (approximately) normally distributed,
then
has a chi-square distribution with n 1 degrees
of freedom.

69
Estimation of the Population Variance

Assuming that the population from which the
sample is selected is (approximately) normally
distributed, the (1 a)100 confidence interval
for the population variance s2 is

70
Example 11-9

One type of cookie manufactured by Haddad Food
Company is Cocoa Cookies. The machine that fills
packages of these cookies is set up in such a way
that the average net weight of these packages is
32 ounces with a variance of .015 square ounce.

71
Example 11-9

From time to time the quality control inspector
at the company selects a sample of a few such
packages, calculates the variance of the net
weights of these packages, and construct a 95
confidence interval for the population variance.
If either both or one of the two limits of this
confidence interval is not the interval .008 to
.030, the machine is stopped and adjusted.

72
Example 11-9

A recently taken random sample of 25 packages
from the production line gave a sample variance
of .029 square ounce. Based on this sample
information, do you think the machine needs an
adjustment? Assume that the net weights of
cookies in all packages are normally distributed.

73
Solution 11-9

n 25 s2 .029
a 1 - .95 .05
a / 2 .05 / 2 .025
1 a / 2 1 .025 .975
df n 1 25 1 24
?2 for 24 df and .025 area in the right tail
39.364
?2 for 24 df and .975 area in the right tail
12.401

74
Figure 11.9
df 24
.025
39.364
?2
Value of
75
Figure 11.9
df 24
.025
?2
12.401
Value of
76
Solution 11-9
77
Solution 11-9

Thus, with 95 confidence, we can state that the
variance for all packages of Cocoa Cookies lies
between .0177 and .0561 square ounce.

78
Hypothesis Tests About the Population Variance

The value of the test statistic ?2 is calculated
as
where s2 is the sample variance, s2 is the
hypothesized value of the population variance,
and n 1 represents the degrees of freedom. The
population from which the sample is selected is
assumed to be (approximately) normally
distributed.

79
Example 11-10

One type of cookie manufactured by Haddad Food
Company is Cocoa Cookies. The machine that fills
packages of these cookies is set up in such a way
that the average net weight of these packages is
32 ounces with a variance of .015 square ounce.
From time to time the quality control inspector
at the company selects a sample of a few such
packages, calculates the variance of the net
weights of these packages, and makes a test of
hypothesis about the population variance.

80
Example 11-10

She always uses a .01. The acceptable value of
the population variance is .015 square ounce or
less. If the conclusion from the test of
hypothesis is that the population variance is not
within the acceptable limit, the machine is
stopped and adjusted.

81
Example 11-10

A recently taken random sample of 25 packages
from the production line gave a sample variance
of .029 square ounce. Based on this sample
information, do you think the machine needs an
adjustment? Assume that the net weights of
cookies in all packages are normally distributed.

82
Solution 11-10

H0 s2 .015
The population variance is within the acceptable
limit
H1 s2 gt.015
The population variance exceeds the acceptable
limit

83
Solution 11-10

a .01
df n 1 25 1 24
The critical value of ?2 42.980

84
Figure 11.10
Reject H0
Do not reject H0
a .01
42.980
?2
Critical value of ?2
85
Solution 11-10
From H0
86
Solution 11-10

The value of the test statistic ?2 46.400
It is greater than the critical value of ? 2
It falls in the rejection region
Hence, we reject the null hypothesis H0
We conclude that the population variance is not
within the acceptable limit
The machine should be stopped and adjusted

87
Example 11-11

The variance of scores on a standardized
mathematics test for all high school seniors was
150 in 2002. A sample of scores for 20 high
school seniors who took this test this year gave
a variance of 170. Test at the 5 significance
level if the variance of current scores of all
high school seniors on this test is different
from 150. Assume that the scores of all high
school seniors on this test are (approximately)
normally distributed.

88
Solution 11-11

H0 s2 150
The population variance is not different from 150
H1 s2 ? 150
The population variance is different from 150

89
Solution 11-11

a .05
Area in the each tail .025
df n 1 20 1 19
The critical values of ?2 32.852 and 8.907

90
Figure 11.11
Reject H0
Do not reject H0
Reject H0
a /2 .025
a /2 .025
8.907
32.852
Two critical values of ?2
91
Solution 11-11
From H0
92
Solution 11-11