CHI-SQUARE TESTS - PowerPoint PPT Presentation

About This Presentation
Title:

CHI-SQUARE TESTS

Description:

The shape of a chi-squared distribution curve is skewed to the right for small ... Women (W) No Opinions (N) Against (A) In Favor (F) 40. Table 11.6. Solution ... – PowerPoint PPT presentation

Number of Views:186
Avg rating:3.0/5.0
Slides: 93
Provided by: itd976
Category:
Tags: chi | square | tests

less

Transcript and Presenter's Notes

Title: CHI-SQUARE TESTS


1
CHAPTER 11
  • CHI-SQUARE TESTS

2
THE CHI-SQUARE DISTRIBUTION
  • Definition
  • The chi-square distribution has only one
    parameter called the degrees of freedom. The
    shape of a chi-squared distribution curve is
    skewed to the right for small df and becomes
    symmetric for large df. The entire chi-square
    distribution curve lies to the right of the
    vertical axis. The chi-square distribution
    assumes nonnegative values only, and these are
    denoted by the symbol ?2 (read as chi-square).

3
Figure 11.1 Three chi-square distribution curves.
4
Example 11-1
  • Find the value of ?² for 7 degrees of freedom and
    an area of .10 in the right tail of the
    chi-square distribution curve.

5
Table 11.1 ?2 for df 7 and .10 Area in the
Right Tail
Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve
df .995 .100 .005
1 2 . 7 . 100 .000 .010 .989 67.328 2.706 4.605 12.017 118.498 7.879 10.597 20.278 140.169
Required value of ?²
6
Figure 11.2
df 7
.10
12.017
0

7
Example 11-2
  • Find the value of ?² for 12 degrees of freedom
    and area of .05 in the left tail of the
    chi-square distribution curve.

8
Solution 11-2
  • Area in the right tail
    1 Area in the left tail
    1 .05 .95

9
Table 11.2 ?2 for df 12 and .95 Area in the
Right Tail
Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve Area in the Right Tail Under the Chi-Square Distribution Curve
df .995 .950 .005
1 2 . 12 . 100 .000 .010 3.074 67.328 .004 .103 5.226 77.929 7.879 10.597 28.300 140.169
Required value of ?²
10
Figure 11.3
df 12
Shaded area .95
.05
5.226
0

11
A GOODNESS-OF-FIT TEST
  • Definition
  • An experiment with the following characteristics
    is called a multinomial experiment.

12
Multinomial Experiment cont.
  1. It consists of n identical trials (repetitions).
  2. Each trial results in one of k possible outcomes
    (or categories), where k gt 2.
  3. The trials are independent.
  4. The probabilities of the various outcomes remain
    constant for each trial.

13
A GOODNESS-OF-FIT TEST cont.
  • Definition
  • The frequencies obtained from the performance of
    an experiment are called the observed frequencies
    and are denoted by O. The expected frequencies,
    denoted by E, are the frequencies that we expect
    to obtain if the null hypothesis is true. The
    expected frequency for a category is obtained as
  • E np
  • Where n is the sample size and p is the
    probability that an element belongs to that
    category if the null hypothesis is true.

14
A GOODNESS-OF-FIT TEST cont.
  • Degrees of Freedom for a Goodness-of-Fit Test
  • In a goodness-of-fit test, the degrees of freedom
    are
  • df k 1
  • where k denotes the number of possible outcomes
    (or categories) for the experiment.

15
Test Statistic for a Goodness-of-Fit Test
  • The test statistic for a goodness-of-fit test is
    ?2 and its value is calculated as
  • where
  • O observed frequency for a category
  • E expected frequency for a category np
  • Remember that a chi-square goodness-of-fit test
    is always right-tailed.

16
Example 11-3
  • A bank has an ATM installed inside the bank, and
    it is available to its customers only from 7 AM
    to 6 PM Monday through Friday. The manager of the
    bank wanted to investigate if the percentage of
    transactions made on this ATM is the same for
    each of the five days (Monday through Friday) of
    the week. She randomly selected one week and
    counted the number of transactions made on this
    ATM on each of the five days during this week.
    The information she obtained is given in the
    following table, where the number of users
    represents the number of transactions on this ATM
    on these days. For convenience, we will refer to
    these transactions as people or users.

17
Example 11-3
  • At the 1 level of significance, can we reject
    the null hypothesis that the proportion of people
    who use this ATM each of the five days of the
    week is the same? Assume that this week is
    typical of all weeks in regard to the use of this
    ATM.

Day Monday Tuesday Wednesday Thursday Friday
Number of users 253 197 204 179 267
18
Solution 11-3
  • H0 p1 p2 p3 p4 p5 .20
  • H1 At least two of the five proportions are
    not equal to .20

19
Solution 11.3
  • There are five categories
  • Five days on which the ATM is used
  • Multinomial experiment
  • We use the chi-square distribution to make this
    test.

20
Solution 11-3
  • Area in the right tail a .01
  • k number of categories 5
  • df k 1 5 1 4
  • The critical value of ?2 13.277

21
Figure 11.4
Reject H0
Do not reject H0
a .01
13.277
?2
Critical value of ?2
22
Table 11.3
Category (Day) Observed Frequency O p Expected Frequency E np (O E) (O E)2
Monday Tuesday Wednesday Thursday Friday 253 197 204 279 267 .20 .20 .20 .20 .20 1200(.20) 240 1200(.20) 240 1200(.20) 240 1200(.20) 240 1200(.20) 240 13 -43 -36 39 27 169 1849 1296 1521 729 .704 7.704 5.400 6.338 3.038
n 1200 Sum 23.184
23
Solution 11-3
  • All the required calculations to find the value
    of the test statistic ?2 are shown in Table 11.3.

24
Solution 11.3
  • The value of the test statistic ?2 23.184 is
    larger than the critical value of ?2 13.277
  • It falls in the rejection region
  • Hence, we reject the null hypothesis

25
Example 11-4
  • In a National Public Transportation survey
    conducted in 1995 on the modes of transportation
    used to commute to work, 79.6 of the respondents
    said that they drive alone, 11.1 car pool, 5.1
    use public transit, and 4.2 depend on other
    modes of transportation (USA TODAY, April 14,
    1999). Assume that these percentages hold true
    for the 1995 population of all commuting workers.
    Recently 1000 randomly selected workers were
    asked what mode of transportation they use to
    commute to work. The following table lists the
    results of this survey.

26
Example 11-4
Mode of transportation Drive alone Carpool Public transit Other
Number of workers 812 102 57 29

Test at the 2.5 significance level whether the
current pattern of use of transportation modes is
different from that for 1995.
27
Solution 11-4
  • H0 The current percentage distribution of
    the use of transportation modes is the same
    as that for 1995.
  • H1 The current percentage distribution of
    the use of transportation modes is different
    from that for 1995.

28
Solution 11-4
  • There are four categories
  • Drive alone, carpool, public transit, and other
  • Multinomial experiment
  • We use the chi-square distribution to make the
    test.

29
Solution 11-4
  • Area in the right tail a .025
  • k number of categories 4
  • df k 1 4 1 3
  • The critical value of ?2 9.348

30
Figure 11.5
Reject H0
Do not reject H0
a .025
9.348
?2
Critical value of ?2
31
Table 11.4
Category Observed Frequency O p Expected Frequency E np (O E) (O E)2
Drive alone Car pool Public transit Other 812 102 57 29 .796 .111 .051 .042 1000(.796) 796 1000(.111) 111 1000(.051) 51 1000(.042) 42 16 -9 6 -13 256 81 36 169 .322 .730 .706 4.024
n 1000 Sum 5.782
32
Solution 11-4
  • All the required calculations to find the value
    of the test statistic ?2 are shown in Table 11.4.

33
Solution 11-4
  • The value of the test statistic ?2 5.782 is
    less than the critical value of ?2 9.348
  • It falls in the nonrejection region
  • Hence, we fail to reject the null hypothesis.

34
CONTINGENCY TABLES
Table 11.5 Total 2002 Enrollment at a University
Full-Time Part-Time
Male 6768 2615
Female 7658 3717
Students who are male and enrolled part-time
35
A TEST OF INDEPENDENCE OR HOMOGENEITY
  • A Test of Independence
  • A Test of Homogeneity

36
A Test of Independence
  • Definition
  • A test of independence involves a test of the
    null hypothesis that two attributes of a
    population are not related. The degrees of
    freedom for a test of independence are
  • df (R 1)(C 1)
  • Where R and C are the number of rows and the
    number of columns, respectively, in the given
    contingency table.

37
A Test of Independence cont.
  • Test Statistic for a Test of Independence
  • The value of the test statistic ?2 for a test of
    independence is calculated as
  • where O and E are the observed and expected
    frequencies, respectively, for a cell.

38
Example 11-5
  • Violence and lack of discipline have become major
    problems in schools in the United States. A
    random sample of 300 adults was selected, and
    they were asked if they favor giving more freedom
    to schoolteachers to punish students for violence
    and lack of discipline. The two-way
    classification of the responses of these adults
    is represented in the following table.

39
Example 11-5
  • Calculate the expected frequencies for this table
    assuming that the two attributes, gender and
    opinions on the issue, are independent.

In Favor (F) Against (A) No Opinions (N)
Men (M) Women (W) 93 87 70 32 12 6
40
Table 11.6
  • Solution 11-5

In Favor (F) Against (A) No Opinion (N) Row Totals
Men (M) 93 70 12 175
Women (W) 87 32 6 125
Column Totals 180 102 18 300
41
Expected Frequencies for a Test of Independence
  • The expected frequency E for a cell is calculated
    as

42
Table 11.7
  • Solution 11-5

In Favor (F) Against (A) No Opinion (O) Row Totals
Men (M) 93 (105.00) 70 (59.50) 12 (10.50) 175
Women (W) 87 (75.00) 32 (42.50) 6 (7.50) 125
Column Totals 180 102 18 300
43
Example 11-6
  • Reconsider the two-way classification table given
    in Example 11-5. In that example, a random sample
    of 300 adults was selected, and they were asked
    if they favor giving more freedom to
    schoolteachers to punish students for violence
    and lack of discipline. Based on the results of
    the survey, a two-way classification table was
    prepared and presented in Example 11-5. Does the
    sample provide sufficient information to conclude
    that the two attributes, gender and opinions of
    adults, are dependent? Use a 1 significance
    level.

44
Solution 11-6
  • H0 Gender and opinions of adults are
    independent
  • H1 Gender and opinions of adults are
    dependent

45
Solution 11-6
  • a .01
  • df (R 1)(C 1) (2 1)(3 1) 2
  • The critical value of ?2 9.210

46
Figure 11.6
Reject H0
Do not reject H0
a .01
9.210
?2
Critical value of ?2
47
Table 11.8
In Favor (F) Against (A) No Opinion (N) Row Totals
Men (M) 93 (105.00) 70 (59.50) 12 (10.50) 175
Women (W) 87 (75.00) 32 (42.50) 6 (7.50) 125
Column Totals 180 102 18 300
48
Solution 11-6
49
Solution 11-6
  • The value of the test statistic ?2 8.252
  • It is less than the critical value of ?2
  • It falls in the nonrejection region
  • Hence, we fail to reject the null hypothesis

50
Example 11-7
  • A researcher wanted to study the relationship
    between gender and owning cell phones. She took a
    sample of 2000 adults and obtained the
    information given in the following table.

51
Example 11-7
  • At the 5 level of significance, can you conclude
    that gender and owning cell phones are related
    for all adults?

Own Cell Phones Do Not Own Cell Phones
Men Women 640 440 450 470
52
Solution 11-7
  • H0 Gender and owning a cell phone are not
    related
  • H1 Gender and owning a cell phone are related

53
Solution 11-7
  • We are performing a test of independence
  • We use the chi-square distribution
  • a .05.
  • df (R 1)(C 1) (2 1)(2 1) 1
  • The critical value of ?2 3.841

54
Figure 11.7
Reject H0
Do not reject H0
a .05
3.841
?2
Critical value of ?2
55
Table 11.9
Own Cell Phones (Y) Do Not Own Cell Phones (N) Row Totals
Men (M) 640 (588.60) 450 (501.40) 1090
Women (W) 440 (491.40) 470 (418.60) 910
Column Totals 1080 920 2000
56
Solution 11-7
57
Solution 11-7
  • The value of the test statistic ?2 21.445
  • It is larger than the critical value of ?2
  • It falls in the rejection region
  • Hence, we reject the null hypothesis

58
A Test of Homogeneity
  • Definition
  • A test of homogeneity involves testing the null
    hypothesis that the proportions of elements with
    certain characteristics in two or more different
    populations are the same against the alternative
    hypothesis that these proportions are not the
    same.

59
Example 11-8
  • Consider the data on income distributions for
    households in California and Wisconsin given in
    following table

California Wisconsin Row Totals
High Income 70 34 104
Medium Income 80 40 120
Low Income 100 76 176
Column Totals 250 150 400
60
Example 11-8
  • Using the 2.5 significance level, test the null
    hypothesis that the distribution of households
    with regard to income levels is similar
    (homogeneous) for the two states.

61
Solution 11-8
  • H0 The proportions of households that belong
    to different income groups are the same in
    both states
  • H1 The proportions of households that belong
    to different income groups are not the same
    in both states

62
Solution 11-8
  • a .025
  • df (R 1)(C 1) (3 1)(2 1) 2
  • The critical value of ?2 7.378

63
Figure 11.7
Reject H0
Do not reject H0
a .025
7.378
?2
Critical value of ?2
64
Table 11.11
California Wisconsin Row Totals
High income 70 (65) 34 (39) 104
Medium income 80 (75) 40 (45) 120
Low income 100 (110) 76 (66) 176
Column Totals 250 150 400
65
Solution 11-8
66
Solution 11-8
  • The value of the test statistic ?2 4.339
  • It is less than the critical value of ?2
  • It falls in the nonrejection region
  • Hence, we fail to reject the null hypothesis

67
INFERENCES ABOUT THE POPULATION VARIANCE
  • Estimation of the Population Variance
  • Hypothesis Tests About the Population Variance

68
INFERENCES ABOUT THE POPULATION VARIANCE cont.
  • Sampling Distribution of (n 1)s2 / s2
  • If the population from which the sample is
    selected is (approximately) normally distributed,
    then
  • has a chi-square distribution with n 1 degrees
    of freedom.

69
Estimation of the Population Variance
  • Assuming that the population from which the
    sample is selected is (approximately) normally
    distributed, the (1 a)100 confidence interval
    for the population variance s2 is

70
Example 11-9
  • One type of cookie manufactured by Haddad Food
    Company is Cocoa Cookies. The machine that fills
    packages of these cookies is set up in such a way
    that the average net weight of these packages is
    32 ounces with a variance of .015 square ounce.

71
Example 11-9
  • From time to time the quality control inspector
    at the company selects a sample of a few such
    packages, calculates the variance of the net
    weights of these packages, and construct a 95
    confidence interval for the population variance.
    If either both or one of the two limits of this
    confidence interval is not the interval .008 to
    .030, the machine is stopped and adjusted.

72
Example 11-9
  • A recently taken random sample of 25 packages
    from the production line gave a sample variance
    of .029 square ounce. Based on this sample
    information, do you think the machine needs an
    adjustment? Assume that the net weights of
    cookies in all packages are normally distributed.

73
Solution 11-9
  • n 25 s2 .029
  • a 1 - .95 .05
  • a / 2 .05 / 2 .025
  • 1 a / 2 1 .025 .975
  • df n 1 25 1 24
  • ?2 for 24 df and .025 area in the right tail
    39.364
  • ?2 for 24 df and .975 area in the right tail
    12.401

74
Figure 11.9
df 24
.025
39.364
?2
Value of
75
Figure 11.9
df 24
.025
?2
12.401
Value of
76
Solution 11-9
77
Solution 11-9
  • Thus, with 95 confidence, we can state that the
    variance for all packages of Cocoa Cookies lies
    between .0177 and .0561 square ounce.

78
Hypothesis Tests About the Population Variance
  • The value of the test statistic ?2 is calculated
    as
  • where s2 is the sample variance, s2 is the
    hypothesized value of the population variance,
    and n 1 represents the degrees of freedom. The
    population from which the sample is selected is
    assumed to be (approximately) normally
    distributed.

79
Example 11-10
  • One type of cookie manufactured by Haddad Food
    Company is Cocoa Cookies. The machine that fills
    packages of these cookies is set up in such a way
    that the average net weight of these packages is
    32 ounces with a variance of .015 square ounce.
    From time to time the quality control inspector
    at the company selects a sample of a few such
    packages, calculates the variance of the net
    weights of these packages, and makes a test of
    hypothesis about the population variance.

80
Example 11-10
  • She always uses a .01. The acceptable value of
    the population variance is .015 square ounce or
    less. If the conclusion from the test of
    hypothesis is that the population variance is not
    within the acceptable limit, the machine is
    stopped and adjusted.

81
Example 11-10
  • A recently taken random sample of 25 packages
    from the production line gave a sample variance
    of .029 square ounce. Based on this sample
    information, do you think the machine needs an
    adjustment? Assume that the net weights of
    cookies in all packages are normally distributed.

82
Solution 11-10
  • H0 s2 .015
  • The population variance is within the acceptable
    limit
  • H1 s2 gt.015
  • The population variance exceeds the acceptable
    limit

83
Solution 11-10
  • a .01
  • df n 1 25 1 24
  • The critical value of ?2 42.980

84
Figure 11.10
Reject H0
Do not reject H0
a .01
42.980
?2
Critical value of ?2
85
Solution 11-10
From H0
86
Solution 11-10
  • The value of the test statistic ?2 46.400
  • It is greater than the critical value of ? 2
  • It falls in the rejection region
  • Hence, we reject the null hypothesis H0
  • We conclude that the population variance is not
    within the acceptable limit
  • The machine should be stopped and adjusted

87
Example 11-11
  • The variance of scores on a standardized
    mathematics test for all high school seniors was
    150 in 2002. A sample of scores for 20 high
    school seniors who took this test this year gave
    a variance of 170. Test at the 5 significance
    level if the variance of current scores of all
    high school seniors on this test is different
    from 150. Assume that the scores of all high
    school seniors on this test are (approximately)
    normally distributed.

88
Solution 11-11
  • H0 s2 150
  • The population variance is not different from 150
  • H1 s2 ? 150
  • The population variance is different from 150

89
Solution 11-11
  • a .05
  • Area in the each tail .025
  • df n 1 20 1 19
  • The critical values of ?2 32.852 and 8.907

90
Figure 11.11
Reject H0
Do not reject H0
Reject H0
a /2 .025
a /2 .025
8.907
32.852
Two critical values of ?2
91
Solution 11-11
From H0
92
Solution 11-11
  • The value of the test statistic ?2 21.533
  • It is between the two critical values of ?2
  • It falls in the nonrejection region
  • Consequently, we fail to reject H0.
Write a Comment
User Comments (0)
About PowerShow.com