Title: Chapter 12 Tests of Goodness of Fit and Independence
1Chapter 12 Tests of Goodness of Fit and
Independence
- Goodness of Fit Test A Multinomial Population
Goodness of Fit Test Poisson and Normal
Distributions
2Hypothesis (Goodness of Fit) Testfor Proportions
of a Multinomial Population
1. Set up the null and alternative hypotheses.
2. Select a random sample and record the
observed frequency, fi , for each of the k
categories.
3. Assuming H0 is true, compute the expected
frequency, ei , in each category by
multiplying the category probability by the
sample size.
3Hypothesis (Goodness of Fit) Testfor Proportions
of a Multinomial Population
4. Compute the value of the test statistic.
where
fi observed frequency for category i
ei expected frequency for category i
k number of categories
Note The test statistic has a chi-square
distribution with k 1 df provided that the
expected frequencies are 5 or more for all
categories.
4Hypothesis (Goodness of Fit) Testfor Proportions
of a Multinomial Population
5. Rejection rule
p-value approach
Reject H0 if p-value
Critical value approach
where ? is the significance level and there are
k - 1 degrees of freedom
5Multinomial Distribution Goodness of Fit Test
- Example Finger Lakes Homes (A)
Finger Lakes Homes manufactures four
models of prefabricated homes, a two-story
colonial, a log cabin, a split-level, and an
A-frame. To help in production planning,
management would like to determine if
previous customer purchases indicate that
there is a preference in the style selected.
6Multinomial Distribution Goodness of Fit Test
- Example Finger Lakes Homes (A)
The number of homes sold of each model for
100 sales over the past two years is shown below.
Split-
A- Model Colonial Log Level Frame
Sold 30 20 35 15
7Multinomial Distribution Goodness of Fit Test
Hypotheses
H0 pC pL pS pA .25
Ha The population proportions are not
pC .25, pL .25, pS .25, and pA .25
where pC population proportion that
purchase a colonial pL population
proportion that purchase a log cabin pS
population proportion that purchase a
split-level pA population proportion that
purchase an A-frame
8Multinomial Distribution Goodness of Fit Test
Rejection Rule
Reject H0 if p-value 7.815.
With ? .05 and k - 1 4 - 1 3
degrees of freedom
Do Not Reject H0
Reject H0
?2
7.815
9Multinomial Distribution Goodness of Fit Test
- Expected Frequencies
-
-
- Test Statistic
- e1 .25(100) 25 e2 .25(100) 25
- e3 .25(100) 25 e4 .25(100) 25
1 1 4 4 10
10Multinomial Distribution Goodness of Fit Test
- Conclusion Using the p-Value Approach
Area in Upper Tail .10 .05 .025
.01 .005
c2 Value (df 3) 6.251 7.815 9.348
11.345 12.838
Because c2 10 is between 9.348 and 11.345,
the area in the upper tail of the distribution
is between .025 and .01.
The p-value hypothesis.
11Multinomial Distribution Goodness of Fit Test
- Conclusion Using the Critical Value Approach
c2 10 7.815
We reject, at the .05 level of
significance, the assumption that there is no
home style preference.
12Test of Independence Contingency Tables
1. Set up the null and alternative hypotheses.
2. Select a random sample and record the
observed frequency, fij , for each cell of
the contingency table.
3. Compute the expected frequency, eij , for
each cell.
13Test of Independence Contingency Tables
4. Compute the test statistic.
5. Determine the rejection rule.
where ? is the significance level and, with n
rows and m columns, there are (n - 1)(m - 1)
degrees of freedom.
14Contingency Table (Independence) Test
- Example Finger Lakes Homes (B)
Each home sold by Finger Lakes Homes can be
classified according to price and to style.
Finger Lakes manager would like to determine
if the price of the home and the style of the
home are independent variables.
15Contingency Table (Independence) Test
- Example Finger Lakes Homes (B)
The number of homes sold for each model and
price for the past two years is shown below. For
convenience, the price of the home is listed as
either 99,000 or less or more than 99,000.
Price Colonial Log Split-Level
A-Frame
12
99,000 12 14 16
3
16Contingency Table (Independence) Test
H0 Price of the home is independent of the
style of the home that is purchased
Ha Price of the home is not independent of the
style of the home that is purchased
17Contingency Table (Independence) Test
Expected Frequencies
Price Colonial Log Split-Level
A-Frame Total 99K Total
18 6 19
12 55
12 14 16
3 45
30 20 35
15 100
18Contingency Table (Independence) Test
Reject H0 if p-value 7.815
.1364 2.2727 . . . 2.0833 9.149
19Contingency Table (Independence) Test
- Conclusion Using the p-Value Approach
Area in Upper Tail .10 .05 .025
.01 .005
c2 Value (df 3) 6.251 7.815 9.348
11.345 12.838
Because c2 9.145 is between 7.815 and 9.348,
the area in the upper tail of the distribution
is between .05 and .025.
The p-value hypothesis.
20Contingency Table (Independence) Test
Conclusion Using the Critical Value Approach
c2 9.145 7.815
We reject, at the .05 level of
significance, the assumption that the price of
the home is independent of the style of home that
is purchased.
21Goodness of Fit Test Poisson Distribution
- 1. Set up the null and alternative hypotheses.
- H0 Population has a Poisson probability
distribution - Ha Population does not have a Poisson
distribution
2. Select a random sample and a. Record
the observed frequency fi for each value of
the Poisson random variable. b.
Compute the mean number of occurrences ?.
3. Compute the expected frequency of
occurrences ei for each value of the
Poisson random variable.
22Goodness of Fit Test Poisson Distribution
- 4. Compute the value of the test statistic.
where
fi observed frequency for category i
ei expected frequency for category i
k number of categories
23Goodness of Fit Test Poisson Distribution
5. Rejection rule
Reject H0 if p-value
p-value approach
Critical value approach
where ? is the significance level and there
are k - 2 degrees of freedom
24Goodness of Fit Test Poisson Distribution
- Example Troy Parking Garage
In studying the need for an additional
entrance to a city parking garage, a consultant
has recommended an analysis approach that is
applicable only in situations where the number
of cars entering during a specified time period
follows a Poisson distribution.
25Goodness of Fit Test Poisson Distribution
- Example Troy Parking Garage
- A random sample of 100 one-minute time
intervals resulted in the customer arrivals
listed below. A statistical test must be
conducted to see if the assumption of a Poisson
distribution is reasonable.
Arrivals 0 1 2 3 4 5 6 7
8 9 10 11 12
Frequency 0 1 4 10 14 20 12 12 9
8 6 3 1
26Goodness of Fit Test Poisson Distribution
H0 Number of cars entering the garage during
a one-minute interval is Poisson distributed
Ha Number of cars entering the garage during a
one-minute interval is not Poisson
distributed
27Goodness of Fit Test Poisson Distribution
- Estimate of Poisson Probability Function
?otal Arrivals 0(0) 1(1) 2(4) . . .
12(1) 600
Estimate of ? 600/100 6
Total Time Periods 100
Hence,
28Goodness of Fit Test Poisson Distribution
x f (x ) nf (x )
x f (x ) nf (x )
0 1 2 3 4 5 6
7 8 9 10 11 12 Total
.1377 .1033 .0688 .0413 .0225
.0201 1.0000
.0025 .0149 .0446 .0892 .1339 .1606 .1606
.25 1.49 4.46 8.92 13.39 16.06 16.06
13.77 10.33 6.88 4.13 2.25
2.01 100.00
29Goodness of Fit Test Poisson Distribution
- Observed and Expected Frequencies
i fi
ei fi - ei
-1.20 1.08 0.61 3.94 -4.06 -1.77 -1.33 1.12
1.61
5 10 14 20 12 12 9 8 10
6.20 8.92 13.39 16.06 16.06 13.77 10.33
6.88 8.39
0 or 1 or 2 3 4 5
6 7 8 9 10 or more
30Goodness of Fit Test Poisson Distribution
Reject H0 if p-value 14.067.
31Goodness of Fit Test Poisson Distribution
- Conclusion Using the p-Value Approach
Area in Upper Tail .90 .10 .05
.025 .01
c2 Value (df 7) 2.833 12.017 14.067
16.013 18.475
Because c2 3.268 is between 2.833 and
12.017 in the Chi-Square Distribution Table, the
area in the upper tail of the distribution is
between .90 and .10.
The p-value a . We cannot reject the null
hypothesis. There is no reason to doubt the
assumption of a Poisson distribution.
32Goodness of Fit Test Normal Distribution
- 1. Set up the null and alternative hypotheses.
2. Select a random sample and a. Compute the
mean and standard deviation. b. Define
intervals of values so that the expected
frequency is at least 5 for each interval. c.
For each interval record the observed frequencies
3. Compute the expected frequency, ei , for
each interval.
33Goodness of Fit Test Normal Distribution
- 4. Compute the value of the test statistic.
34Normal Distribution Goodness of Fit Test
IQ Computers (one better than HP?) manufactures
and sells a general purpose microcomputer. As
part of a study to evaluate sales personnel,
management wants to determine, at a .05
significance level, if the annual sales volume
(number of units sold by a salesperson) follows a
normal probability distribution.
35Normal Distribution Goodness of Fit Test
- A simple random sample of 30 of
- the salespeople was taken and their
- numbers of units sold are below.
33 43 44 45 52 52 56 58
63 64 64 65 66 68 70 72 73
73 74 75 83 84 85 86 91 92
94 98 102 105
(mean 71, standard deviation 18.54)
36Normal Distribution Goodness of Fit Test
H0 The population of number of units sold has
a normal distribution with mean 71 and standard
deviation 18.54.
Ha The population of number of units sold
does not have a normal distribution with mean
71 and standard deviation 18.54.
37Normal Distribution Goodness of Fit Test
To satisfy the requirement of an
expected frequency of at least 5 in each interval
we will divide the normal distribution into 30/5
6 equal probability intervals.
38Normal Distribution Goodness of Fit Test
Areas 1.00/6 .1667
71
53.02
88.98 71 .97(18.54)
71 - .43(18.54) 63.03
78.97
39Normal Distribution Goodness of Fit Test
- Observed and Expected Frequencies
- Observed and Expected Frequencies
i fi
ei fi - ei
1 -2 1 0 -1 1
5 5 5 5 5 5 30
6 3 6 5 4 6 30
Less than 53.02 53.02 to 63.03 63.03 to
71.00 71.00 to 78.97 78.97 to 88.98 More than
88.98
Less than 53.02 53.02 to 63.03 63.03 to
71.00 71.00 to 78.97 78.97 to 88.98 More than
88.98
Total
Total
40Normal Distribution Goodness of Fit Test
Reject H0 if p-value 7.815.
41Normal Distribution Goodness of Fit Test
- Conclusion Using the p-Value Approach
Area in Upper Tail .90 .10 .05
.025 .01
c2 Value (df 3) .584 6.251 7.815
9.348 11.345
Because c2 1.600 is between .584 and 6.251
in the Chi-Square Distribution Table, the area in
the upper tail of the distribution is between .90
and .10.
The p-value a . We cannot reject the null
hypothesis. There is little evidence to support
rejecting the assumption the population is
normally distributed with ? 71 and ? 18.54.