Title: Goodness of Fit Test
1Goodness of Fit Test
We have seen hypothesis tests for one proportion,
and for two proportions, we will now look at
tests for more than two proportions. Consider a
survey asking respondents if they will use a
credit card to pay their taxes. The categorical
variable is their response to the questions.
One Way Frequency Table
There are four proportions here
2Goodness of Fit Test
Our hypothesis of interest is that there is a
uniform distribution of the responses, the
relative frequency or proportion is .25 for each
category.
There were 100 observations. If the observed
data fit the proportions then we would have
expected each category in the sample to have a
count of .25100 25.
Outcome Observed
Expected (If Ho Is True)Definitely will
14 25 Probably will 12 25 Probably not
24 25 Definitely not 50 25 100 100
Is the difference Observed Expected due to
sampling variation and H0 is true, or is the
difference because the true proportions are not
uniform?
3Births and the Lunar Cycle 1
Total
699 222,784
The question we address is Are there
disproportionably more births on certain
cycles?For the categorical variable, Lunar
Phase, we have 8 categories over a two year 699
day period. We have 8 proportions, one for each
category p1p8. H0 The observed data fits
the proportions p1p8- HA The observed data
does not fit the proportions.
4Births and the Lunar Cycle 2
If H0 is true, then the true proportions for each
of the cycles are
Next we need to calculate the expected births for
each lunar cycle if H0 is true. The expected
count for each cycle is the H0 proportion times
total births of 222,784. For the first cycle, the
expected count isp1n .0343222,7847641.49.F
or the second cycle, the expected count
isp2n.2175222,78448,455.52 For the eight
cycle, the expected count isp8n.2175222,78448
455.52.
5Births and the Lunar Cycle 3
Total 222,784
222,784
Next, we need a measure of how far on average
are the observed numbers from the expected
numbers.
6Births and the Lunar Cycle 4
For example, the calculation of Contribution for
the First quarter is (observed
expected)2/expected (7579-7641.5)2/7641.5
.51105. The number of degrees of freedom df for
the Chi-Square Statistic is cells or
proportions 1.
7Births and the Lunar Cycle 5
Next, we need a sampling distribution for the
Chi-Square statistic so we can measure a
p-value. The sampling distribution will be a
distribution of the Chi-Square statistic for all
possible samples of size n for a particular
number of degrees of freedom.
8Births and the Lunar Cycle 6
- Conditions Chi-Square Sampling Distribution
- Observed Cell Counts are based on Random Sample.
- The observed cell counts are independent.
- The sample size is large. Each expected cell
count is at least 5. - Solving for the P-value using TI-83 Add-in
Program - Enter the Observed Data in L1. Calculate the
Expected List and enter it in L2. (From slide 6) - PRGM 2 GOODFITOBSERVED LST 2ND L1EXPECTED
LST 2ND L2P-VALUE .4764 CH SQ STAT 6.557
df 7 - The P-value means that the probability of getting
a Chi-Square statistic as high or higher than
6.557 is .4764 or 47.64. Because the P-value is
greater than 5 we fail to reject H0, and fail to
prove that there are disproportionate numbers of
births in any moon cycle.
9Problems
10Problems
11Test for HomogeneityConcussions Problem
The above table is a frequency table that gives
the observed counts of the variable concussions
for three different groups of individuals. The
question we want to test is whether the
concussion distribution is the same or
homogeneous for each of the three groups. H0 The
category proportions are the same for each of the
three groups. HA The category proportion are not
the same and the distributions are not
homogeneous. We will use Chi-Square methods like
we did for Goodness of Fit tests.
12Concussions Problem 2
The numbers in parentheses are the expected
counts if the null hypothesis is true. For
example, we assume the proportions for 0
concussions are the same for each group and the
proportion is overall average for the three
groups. For 0 concussion, the overall average is
158/240 .6583. The expected count for soccer
players is .6583 x 91 59.9, for Non-soccer
Athletes .6583 x 96 63.2, and for Non-Athletes
.6583 x 53 34.9. The conditions for a
chi-square test for homogeneity is that the
counts be from a random sample, and that each
expected cell is a least 5. The above data does
not meet the test, so we combine the last two
columns in a new table.
13Concussions Problem 3
37
We recalculate the expected counts and the new
table meets the conditions of each expected cell
count of at least 5. Then for each cell, we
calculate the chi-square contribution. For
example, for Soccer Players with 0 concussions
(observed expected)2/expected
(45-59.9)2/59.9 3.70623. We add up the
contributions for the 9 cells. The number of
degrees of freedom is (rows-1)x(columns -1)
(3-1)x(3-1) 4 df.
14Concussions Problem 4
37
Using the TI-83 Black Box Program for a
Chi-Square Test for Homogeneity.Enter the table
of observed counts in a Matrix. Do not enter row
or column totals.2nd MATRIX EDIT -1A
ENTER3 ENTER 3 ENTER (Enters matrix size rows x
columns)45 ENTER 25 ENTER3 ENTER (Enter all the
data)STAT TESTS C?2-Test ENTEROBSERVED
AEXPECTED BCALCULATE ENTER ?2 20.6036
P-VALUE 3.7942E-42ND MATRIX EDITB ENTER
(Displays B, the expected counts. Check to make
sure all are at least 5. Since the p-value is
less than 5 we reject the H0 and conclude that
the distributions are not homogeneous.
15Homogeneity Problems
16Homogeneity Problems
17Tests for Independence
In tests for independence we have one sample and
two variables. Suppose we have a sample of 100
people and measure the following two categorical
variablesgender male or femaleestrogen yes
or no. We can define that the two variables are
independent if, given the category for one
variable, the probability of the category for the
other variable is not changed. If there are 50
men and 50 women in the sample, the probability
of picking a estrogen yes is 50. However, if I
know that the gender is female, the probability
of estrogen yes is 100. Cleary these two
variables are not independent. There is a
relationship between the two variables. We say
they are associated. Consider the two categorical
variablesgender male or femaleeye color
brown, blue, green, hazel, other. Suppose the
probability of brown is 23. If I know the
gender, the probability of brown is not changed.
We would say the variables gender and eye color
and independent. There is not a relationship
between the two variables. They are not
associated.
18Stroke Mortality and Education Problem
Test for independence of the two variables H0
The variables are independent (not related)HA
The variables are not independent (related) An
alternate definition for independence Two
categorical variables are independent if their
category distributions are homogeneous. This
means that the procedures for testing for
independence are identical to procedures for
tests of homogeneity as previously covered. On
the TI-83, enter the table into a Matrix, 2nd
Matrix Edit. Then do Chi Square Test, STAT
TESTS - C?2 Test described in detail on slide
14.We get a p-value of .016. Using a
significance level of a.01, we cannot reject
H0, and therefore cannot prove the variables are
not independent. We cannot prove they are
related.
19Independence Problems
20Independence Problems