Title: Goodness of Fit Tests
1Goodness of Fit Tests
- QSCI 381 Lecture 40
- (Larson and Farber, Sect 10.1)
2Multinomial Experiments
- A
is a probability experiment consisting of a
fixed number of trials in which there are more
than two possible outcomes for each independent
trial. The probability for each outcome is fixed
and each outcome is classified into
- .
- Examples of multinomial experiments include
- You sample 100 animals from a population. The
categories could be age, length, maturity state. - You sample 1000 poppies in a field. The
categories could be colour. - You sample 20 animals and calculate the frequency
that each has a particular genetic haplotype.
multinomial experiment
categories
3Goodness-of-fit Tests
- A
is used to test whether an observed
frequency distribution fits an expected
distribution. - We need to specify a null and an alternative
hypothesis. Generally the null hypothesis is that
the observed frequency distribution (the data)
fits the expected distribution. The alternative
hypothesis is that this is not the case.
chi-square goodness-of-fit test
4Example-I
- We expect that a healthy marine mammal
population should consist of an equal number of
males and females, and that 60 of the population
should be mature. We sample 150 animals and
assess the fraction in each of four categories to
be -
Mature Female Mature Male Immature Female Immature Male
30 40 32 48
5Observed and Expected Frequencies
- The
of a category is the frequency for the category
observed in the data. - The
of a category is the calculated frequency for the
category. Expected frequencies are obtained by
assuming the specified (or hypothesized)
distribution is correct. The expected frequency
for the i th category is - Where n is the number of trials, and pi is the
assumed probability for the i th category.
observed frequency O
expected frequency E
6Observed and Expected Frequencies(Example)
Mature Female Mature Male Immature Female Immature Male
Observed frequency 30 40 32 48
Assumed probability 0.3 0.3 0.2 0.2
Expected frequency 45 (150 x 0.3) 45 (150 x 0.3) 30 (150 x 0.2) 30 (150 x 0.2)
7The Chi-square goodness-of-fit Test-I
- IF
- the observed frequencies are obtained from a
random sample, and - the expected frequencies are greater than or
equal to 5 (pool categories if this is not the
case). - then the sampling distribution for the
goodness-of-fit test is a chi-square distribution
with k-1 degrees of freedom where k is the number
of categories. The test statistic is
8The Chi-square goodness-of-fit Test-II
- Identify the claim and state the null and
alternative hypotheses. - Specify the level of significance, ?.
- Determine the degrees of freedom, d.fk-1.
- Find the critical value of the chi-square
distribution and hence define the rejection
region for the test. - Calculate the test statistic.
- Check whether or not the value of the test
statistic is in the rejection region.
9Example (Test using ?0.01)
- H0 the distribution of animals between sex and
maturity classes equals that expected for a
healthy population. - The degrees of freedomk-13.
- The critical value of the chi-square distribution
is 11.34 (CHIINV(0.01,3))
10Example (Test using ?0.01)
Mature Female Mature Male Immature Female Immature Male
Observed frequency 30 40 32 48
Expected frequency 45 45 30 30
5 0.56 0.13 10.80
- We reject the null hypothesis at the
- 1 level of significance.
11Example-A-1 (?0.05)
- The probability of a particular bird species
utilizing each of five habitats is known. We
collect data for a different species (n137) and
wish to assess whether the two species differ in
their habitat requirements.
Habitat type Habitat type Habitat type Habitat type Habitat type
1 2 3 4 5
Expected p 0.2 0.1 0.05 0.5 0.15
Observed 30 17 0 72 18
12Example-A-2 (?0.05)
Habitat type Habitat type Habitat type Habitat type Habitat type
1 2 3 4 5
Observed frequency 30 17 0 72 18
Expected frequency 27.4 13.7 6.85 68.5 20.55
0.25 0.79 6.85 0.18 0.32
The critical value is 9.49 we fail to reject
the null hypothesis
13Testing for Normality
- We can use the chi-square test in some cases to
assess whether a variable is normally
distributed. - The null and alternative hypotheses are that
- The variable has a normal distribution.
- The variable does not have a normal distribution.
14Example
Class boundaries Frequency
5-15 6
15-25 23
25-35 53
35-45 45
45-55 22
Can we assume that these data are normal (assume
?0.05)?
15Calculating the Test Statistic
Class boundaries Observed frequency O Cumulative normal Cumulative normal Expected p Expected Frequency E
Lower Upper Difference
5-15 6 0.0030 0.0368 0.0338 5.0 0.1822
15-25 23 0.0368 0.2037 0.1669 24.9 0.1407
25-35 53 0.2037 0.5526 0.3488 52.0 0.0202
35-45 45 0.5526 0.8627 0.3102 46.2 0.0318
45-55 22 0.8637 0.9800 0.1172 17.5 1.1746
? 149 0.977 145.57 1.5497
Eipi x 149
xi is the mid-point of each class