Title: Using Statistics To Make Inferences 8
1Using Statistics To Make Inferences 8
- Summary
-
- Contingency tables.
- Goodness of fit test.
2Goals
- To assess contingency tables for independence.
- To perform and interpret a goodness of fit test.
-
- Practical
- Construct and analyse contingency tables.
3Recall
- To compare a population and sample variance we
employed?
?2
Cc cc
4Today
- The distribution from last week is employed to
tell if observed data confirms to the pattern
expected under a given model.
5Categorical Data - Example
- Assessed intelligence of athletic and
non-athletic schoolboys.
K. Pearson Biometrika, 1906, 5, 105-146, data on
page 144.
6Procedure
- Formulate a null hypothesis. Typically the null
hypothesis is that there is no association
between the factors.
- Calculate expected frequencies for the cells in
the table on the assumption that the null
hypothesis is true.
- Calculate the chi-squared statistic. This is for
an r x c table with entries in row i and
column j.
7Procedure
- Compare the calculated statistic with tabulated
values of the chi-squared distribution with ?
degrees of freedom. - ? (rows - 1)(columns - 1) (r 1)(c 1)
8Example
- Assessed intelligence of athletic and non
athletic schoolboys. -
- Observed
9Probabilities
C C C C C C C C C C C C c
The probability a random boy is athletic is
The probability a random boy is bright is
Assuming independence, the probability a random
boy is both athletic and bright is
For 1708 respondents the expected number of
athletic bright boys is
10Expected
The expected number of athletic bright boys is
11Expected
The expected number of athletic stupid boys is
12Expected
The expected number of athletic stupid boys is
1148 530.98 617.02
13Expected
The expected number of lazy bright boys is
14Expected
The expected number of stupid lazy boys is
15Expected
The expected number of stupid lazy boys is
918 617.02 300.98
16Expected
17?2
Observed
Expected
18?2
As a general rule to employ this statistic. All
expected frequencies should exceed 5. If this is
not the case categories are pooled (merged) to
achieve this goal. See the Prussian data later.
19Conclusion
The result is significant (26.73 gt 3.84) at the
5 level. So we reject the hypothesis of
independence between athletic prowess and
intelligence.
20SPSS
Raw data
21SPSS
Data gt Weight Cases
22SPSS
Analyze gt Descriptive Statistics gt Crosstabs
23SPSS
24SPSS
25SPSS
Expected cell frequencies
26SPSS
Pearson Chi Square is the required statistic
27Aside
Two dials were compared. A subject was asked to
read each dial many times, and the experimenter
recorded his errors. Altogether 7 subjects were
tested. The data shows how many errors each
subject produced. Do the two conditions differ at
the 0.05 significance level (give the appropriate
p value)? Observed data 1 2 3 4 5 6 7 36 31 3
1 29 32 25 26 29 35 34 35 34 35 30
What key word describes this data?
28Aside
C C C C C C C C C c
- What tests are available for paired data?
One sample t test Sign test Wilcoxon Signed
Ranks Test
29Aside
Cc C C C C C C C C C C C c
- What tests are available for paired data? What
assumptions are made?
normality
One sample t test
Sign test
No assumption of normality
Wilcoxon Signed Ranks Test
Resembles the Sign-Test in scope, but it is much
more sensitive. In fact, for large numbers it is
almost as sensitive as the Student t-test
30Aside
C C C C C
- What tests are available for paired data?
One sample t test
Wilcoxon Signed Ranks Test
Sign test
Sign test answers the question How Often?,
whereas other tests answer the question How Much?
31Example
- The table is based on case-records of women
employees in Royal Ordnance factories during
1943-6. The same test being carried out on the
left eye (columns) and right eye (rows). - Stuart, Biometrika, 1953, 40, 105-110
32Observed
Is there any obvious structure?
33Expected
In general to find the expected frequency in a
particular cell the equation is Row total x
Column total / Grand total
34Expected
In general to find the expected frequency in a
particular cell the equation is Row total x
Column total / Grand total So for highest right
and left the equation becomes 1976 x 1907 / 7477
503.98
35Expected
Row total x Column total / Grand total 1976 x
1907 / 7477 503.98
36Expected
Row total x Column total / Grand total
37Expected
The missing values are simply found by subtraction
38Expected
1976 503.98 587.22 662.54 222.26
39Expected
1976 503.98 587.22 662.54 222.26
40Expected
Similarly for the remaining cells
41Expected
42Short Cut
- Contributions to the ?2 statistic,
for the top left cell the contribution is
43Conclusion
The above statistic makes it very clear that
there is some relationship between the quality of
the right and left eyes.
44Total ?2
45Conclusion
The above statistic makes it very clear that
there is some relationship between the quality of
the right and left eyes.
46SPSS
Raw data
47SPSS
Expected cell frequencies
48SPSS
Pearson Chi Square is the required statistic
49Alternate applications
- A similar approach may be employed to test if
simple models are plausible.
50?2 Goodness of Fit Test
The degrees of freedom are ? m n 1, where
there are m frequencies left in the problem,
after pooling, and n parameters have been fitted
from the raw data. For example
51Example
- The number of Prussian army corps in which
soldiers died from the kicks of a horse in a
year. -
- Typical industrial injury data
52Which distribution is appropriate?
- Is the data discrete or continuous?
-
ccccccccccccccccccccccc
Discrete, since a simple count
53Check list of distributions
54Check list of distribution parameters
n p
µ s2
cccccccccccccccccccccccccc
?
cccccccccccccccccccccccccc
?
Discrete, no n implies Poisson
ccccccc
55Poisson Distribution
- discrete events which are independent.
- 2 events occur at a fixed rate ? per unit
continuum.
56Poisson Distribution
x successes
e is approximately equal to 2.718
? is the rate per unit continuum
the mean is ? the variance is ?
57Casio 83ES
exp or e
exp(1) 2.7182818 exp(2) 7.389056
58Observed Data
We need to estimate the Poisson parameter ?.
Which is the mean of the distribution.
59Observed Data
60Mean
ccccccccccccccccccccc
61Expected
? 0.7 and e is a constant on your calculator
62Expected
63Expected Frequency
- Expected frequency for no deaths 280 x 0.4966
139.04
64Expected Frequency
- Expected frequency for remaining rows
- 280 probability frequency
Note the two expected frequencies less than 5!
65?2 Calculation
Pool to ensure all expected frequencies exceed 5
66Conclusion
- Here m (frequencies) 4,
- n (fitted parameters) 1
- then ? m n 1 4 1 1 2
The hypothesis, that the data comes from a
Poisson distribution would be accepted (5.991 gt
1.95).
67Next Week
- Bring your calculators next week
68Read
- Read Howitt and Cramer pages 134-152
- Read Howitt and Cramer (e-text) pages 125-134
- Read Russo (e-text) pages 100-119
- Read Davis and Smith pages 434-448
69Practical 8
- This material is available from the module web
page. - http//www.staff.ncl.ac.uk/mike.cox
Module Web Page
70Practical 8
- This material for the practical is available.
Instructions for the practical Practical 8
Material for the practical Practical 8
71Assignment 2
- You will find submission details on the module
web site
Note the dialers lower down the page give access
to your individual assignment. It is necessary to
enter your student number exactly as it appears
on your smart card.
72Assignment 2
- As a general rule make sure you can perform the
calculations manually.
It does no harm to check your calculations using
a software package.
Some software employ non-standard definitions and
should be used with caution.
73Assignment 2
- All submissions must be typed.