Title: Mobile Computing Group
1 Mobile Computing Group
A quick-and-dirty tutorial on the chi2 test for
goodness-of-fit testing
2Outline
The presentation follows the pyramid schema
Chi2 tests for GoF
Goodness-of-fit (GoF)
Background -concepts
3Background
- Descriptive vs. inferential statistics
- Descriptive data used only for descriptive
purposes (use tables, graphs, measures of
variability etc.) - Inferential data used for drawing inferences,
make predictions etc. - Sample vs. population
- A sample is drawn from a population, assumed to
have some characteristics. - The sample is often used to make inferences about
the population (inferential statistics) - Hypothesis testing
- Estimation of population parameters
4Background
- Statistic vs. parameter
- A statistic is related (estimated from) a sample.
It can be used for both descriptive and
inferential purposes - A parameter refers to the whole population. A
sample statistic is often used to infer a
population parameter - Example the sample mean may be used to infer
the population mean (expected value) - Hypothesis testing
- A procedure where sample data are used to
evaluate a hypothesis regarding the population - A hypothesis may refer to several things
properties of a single population, relation
between two populations etc. - Two statistical hypotheses are defined a null H0
and an alternative H1 - H0 is the often a statement of no effect or no
difference. It is the hypothesis the researcher
seeks to reject
5Background
- Inferential statistical test
- Hypothesis testing is carried out via an
inferential statistic test - Sample data are manipulated to yield a test
statistic - The obtained value of the test statistic is
evaluated with respect to a sampling
distribution, i.e., a theoretical probability
distribution for the possible values of the test
statistic - The theoretical values of the statistic are
usually tabulated and let someone assess the
statistical significance of the result of his
statistical test - The goodness-of-fit is a type of hypothesis
testing - devise inferential statistical tests, apply them
to the sample, infer the matching of a
theoretical distribution to the population
distribution
6GoF as hypothesis testing
- Hypothesis H0
- The sample is derived from a theoretical
distribution F(?) - The sample data are manipulated to derive a test
statistic - In the case of the chi2 statistic this includes
aggregation of data into bins and some
computations - The statistic, as computed from data, is checked
against the sampling distribution - For the chi2 test, the sampling distribution is
the chi2 distribution, hence the name
7Goodness-of-fit
- Statistical tests and statistics the big picture
Chi2 type tests
EDF-based tests
Specialized tests
e.g., KS test, Anderson-Darling test
e.g., Shapiro-Wilk test for normality
Generalized chi2 statistics
Classical chi2 statistics
Log-likelihood ratio statistic
Modified chi2 statistic
Pearson chi2 statistic
8Pearson chi2 statistic
If X1, X2, X3Xn , the random sample and F(?)
the theoretical distribution under test, the
Pearson chi2 statistic is computed as
- M number of bins
- Oi (Ni) observed frequency in bin i
- n sample size
- Ei (npi) expected frequency in bin i according
to the theoretical distribution F(?)
9Interpretation of chi2 statistic
- Theory says that the Pearson chi2 statistic
follows a chi2 distribution, whose df are - M-1, when the parameters of the fitted
distribution are given a priori (case 0 test) - Somewhere between M-1 and M-1-q, when the q
parameters of the distribution are estimated by
the sample data - Usually, the df for this case are taken to be
M-1-q - Having estimated the value of the chi2 statistic
X2 , I check the chi2 distribution with M-1
(M-1-q) df to find - What is the probability to get a value equal to
or greater than the computed value X2, called
p-value - If p gt a, where a is the significance level of
my test, the hypothesis is rejected, otherwise it
is retained - Standard values for a are 0.1, 0.05, 0.01 the
higher a is the more conservative I am in
rejecting the hypothesis H0
10Example
- A die is rolled 120 times
- 1 comes 20 times, 2 comes 14, 3 comes 18, 4 comes
17, 5 comes 22 and 6 comes 29 times - The question is Is the die biased? or better
Do these data suggest that the die is biased? - Hypothesis H0 the die is not biased
- Therefore, according to the null hypothesis these
numbers should be distributed uniformly - F(?) the discrete uniform distribution
11Example cont.
- Interpretation
- The distribution of the test statistic has 5 df
- The probability to get a value smaller or equal
than 6.7 under a chi2 distribution with 5 df
(p-value) is 0.75, which is lt 1-a for all a in
0.01..0.1. - Therefore the hypothesis that the die is not
biased cannot be rejected
12Interpretation of Pearson chi2
- At 10 significance level, I would reject the
hypothesis if the computed X2gt9.24)
10 of the area under the curve
z
6.7
11.07
15.09
9.24
P-value
0.25
0.1
0.05
0.01
13Properties of Pearson chi2 statistic
- It can be estimated for both discrete and
continuous variables - Holds for all chi2 statistics. Max flexibility
but fails to make use of all available
information for continuous variables - It is maybe the simplest one from computational
point of view - As with all chi2 statistics, one needs to define
number and borders of bins - These are generally a function of sample size and
the theoretical distribution under test
14Bin selection
- How many and which?
- Different opinions in literature, no rigid proof
of optimality - There seems to be convergence on the following
aspects - Probability of bins
- The bins should be chosen equiprobable with
respect to the theoretical distribution under
test - Minimum expected frequencies npi
- (Cramer, 46) npi gt 10, for all bins
- (Cochran, 54) npi gt 1 for all bins, npi gt 5
for 80 of bins - (Roscoe and Byars,71)
15Bin selection
- Relevance of bins M to sample size N
- (Mann and Wald, 42), (Schorr, 74) for large
sample sizes - 1.88n2/5 lt M lt 3.76n2/5
- (Koehler and Larntz,80) for small sample size
- Mgt3, ngt10 and n2/Mgt10
- (Roscoe and Byars, 71)
- Equi-probable bins hypothesis N gt M when a
0.01 and a 0.05 - Non-equiprobable bins Ngt2M (a 0.05) and Ngt4M
(a0.01)
16Bin selection
- Bins vs. sample size according to Mann and Ward
17Bin selection cont. vs. discrete
1.0
0.9
0.8
0.7
0.6
Equi-probable bins easy to select
0.5
0.4
0.3
0.2
0.1
Bin i
1.0
Less straightforward to define equi-probable bins
1
2
3
4
5
6
7
18References
Textbooks
- D.J. Sheskin, Handbook of parametric and
nonparametric statistical procedures - Introduction (descriptive vs. inferential
statistics, hypothesis testing, concepts and
terminology) - Test 8 (chap. 8) The Chi-Square Goodness-of-Fit
Test (high-level description with examples and
discussion on several aspects) - R. Agostino, M. Stephens, Goodness-of-fit
techniques - Chapter 3 Tests of Chi-square type
- Reviews the theoretical background and looks more
generally at chi2 tests, not only the Pearson
test.
19References
Papers
- S. Horn, Goodness-of-Fit tests for discrete data
A review and an Application to a Health
Impairment scale - Good discussion of the properties and pros/cons
of most goodness-of-fit tests for discrete data - accessible, tutorial-like