Title: Additional Tests for Qualitative Data
1Additional Tests for Qualitative Data
215.1 Introduction
- Two statistical techniques are presented, to
analyze qualitative data. - A goodness-of-fit test for the multinomial
experiment. - A contingency table test of independence.
- Both tests use the c2 as the sampling
distribution for the test statistic.
315.2 Chi-squared Goodness-of-Fit Test
- This test describes a single population of
qualitative data. - The multinomial experiment studied is an
extension of the binomial experiment. - There are n independent trials.
- The outcome of each trial can be classified into
one of k categories, called cells. - The probability pi of cell i remains constant for
each trial. Moreover, p1 p2 pk 1. - The hypothesis tested involves the values of pi.
4 Example 15.1
- Two competing companies A and B have conducted
aggressive advertising campaigns. - Market shares before the campaigns were
- Company A 45
- Company B 40
- Other competitors 15.
- To study the effects of the campaigns on the
market shares, 200 customers were asked to
indicate their preference regarding the product
advertised.
5- Survey results
- 102 customers preferred the company As product,
- 82 customers preferred the company Bs product,
- 16 customers preferred the other competitors
product. - Solution
- The population investigated is the brand
preferences. - The data are qualitative (A, B, or other)
- This is a multinomial experiment (three
categories). - The question of interest Are p1, p2, and p3
different after the campaign from their values
before the campaign?
6- The hypotheses are
- H0 p1 .45, p2 .40, p3 .15
- H1 At least one pi is not equal to its specified
value
Test statistic What sample frequency would you
expect for each category if the null hypothesis
is true?
What actual frequencies did the sample return?
90 200(.45)
80 200(.40)
102
82
30 200(.15)
16
7- The statistic is
- The rejection region is
- In our example
-
Conclusion At 5 significance level there is
sufficient evidence to reject the null
hypothesis. At least one of the probabilities pi
is different. Thus, at least two market shares
have changed.
8 Rule of five
- The test statistic used to perform the test is
only approximately Chi-squared distributed. - For the approximation to apply, the expected cell
frequency has to be at least 5. - If the expected frequency in a cell is less than
5, combine it with other cells.
915.3 Chi-squared Test of a Contingency Table
- This test satisfies two objectives
- Are two qualitative variables related?
- Are there differences among two or more
populations of qualitative variables? - To accomplish the test objectives, we need to
classify the data according to two different
criteria.
10 Example 15.2
- In an effort to better predict the demand for
courses offered by a certain MBA program, it was
hypothesized that students academic background
affect their choice of MBA major, thus, their
courses selection. - A random sample of last years MBA students was
selected. The following contingency table
summarizes relevant data.
11There are two ways to address the problem
If each classification is considered a
qualitative variable, are these two variables
dependent?
If each undergraduate degree is considered a
population, do these populations differ?
12- Solution
- The hypotheses are
- H0 The two variables are independent
- H1 The two variables are dependent
k is the number of cells in the contingency
table.
Since ei npi, we need to estimate the unknown
probability from the data, assuming H0 is true.
1360
39
61
44
152
152
- Under the null hypothesis the two variables are
independent - P(Marketing and BA) P(Marketing)P(BA)
61/15260/152.
The number of students expected to fall in the
cell Marketing - BA is eMarket-BA npMarket-BA
152(61/152)(60/152) 6160/152 24.08
The number of students expected to fall in the
cell Finance - BBA is eFinance-BBA
npFinance-BBA 152(44/152)(39/152) 4439/152
11.29
14(No Transcript)
1531 24.08
31 24.08
7 6.80
5 6.39
31 24.08
7 6.80
5 6.39
The expected frequency
7 6.80
5 6.39
31 24.08
Calculation of the c2 statistic
7 6.80
5 6.39
31 24.08
7 6.80
(31 - 24.08)2 24.08
(5 - 6.39)2 6.39
(7 - 6.80)2 6.80
c2
14.70
.
.
16Excel solution
Select the Chi squared / raw data option from
Data analysis plus under tools.
Define a code to specify each quantitative
value. Input the data in columns one column for
each category.
Code Undergraduate degree 1 BA 2
BENG 3
BBA 4 OTHERS MBA Major
1 MARKETING 2
FINANCE 3 ACCOUNTING
17 Rule of five
- The c2 distribution provides an adequate
approximation to the sampling distribution under
the condition that eij gt 5 for all the cells. - When eij lt 5 rows or columns must be added such
that the condition is met.
Example
4 (5.1) 7 (6.3) 4 (3.6)
18 (17.9) 23 (22.3) 12 (12.8)
4 14 5.112.8 7 16 6.316 4 8
3.6 9.2
1815.5 Chi-Squared test for Normality
- The goodness of fit Chi-squared test can be used
to determined if data were drawn from any
distribution. - The multinomial experiment produces the test
statistic.
Testing goodness of fit for the normal
distribution
For example P(z1ltzltz2)p2
Select values of zi such that the expected
frequency in each interval (zi, zi1) is at
least 5.
np2 gt 5
np2 gt 5
Test the hypotheses H0 P1 p1,, Pk pk H1 At
least one proportions differs from its
specified value.
19Example For a sample size of n50 (see example
11.1) ,the sample mean was 460.38 with standard
error of 38.83. Can we infer from the data
provided that this sample was drawn from a normal
distribution with m 460.38 and s 38.83? Use
5 significance level.
Solution First let us select z values that define
each cell (expected frequency gt 5 for each
cell.) z1 -1 P(z lt -1) p1 .1587 e1
np1 50(.1587) 7.94 z2 0 P(-1 lt zlt 0)
p2 .3413 e2 np2 50(.3413) 17.07 z3 1
P(0 lt z lt 1) p3 .3413 e3 17.07
P(z gt 1) p4 .1587 e4 7.94
f3 19
Expected frequencies
Sample frequencies
The cell boundaries are calculated from the
corresponding z values determined above.
e2 17.07
e3 17.07
f2 13
p2
p2
z1 (x1 - 460.38)/38.83 -1 x1 421.55
f1 10
f4 8
e4 7.94
e1 7.94
p1
p1
The frequencies per cell can now be determined
421.55
460.38
499.21
20(10 - 7.94)2 7.94
c2
(13 - 17.07)2 17.07
(19 - 17.07)2 17.07
(8 - 7.94)2 7.94
1.72
- Conclusion There is insufficient evidence to
conclude at 5 significance level that the data
are not normally distributed.