Additional Tests for Qualitative Data - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Additional Tests for Qualitative Data

Description:

The multinomial experiment studied is an extension of the binomial experiment. ... This is a multinomial experiment (three categories) ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 21

Provided by: zgo

Category:

more less

Transcript and Presenter's Notes

Title: Additional Tests for Qualitative Data

1
Additional Tests for Qualitative Data

Chapter 15

2
15.1 Introduction

Two statistical techniques are presented, to
analyze qualitative data.
A goodness-of-fit test for the multinomial
experiment.
A contingency table test of independence.
Both tests use the c2 as the sampling
distribution for the test statistic.

3
15.2 Chi-squared Goodness-of-Fit Test

This test describes a single population of
qualitative data.
The multinomial experiment studied is an
extension of the binomial experiment.
There are n independent trials.
The outcome of each trial can be classified into
one of k categories, called cells.
The probability pi of cell i remains constant for
each trial. Moreover, p1 p2 pk 1.
The hypothesis tested involves the values of pi.

4
Example 15.1

Two competing companies A and B have conducted
aggressive advertising campaigns.
Market shares before the campaigns were
Company A 45
Company B 40
Other competitors 15.
To study the effects of the campaigns on the
market shares, 200 customers were asked to
indicate their preference regarding the product
advertised.

Survey results
102 customers preferred the company As product,
82 customers preferred the company Bs product,
16 customers preferred the other competitors
product.
Solution
The population investigated is the brand
preferences.
The data are qualitative (A, B, or other)
This is a multinomial experiment (three
categories).
The question of interest Are p1, p2, and p3
different after the campaign from their values
before the campaign?

The hypotheses are
H0 p1 .45, p2 .40, p3 .15
H1 At least one pi is not equal to its specified
value

Test statistic What sample frequency would you
expect for each category if the null hypothesis
is true?
What actual frequencies did the sample return?
90 200(.45)
80 200(.40)
102
82
30 200(.15)
16
7

The statistic is
The rejection region is
In our example

Conclusion At 5 significance level there is
sufficient evidence to reject the null
hypothesis. At least one of the probabilities pi
is different. Thus, at least two market shares
have changed.
8
Rule of five

The test statistic used to perform the test is
only approximately Chi-squared distributed.
For the approximation to apply, the expected cell
frequency has to be at least 5.
If the expected frequency in a cell is less than
5, combine it with other cells.

9
15.3 Chi-squared Test of a Contingency Table

This test satisfies two objectives
Are two qualitative variables related?
Are there differences among two or more
populations of qualitative variables?
To accomplish the test objectives, we need to
classify the data according to two different
criteria.

10
Example 15.2

In an effort to better predict the demand for
courses offered by a certain MBA program, it was
hypothesized that students academic background
affect their choice of MBA major, thus, their
courses selection.
A random sample of last years MBA students was
selected. The following contingency table
summarizes relevant data.

11
There are two ways to address the problem
If each classification is considered a
qualitative variable, are these two variables
dependent?
If each undergraduate degree is considered a
population, do these populations differ?
12

Solution
The hypotheses are
H0 The two variables are independent
H1 The two variables are dependent

The test statistic

k is the number of cells in the contingency
table.
Since ei npi, we need to estimate the unknown
probability from the data, assuming H0 is true.
13
60
39
61
44
152
152

Under the null hypothesis the two variables are
independent
P(Marketing and BA) P(Marketing)P(BA)

61/15260/152.
The number of students expected to fall in the
cell Marketing - BA is eMarket-BA npMarket-BA
152(61/152)(60/152) 6160/152 24.08
The number of students expected to fall in the
cell Finance - BBA is eFinance-BBA
npFinance-BBA 152(44/152)(39/152) 4439/152
11.29
14
(No Transcript)
15
31 24.08
31 24.08
7 6.80
5 6.39
31 24.08
7 6.80
5 6.39
The expected frequency
7 6.80
5 6.39
31 24.08
Calculation of the c2 statistic
7 6.80
5 6.39
31 24.08
7 6.80
(31 - 24.08)2 24.08
(5 - 6.39)2 6.39
(7 - 6.80)2 6.80
c2
14.70
.
.
16
Excel solution
Select the Chi squared / raw data option from
Data analysis plus under tools.
Define a code to specify each quantitative
value. Input the data in columns one column for
each category.
Code Undergraduate degree 1 BA 2
BENG 3
BBA 4 OTHERS MBA Major
1 MARKETING 2
FINANCE 3 ACCOUNTING
17
Rule of five

The c2 distribution provides an adequate
approximation to the sampling distribution under
the condition that eij gt 5 for all the cells.
When eij lt 5 rows or columns must be added such
that the condition is met.

Example
4 (5.1) 7 (6.3) 4 (3.6)
18 (17.9) 23 (22.3) 12 (12.8)
4 14 5.112.8 7 16 6.316 4 8
3.6 9.2
18
15.5 Chi-Squared test for Normality

The goodness of fit Chi-squared test can be used
to determined if data were drawn from any
distribution.
The multinomial experiment produces the test
statistic.

Testing goodness of fit for the normal
distribution
For example P(z1ltzltz2)p2
Select values of zi such that the expected
frequency in each interval (zi, zi1) is at
least 5.
np2 gt 5
np2 gt 5
Test the hypotheses H0 P1 p1,, Pk pk H1 At
least one proportions differs from its
specified value.
19
Example For a sample size of n50 (see example
11.1) ,the sample mean was 460.38 with standard
error of 38.83. Can we infer from the data
provided that this sample was drawn from a normal
distribution with m 460.38 and s 38.83? Use
5 significance level.
Solution First let us select z values that define
each cell (expected frequency gt 5 for each
cell.) z1 -1 P(z lt -1) p1 .1587 e1
np1 50(.1587) 7.94 z2 0 P(-1 lt zlt 0)
p2 .3413 e2 np2 50(.3413) 17.07 z3 1
P(0 lt z lt 1) p3 .3413 e3 17.07
P(z gt 1) p4 .1587 e4 7.94
f3 19
Expected frequencies
Sample frequencies
The cell boundaries are calculated from the
corresponding z values determined above.
e2 17.07
e3 17.07
f2 13
p2
p2
z1 (x1 - 460.38)/38.83 -1 x1 421.55
f1 10
f4 8
e4 7.94
e1 7.94
p1
p1
The frequencies per cell can now be determined
421.55
460.38
499.21
20