Additional Tests for Qualitative Data - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Additional Tests for Qualitative Data

Description:

The multinomial experiment studied is an extension of the binomial experiment. ... This is a multinomial experiment (three categories) ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 21
Provided by: zgo
Category:

less

Transcript and Presenter's Notes

Title: Additional Tests for Qualitative Data


1
Additional Tests for Qualitative Data
  • Chapter 15

2
15.1 Introduction
  • Two statistical techniques are presented, to
    analyze qualitative data.
  • A goodness-of-fit test for the multinomial
    experiment.
  • A contingency table test of independence.
  • Both tests use the c2 as the sampling
    distribution for the test statistic.

3
15.2 Chi-squared Goodness-of-Fit Test
  • This test describes a single population of
    qualitative data.
  • The multinomial experiment studied is an
    extension of the binomial experiment.
  • There are n independent trials.
  • The outcome of each trial can be classified into
    one of k categories, called cells.
  • The probability pi of cell i remains constant for
    each trial. Moreover, p1 p2 pk 1.
  • The hypothesis tested involves the values of pi.

4
Example 15.1
  • Two competing companies A and B have conducted
    aggressive advertising campaigns.
  • Market shares before the campaigns were
  • Company A 45
  • Company B 40
  • Other competitors 15.
  • To study the effects of the campaigns on the
    market shares, 200 customers were asked to
    indicate their preference regarding the product
    advertised.

5
  • Survey results
  • 102 customers preferred the company As product,
  • 82 customers preferred the company Bs product,
  • 16 customers preferred the other competitors
    product.
  • Solution
  • The population investigated is the brand
    preferences.
  • The data are qualitative (A, B, or other)
  • This is a multinomial experiment (three
    categories).
  • The question of interest Are p1, p2, and p3
    different after the campaign from their values
    before the campaign?

6
  • The hypotheses are
  • H0 p1 .45, p2 .40, p3 .15
  • H1 At least one pi is not equal to its specified
    value

Test statistic What sample frequency would you
expect for each category if the null hypothesis
is true?
What actual frequencies did the sample return?
90 200(.45)
80 200(.40)
102
82
30 200(.15)
16
7
  • The statistic is
  • The rejection region is
  • In our example

Conclusion At 5 significance level there is
sufficient evidence to reject the null
hypothesis. At least one of the probabilities pi
is different. Thus, at least two market shares
have changed.
8
Rule of five
  • The test statistic used to perform the test is
    only approximately Chi-squared distributed.
  • For the approximation to apply, the expected cell
    frequency has to be at least 5.
  • If the expected frequency in a cell is less than
    5, combine it with other cells.

9
15.3 Chi-squared Test of a Contingency Table
  • This test satisfies two objectives
  • Are two qualitative variables related?
  • Are there differences among two or more
    populations of qualitative variables?
  • To accomplish the test objectives, we need to
    classify the data according to two different
    criteria.

10
Example 15.2
  • In an effort to better predict the demand for
    courses offered by a certain MBA program, it was
    hypothesized that students academic background
    affect their choice of MBA major, thus, their
    courses selection.
  • A random sample of last years MBA students was
    selected. The following contingency table
    summarizes relevant data.

11
There are two ways to address the problem
If each classification is considered a
qualitative variable, are these two variables
dependent?
If each undergraduate degree is considered a
population, do these populations differ?
12
  • Solution
  • The hypotheses are
  • H0 The two variables are independent
  • H1 The two variables are dependent
  • The test statistic

k is the number of cells in the contingency
table.
Since ei npi, we need to estimate the unknown
probability from the data, assuming H0 is true.
13
60
39
61
44
152
152
  • Under the null hypothesis the two variables are
    independent
  • P(Marketing and BA) P(Marketing)P(BA)

61/15260/152.
The number of students expected to fall in the
cell Marketing - BA is eMarket-BA npMarket-BA
152(61/152)(60/152) 6160/152 24.08
The number of students expected to fall in the
cell Finance - BBA is eFinance-BBA
npFinance-BBA 152(44/152)(39/152) 4439/152
11.29
14
(No Transcript)
15
31 24.08
31 24.08
7 6.80
5 6.39
31 24.08
7 6.80
5 6.39
The expected frequency
7 6.80
5 6.39
31 24.08
Calculation of the c2 statistic
7 6.80
5 6.39
31 24.08
7 6.80
(31 - 24.08)2 24.08
(5 - 6.39)2 6.39
(7 - 6.80)2 6.80
c2
14.70
.
.
16
Excel solution
Select the Chi squared / raw data option from
Data analysis plus under tools.
Define a code to specify each quantitative
value. Input the data in columns one column for
each category.
Code Undergraduate degree 1 BA 2
BENG 3
BBA 4 OTHERS MBA Major
1 MARKETING 2
FINANCE 3 ACCOUNTING
17
Rule of five
  • The c2 distribution provides an adequate
    approximation to the sampling distribution under
    the condition that eij gt 5 for all the cells.
  • When eij lt 5 rows or columns must be added such
    that the condition is met.

Example
4 (5.1) 7 (6.3) 4 (3.6)
18 (17.9) 23 (22.3) 12 (12.8)
4 14 5.112.8 7 16 6.316 4 8
3.6 9.2
18
15.5 Chi-Squared test for Normality
  • The goodness of fit Chi-squared test can be used
    to determined if data were drawn from any
    distribution.
  • The multinomial experiment produces the test
    statistic.

Testing goodness of fit for the normal
distribution
For example P(z1ltzltz2)p2
Select values of zi such that the expected
frequency in each interval (zi, zi1) is at
least 5.
np2 gt 5
np2 gt 5
Test the hypotheses H0 P1 p1,, Pk pk H1 At
least one proportions differs from its
specified value.
19
Example For a sample size of n50 (see example
11.1) ,the sample mean was 460.38 with standard
error of 38.83. Can we infer from the data
provided that this sample was drawn from a normal
distribution with m 460.38 and s 38.83? Use
5 significance level.
Solution First let us select z values that define
each cell (expected frequency gt 5 for each
cell.) z1 -1 P(z lt -1) p1 .1587 e1
np1 50(.1587) 7.94 z2 0 P(-1 lt zlt 0)
p2 .3413 e2 np2 50(.3413) 17.07 z3 1
P(0 lt z lt 1) p3 .3413 e3 17.07
P(z gt 1) p4 .1587 e4 7.94
f3 19
Expected frequencies
Sample frequencies
The cell boundaries are calculated from the
corresponding z values determined above.
e2 17.07
e3 17.07
f2 13
p2
p2
z1 (x1 - 460.38)/38.83 -1 x1 421.55
f1 10
f4 8
e4 7.94
e1 7.94
p1
p1
The frequencies per cell can now be determined
421.55
460.38
499.21
20
  • The test statistic

(10 - 7.94)2 7.94
c2
(13 - 17.07)2 17.07
(19 - 17.07)2 17.07
(8 - 7.94)2 7.94
1.72


  • The rejection region
  • Conclusion There is insufficient evidence to
    conclude at 5 significance level that the data
    are not normally distributed.
Write a Comment
User Comments (0)
About PowerShow.com