Title: Categorical Data Analysis
1Chapter 13
- Categorical Data Analysis
2Learning Objectives
- 1. Explain ?2 Test for Proportions
- 2. Explain ?2 Test of Independence
- 3. Solve Hypothesis Testing Problems
- Two or More Population Proportions
- Independence
3Data Types
4Qualitative Data
- 1. Qualitative Random Variables Yield Responses
That Classify - Example Gender (Male, Female)
- 2. Measurement Reflects in Category
- 3. Examples
- Do You Own Savings Bonds?
- Do You Live On-Campus or Off-Campus?
5Hypothesis Tests Qualitative Data
6Chi-Square (?2) Test for k Proportions
7Hypothesis Tests Qualitative Data
8Chi-Square (?2) Test for k Proportions
- 1. Tests Hypothesis About Proportions Only
- Example p1 .2, p2.3, p3 .5
- 2. One Variable With Several Levels
- 3. Assumptions
- Multinomial Experiment
- Large Sample Size
- All Expected Counts ? 5
- 4. Uses One-Way Contingency Table
9Multinomial Experiment
- 1. n Identical Trials
- 2. k Outcomes to Each Trial
- 3. Constant Outcome Probability, pi
- 4. Independent Trials
- 5. Random Variable is Count, ni
- 6. Example Ask 100 People (n) Which of 3
Candidates (k) They Will Vote For
10One-Way Contingency Table
- 1. Shows Observations in k Independent Groups
(Outcomes or Variable Levels)
11One-Way Contingency Table
- 1. Shows Observations in k Independent Groups
(Outcomes or Variable Levels)
Outcomes (k 3)
Number of responses
12Generating in Stata
- . tab displaymode
- displaymode Freq. Percent Cum.
- -----------------------------------------------
- archiv 1 0.00 0.00
- flat 5,425 3.14 3.14
- nested 28,625 16.59 19.73
- nocomm 366 0.21 19.94
- thread 138,164 80.06 100.00
- -----------------------------------------------
- Total 172,581 100.00
13Why not ANOVA for this?
- Comparing multiple means, arent we?
14Why not ANOVA for this?
- Comparing multiple means, arent we?
- Yes, but
- Outcomes are dependent
- If higher count for outcome 1, lower for outcome
2
15?2 Test for k Proportions Hypotheses Statistic
16?2 Test for k Proportions Hypotheses Statistic
Hypothesized probability
- 1. Hypotheses
- H0 p1 p1,0, p2 p2,0, ..., pk pk,0
- Ha Not all pi equal their hypothesized values
17?2 Test for k Proportions Hypotheses Statistic
Hypothesized probability
- 1. Hypotheses
- H0 p1 p1,0, p2 p2,0, ..., pk pk,0
- Ha Not all pi equal their hypothesized values
- 2. Test Statistic
Observed count
Expected count
18?2 Test for k Proportions Hypotheses Statistic
Hypothesized probability
- 1. Hypotheses
- H0 p1 p1,0, p2 p2,0, ..., pk pk,0
- Ha Not all pi are equal
- 2. Test Statistic
- 3. Degrees of Freedom k - 1
Observed count
Expected count
Number of outcomes
19?2 Test Basic Idea
- 1. Compares Observed Count to Expected Count If
Null Hypothesis Is True - 2. Closer Observed Count to Expected Count, the
More Likely the H0 Is True - Measured by Squared Difference Relative to
Expected Count - Reject Large Values
20Sampling Distribution for ?2 Statistic
- Run the experiment thousands of times
- Each time draw a sample of size n and get counts
for each of the k outcomes - Each time compute a single value, the ?2
statistic - ?2 distribution gives the frequency with which
youd get different values for that statistic - The ?2 distribution is different for different
degrees of freedom depends only on k - Actually, ?2 distribution is only an
approximation of the true distribution for the ?2
statistic youd get - Better approximation as n gets large
- Then compute p-values or do hypothesis tests
- Rule of thumb only conduct tests based on this
sampling distribution when expected count for
each of the possible outcomes is gt5 - Confidence intervals dont really make sense
here, as theres so meaningful point estimate
that were trying to draw an interval around
21Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
22Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
?2 Table (Portion)
23Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
?2 Table (Portion)
24Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
?2 Table (Portion)
25Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
?2 Table (Portion)
26Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
?2 Table (Portion)
27Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
?2 Table (Portion)
28Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
df k - 1 2
?2 Table (Portion)
29Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
df k - 1 2
?2 Table (Portion)
30Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
df k - 1 2
?2 Table (Portion)
31Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
df k - 1 2
?2 Table (Portion)
32Finding Critical Value Example
What is the critical ?2 value if k 3, ? .05?
If ni E(ni), ?2 0. Do not reject H0
? .05
df k - 1 2
?2 Table (Portion)
33Check Your Understanding
- Will a higher value for ?2 statistic yield a
higher or low p-value? - Will a higher value of ?2 statistic make you more
or less likely to reject null hypothesis? - If alpha is smaller, will the critical ?2 value
be smaller or larger?
34?2 Test for k Proportions Example
- As personnel director, you want to test the
perception of fairness of three methods of
performance evaluation. Of 180 employees, 63
rated Method 1 as fair. 45 rated Method 2 as
fair. 72 rated Method 3 as fair. At the .05
level, is there a difference in perceptions?
35?2 Test for k Proportions Solution
36?2 Test for k Proportions Solution
- H0
- Ha
- ?
- n1 n2 n3
- Critical Value(s)
Test Statistic Decision Conclusion
37?2 Test for k Proportions Solution
- H0 p1 p2 p3 1/3
- Ha At least 1 is different
- ?
- n1 n2 n3
- Critical Value(s)
Test Statistic Decision Conclusion
38?2 Test for k Proportions Solution
- H0 p1 p2 p3 1/3
- Ha At least 1 is different
- ? .05
- n1 63 n2 45 n3 72
- Critical Value(s)
Test Statistic Decision Conclusion
39?2 Test for k Proportions Solution
- H0 p1 p2 p3 1/3
- Ha At least 1 is different
- ? .05
- n1 63 n2 45 n3 72
- Critical Value(s)
Test Statistic Decision Conclusion
? .05
40?2 Test for k Proportions Solution
41?2 Test for k Proportions Solution
- H0 p1 p2 p3 1/3
- Ha At least 1 is different
- ? .05
- n1 63 n2 45 n3 72
- Critical Value(s)
Test Statistic Decision Conclusion
?2 6.3
? .05
42?2 Test for k Proportions Solution
- H0 p1 p2 p3 1/3
- Ha At least 1 is different
- ? .05
- n1 63 n2 45 n3 72
- Critical Value(s)
Test Statistic Decision Conclusion
?2 6.3
Reject at ? .05
? .05
43?2 Test for k Proportions Solution
- H0 p1 p2 p3 1/3
- Ha At least 1 is different
- ? .05
- n1 63 n2 45 n3 72
- Critical Value(s)
Test Statistic Decision Conclusion
?2 6.3
Reject at ? .05
? .05
There is evidence of a difference in proportions
44Intuitions Why doesnt critical value depend on
n?
- What happens to ?2 statistic as n gets bigger?
- More terms (cells) to add into the sum
- Every term is positive
- Distribution of numerator values is same for each
additional cell - but denominator increases, too
- Its a weighted sum, with total weight 1
45Intuitions Why does critical value change with k?
- With more possible outcomes
- Greater chance that one of outcomes will have an
unusual count - Higher values of ?2 are expected (more likely to
get larger values) - Therefore critical value for ?2 goes up
46Running tests in stata
- No built-in stata code for this
- Type findit csgof, then click through to install
the csgof package - Csgof chi-square goodness of fit
47Stata output
- . keep if displaymode!1
- (1 observation deleted)
- . csgof displaymode, expperc(25, 25, 25, 25)
- ----------------------------------------
- displae expperc expfreq obsfreq
- ----------------------------------------
- flat 25 43145 5,425
- nested 25 43145 28,625
- nocomm 25 43145 366
- thread 25 43145 138,164
- ----------------------------------------
- chisq(3) is 289541.81, p 0
48A very sensitive test
- . csgof displaymode, expperc(3, 16.8, .2, 80)
- -----------------------------------------
- displae expperc expfreq obsfreq
- -----------------------------------------
- flat 3 5177.4 5,425
- nested 16.8 28993.44 28,625
- nocomm .2 345.16 366
- thread 80 138064 138,164
- -----------------------------------------
- chisq(3) is 17.85, p .0005
49Finally a non-rejection
- . csgof displaymode, expperc(3.2, 16.6, .2, 80)
- -----------------------------------------
- displae expperc expfreq obsfreq
- -----------------------------------------
- flat 3.2 5522.56 5,425
- nested 16.6 28648.28 28,625
- nocomm .2 345.16 366
- thread 80 138064 138,164
- -----------------------------------------
- chisq(3) is 3.07, p .3805
50?2 Test of Independence
51Hypothesis Tests Qualitative Data
52?2 Test of Independence
- 1. Shows If a Relationship Exists Between 2
Qualitative Variables - One Sample Is Drawn
- Does Not Show Causality
- 2. Assumptions
- Multinomial Experiment
- All Expected Counts ? 5
- 3. Uses Two-Way Contingency Table
53?2 Test of Independence Contingency Table
- 1. Shows Observations From 1 Sample Jointly in
2 Qualitative Variables
54?2 Test of Independence Contingency Table
- 1. Shows Observations From 1 Sample Jointly in
2 Qualitative Variables
Levels of variable 2
Levels of variable 1
55?2 Test of Independence Hypotheses Statistic
- 1. Hypotheses
- H0 Variables Are Independent
- Ha Variables Are Related (Dependent)
56?2 Test of Independence Hypotheses Statistic
- 1. Hypotheses
- H0 Variables Are Independent
- Ha Variables Are Related (Dependent)
- 2. Test Statistic
Observed count
Expected count
57?2 Test of Independence Hypotheses Statistic
- 1. Hypotheses
- H0 Variables Are Independent
- Ha Variables Are Related (Dependent)
- 2. Test Statistic
- Degrees of Freedom (r - 1)(c - 1)
Observed count
Expected count
Rows Columns
58?2 Test of Independence Expected Counts
- 1. Statistical Independence Means Joint
Probability Equals Product of Marginal
Probabilities - 2. Compute Marginal Probabilities Multiply for
Joint Probability - 3. Expected Count Is Sample Size Times Joint
Probability
59Expected Count Example
60Expected Count Example
61Expected Count Example
112 160
Marginal probability
62Expected Count Example
112 160
Marginal probability
78 160
Marginal probability
63Expected Count Example
112 160
Marginal probability
Joint probability
78 160
Marginal probability
64Expected Count Example
112 160
Marginal probability
Joint probability
78 160
Marginal probability
54.6
65Expected Count Calculation
66Expected Count Calculation
67Expected Count Calculation
11282 160
11278 160
4878 160
4882 160
68?2 Test of Independence Example
- Youre a marketing research analyst. You ask a
random sample of 286 consumers if they purchase
Diet Pepsi or Diet Coke. At the .05 level, is
there evidence of a relationship?
69?2 Test of Independence Solution
70?2 Test of Independence Solution
- H0
- Ha
- ?
- df
- Critical Value(s)
Test Statistic Decision Conclusion
71?2 Test of Independence Solution
- H0 No Relationship
- Ha Relationship
- ?
- df
- Critical Value(s)
Test Statistic Decision Conclusion
72?2 Test of Independence Solution
- H0 No Relationship
- Ha Relationship
- ? .05
- df (2 - 1)(2 - 1) 1
- Critical Value(s)
Test Statistic Decision Conclusion
73?2 Test of Independence Solution
- H0 No Relationship
- Ha Relationship
- ? .05
- df (2 - 1)(2 - 1) 1
- Critical Value(s)
Test Statistic Decision Conclusion
? .05
74?2 Test of Independence Solution
?
E(nij) ? 5 in all cells
116132 286
154132 286
170132 286
170154 286
75?2 Test of Independence Solution
76?2 Test of Independence Solution
- H0 No Relationship
- Ha Relationship
- ? .05
- df (2 - 1)(2 - 1) 1
- Critical Value(s)
Test Statistic Decision Conclusion
?2 54.29
? .05
77?2 Test of Independence Solution
- H0 No Relationship
- Ha Relationship
- ? .05
- df (2 - 1)(2 - 1) 1
- Critical Value(s)
Test Statistic Decision Conclusion
?2 54.29
Reject at ? .05
? .05
78?2 Test of Independence Solution
- H0 No Relationship
- Ha Relationship
- ? .05
- df (2 - 1)(2 - 1) 1
- Critical Value(s)
Test Statistic Decision Conclusion
?2 54.29
Reject at ? .05
? .05
There is evidence of a relationship
79Siskel and Ebert (13.49)
- . tab siskel ebert
- Ebert
- Siskel Con Mix Pro
Total - ------------------------------------------------
------ - Con 24 8 13
45 - Mix 8 13 11
32 - Pro 10 9 64
83 - ------------------------------------------------
------ - Total 42 30 88
160
80Siskel and Ebert
- . tab siskel ebert, expected chi2
- Ebert
- Siskel Con Mix Pro
Total - ------------------------------------------------
------ - Con 24 8 13
45 - 11.8 8.4 24.8
45.0 - ------------------------------------------------
------ - Mix 8 13 11
32 - 8.4 6.0 17.6
32.0 - ------------------------------------------------
------ - Pro 10 9 64
83 - 21.8 15.6 45.6
83.0 - ------------------------------------------------
------ - Total 42 30 88
160 - 42.0 30.0 88.0
160.0 - Pearson chi2(4) 45.3569 Pr 0.000
81Conclusion
- 1. Explained ?2 Test for Proportions
- 2. Explained ?2 Test of Independence
- 3. Solved Hypothesis Testing Problems
- Two or More Population Proportions
- Independence
82End of Chapter
Any blank slides that follow are blank
intentionally.