Title: For testing significance of patterns in qualitative data
1Chi-squared Tests
- For testing significance of patterns in
qualitative data - Test statistic is based on counts that represent
the number of items that fall in each category - Test statistics measures the agreement between
actual counts and expected counts assuming the
null hypothesis
2Chi-squared Distribution
The chi-square distribution can be used to see
whether or not an observed counts agree with an
expected counts.Let O observed count and E
Expected count
3Testing if Observed Counts are in Agreement with
Known Percentages
Consider items of a population distributed over k
categories in in proportions
If H0 is
true then we expect Ei n , expected
frequency for the ith category as opposed to
Oi, observed frequency.
4An Example Biased Coin?
Observed Expected Frequency Frequency H 40 5
0 T 60 50 sum 100 100
5(No Transcript)
6degrees of freedom (R 1)(C 1) R number of
rows C number of columns
7Is our chi square value an extreme outcome just
by chance while in fact the null hypothesis is
true and sample frequencies are not significantly
apart from the ideal frequencies? Note that
chi-squared statistic
is a positive number
8- only the right-hand sideof the table is used
- nondirectional test
- the statistic has no sign
9 Observed Expected Die Frequency Frequency 1 4
10 2 6 10 3 17 10 4 16 10 5 8 10 6 9 1
0 sum 60 60
10(No Transcript)
11degrees of freedom number of terms -1
122 x 2 contingency tables Chi-squared test for
independence
Var B
total
b1
b2
Var A
a1
a2
total
Ho The two variable are independent Ha The
two variables are associated
13Result
notdef.
total
def
Operator
A
100
900
1000
B
60
440
500
total
160
1340
1500
14Total number of items1500 Total number of
defective items160 Overall defective rate
160/15000.1067 Now, apply this rate to the
number of items produced by each operator.
15Expected defective from Operator A 1000
0.1067 106.7 (expected not defective1000-106.7
893.3) Expected defective from Operator B 500
0.1067 53.3 (expected not defective500-53.3446
.7)
16Result
17r x c contingency tables
SA A NO D SD Gr 1 12 18 4 8 12 Gr2 48 22 10 8 10
Gr3 10 4 12 10 12
18- use when you have categorical data
- measure the difference between actual counts and
expected counts - test the independence of two variables
- Assumptionsdata set is a random sampleyou have
at least 5 counts in each category - degrees of freedom (categories var1
-1)(categories var2 -1)