Title: Chi Square
1Chi Square Correlation
2Nonparametric Test of Chi2
- Used when too many assumptions are violated in
T-Tests - Sample size too small to reflect population
- Data are not continuous and thus not appropriate
for parametric tests based on normal
distributions. - ?2 is another way of showing that some pattern in
data is not created randomly by chance. - X2 can be one or two dimensional.
- X2 deals with the question of whether what we
observed is different from what is expected
3Calculating X2
- What would a contingency table look like if no
relationship exists between gender and voting for
Bush? (i.e. statistical independence)
Male
Female
25 25
25 25
Voted for Bush
50
Voted for Kerry
50
100
50
50
NOTE INDEPENDENT VARIABLES ON COLUMS AND
DEPENDENT ON ROWS
4Calculating X2
- What would a contingency table look like if a
perfect relationship exists between gender and
voting for Bush?
Male
Female
Voted for Bush
50 0
0 50
Voted for Kerry
5Calculating the expected value
The expected frequency of the cell in the ith row
and jth column
Fi The total in the ith row marginal Fj The
total in the jth column marginal N The grand
total, or sample size for the entire table
Expected Voted for Bush 50x50 / 100 25
6Nonparametric Test of Chi2
- Again, the basic question is what you are
observing in some given data created by chance or
through some systematic process?
O Observed frequency E Expected frequency
7Nonparametric Test of Chi2
- The null hypothesis we are testing here is that
the proportion of occurrences in each category
are equal to each other (Ho BK). Our research
hypothesis is that they are not equal (Ha B K). - Given the sample size, how many cases could we
expect in each category (n/categories)? The
obtained/critical value estimation will provide a
coefficient and a Pr. that the results are random.
8Lets do a X2
- (50-25)2/2525
- (0 - 25)2 /2525
- (0 - 25)2 /2525
- (50-25)2 /2525
- X2100
Male
Female
Voted for Bush
50 0
0 50
Voted For Kerry
What would X2 be when there is statistical
independence?
9Lets corroborate with SPSS
10Testing for significance
- How do we know if the relationship is
statistically significant? - We need to know the df (df (R-1) (C-1) )
- (2-1)(2-1) 1
- We go to the X2 distribution to look for the
critical value (CV 3.84) - We conclude that the relationship gender and
voting is statistically significant.
Male
Female
Voted for Bush
20 30
30 20
Voted for Kerry
X2 4
11When is X2 appropriate to use?
- X2 is perhaps the most widely used statistical
technique to analyze nominal and ordinal data - Nominal X nominal (gender and voting preferences)
- Nominal and ordinal (gender and opinion for W)
12X2 can also be used with larger tables
Opinion of Bush MALE FEMALE
Favorable 40 5
Indifferent 10 20
Unfavorable 15 55
45
(19.4)
(15.8)
30
(.88)
(.72)
70
(8.6)
(6.9)
65
80
145
X252.3 Do we reject the null hypothesis?
13Correlation (Does not mean causation)
- We want to know how two variables are related to
each other - Does eating doughnuts affect weight?
- Does spending more hours studying increase test
scores? - Correlation means how much two variables overlap
with each other
14Types of Correlations
X (cause) Y (effect) Correlation Values
Increases Increases Positive 0 to1
Decreases Decreases Positive 0 to 1
Increases Decreases Negative -1 to 0
Decreases Increases Negative -1 to 0
Increase Decreases Does not change Independent 0
15Conceptualizing Correlation
Measuring Development
Strong
Weak
GPD
POP WEIGHT
GDP
EDUCATION
Correlation will be associated with what type of
validity?
16Correlation Coefficient
17Home Value Square footage
Log value Log sqft value2 sqft2 Val sqft
5.13 4.02 26.3169 16.1604 20.6226
5.2 4.54 27.04 20.6116 23.608
4.53 3.53 20.5209 12.4609 15.9909
4.79 3.8 22.9441 14.44 18.202
4.78 3.86 22.8484 14.8996 18.4508
4.72 4.17 22.2784 17.3889 19.6824
29.15 23.92 141.95 95.96 116.56
18Correlation Coefficient
19Rules of Thumb
Size of correlation coefficient General Interpretation
.8 - 1.0 Very Strong
.6 - .8 Strong
.4 - .6 Moderate
.2 - .4 Weak
.0 - .2 Very Weak or no relationship
20Multiple Correlation Coefficients
21Limitation of correlation coefficients
- They tell us how strong two variables are related
- However, r coefficients are limited because they
cannot tell anything about - Causation between X and Y
- Marginal impact of X on Y
- What percentage of the variation of Y is
explained by X - Forecasting
- Because of the above Ordinary Least Square (OLS)
is most useful
22Do you have the BLUES?
- B for Best (Minimum error)
- L for Linear (The form of the relationship)
- U for Un-bias (does the parameter truly reflect
the effect?) - E for Estimator
23Home value and sq. Feet
Does the above line meet the BLUE criteria?