Title: Measures of Association
1Measures of Association
- Political Science 102
- Introduction to Political Inquiry
- Lecture 20
2Why Use Measures of Association?
- Cross-tabs and scatter plots are flexible tools
for exploring relationships between variables - Chi-squared test evaluates statistical
significance - Neither method provides a summary measure of the
relationship - What is the direction?
- How strong is the relationship?
- Measures of Association seek to provide this
information
3Ordinal Linear Measures
- Coefficient compares pairs of cases record them
as concordant, discordant, or tied - Concordant case 1 is higher (or lower) than
case 2 on both X and Y - Discordant case 1 is lower than case 2 on X,
but higher than case 2 on Y (or vice versa) - Tied case 1 and case 2 are equal on either X,
or Y, or both - Positive coefficient indicates more concordant
than discordant pairs negative coefficient
indicates more discordant pairs than condordant
4Ordinal Linear Measures
- Coefficients vary in how they weight and account
for ties - Gamma ignores ties (may ignore much of the data)
- Tau-b uses a weighted average of ties on X and Y
- All of these coefficients focus on linear
relationships (or at least monotonic) - Curvilinear and contingent relationships may be
masked by these procedures
5Goodman Kruskals Gamma
C Concordant pairs D Discordant pairs
C 23 x 68 1,564 D 5 x 3 15 Tx (23 x 5)
(3 x 68) Ty (23 x 3) (5 x 68)
6Goodman Kruskals Gamma
C 23 x 68 1,564 D 5 x 3 15 Tx (23 x 5)
(3 x 68) 319 Ty (23 x 3) (5 x 68) 409
7Kendalls Tau-B
C 23 x 68 1,564 D 5 x 3 15 Tx (23 x 5)
(3 x 68) 319 Ty (23 x 3) (5 x 68) 409
8Linearity and the Limits of Gama and Tau-b
Level of Interest in Politics/Current
Ideology Events
Very Libe Liberal Moderate Conservat
Very Cons Total ---------------------------
-------------------------------------------------
---------- Not much interested 15
46 100 62 21 244
4.69 7.01
8.25 6.78 4.36 6.81
------------------------------------------------
-------------------------------------- Somewhat
Interested 62 232 461
272 96 1,123
19.38 35.37 38.04 29.73
19.92 31.32 -----------------------------
-------------------------------------------------
-------- Very Much Interested 243
378 651 581 365 2,218
75.94 57.62
53.71 63.50 75.73 61.87
------------------------------------------------
--------------------------------------
Total 320 656 1,212
915 482 3,585
100.00 100.00 100.00 100.00
100.00 100.00 Pearson chi2(8)
106.8563 Pr 0.000 gamma
0.0748 ASE 0.023 Kendall's tau-b
0.0466 ASE 0.014
9Correlation Coefficients
- For ratio data we can construct measures of
association with more information about distance
between categories - Gamma and tau-b make only ordinal comparisons
- Analysts sought to construct summary statistic
that would allow comparison of the strength of
the relationship despite different units of
measure - The correlation coefficient!
10Origins of the Correlation Coefficient
- Analysts wanted to summarize how much changes in
X and Y are associated with one another - But X and Y are on different scales with
different levels of variation - Step 1 Measure the association of variation in X
and Y by subtracting out the mean level of each
variable
11The Origins of the Correlation Coefficient
- This formula focuses on deviations in X and Y,
but X and Y are still measured in different units - Solution Divide deviations in X and Y by their
respective standard deviations - Puts deviations in units of standard deviations
12Aspirations of the Correlation Coefficient
- Aims to be a unit-free measure of association
that allows comparison of degrees of association
across variables measured in different units - It FAILS on all counts!
- Cannot compare correlation of X and Y to Z and Y
because X and Z have different standard
deviations - The goal of unit-free comparisons is
wrong-headed - Cannot generalize correlations between the same
variables across different samples because the
standard deviations of the samples differ - Instead of being universally comparable,
correlations are universally incommensurable!
13What is the Solution?
- DONT use correlation coefficients to make
generalizable claims about the association
between variables - Assess the strength of relationships by looking
at - Variation across categories in cross-tabs
- Difference of means or proportions tests
- Scatter plots
- Rely on chi-squared and t-tests for statistical
significance - Rely on regression analysis to summarize the
strength of relationships between variables
14The Sample Dependence of Correlation Coefficients
- I created a dataset with these characteristics
- X1 varies from 20 to 20 where sx 10
- Y is defined as Y132X1e
- E is a random error term such that eN(0,20)
- Thus we KNOW the true relationship between X
and Y - We can change the sample to see if correlations
are generalizable
15Correlation Coefficients Depend on the Sample
. corr y1 x1 (obs100) y1
x1 --------------------------- y1
1.0000 x1 0.6401 1.0000
Analysis of Full Sample
. corr y1 x1 if x1gt-10 x1lt10 (obs70)
y1 x1 ---------------------------
y1 1.0000 x1 0.4655 1.0000
Analysis of Restricted Variation in X
Correlation coefficient drops by 1/3 due to
arbitrary changes in the sample
16Regression Coefficients Are Generalizable
. reg y1 x1 Source SS df
MS Number of obs
100 ---------------------------------------
F( 1, 98) 68.03 Model
30717.8966 1 30717.8966 Prob gt
F 0.0000 Residual 44248.6995 98
451.517342 R-squared
0.4098 ---------------------------------------
Adj R-squared 0.4037 Total
74966.5961 99 757.238344 Root
MSE 21.249 ------------------------------
------------------------------------------------
y1 Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- x1 1.92611 .2335192
8.248 0.000 1.462699 2.389522 _cons
4.27945 2.130969 2.008 0.047
.0506116 8.508289 -----------------------------
-------------------------------------------------
Analyzing the Full Sample
17Regression Coefficients Are Generalizable
. reg y1 x1 if x1gt-10 x1lt10 Source
SS df MS Number of
obs 70 -----------------------------------
---- F( 1, 68) 18.81
Model 8038.90856 1 8038.90856
Prob gt F 0.0000 Residual 29063.7331
68 427.407839 R-squared
0.2167 ---------------------------------------
Adj R-squared 0.2051 Total
37102.6416 69 537.719444 Root
MSE 20.674 ------------------------------
------------------------------------------------
y1 Coef. Std. Err. t
Pgtt 95 Conf. Interval ----------------
--------------------------------------------------
----------- x1 1.971043 .4544842
4.337 0.000 1.064134 2.877952 _cons
5.884096 2.517394 2.337 0.022
.8607142 10.90748 -----------------------------
-------------------------------------------------
Analyzing the Restricted Sample