Describing Association for Discrete Variables - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Describing Association for Discrete Variables

Description:

Describing Association for Discrete Variables. Discrete variables can have ... which two variables 'hang around. together' [non-directional] Symbolically, X Y ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 59
Provided by: robertas5
Category:

less

Transcript and Presenter's Notes

Title: Describing Association for Discrete Variables


1
Describing Association for Discrete Variables
2
Discrete variables can have one of two
different qualities 1. ordered categories 2.
non-ordered categories
3
1. Ordered categories e.g., High, Medium,
and Low both variables must be
ordered2. Non-ordered categories e.g., Yes
and No
4
Relationships between two variables may
be either 1. symmetrical or 2.
asymmetrical
5
Symmetrical means that we are only
interested in describing the extent to which
two variables hang around together
non-directional Symbolically, X
??? Y
6
Asymmetrical means that we want a measure
of association that yields a different
description of Xs influence on Y from Ys
influence on X directional
Symbolically, X ??? Y Y ??? X
7
Ordered Categories Asymmetrical
Relationship No Yes
No
Yes
8
For symmetrical relationships between two
non-ordered variables, there are two choices
1. Yules Q (for 2x2 tables) 2.
Cramers V (for larger tables)
9
Respondents in the 1997 General Social Survey
(GSS 1997) were askedWere they strong
supporters of any political party (yes or no)?
and, Did they vote in the 1996 presidential
election (yes or no)? Party
Identification Not Strong Strong
TotalVoting Voted a
b a bTurnout Not Voted c
d c d Total a c
b d abcd
10
(No Transcript)
11
Party Identification Not Strong
Strong TotalVoting Voted 615
339 954Turnout Not Voted 318
59 377 Total 933
398 1,331
12
Q (339)(318) - (615)(59) / (339)(318)
(615)(59) (107,801) - (36,285) / (107,801)
(36,285) (71,516) / (144,086) 0.496
13
What does this mean?Yules Q varies from 0.00
(statistical independence no association) to
1.00 (perfect direct association) and 1.00
(perfect inverse association)
14
Use the following rule of thumb (for now) 0.00
to 0.24 "No relationship" 0.25 to 0.49 "Weak
relationship" 0.50 to 0.74 "Moderate
relationship" 0.75 to 1.00 "Strong
relationship" Yules Q 0.496 ". . .
represents a moderate positive association
between party identification strength and voting
turnout."
15
Party Identification Not Strong
Strong TotalVoting Voted 0
954 954Turnout Not Voted
377 0 377 Total 377
954 1,331
16
What would be the value of Yule's Q?Q
(954)(377) - (0)(0) / (954)(377) (0)(0)
(359,658) - (0) / (359,658) (0)
(359,658) / (359,658) 1.000
17
Party Identification Not Strong
Strong TotalVoting Voted 477
477 954Turnout Not Voted 189
188 377 Total 666
665 1,331
18
In this case, Yule's Q would beQ (477)(189)
- (477)(188) / (477)(189) (477)(188)
(90,153) - (89,676) / (90,153) (89,676)
(477) / (179,829) 0.003
19
Obviously Yule's Q can only be calculated for 2 x
2 tables. For larger tables (e.g., 3 x 4 tables
having three rows and four columns), most
statistical programs such as SAS report the
Cramer's V statistic. Cramer's V has properties
similar to Yule's Q, but since it is computed
from ?2 it cannot take negative values
Where min(R 1) or (C 1) means either number
of rows less one or number of columns less one,
whichever is smaller, and N is sample size.
20
In the example above, ?2 50.968 and Cramer's V
is
0.196
21
For asymmetrical relationships between two
non-ordered variables, the statistic of choice
is Lambda (?)
22
Lambda is calculated as follows?
(Non-modal responses on Y) - (Sum of non-modal
responses for each category of X) / (Non-modal
responses on Y)
23
Party Identification Not Strong
Strong TotalVoting Voted 615
339 954Turnout Not Voted 318
59 377 Total 933
398 1,331
24
In this example,? (377) - (318 59) /
(377) (377) - (377) / (377) (0) / (377)
0.00
25
For symmetrical relationships between two
variables having ordered categories, the
statistic of choice is Gamma (G)
26
where ns are concordant pairs and nd
are discordant pairs
27
The concepts of concordant and discordant pairs
are simple and are based on a generalization of
the diagonal and off-diagonal in the Yules Q
statistic.
28
(No Transcript)
29
To construct concordant pairs "Starting with
the upper right cell (i.e., the first row, last
column in the table), add together all
frequencies in cells below AND to the left of
this cell, then multiply that sum by the cell
frequency. Move to the next cell (i.e., still
row one, but now one column to the left) and do
the same thing. Repeat until there are NO cells
to the left AND below the target cell. Then sum
up all these products to form the value for the
concordant pairs."
30
To illustrate, take the crosstabulation below
which shows the relationship between a measure of
social class and respondents' satisfaction with
their current financial situation
Social ClassFinanciallySatisfied Lower Working
Middle Upper Total Very well
10 131 251 36 428
More or less 19 309 343
19 690 Not at all 43 190
84 7 324 Total 72
630 678 62 1,442
31
Social ClassFinanciallySatisfied Lowe
r Working Middle Upper Total
Very well 10 131 251
36 428 More or less 19 309
343 19 690 Not at all 43
190 84 7 324
Total 72 630 678 62
1,442
32
Social ClassFinanciallySatisfied Lowe
r Working Middle Upper Total
Very well 10 131 251
36 428 More or less 19 309
343 19 690 Not at all 43
190 84 7 324
Total 72 630 678 62
1,442
33
Social ClassFinanciallySatisfied Lowe
r Working Middle Upper Total
Very well 10 131 251
36 428 More or less 19 309
343 19 690 Not at all 43
190 84 7 324
Total 72 630 678 62
1,442
34
Social ClassFinanciallySatisfied Lowe
r Working Middle Upper Total
Very well 10 131 251
36 428 More or less 19 309
343 19 690 Not at all 43
190 84 7 324
Total 72 630 678 62
1,442
35
For this table, the calculations are 36 x
(343 309 19 84 190 43) 35,568 251
x (309 19 190 43) 140,811 131 x (19
43) 8,122 19 x (84 190 43) 6,023
343 x (190 43) 79,919 309 x (43)
13,287These are NOT the value of the concordant
pairs they are the values that must be added
together to determine the value of concordant
pairs. ns (35,568 140,811 8,122
6,023 79,919 13,287) ns 283,730
36
To construct discordant pairs "Starting with
the upper left cell (i.e., the first row, first
column in the table), add together all
frequencies in cells below AND to the right of
this cell, then multiply that sum by the cell
frequency. Move to the next cell (i.e., still
row one, but now one column to the right) and do
the same thing. Repeat until there are NO cells
to the left AND below the target cell. Then sum
up all these products to form the value for the
discordant pairs."
37
Social ClassFinanciallySatisfied Lowe
r Working Middle Upper Total
Very well 10 131 251
36 428 More or less 19 309
343 19 690 Not at all 43
190 84 7 324
Total 72 630 678 62
1,442
38
Social ClassFinanciallySatisfied Lowe
r Working Middle Upper Total
Very well 10 131 251
36 428 More or less 19 309
343 19 690 Not at all 43
190 84 7 324
Total 72 630 678 62
1,442
39
Social ClassFinanciallySatisfied Lowe
r Working Middle Upper Total
Very well 10 131 251
36 428 More or less 19 309
343 19 690 Not at all 43
190 84 7 324
Total 72 630 678 62
1,442
40
Social ClassFinanciallySatisfied Lowe
r Working Middle Upper Total
Very well 10 131 251
36 428 More or less 19 309
343 19 690 Not at all 43
190 84 7 324
Total 72 630 678 62
1,442
41
For the discordant pairs in this table, the
calculations are 10 x (309 343 19 190
84 7) 9,520 131 x (343 19 84 7)
59,343 251 x (19 7) 6,526 19 x (190
84 7) 5,339 309 x (84 7) 28,119 343
x (7) 2,401Again, these are NOT the value of
the disconcordant pairs they are the values that
must be added together to determine the value of
disconcordant pairs. nd (9,520
59,343 6,526 5,339 28,119 2,401)
nd 111,248
42
G (283,730) - (111,248) / (283,730)
(111,248) (172,482) / (394,978) 0.437
43
For asymmetrical relationships between two
variables having ordered categories, the
statistic of choice is Somers dyx
44
For this crosstabulation, we specify Social Class
(the column variable) as the independent variable
(X) and Financial Satisfaction (the row variable)
as the dependent variable (Y). Social
Class (X)FinanciallySatisfied (Y) Lower Working
Middle Upper Total Very well
10 131 251 36 428
More or less 19 309 343
19 690 Not at all 43 190
84 7 324 Total 72
630 678 62 1,442
45
Somers' dyx statistic is created by adjusting
concordant and discordant pairs for tied pairs on
the dependent variable (Y).In the example we
have been using example, the only asymmetrical
relationship that makes sense is one with the
dependent variable (Y) as the row variable.
Therefore Somers' dyx will be shown only for this
situation, that is, for tied pairs on the row
variable. (Tied pairs for the column variable
follow the identical logic.)A tied pair is all
respondents who are identical with respect to
categories of the dependent variable but who
differ on the category of the independent
variable to which they belong. In the case of
financial satisfaction, it is all respondents who
express the same satisfaction level but who
identify themselves with different social
classes. In other words, for ties for a
dependent row variable it is all the observations
in the other cells in the same row.
46
The computational rule is Target the upper left
hand cell (in the first row, first column)
multiply its value by the sum of the cell
frequencies to right in the same row move to the
cell to the right and multiply its value by the
sum of the cell frequencies to right in the same
row repeat until there are no more cells to the
right in the same row then move to the first
cell in the next row (first column) and repeat
until there are no more cells in the table having
cells to the right. Add up these products.
47
Social ClassFinanciallySatisfied Lowe
r Working Middle Upper Total
Very well 10 131 251
36 428 More or less 19 309
343 19 690 Not at all 43
190 84 7 324
Total 72 630 678 62
1,442
48
Social ClassFinanciallySatisfied Lowe
r Working Middle Upper Total
Very well 10 131 251
36 428 More or less 19 309
343 19 690 Not at all 43
190 84 7 324
Total 72 630 678 62
1,442
49
Social ClassFinanciallySatisfied Lowe
r Working Middle Upper Total
Very well 10 131 251
36 428 More or less 19 309
343 19 690 Not at all 43
190 84 7 324
Total 72 630 678 62
1,442
50
Social ClassFinanciallySatisfied Lowe
r Working Middle Upper Total
Very well 10 131 251
36 428 More or less 19 309
343 19 690 Not at all 43
190 84 7 324
Total 72 630 678 62
1,442
51
Here, the products are 10 x (131 251 36)
4,180 131 x (251 36) 37,597 251 x (36)
9,036 19 x (309 343 19) 12,749 309 x
(343 19) 111,858 343 x (19) 6,517 43 x
(190 84 7) 12,083 190 x (84 7)
17,290 84 x (7) 588Thus, tied pairs (Tr)
for rows equalsTr (4,180 37,597 9,036
12,749 111,858 6,517 12,083 17,290
588) 211,898
52
In this example,Somers' dyx (283,730) -
(111,248) / (283,730) (111,248)
(211,898) (172,482) /
(606,976) 0.284
53
Ordered Categories Asymmetrical
Relationship No Yes
No
Yes
54
Using SAS to Produce Two-Way Frequency
Distributions and Statistics libname mystuff
'a\'libname library 'a\' options
formchar'--------/\lt gt' ps66 nodate
nonumber proc freq datamystuff.marriagetable
s churchmarried / expected alltitle1
Crosstabulation for Discrete Variables'run 
55
Crosstabulation for Discrete
Variables  TABLE OF CHURCH BY MARRIED 
CHURCH MARRIED  Frequency
Expected Percent Row
Pct Col Pct DivorcedMarried Never
SeparateWidowed Total
---------------------------------------------
---- Annually 74 269
129 18 43 533
62.318 290.33 101.17 18.695 60.485
5.09 18.50 8.87
1.24 2.96 36.66 13.88
50.47 24.20 3.38 8.07
43.53 33.96 46.74 35.29
26.06 ------------------------------
------------------- Monthly 30
149 50 10 26 265
30.983 144.35 50.303 9.295
30.072 2.06 10.25
3.44 0.69 1.79 18.23
11.32 56.23 18.87 3.77 9.81
17.65 18.81 18.12
19.61 15.76 ----------------------
--------------------------- Never
32 85 34 6 16
173 20.227 94.234 32.839
6.0681 19.632 2.20
5.85 2.34 0.41 1.10 11.90
18.50 49.13 19.65 3.47
9.25 18.82 10.73
12.32 11.76 9.70
---------------------------------------------
---- Weekly 34 289
63 17 80 483
56.472 263.09 91.684 16.942 54.811
2.34 19.88 4.33 1.17
5.50 33.22 7.04
59.83 13.04 3.52 16.56
20.00 36.49 22.83 33.33 48.48
-----------------------------------
-------------- Total 170
792 276 51 165 1454
11.69 54.47 18.98 3.51
11.35 100.00
56
Crosstabulation for Discrete
Variables  STATISTICS FOR TABLE OF
CHURCH BY MARRIED  Statistic
DF Value Prob
-------------------------------------------------
----- Chi-Square
12 57.792 0.000 Likelihood
Ratio Chi-Square 12 57.806 0.000
Mantel-Haenszel Chi-Square 1 8.152
0.004 Phi Coefficient
0.199 Contingency
Coefficient 0.196
Cramer's V 0.115  
Statistic
Value ASE --------------------
----------------------------------
Gamma 0.052
0.033 Kendall's Tau-b
0.035 0.022 Stuart's
Tau-c 0.031 0.020 
Somers' D CR
0.033 0.021 Somers' D RC
0.037 0.024 
Pearson Correlation 0.075
0.026 Spearman Correlation
0.041 0.026  Lambda
Asymmetric CR 0.000 0.000
Lambda Asymmetric RC
0.062 0.027 Lambda Symmetric
0.036 0.016 
Uncertainty Coefficient CR 0.016
0.004 Uncertainty Coefficient RC
0.015 0.004 Uncertainty
Coefficient Symmetric 0.016 0.004 
Sample Size 1454 
57
ExerciseCompute values for Lambda (?),
Gamma (G) and Somers' dyx for the following
two-way frequency distribution. Assume that the
row variable, self-described health, is the
dependent (Y) variable.   Education
Degree LevelSelf-DescribedHealth Less
than H.S. H.S. Jr.Co. Col. Grad.Sch.
Total Excellent 69 227 20 82
37 435 Good 156 403 26
77 34 696 Fair 122 111
8 12 5 258 Poor 50 16
0 3 0 69 Total 397
757 54 174 76 1,458
58
Answers 1. The modal responses on Y
(self-described health) are 696. Therefore, the
non-modal responses are 435 258 69 762.
For each category of self-described health, the
non-modal responses total 754. Therefore,
  Lambda (762 - 754) / 762 0.010  2.
Concordant pairs (ns) 320,060 and discordant
pairs (nd) 130,272  Gamma (320060 - 130272)
/ (320060 130272) 189788 / 450332
0.421  3. Tied pairs (Tr)
227,737 Therefore,  Somers' dyx
(320060 - 130272) / (320060 130272 227737)
189788 / 678069 0.280
Write a Comment
User Comments (0)
About PowerShow.com