Title: Lecture 2- Alternate Correlation Procedures
1Lecture 2- Alternate Correlation Procedures
- EPSY 640
- Texas AM University
2CORRELATION MEASURES FOR VARIOUS SCALES OF
MEASUREMENT
Y
Nominal Level Nominal
Level Ordinal Level Interval or
ratio X
(dichotomous) (polychotomous)
Level
Nominal Level Phi coefficient,
Dichotomous Yule's Q,
Pearson rank-biserial
point-biserial,
Goodmans ? Association,
biserial,
Nagelkirke
Lambda , Tschuprows
T, R-square (logistic)
tetrachoric
Pearson Nominal Level
Association,
reduce to (Polychotomous)
Tschiuprows T,
dichotomous or R-square
Cramers C
Kruskal-Wallis
based statistic Ordinal
Spearman,
R-squared Level
Kendalls tau Interval/Ratio
Pearson r
3Dichotomous-Dichotomous Case- PHI COEFFICIENT
- the phi coefficient can be written as
- rphi (ad bc) / (ac)(bd)(ab)(cd)1/2
MINORITY STATUS
a b c d
GENDER
4Dichotomous-Dichotomous Case- PHI COEFFICIENT
Political affiliation
- rphi (7x10 2x2) / (72)(210)(72)(210)1/2
- (70 - 4) / 9x12x9x121/2
- 66/108 .611
- Pearson r .157/(.5071 x .5071) .611
Row total 9 12
7 2 2 10
Gender
Column Total 9 12
21
5CHI-SQUARE I J ?2 ? ? n.j
(nij/n ni./n)2 / ni./n i1 j1
97/21 9/212 /( 9/21) 92/21 9/21 2/(
9/21) 122/21 12/212 /(12/21) 1210/21
12/212 / (12/21) 4/21 49/21
100/21 4/21 157/21
7.476 PEARSON ASSOCIATION TSCHUPROW'S
T P ?2 / ?2 n1/2 T ?2/n(r-c)(c-1)1/
21/2 7.476/22.4761/2
7.476/21 x 1 x 11/21/2 .577
.597
6SPSS Crosstabs procedure
- Select Analyze/ Descriptive Statistics/
Crosstabs - Select Row and Column variables for the two
nominal variables - Select under Statistics the options you want
such as Chi Square and various Nominal
association measures
7(No Transcript)
8(No Transcript)
91,1
0,1
1
1,0
0,0
0
1
0
TETRACHORIC ASSUMPTIONS- underlying normality of
observed dichotomies
10 n11 70 n12 20 n21 20 n22 100
ux height of normal curve for
proportion 90/210 U(.4290) The
z-score for .429 -.18 The ux for U(.4290)
(requires stat table or SPSS procedure
Pdf.Norm) .3637 uy height of
normal curve for proportion 90/210 U(.4290)
.3637 rtet (70 x 100)
(20 x 20) / .3637 x .3637 x 2102
6600/ (.132 x 2102) 1.13
not a good estimate! Table 3.4 Computation of
tetrachoric correlation
70
20
20
100
11N O R M A L D E N S I T Y H E I G H T
12N O R M A L D E N S I T Y H E I G H T
13POINT-BISERIAL CORRELATION
- Y score on interval measure (eg. Test score)
- x 0 or 1 (grouping eg. gender)
Y1. Y0. ____________ rpb
_________ . ? n1n0 / n(n-1)
sy
14Descriptive Statistics
Mean SD N Covariance GENDER .4667 .51
15 1.846 READING COMPREHENSION 0
(boys) 68.26 18.39 8 1 (girls) 75.19 11.11
7 Total 71.49 15.32 15
_________________ r
pb 75.19 68.26 . ? 8 x 7 / 15 x 14
15.32
.233 Table 3.5 Calculation of point-biserial
correlation coefficient for First Grade reading
comprehension of boys and girls
15POINT BISERIAL CORRELATION
Y
X X X X X X X X
m mean
m
X X X X X X X X
m
F
M
X
16Dichotomous (Normal)-Interval Case
- biserial correlation
- Y1. Y0.
- rbis _________ . n1n0 / uxn2 ,
- sy
- where u height of normal curve for proportion
n1/(n0 n1)
17 Y1. Y0. rbis
_________ . n1n0 / uxn2 ,
sy 75.19 68.26 . 8 x 7 /
.3675 x 152 15.32
.306
18BISERIAL CORRELATION
Y
X X X X X X X X
m
X X X X X X X X
m
F
M
X
19RANK-RANK DATA
- 1. DATA ARE INTERVAL OR RATIO
- Transformed to ranks because of odd distribution
- 2. DATA ARE ORDINAL, NO INTERVAL INFORMATION
AVAILABLE - USE SPEARMAN CORRELATION (Pearson formula used on
the ranks - no ties assumed)
20Rank distribution of real estate price per square
foot in Manhattan 21 26 19.5 17 12 22 19.5
15 24.5 9 24.5
8 6 4 7 3 14 Rank Price per foot
13 27
11 23 16 4 10 2
1 5 18 Battery
Central Park Location The
relative position of the ranks above is only
approximate, due to typeface limitations. All are
ordered correctly. This results in the
following ranks rank location 17 20 23 18 7
1 8 21 23 14 24 6 11 5 3
12 9 19 4 16 10 22 2 15 27 26
13 price 8 4 1 6 12 21 22 2 5
9 3 17 24.5 27 19.5 11 19.5 10 13 16 15
7 26 24.5 18 14 23 Computation of rank
correlation coefficient rrank sxy/sxsy
-.647 rSpearman -.640 (from SPSS, Version
13) Table 3.7 Computation of rank correlation
for Real Estate location in Manhattan with Price
Per Square Foot
21Least squares estimation
- The best estimate will be one in which the sum of
squared differences between each score and the
estimate will be the smallest among all possible
linear unbiased estimates (BLUES, or best linear
unbiased estimate).
22Least squares estimation
- errors or disturbances. They represent in this
case the part of the y score not predictable from
x - ei yi b1xi -b0x.
- The sum of squares for errors follows
- n
- SSe ? e2i .
- i1
23y
e
e
e
e
e
e
e
e
x
SSe ?e2i
24Matrix representation of least squares estimation.
- We can represent the regression model in matrix
form - y X? e
25Matrix representation of least squares estimation
- y X
? e - y1 1 x1 e1
- ?0
- y2 1 x2 ?1 e2
- y3 1 x3 e3
- y4 1 x4 e4
- . 1 . .
- . 1 . .
- . 1 . .
26Matrix representation of least squares estimation
- y Xb e
- The least squares criterion is satisfied by the
following matrix equation - b (X?X)-1X?y .
- The term X? is called the transpose of the X
matrix. It is the matrix turned on its side. When
- X?X is multiplied together, the result is a 2
x 2 matrix - n ?xi
- ?xi ?x2i Note all information here sample
size, mean (sum of scores), variance (squared
scores)
27SUMS OF SQUARES computational equivalents
- SSe (n 2 )s2e
- SSreg ? ( b1 xi y. )2
- SSy SSreg SSe
28SUMS OF SQUARES-Venn Diagram
SSy
SSreg
SSx
SSe
Fig. 8.3 Venn diagram for linear regression with
one predictor and one outcome measure
29SUMS OF SQUARES- ANOVA Table
- SOURCE df Sum of Squares Mean Square F
- x 1 SSreg SSreg / 1 SSreg/ 1
- SSe /(n-2)
- e n-2 SSe SSe / (n-2)
- Total n-1 SSy SSy / (n-1)
- Table 8.1 Regression table for Sums of Squares
30- Rupley and Willson (1997) studied the
relationship between word recognition and reading
comprehension for 200 six- and seven-year olds
using a national sample of students that mirrored
the U.S. census of 1980. The mean for Word
Recognition was 100, SD15, and the mean for
Reading Comprehension was 23.16, SD14.74. The
regression analysis is reported in the table
below - Dep. Var. Reading Comprehension
- SOURCE df Sum of Squares Mean Square F
Prob. R2 - Word recog- 1 34316.55 34316.55
763.17 .001 .794 - nition
- error 198 8903.23 44.97
- total 199 43219.78 se 6.71
31Two variable linear regression Which direction?
- Regression equations
- y xb1x xb0
- x yb1y yb0
- Regression coefficients
- xb1 rxy sy / sx
- yb1 rxy sx / sy
32Two variable linear regression
- y b1x b0
- If the correlation coefficient is calculated,
then b1 can be calculated from the equation
above - b1 rxy sy / sx
- The intercept, b0, follows by placing the means
for x and y into the equation above and solving -
- b0 y. rxysy/sx x.
33Two variable linear regression.
yb1 rxy sx / sy
xb1 rxy sy / sx
x
y
y
x
Fig. 8.1 Slopes for two regression
representations of Pearson correlation
34Three variable linear regression
- y b1x1 b2x2 b0
- Two predictors all variables may be correlated
with each other - Exact equations exist to compute b1 , b2 but not
for more than two predictors, matrix form normal
equations must be used
35Three variable linear regression
- Path model representationunstandardized
x1
b1
y
e
?12
x2
b2
36Three variable linear regression
- Path model representationstandardized
x1
?1
y
e
r12
x2
?2
37SUMS OF SQUARES-Venn Diagram
SSx1
SSy
SSreg
SSe
SSx2
Fig. 8.3 Venn diagram for linear regression with
two predictors and one outcome measure