Title: Correlation
1Correlation
2Two variables Which test?
X
Contingency analysis
Logistic regression
Y
Correlation Regression
t-test
3Two variables Which test?
X
Contingency analysis
Logistic regression
Y
Correlation Regression
t-test
4Relationship Between Two Numerical Variables
5Relationship Between Two Numerical Variables
6Correlation
- What is the tendency of two numerical variables
to co-vary (change together)?
7Correlation
- What is the tendency of two numerical variables
to co-vary (change together)? - Correlation coefficient r measures the strength
and direction of the linear association between
two numerical variables
8Correlation
- What is the tendency of two numerical variables
to co-vary (change together)? - Correlation coefficient r measures the strength
and direction of the linear association between
two numerical variables - Population parameter r (rho)
- Sample estimate r
9(No Transcript)
10Sum of squares X and Y
11Sum of products
Sum of squares X and Y
12Shortcuts
13r
r
r
r
14Correlation assumes...
- Random sample
- X is normally distributed with equal variance for
all values of Y - Y is normally distributed with equal variance for
all values of X
15Correlation assumes...
- Random sample
- X is normally distributed with equal variance for
all values of Y - Y is normally distributed with equal variance for
all values of X
Bivariate normal distribution
16Correlation coefficient facts
- -1 lt r lt 1 -1 lt r lt 1
17Correlation coefficient facts
- -1 lt r lt 1 -1 lt r lt 1
- Positive r variables increase together
- Negative r when one variable increases, the
other decreases, and vice-versa
18Correlation coefficient facts
- -1 lt r lt 1 -1 lt r lt 1
- Positive r variables increase together
- Negative r when one variable increases, the
other decreases, and vice-versa
uncorrelated
positive
negative
r0
r 1
r -1
19Correlation coefficient facts
- Coefficient of determination r2
- Describes the proportion of variation in one
variable that can be predicted from the other
20Standard error of r
21Confidence Limits for r
22Example
- Are the effects of new mutations on mating
success and productivity correlated? - Data from Drosophila melanogaster
- n 31 individuals
23X is productivity, Y is the mating success
- Sum of products 2.796
- Sum of squares for X 16.245
- Sum of squares for Y 1.6289
24X is productivity, Y is the mating success
25(No Transcript)
26(No Transcript)
27Confidence Limits for r
28Confidence Limits for r
29Confidence Limits for r
30Confidence Limits for r
31Confidence Limits for r
32Confidence Limits for r
33Example Why Sleep?
34Example Why Sleep?
- 10 experimental subjects
- Measured increase in slow-wave activity during
sleep - Measured improvement in task after sleep -
hand-eye coordination activity
35Example Why Sleep?
36Why sleep?
- Sum of products 1127.4
- Sum of squares X 2052.4
- Sum of squares Y 830.9
- Calculate a 95 C.I. for ?
37Hypothesis Testing for Correlations
- Can test hypotheses relating to correlations
among variables - Closely related to regression - the topic for
next Tuesdays lecture
38Hypothesis Testing for Correlations
39If r 0,...
r is normally distributed with mean 0
with df n -2
40Example
- Are the effects of new mutations on mating
success and productivity correlated? - Data from Drosophila melanogaster
41Hypotheses
- H0 Mating success and productivity are not
related (r 0) - HA Mating success and productivity are
correlated (r ? 0)
42X is productivity, Y is the mating success
- Sum of products 2.796
- Sum of squares for X 16.245
- Sum of squares for Y 1.6289
43(No Transcript)
44(No Transcript)
45df n-231-229
46df n-231-229
47Why sleep?
- Sum of products 1127.4
- Sum of squares X 2052.4
- Sum of squares Y 830.9
- Test for a correlation different from zero in
these data.
48Checking Assumptions for Correlation
- Bivariate normal distribution
- Relationship is linear (straight line)
- Cloud of points in scatter plot is circular or
elliptical - Frequency distributions of X and Y are normal
49Linear Relationship?
50(No Transcript)
51Maximum correlation possible
52Maximum correlation possible
Correlation of zero
53Maximum correlation possible
Correlation of zero
54Cloud of points elliptical?
55X and Y normal?
- Use usual techniques for both X and Y separately
- Be wary of outliers
56Quick Reference Guide - Correlation Coefficient
- What is it for? Measuring the strength of a
linear association between two numerical
variables - What does it assume? Bivariate normality and
random sampling - Parameter ?
- Estimate r
- Formulae
57Quick Reference Guide - t-test for zero linear
correlation
- What is it for? To test the null hypothesis that
the population parameter, ?, is zero - What does it assume? Bivariate normality and
random sampling - Test statistic t
- Null distribution t with n-2 degrees of freedom
- Formulae
58T-test for correlation
Null hypothesis ?0
Sample
Test statistic
Null distribution t with n-2 d.f.
compare
How unusual is this test statistic?
P gt 0.05
P lt 0.05
Reject Ho
Fail to reject Ho