Title: Lab 8
1Lab 8
- Bivariate Correlation and Regression
2Overview
- What is a Correlation?
- Bivariate Correlation in SPSS
- Comparing Correlations
- What is a Regression?
- Bivariate Regression in SPSS
- Assignment 8
3What is a Correlation?
4What is a Correlation?
- There are two general approaches to research in
psychology - Mean (group) differences
- Typically, experiments (with manipulations)
analyzed using ANOVA - How do the IVs cause changes on the DV?
- Individual differences
- Typically, multiple measures taken from a set of
individuals and analyzed using correlations or
regression - How do these variables covary?
5What is a Correlation?
Correlation?
Yes
No
Yes
Mean Difference?
No
6What is a Correlation?
- The math underlying both approaches is the same
- The General Linear Model (GLM) can be used for
both ANOVA and regression - Regression is much better for continuous
predictors, and it can also use categorical IVs - Any analysis using ANOVA can be conducted
equivalently using regression (but not vice
versa) - SPSS actually runs regressions when you ask it to
perform ANOVA
7What is a Correlation?
- In the individual differences approach, the basic
data of interest are pairs of observations from
all the individuals in a sample - For example, you might measure self-esteem and
socio-economic status for all people in a given
sample - Note that there are no manipulations involved
here you are only collecting a set of
measurements (what you would call DVs in the mean
differences approach)
8What is a Correlation?
- The heart of the GLM is variance (s2)
- Variance is (roughly) the average departure of
observations from the group mean
9What is a Correlation?
- A closely related measure is covariance (cov)
- Covariance is (roughly) the shared variation
between two variables about their respective
means - The higher the covariance, the more closely the
two variables track each other
10What is a Correlation?
- Correlation (r) is basically standardized
covariance - That is, a measure of how well two variables
track each other (or covary) in standard units
covariance (roughly)
standardized by dividing by the standard
deviations of the two variables
11What is a Correlation?
- Another way to think about this ratio is as a
part out of the whole - In other words, the correlation is the proportion
of observed covariance (tracking) between two
variables out of the maximum possible covariance
(determined by the variances of the individual
variables) - Thus, the size (or magnitude) of a correlation
can only vary from 0 to 1 - That is, from no tracking at all (r 0) to
perfect tracking (r 1)
12What is a Correlation?
- In addition to the magnitude, the correlation
coefficient r also provides information about the
direction of covariation - Negative values (between -1 and 0) indicate an
inverse relationship As one goes up, the other
goes down - A value at or close to 0 indicates no
relationship The two variables dont seem to
track each other - Positive values (between 0 and 1) indicate a
direct relationship As one goes up, the other
goes up
13What is a Correlation?
- As you can see, the more variance two variables
share in common (i.e., the higher the
covariance), the stronger the correlation - When one variable starts to vary without the
other, it makes the pattern noisieror cloudier
in the scatterplot
14What is a Correlation?
- The square of the correlation coefficient (r2)
gives the proportion of variance accounted for by
the linear relationship (covariation) between the
two variables - In other words, r2 is the ratio of predicted
variance to observed variancewhich is the
proportion of variance accounted for by the
prediction
15What is a Correlation?
- The null hypothesis for testing a correlation is
that the correlation in the population (? or
rho) is zero - Like an F-test in ANOVA, this is a ratio of the
proportion of variance accounted for to the
proportion of variance not accounted for (error) - Note that significance depends on the size of r
and N
16Bivariate Correlation in SPSS
17Bivariate Correlation in SPSS
- Lets say you were interested in studying
depression in Chinese immigrants to Canada - Collect data for several individual difference
measures from a sample of Chinese immigrants - Length of time in Canada
- English proficiency
- Fit with Canadian culture
- Amount of stress
- Depression
18Bivariate Correlation in SPSS
Length of time in Canada
English proficiency
Fit with Canadian culture
Amount of stress
Depression
19Bivariate Correlation in SPSS
20Bivariate Correlation in SPSS
- The correlation matrix is symmetrical the
information below the diagonal is redundant - SPSS reports two-tailed significance tests
- For a one-tailed test, divide the sig. value in
half - For correlations, df N 2
21Bivariate Correlation in SPSS
- To report the correlation between fit with
Canadian culture and depression - r(99) -.34, p lt .001
- To calculate the proportion of variance in
depression accounted for by fit with Canadian
culture - r2 -.342 .12
- In other words, fit with Canadian culture
explains 12 of the variance in depression (or
vice versa) - Note that direction of causality is open to
question
22Comparing Correlations
23Comparing Correlations
- There are a number of cases in which you might
want to compare different correlations for
significant differences - rxy from sample 1 to rxy from sample 2?
- rxy to rxz from one sample?
- See pp. 209-213
24Comparing Correlations
- For example, is the correlation between length of
time in Canada (V1) and English proficiency (V2)
significantly different from the correlation
between length of time in Canada (V1) and fit
with Canadian culture(V3)? - r12 .46
- r13 .42
- r23 .79
25Comparing Correlations
- Enter these three correlations into the (very
long) Dunn and Clark equation on p. 209 to
calculate a z-score - z .50
- Critical z at the .05 level is 1.96, which we did
not exceed, so these two correlations (r12 and
r13) are not significantly different
26What is a Regression?
27What is a Regression?
- Bivariate regression is essentially correlation
- The only difference is semantic You consider one
variable to be the predictor (X) and the other to
be the criterion (Y) - If you remember intro algebra, you can see that
this is an equation for a straight line with
intercept a and slope b
28What is a Regression?
- The point of regression is to figure out the
values of a and b that define the line of best
fit relative to the observed X and Y values - But unless the correlation between X and Y is
exactly 1, this line of best fit will not be a
perfect fit to the data - Thus, you actually end up with predicted Y values
(Y) from your regression equation
29What is a Regression?
- The important conceptual point is that the slope
b is equivalent to the correlation coefficient r - In algebra, the slope tells you how much Y
changes for a unit change in X - Equivalently, in regression, the correlation
coefficient is a measure of the covariation
between X and Y (in standardized form)
30What is a Regression?
- The simplest type of regression is linear
regression - Linear regression assumes that the relationship
between two variables can be modeled by a
straight line - Thus, the correlation coefficient tells you how
closely the observations fall along a straight
line - When the correlation is weak, the observations
vary a lot from the value predicted by the
straight line - When the correlation is strong, the observations
fall very close to the values predicted by the
straight line
31What is a Regression?
- For example, participants were asked how good,
happy, uneasy, and bothered they felt at the
current moment - Rated on 7-point Likert scales, with 1 being low
and 7 being high
32What is a Regression?
As you can see, the higher the correlation
coefficient, the more the variation in the
predictor translates into variation in the
criterion
33What is a Regression?
- Lets use bivariate regression to analyze some
data - Suppose were wondering how well self-reported
intent to take fliers and hand them out predicts
the actual number of fliers taken for
distribution - The predictor (X) is how many fliers a person
says they would be willing to distribute - The criterion (Y) is the actual number taken
34What is a Regression?
Y number of fliers actually taken
X number of fliers intended to be taken
35What is a Regression?
- First we need to calculate the regression
equationthe equation for our straight line model - The formula for the slope b is
- Plugging in the numbers, we get b .60
- In other words, for every one flier people say
they will take, they end up taking .60 fliers
36What is a Regression?
- We also need to calculate the intercept a, which
is the value of the criterion Y when the
predictor X 0 - The formula for the intercept a is
- Plugging in the numbers, we get a .50
- In other words, when someone predicts they will
take 0 fliers, they end up taking .50 fliers
37What is a Regression?
- Now we have all the information we need to
construct our linear regression equation - Using this model, which was based on observed
data, we can predict the number of fliers that
will actually be taken (Y) for any number a
person intends to take (X)
38What is a Regression?
- For example, when X 0, the predicted number of
fliers taken .50 .60(0) .50 - When X 14, the predicted number of fliers taken
.50 .60(14) 8.90 - In this way, we can plot our line of best fit
through our observed data
39What is a Regression?
X 14 Y 8.90
X 0 Y .50
40What is a Regression?
- Clearly, our regression line does not predict the
observations perfectly - The differences between each observation (Y) and
its predicted value (Y) are called residuals, or
simply error - Using these residuals, we can compute the
standard error of the estimate (SE), which is a
quick way to tell how well our line fits the
observed data
41What is a Regression?
- The formula for standard error is
- In essence, SE is the average residualthe
average difference between observed and predicted
scores - In the current example, SE 2.15
42What is a Regression?
- We can also use SE to construct confidence
intervals around our predicted values - More specifically, the confidence interval
specifies a range of Y values in which the
population Y value will fall with a given level
confidence - For example, the 95 confidence interval
specifies the range in which the population Y
value will fall 95 times out of 100 given a
certain value of X
43What is a Regression?
- The confidence interval is found by defining the
minimum and maximum predicted value for a given
confidence level - For 95 confidence, the critical z 1.96
- So if a person intends to take 10 fliers, we can
predict with 95 confidence that 6.50 4.21
fliers will actually be taken
44What is a Regression?
When X 10, the 95 confidence interval for Y
varies between 2.29 and 10.71
45What is a Regression?
- Because regression is, like ANOVA, an attempt to
explain variance, the overall regression model
can be evaluated for significance with an F-ratio - To begin with, the total sum of squares is
- That is, the sum of the squared differences
between the observed Y values and their mean
46What is a Regression?
- This total variance can be partitioned into two
sources - 1) Regression sum of squares
- The sum of the squared differences between the
predicted Y values and the mean of the predicted
Y values - 2) Residual sum of squares
- The sum of the squared differences between the
observed Y values and the predicted Y values
47What is a Regression?
- If you think through these equations, you can see
that the regression sum of squares is a measure
of the variance that the model accounts for - Likewise, the residual sum of squares is a
measure of the variance that the model does not
explain, or error - From there, its a simple step to construct your
mean squares and compute an F-ratio based on
these numbers
48What is a Regression?
- MSRegression SSRegression / dfRegression
- MSResidual SSResidual / dfResidual
- In bivariate regression, this test is equivalent
to the significance test for the single
regression coefficient b - In multiple regression, however, things are more
complex
49Bivariate Regression in SPSS
50Bivariate Regression in SPSS
51Bivariate Regression in SPSS
This is the multiple correlationin the
bivariate case, equivalent to the correlation
coefficient r
The R2 value for the multiple correlationi.e.,
the proportion of variance that the model
accounts for
Standard error
52Bivariate Regression in SPSS
The ANOVA table tests the significance of the
overall regression model with an F-ratio, as we
saw earlier. In the bivariate case, this
significance test is equivalent to the
significance test for the single regression
coefficient b. To report this test F(1, 58)
80.01, p lt .001
53Bivariate Regression in SPSS
In regression, each individual predictor can be
tested for significance separately. In the
bivariate case, this test is the same as the
significance test for the correlation between X
and Y.
54Bivariate Regression in SPSS
Intercept a
Slope b (unstandardized)
Significance tests for null hypotheses that a and
b are equal to zero
55Bivariate Regression in SPSS
- We can reconstruct our regression equation based
on the SPSS output - a .50
- b .60
- So
56Bivariate Regression in SPSS
- Note the difference between b and ß
- The unstandardized coefficient b is the
covariation between X and Y in the original units
of measurement - So if b .60, for every one flier a participant
intends to take, .60 fliers will actually be
taken - The standardized coefficient ß simply
standardizes this covariation so that it varies
between -1 and 1 - This is the correlation coefficient r
- If both X and Y were z-scored (standardized with
M 0 and SD 1) before running the regression,
b and ß would be equal
57Bivariate Regression in SPSS
- If we were to add another predictor in addition
to X, wed be using multiple regression - In multiple regression, things get more
complicated - The regression coefficients (the b values) are no
longer directly equal to the correlation
coefficient (r values) - Interpreting the regression coefficients and
their significance tests becomes more difficult - Interaction terms have to be computed
- Well talk more about this next week
58Assignment 8
59Assignment 8
- Question 1
- Just look at the correlation matrix
- Question 2
- Use linear regression to see if chemistry scores
(X) predict biology scores (Y) - Reconstruct the regression equation and calculate
the predicted chemistry scores (Y) for
participants 50 and 60 - Give the 95 confidence interval for both of the
predictions - Explain why the unstandardized and standardized
regression coefficients are different
60Assignment 8
- Question 3
- Pick whichever pair of variables you want, as
long as they are significantly correlated - For the significance test, report the F-test of
the overall model (from the ANOVA output table) - There are a lot of questionsMake sure to answer
all them! - Two pages max