Lab 8 - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Lab 8

Description:

Comparing Correlations. What is a Regression? Bivariate Regression in SPSS. Assignment 8 ... Comparing Correlations ... want to compare different correlations ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 61
Provided by: asz3
Category:
Tags: comparing | lab

less

Transcript and Presenter's Notes

Title: Lab 8


1
Lab 8
  • Bivariate Correlation and Regression

2
Overview
  • What is a Correlation?
  • Bivariate Correlation in SPSS
  • Comparing Correlations
  • What is a Regression?
  • Bivariate Regression in SPSS
  • Assignment 8

3
What is a Correlation?
4
What is a Correlation?
  • There are two general approaches to research in
    psychology
  • Mean (group) differences
  • Typically, experiments (with manipulations)
    analyzed using ANOVA
  • How do the IVs cause changes on the DV?
  • Individual differences
  • Typically, multiple measures taken from a set of
    individuals and analyzed using correlations or
    regression
  • How do these variables covary?

5
What is a Correlation?
Correlation?
Yes
No
Yes
Mean Difference?
No
6
What is a Correlation?
  • The math underlying both approaches is the same
  • The General Linear Model (GLM) can be used for
    both ANOVA and regression
  • Regression is much better for continuous
    predictors, and it can also use categorical IVs
  • Any analysis using ANOVA can be conducted
    equivalently using regression (but not vice
    versa)
  • SPSS actually runs regressions when you ask it to
    perform ANOVA

7
What is a Correlation?
  • In the individual differences approach, the basic
    data of interest are pairs of observations from
    all the individuals in a sample
  • For example, you might measure self-esteem and
    socio-economic status for all people in a given
    sample
  • Note that there are no manipulations involved
    here you are only collecting a set of
    measurements (what you would call DVs in the mean
    differences approach)

8
What is a Correlation?
  • The heart of the GLM is variance (s2)
  • Variance is (roughly) the average departure of
    observations from the group mean

9
What is a Correlation?
  • A closely related measure is covariance (cov)
  • Covariance is (roughly) the shared variation
    between two variables about their respective
    means
  • The higher the covariance, the more closely the
    two variables track each other

10
What is a Correlation?
  • Correlation (r) is basically standardized
    covariance
  • That is, a measure of how well two variables
    track each other (or covary) in standard units

covariance (roughly)
standardized by dividing by the standard
deviations of the two variables
11
What is a Correlation?
  • Another way to think about this ratio is as a
    part out of the whole
  • In other words, the correlation is the proportion
    of observed covariance (tracking) between two
    variables out of the maximum possible covariance
    (determined by the variances of the individual
    variables)
  • Thus, the size (or magnitude) of a correlation
    can only vary from 0 to 1
  • That is, from no tracking at all (r 0) to
    perfect tracking (r 1)

12
What is a Correlation?
  • In addition to the magnitude, the correlation
    coefficient r also provides information about the
    direction of covariation
  • Negative values (between -1 and 0) indicate an
    inverse relationship As one goes up, the other
    goes down
  • A value at or close to 0 indicates no
    relationship The two variables dont seem to
    track each other
  • Positive values (between 0 and 1) indicate a
    direct relationship As one goes up, the other
    goes up

13
What is a Correlation?
  • As you can see, the more variance two variables
    share in common (i.e., the higher the
    covariance), the stronger the correlation
  • When one variable starts to vary without the
    other, it makes the pattern noisieror cloudier
    in the scatterplot

14
What is a Correlation?
  • The square of the correlation coefficient (r2)
    gives the proportion of variance accounted for by
    the linear relationship (covariation) between the
    two variables
  • In other words, r2 is the ratio of predicted
    variance to observed variancewhich is the
    proportion of variance accounted for by the
    prediction

15
What is a Correlation?
  • The null hypothesis for testing a correlation is
    that the correlation in the population (? or
    rho) is zero
  • Like an F-test in ANOVA, this is a ratio of the
    proportion of variance accounted for to the
    proportion of variance not accounted for (error)
  • Note that significance depends on the size of r
    and N

16
Bivariate Correlation in SPSS
17
Bivariate Correlation in SPSS
  • Lets say you were interested in studying
    depression in Chinese immigrants to Canada
  • Collect data for several individual difference
    measures from a sample of Chinese immigrants
  • Length of time in Canada
  • English proficiency
  • Fit with Canadian culture
  • Amount of stress
  • Depression

18
Bivariate Correlation in SPSS
Length of time in Canada
English proficiency
Fit with Canadian culture
Amount of stress
Depression
19
Bivariate Correlation in SPSS
20
Bivariate Correlation in SPSS
  • The correlation matrix is symmetrical the
    information below the diagonal is redundant
  • SPSS reports two-tailed significance tests
  • For a one-tailed test, divide the sig. value in
    half
  • For correlations, df N 2

21
Bivariate Correlation in SPSS
  • To report the correlation between fit with
    Canadian culture and depression
  • r(99) -.34, p lt .001
  • To calculate the proportion of variance in
    depression accounted for by fit with Canadian
    culture
  • r2 -.342 .12
  • In other words, fit with Canadian culture
    explains 12 of the variance in depression (or
    vice versa)
  • Note that direction of causality is open to
    question

22
Comparing Correlations
23
Comparing Correlations
  • There are a number of cases in which you might
    want to compare different correlations for
    significant differences
  • rxy from sample 1 to rxy from sample 2?
  • rxy to rxz from one sample?
  • See pp. 209-213

24
Comparing Correlations
  • For example, is the correlation between length of
    time in Canada (V1) and English proficiency (V2)
    significantly different from the correlation
    between length of time in Canada (V1) and fit
    with Canadian culture(V3)?
  • r12 .46
  • r13 .42
  • r23 .79

25
Comparing Correlations
  • Enter these three correlations into the (very
    long) Dunn and Clark equation on p. 209 to
    calculate a z-score
  • z .50
  • Critical z at the .05 level is 1.96, which we did
    not exceed, so these two correlations (r12 and
    r13) are not significantly different

26
What is a Regression?
27
What is a Regression?
  • Bivariate regression is essentially correlation
  • The only difference is semantic You consider one
    variable to be the predictor (X) and the other to
    be the criterion (Y)
  • If you remember intro algebra, you can see that
    this is an equation for a straight line with
    intercept a and slope b

28
What is a Regression?
  • The point of regression is to figure out the
    values of a and b that define the line of best
    fit relative to the observed X and Y values
  • But unless the correlation between X and Y is
    exactly 1, this line of best fit will not be a
    perfect fit to the data
  • Thus, you actually end up with predicted Y values
    (Y) from your regression equation

29
What is a Regression?
  • The important conceptual point is that the slope
    b is equivalent to the correlation coefficient r
  • In algebra, the slope tells you how much Y
    changes for a unit change in X
  • Equivalently, in regression, the correlation
    coefficient is a measure of the covariation
    between X and Y (in standardized form)

30
What is a Regression?
  • The simplest type of regression is linear
    regression
  • Linear regression assumes that the relationship
    between two variables can be modeled by a
    straight line
  • Thus, the correlation coefficient tells you how
    closely the observations fall along a straight
    line
  • When the correlation is weak, the observations
    vary a lot from the value predicted by the
    straight line
  • When the correlation is strong, the observations
    fall very close to the values predicted by the
    straight line

31
What is a Regression?
  • For example, participants were asked how good,
    happy, uneasy, and bothered they felt at the
    current moment
  • Rated on 7-point Likert scales, with 1 being low
    and 7 being high

32
What is a Regression?
As you can see, the higher the correlation
coefficient, the more the variation in the
predictor translates into variation in the
criterion
33
What is a Regression?
  • Lets use bivariate regression to analyze some
    data
  • Suppose were wondering how well self-reported
    intent to take fliers and hand them out predicts
    the actual number of fliers taken for
    distribution
  • The predictor (X) is how many fliers a person
    says they would be willing to distribute
  • The criterion (Y) is the actual number taken

34
What is a Regression?
Y number of fliers actually taken
X number of fliers intended to be taken
35
What is a Regression?
  • First we need to calculate the regression
    equationthe equation for our straight line model
  • The formula for the slope b is
  • Plugging in the numbers, we get b .60
  • In other words, for every one flier people say
    they will take, they end up taking .60 fliers

36
What is a Regression?
  • We also need to calculate the intercept a, which
    is the value of the criterion Y when the
    predictor X 0
  • The formula for the intercept a is
  • Plugging in the numbers, we get a .50
  • In other words, when someone predicts they will
    take 0 fliers, they end up taking .50 fliers

37
What is a Regression?
  • Now we have all the information we need to
    construct our linear regression equation
  • Using this model, which was based on observed
    data, we can predict the number of fliers that
    will actually be taken (Y) for any number a
    person intends to take (X)

38
What is a Regression?
  • For example, when X 0, the predicted number of
    fliers taken .50 .60(0) .50
  • When X 14, the predicted number of fliers taken
    .50 .60(14) 8.90
  • In this way, we can plot our line of best fit
    through our observed data

39
What is a Regression?
X 14 Y 8.90
X 0 Y .50
40
What is a Regression?
  • Clearly, our regression line does not predict the
    observations perfectly
  • The differences between each observation (Y) and
    its predicted value (Y) are called residuals, or
    simply error
  • Using these residuals, we can compute the
    standard error of the estimate (SE), which is a
    quick way to tell how well our line fits the
    observed data

41
What is a Regression?
  • The formula for standard error is
  • In essence, SE is the average residualthe
    average difference between observed and predicted
    scores
  • In the current example, SE 2.15

42
What is a Regression?
  • We can also use SE to construct confidence
    intervals around our predicted values
  • More specifically, the confidence interval
    specifies a range of Y values in which the
    population Y value will fall with a given level
    confidence
  • For example, the 95 confidence interval
    specifies the range in which the population Y
    value will fall 95 times out of 100 given a
    certain value of X

43
What is a Regression?
  • The confidence interval is found by defining the
    minimum and maximum predicted value for a given
    confidence level
  • For 95 confidence, the critical z 1.96
  • So if a person intends to take 10 fliers, we can
    predict with 95 confidence that 6.50 4.21
    fliers will actually be taken

44
What is a Regression?
When X 10, the 95 confidence interval for Y
varies between 2.29 and 10.71
45
What is a Regression?
  • Because regression is, like ANOVA, an attempt to
    explain variance, the overall regression model
    can be evaluated for significance with an F-ratio
  • To begin with, the total sum of squares is
  • That is, the sum of the squared differences
    between the observed Y values and their mean

46
What is a Regression?
  • This total variance can be partitioned into two
    sources
  • 1) Regression sum of squares
  • The sum of the squared differences between the
    predicted Y values and the mean of the predicted
    Y values
  • 2) Residual sum of squares
  • The sum of the squared differences between the
    observed Y values and the predicted Y values

47
What is a Regression?
  • If you think through these equations, you can see
    that the regression sum of squares is a measure
    of the variance that the model accounts for
  • Likewise, the residual sum of squares is a
    measure of the variance that the model does not
    explain, or error
  • From there, its a simple step to construct your
    mean squares and compute an F-ratio based on
    these numbers

48
What is a Regression?
  • MSRegression SSRegression / dfRegression
  • MSResidual SSResidual / dfResidual
  • In bivariate regression, this test is equivalent
    to the significance test for the single
    regression coefficient b
  • In multiple regression, however, things are more
    complex

49
Bivariate Regression in SPSS
50
Bivariate Regression in SPSS
51
Bivariate Regression in SPSS
This is the multiple correlationin the
bivariate case, equivalent to the correlation
coefficient r
The R2 value for the multiple correlationi.e.,
the proportion of variance that the model
accounts for
Standard error
52
Bivariate Regression in SPSS
The ANOVA table tests the significance of the
overall regression model with an F-ratio, as we
saw earlier. In the bivariate case, this
significance test is equivalent to the
significance test for the single regression
coefficient b. To report this test F(1, 58)
80.01, p lt .001
53
Bivariate Regression in SPSS
In regression, each individual predictor can be
tested for significance separately. In the
bivariate case, this test is the same as the
significance test for the correlation between X
and Y.
54
Bivariate Regression in SPSS
Intercept a
Slope b (unstandardized)
Significance tests for null hypotheses that a and
b are equal to zero
55
Bivariate Regression in SPSS
  • We can reconstruct our regression equation based
    on the SPSS output
  • a .50
  • b .60
  • So

56
Bivariate Regression in SPSS
  • Note the difference between b and ß
  • The unstandardized coefficient b is the
    covariation between X and Y in the original units
    of measurement
  • So if b .60, for every one flier a participant
    intends to take, .60 fliers will actually be
    taken
  • The standardized coefficient ß simply
    standardizes this covariation so that it varies
    between -1 and 1
  • This is the correlation coefficient r
  • If both X and Y were z-scored (standardized with
    M 0 and SD 1) before running the regression,
    b and ß would be equal

57
Bivariate Regression in SPSS
  • If we were to add another predictor in addition
    to X, wed be using multiple regression
  • In multiple regression, things get more
    complicated
  • The regression coefficients (the b values) are no
    longer directly equal to the correlation
    coefficient (r values)
  • Interpreting the regression coefficients and
    their significance tests becomes more difficult
  • Interaction terms have to be computed
  • Well talk more about this next week

58
Assignment 8
59
Assignment 8
  • Question 1
  • Just look at the correlation matrix
  • Question 2
  • Use linear regression to see if chemistry scores
    (X) predict biology scores (Y)
  • Reconstruct the regression equation and calculate
    the predicted chemistry scores (Y) for
    participants 50 and 60
  • Give the 95 confidence interval for both of the
    predictions
  • Explain why the unstandardized and standardized
    regression coefficients are different

60
Assignment 8
  • Question 3
  • Pick whichever pair of variables you want, as
    long as they are significantly correlated
  • For the significance test, report the F-test of
    the overall model (from the ANOVA output table)
  • There are a lot of questionsMake sure to answer
    all them!
  • Two pages max
Write a Comment
User Comments (0)
About PowerShow.com