Title: Correlation and Regression
1Correlation and Regression
2Correlation Coefficients
- Continuous IV DV
- or dichotomous variables (code as 0-1)
- mean interpreted as proportion
- Pearson product moment correlation coefficient
range -1.0 to 1.0
3Interpreting Correlations
- 1.0, or - indicates perfect relationship
- 0 correlations no association between the
variables - in between - varying degrees of relatedness
- r2 as proportion of variance shared by two
variables - which is X and Y doesnt matter
4Positive Correlation
- regression line is the line of best fit
- With a 1.0 correlation, all points fall exactly
on the line - 1.0 correlation does not mean values identical
- the difference between them is identical
5Negative Correlation
- If r-1.0 all points fall directly on the
regression line - slopes downward from left to right
- sign of the correlation tells us the direction of
relationship - number tells us the size or magnitude
6Zero correlation
- no relationship between the variables
- a positive or negative correlation gives us
predictive power
7Direction and degree
8Direction and degree (cont.)
9Direction and degree (cont.)
10Correlation Coefficient
- r Pearson Product-Moment Correlation
Coefficient - zx z score for variable x
- zy z score for variable y
- N number of paired X-Y values
- Definitional formula (below)
11Raw score formula
12Interpreting correlation coefficients
- comprehensive description of relationship
- direction and strength
- need adequate number of pairs
- more than 30 or so
- same for sample or population
- population parameter is Rho (?)
- scatterplots and r
- more tightly clustered around linehigher
correlation
13Examples of correlations
- -1.0 negative limit
- -.80 relationship between juvenile street crime
and socioeconomic level - .43 manual dexterity and assembly line
performance - .60 height and weight
- 1.0 positive limit
14Describing rs
- Effect size index-Cohens guidelines
- Small r .10, Medium r .30, Large r
.50 - Very high .80 or more
- Strong .60 - .80
- Moderate .40 - .60
- Low .20 - .40
- Very low .20 or less
- small correlations can be very important
15Correlation as causation??
16Nonlinearity and range restriction
- if relationship doesn't follow a linear pattern
Pearson r useless - r is based on a straight line function
- if variability of one or both variables is
restricted the maximum value of r decreases
17Linear vs. curvilinear relationships
18Linear vs. curvilinear (cont.)
19Range restriction
20Range restriction (cont.)
21Understanding r
22Simple linear regression
- enables us to make a best prediction of the
value of a variable given our knowledge of the
relationship with another variable - generate a line that minimizes the squared
distances of the points in the plot - no other line will produce smaller residuals or
errors of estimation - least squares property
23Regression line
- The line will have the form Y'ABX
- Where Y' predicted value of Y
- A Y intercept of the line
- B slope of the line
- X score of X we are using to predict Y
24Ordering of variables
- which variable is designated as X and which is Y
makes a difference - different coefficients result if we flip them
- generally if you can designate one as the
dependent on some logical grounds that one is Y
25Moving to prediction
- statistically significant relationship between
college entrance exam scores and GPA - how can we use entrance scores to predict GPA?
26Best-fitting line (cont.)
27Best-fitting line (cont.)
28Calculating the slope (b)
- Nnumber of pairs of scores, rest of the terms
are the sums of the X, Y, X2, Y2, and XY columns
were already familiar with
29Calculating Y-intercept (a)
- b slope of the regression line
- the mean of the Y values
- the mean of the X values
30Lets make up a small example
- SAT GPA correlation
- How high is it generally?
- Start with a scatter plot
- Enter points that reflect the relationship we
think exists - Translate into values
- Calculate r regression coefficients