Title: Topics: Regression
1Topics Regression
- Simple Linear Regression one dependent variable
and one independent variable - Multiple Regression one dependent variable and
two or more independent variables.
2Correlation
- A correlation describes a relationship between
two variables - Correlation tries to answer the following
questions - What is the relationship between variable X and
variable Y? - How are the scores on one measure associated with
scores on another measure? - To what extent do the high scores on one variable
go with the high scores on the second variable?
3Simple Linear Regression
- Understanding relationships between variables
- Prediction
- Explanation
4Design Requirements and Assumptions
- Two continuous variables
- Variables are linearly related
- Random Sampling
- Independence
- Bivariate Normality
- N gt 30
5Example
- You are the admissions committee in the Sociology
department of a large west coast University. You
are trying to make decisions about who to admit
to the Masters program. You would like to be
able to predict how well the applicants you are
deciding about will do at your school. - Your department has been analyzing the
performance of its graduate students over the
years. One thing it has been looking at it is
relationship between undergraduate GPA and
graduate GPA. - From regression analyses done over the years, you
are able to make some educated guesses about how
applicants will perform once admitted.
6How Used in Making Predictions
7The Regression Coefficient? What Slope? What
Altitude?
8Fitting the Regression Line The Best Fit (Least
Squares)
- Y' a byX
- The predicted value of Y(Y') for a value of X is
computed by - Multiplying a score (X) by the regression
coefficient (by) - Adding the regression constant (a) to this
product - The prediction of Y from X based on linear
relationship of X and Y so that errors are
minimized
9Least Squares Fit Visual
Where the average squared distance of the points
from the regression line is minimized
10Minimizing Prediction Error What that Means
(For Math Types)
11The Regression Coefficient Close Your Eyes if
You Dont Want the Derivation
- by rxy (sy/sx)
- by regression coefficient
- r correlation between X and Y
- sy standard deviation of Y
- sx standard deviation of X
- Compute by divide the standard deviation of Y
(sy) by the standard deviation of X (sx) then
multiply by the Pearson correlation (rxy)between
X and Y
12The Constant (a) More Math
- Regression Constant (a) the altitude of the
regression line the value where the regression
line intercepts Y where X 0 (the Y intercept) - a Y - byX
- a the regression constant
- Y mean of Y
- by regression coefficient
- X mean of X
- Compute a multiply X (mean of X) by the
regression coefficient (by) and then subtract
that product from Y (mean of Y) -
13Plotting Regression Line
- Need compute two predicted scores
- For X (undergrad GPA) 2.75
- Y a byX 2.93.24(2.75) 3.59
- For X (undergrad GPA) 3.60
- Y a byX 2.93.24(3.60) 3.79
- Draw regression line through scatter plot using
these two points
14Plotting the Regression Line Visual
15Errors of Prediction
16Standard Error of Estimate
- The magnitude of the error made in estimating Y
from X a measure of dispersion around the
regression line - The average error of prediction
17The Standard Error of Estimate A Visual
Representation
4.00
3.75
3.75
Graduate GPA
3.50
3.25
3.25
3.00
3.25
3.00
3.50
3.75
4.00
Undergraduate GPA
18Standard Error of Estimate Another Visual
Representation
Y
19Is the prediction worth pursuing?
- Standard error
- Amount of variance explained by X
- Testing the regression coefficient (b) for
significance
20Explaining Variance How much?
Predicted Variance
Total Variance
Y
Unpredicted Variance
21Assessing Prediction Accuracy Explaining Variance
- Total Variance Predicted variance Residual
(unexplained) variance - Coefficient of Determination (r2)Proportion of
total variance in Y that has been predicted by
variable X (r2 s2y/s2y) - Our example r .56, so r2 .3136
- Coefficient of Non-Determination (1-r2)
Proportion of total variance in Y that is not
predicted by X - Our example 1- r2 1- .31 .69
22Proportion of Explained (Predicted) and
Unexplained (Residual) Variance
rxy .56
X
Y
(1-r2) .69 (69) Unexplained variance
r2.31 (31) Explained variance
23t-Test for Individual Regression Coefficients (by)
- H0 ? 0 (where ? is the population regression
coefficient) - H1 ? not 0
- Compute a t statistic
- T (b - ?)/sb b/sb (how many standard error
points b is from the hypothesized population
parameter under the null hypothesis, ? 0 )
24t-Test of b Our Example
- t .24/.12 2.00
- Set alpha at .05 (two-tailed)
- Figure out df (N-2) 8
- t critical (05/2,8) 2.306
- Decision tobserved (2.00) lt tcritical (2.306) so
do not reject the null hypothesis - Conclusion cannot conclude that the slope is
significantly different from 0 in the population.
25Our Conclusion Do not reject the null hypothesis
26Warnings
- Simple regression assumes a straight line
relationship - Outliers can control regression results
- Assumes random samples for making proper
generalizations - Regression is correlational and does not show a
causal link between x causes y