Regression - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Regression

Description:

A way of predicting the value of one variable from another. ... squares of the vertical distances between the points and the line is minimised. ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 21

Provided by: www2War

Category:

more less

Transcript and Presenter's Notes

Title: Regression

1
Regression

Monday 30th January 2006

2
What is Regression?

A way of predicting the value of one variable
from another.
It is a hypothetical model of the relationship
between two variables.
The model used is a linear one.
Therefore, we describe the relationship using the
equation of a straight line.

3
Describing a Straight Line

bi
Regression coefficient for the predictor
Gradient (slope) of the regression line
Direction/Strength of Relationship
a
Intercept (value of Y when X 0)
Point at which the regression line crosses the
Y-axis (ordinate)
?i
Unexplained error.

4
Same Intercept, Different Gradient
5
Same Gradient, Different Intercept
6
The Method of Least Squares
Why is this line a better summary of the data
than a line which is marginally more steep or
marginally more shallow or which is a millimetre
or two further up the page? In fact the line has
been chosen in such a way that the sum of the
squares of the vertical distances between the
points and the line is minimised. As we have
seen earlier in the module, squaring differences
has the advantage of making positive and negative
differences equivalent.
7
How Good is the Model?

The regression line is only a model based on the
data.
This model might not reflect reality.
We need some way of testing how well the model
fits the observed data.
How?

8
Sum of Squares

SST
Total variability (variability between scores
and the mean).
SSR
Residual/Error variability (variability between
the regression model and the actual data).
SSM
Model variability (difference in variability
between the model and the mean).

9
Testing the Model ANOVA

If the model results in better prediction than
using the mean, then we expect SSM to be much
greater than SSR

10
Testing the Model ANOVA

Mean Squared Error
Sums of Squares are total values.
They can be expressed as averages. The averages
are obtained by dividing the sum of squares by
the degrees of freedom for each model.
These are called Mean Squares, MS
If you know F you can check whether the model is
significantly better at predicting the dependent
variable than chance alone.

11
Testing the Model R2

R2
The proportion of variance accounted for by the
regression model (you can transform R2 into a
percentage).
The Pearson Correlation Coefficient Squared

12
Regression An Example
13
SPSS output showing the F ratio
If the improvement due to fitting the model is
much greater than the inaccuracy within the model
then the value of F will be greater than 1. In
this instance the value of F is 99.587 SPSS
tells us that the probability of obtaining this
value of F by chance is very low (p
lt.001) Note Mean Square Sum of Squares /
df F MS regression / MS residual
14
SPSS output showing R2
In this instance the model explains 33.5 of the
variation in the dependent variable.
15
SPSS Output Model Parameters
16
Produce your own regression equations at the
following site

http//people.hofstra.edu/faculty/Stefan_Waner/new
graph/regressionframes.html
Linked from the statistical simulations page
of the website.

17
Multiple Regression when there is more than one
independent variable

b1
Regression coefficient for the first predictor
Direction/Strength of Relationship
b2
Regression coefficient for the second predictor
Direction/Strength of Relationship
bn
Regression coefficient for the nth predictor
Direction/Strength of Relationship
a
Intercept (value of Y when X1 and X2 and Xn all
0)
Point at which the regression line crosses the
Y-axis (ordinate)
?i
Unexplained error.

18
Multiple regression an example
19
Checking Assumptions Checking Residuals
Linearity This assumption is that there is a
straight line relationship between the
independent and dependent variables (n.b. if
there is not it may be possible to make it linear
by transforming one or more variables). Homoscedas
ticity This assumption means that the variance
around the regression line is the same for all
values of the independent variable(s).
20
The effect of outliers
Because the regression line minimizes the squared
difference of points to the line outliers can
have a very large effect (as their squared
distance to the line will make a big difference).
This is why it is sometimes advisable to run
regression analysis omitting outliers.

Write a Comment

User Comments (0)