Regression - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Regression

Description:

A way of predicting the value of one variable from another. ... squares of the vertical distances between the points and the line is minimised. ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 21
Provided by: www2War
Category:

less

Transcript and Presenter's Notes

Title: Regression


1
Regression
  • Monday 30th January 2006

2
What is Regression?
  • A way of predicting the value of one variable
    from another.
  • It is a hypothetical model of the relationship
    between two variables.
  • The model used is a linear one.
  • Therefore, we describe the relationship using the
    equation of a straight line.

3
Describing a Straight Line
  • bi
  • Regression coefficient for the predictor
  • Gradient (slope) of the regression line
  • Direction/Strength of Relationship
  • a
  • Intercept (value of Y when X 0)
  • Point at which the regression line crosses the
    Y-axis (ordinate)
  • ?i
  • Unexplained error.

4
Same Intercept, Different Gradient
5
Same Gradient, Different Intercept
6
The Method of Least Squares
Why is this line a better summary of the data
than a line which is marginally more steep or
marginally more shallow or which is a millimetre
or two further up the page? In fact the line has
been chosen in such a way that the sum of the
squares of the vertical distances between the
points and the line is minimised. As we have
seen earlier in the module, squaring differences
has the advantage of making positive and negative
differences equivalent.
7
How Good is the Model?
  • The regression line is only a model based on the
    data.
  • This model might not reflect reality.
  • We need some way of testing how well the model
    fits the observed data.
  • How?

8
Sum of Squares
  • SST
  • Total variability (variability between scores
    and the mean).
  • SSR
  • Residual/Error variability (variability between
    the regression model and the actual data).
  • SSM
  • Model variability (difference in variability
    between the model and the mean).

9
Testing the Model ANOVA
  • If the model results in better prediction than
    using the mean, then we expect SSM to be much
    greater than SSR

10
Testing the Model ANOVA
  • Mean Squared Error
  • Sums of Squares are total values.
  • They can be expressed as averages. The averages
    are obtained by dividing the sum of squares by
    the degrees of freedom for each model.
  • These are called Mean Squares, MS
  • If you know F you can check whether the model is
    significantly better at predicting the dependent
    variable than chance alone.

11
Testing the Model R2
  • R2
  • The proportion of variance accounted for by the
    regression model (you can transform R2 into a
    percentage).
  • The Pearson Correlation Coefficient Squared

12
Regression An Example
13
SPSS output showing the F ratio
If the improvement due to fitting the model is
much greater than the inaccuracy within the model
then the value of F will be greater than 1. In
this instance the value of F is 99.587 SPSS
tells us that the probability of obtaining this
value of F by chance is very low (p
lt.001) Note Mean Square Sum of Squares /
df F MS regression / MS residual
14
SPSS output showing R2
In this instance the model explains 33.5 of the
variation in the dependent variable.
15
SPSS Output Model Parameters
16
Produce your own regression equations at the
following site
  • http//people.hofstra.edu/faculty/Stefan_Waner/new
    graph/regressionframes.html
  • Linked from the statistical simulations page
    of the website.

17
Multiple Regression when there is more than one
independent variable
  • b1
  • Regression coefficient for the first predictor
  • Direction/Strength of Relationship
  • b2
  • Regression coefficient for the second predictor
  • Direction/Strength of Relationship
  • bn
  • Regression coefficient for the nth predictor
  • Direction/Strength of Relationship
  • a
  • Intercept (value of Y when X1 and X2 and Xn all
    0)
  • Point at which the regression line crosses the
    Y-axis (ordinate)
  • ?i
  • Unexplained error.

18
Multiple regression an example
19
Checking Assumptions Checking Residuals
Linearity This assumption is that there is a
straight line relationship between the
independent and dependent variables (n.b. if
there is not it may be possible to make it linear
by transforming one or more variables). Homoscedas
ticity This assumption means that the variance
around the regression line is the same for all
values of the independent variable(s).
20
The effect of outliers
Because the regression line minimizes the squared
difference of points to the line outliers can
have a very large effect (as their squared
distance to the line will make a big difference).
This is why it is sometimes advisable to run
regression analysis omitting outliers.
Write a Comment
User Comments (0)
About PowerShow.com