Lecture 5: Simple Linear Regression - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Lecture 5: Simple Linear Regression

Description:

Lecture 5: Simple Linear Regression Laura McAvinue School of Psychology Trinity College Dublin Previous Lecture Regression Line Offers a model of the relationship ... – PowerPoint PPT presentation

Number of Views:314
Avg rating:3.0/5.0
Slides: 32
Provided by: tcd
Category:

less

Transcript and Presenter's Notes

Title: Lecture 5: Simple Linear Regression


1
Lecture 5Simple Linear Regression
  • Laura McAvinue
  • School of Psychology
  • Trinity College Dublin

2
Previous Lecture
  • Regression Line
  • Offers a model of the relationship between two
    variables
  • A straight line that represents the best fit
  • Enables us to predict Variable Y on the basis of
    Variable X

3
Today
  • Calculation of the regression line
  • Measuring the accuracy of prediction
  • Some practice!

4
How is the regression line calculated?
  • The Method of Least Squares
  • Computes a line that minimises the difference
    between the predicted values of Y (Y) and the
    actual values of Y (Y)
  • Minimises
  • (Y Y)s
  • Errors of prediction
  • Residuals

5
7
Y
6
5
Y
4
Y
3
These lines Errors of prediction (Y -
Y)s Residuals
2
1
0
0
1
2
3
4
5
X
6
7
Y 6
6
5
Y 5
Y
4
3
2
1
0
0
1
2
3
4
5
X
7
Method of Least Squares
  • When fitting a line to the data, the regression
    procedure attempts to fit a line that minimises
    these errors of prediction, total (Y Y)s
  • But! You cant try to minimise ?(Y-Y) as (Y-Y)s
    will have positive and negative values, which
    will cancel each other out
  • So, you square the residuals and then add them
    and try to minimise ?(Y-Y)2
  • Hence, the name, Method of Least Squares

8
How do we measure the accuracy of prediction?
  • The regression line is fitted in such a way that
    the errors of prediction are kept as small as
    possible
  • You can fit a regression line to any dataset,
    doesnt mean its a good fit!
  • How do we measure how good this fit is?
  • How to we measure the accuracy of the prediction
    that our regression equation makes?
  • Three methods
  • Standard Error of the Estimate
  • r2
  • Statistical Significance

9
Standard Error of the Estimate
  • A measure of the size of the errors of prediction
  • Weve seen that
  • The regression line is computed in such a way as
    to minimise the difference between the predicted
    values (Y) and the actual values (Y)
  • The difference between these values are known as
    errors of prediction or residuals, (Y Y)s
  • For any set of data, the errors of prediction
    will vary
  • Some data points will be close to the line, so (Y
    Y) will be small
  • Some data points will be far from the line, so (Y
    Y) will be big

10
Standard Error of the Estimate
  • One way to assess the fit of the regression line
    is to take the standard deviation of all of these
    errors
  • On average, how much do the data points vary from
    the regression line?
  • Standard error of the estimate

11
Standard Error of the Estimate
  • One point to note
  • Standard error is a measure of the standard
    deviation of data points around the regression
    line
  • (Standard error)2 expresses the variance of the
    data points around the regression line
  • Residual or error variance

12
r2
  • Interested in the relationship between two
    variables
  • Variable X
  • A set of scores that vary around a mean,
  • Variable Y
  • A set of scores that vary around a mean,
  • If these two variables are correlated, they will
    share some variance

13
X
Y
Variance in Y that is not related to X
Variance in X that is not related to Y
Shared variance between X and Y
14
  • In regression, we are trying to explain Variable
    Y as a function of Variable X
  • Would be useful if we could find out what
    percentage of variance in Variable Y can be
    explained by variance in Variable X

15
Total Variance in Variable Y
SStotal
Variance due to Variable X
Variance due to other factors
Regression / Model Variance
Error Variance
SSm
SSerror
SStotal - SSerror
16
r2
  • To calculate the percentage of variance in
    Variable Y that can be explained by variance in
    Variable X
  • SSm Variance due to X / regression
  • SStotal Total variance in Y
  • r2


17
r2
  • (Pearson Correlation)2
  • Shared variance between two variables
  • Used in simple linear regression to show what
    percentage of Variable Y can be explained by
    Variable X
  • For example
  • If rxy .8, r2xy .64, then 64 of the
    variability in Y is directly predictable from
    variable X
  • If rxy .2, r2xy .04, then 4 of the
    variability in Y is due to / can be explained by X

18
Statistical Significance
  • Does the regression model predict Variable Y
    better than chance?
  • Simple linear regression
  • Does X significantly predict Y?
  • If the correlation between X Y is statistically
    significant, the regression model will be
    statistically significant
  • Not so for multiple regression, next lecture
  • F Ratio

19
Statistical Significance
  • F-Ratio
  • Average variance due to the regression
  • Average variance due to error
  • MSm SSm / dfm
  • MSerror SSerror / dferror
  • It uses the mean square rather than the sum of
    squares in order to compare the average variance
  • You want the F-Ratio to be large and
    statistically significant
  • If large, then more variance is explained by the
    regression than by the error in the model

20
An example
  • Linear regression data-set
  • I want to predict a persons verbal coherency
    based on the number of units of alcohol they
    consume
  • Record how much alcohol is consumed and
    administer a test of verbal coherency
  • SPSS
  • Analyse, Regression, Linear
  • Dependent variable Verbal Coherency
  • Independent variable Alcohol
  • Method Enter

21
Three parts to the output
  • Model Summary
  • r2
  • Standard error
  • Anova
  • F Ratio
  • Coefficients
  • Regression Equation

22
  • Table how well our regression model explains
    the
  • variation in verbal coherency

Pearson r between alcohol and verbal coherency
Statistical estimate of the error in
the regression model
Statistical estimate of the population
proportion of variation in verbal coherency
that is related to alcohol
Proportion of variation in verbal coherency
that is related to alcohol
23
Average variation in data due to regression
model
Total variation in data due to regression model
Ratio of variation in data due to regression
model variation not due to model
Probability of observing this F-ratio if Ho is
true
Average variation in data NOT due to
regression model
Total variation in data NOT due to regression
model
24
T-statistic tells us whether using the
predictor variable gives us a better than chance
prediction of the DV Alcohol is a sig. predictor
of verbal coherency
Values that we use in the regression equation (Y
BX a) Verbal Coherency B (alcohol)
constant Verbal coherency 4.7 (alcohol)
21.5 As alcohol ?1 unit, verbal coherency ? by
4.7 units
25
Second Example
  • Can we predict how many months a person survives
    after being diagnosed with cancer, based on their
    level of optimism?
  • Linear Regression dataset
  • Analyse, regression, linear
  • Dependent variable Survival
  • Independent variable Optimism

26
Aspects of Regression analysis
  • Write the regression equation
  • Explain what this equation tells us about the
    relationship between Variables X and Y
  • Make a prediction of Y when given a value of X
  • State the standard error of your prediction
  • Ascertain if the regression model significantly
    predicts the dependent variable Y
  • State what percentage of Variable Y is explained
    by Variable X

27
State the following
  • Describe the relationship between survival (Y)
    and optimism (X) in terms of a regression
    equation.
  • In your own words, explain what this equation
    tells us about the relationship between survival
    and optimism.
  • Using this equation, predict how many months a
    person will survive for if their optimism score
    is 10.

28
State the following
  • What is the standard error of your prediction?
  • Does the regression model significantly predict
    the dependent variable?
  • What percentage of variance in survival is
    explained by optimism level?

29
Answers
  • Describe the relationship between survival (Y)
    and optimism (X) in terms of a regression
    equation.
  • Y .69X 18.4
  • In your own words, explain what this equation
    tells us about the relationship between survival
    and optimism.
  • As optimism level increases by one unit, survival
    increases by .69months
  • When a persons optimism score is 0, his/her
    predicted length of survival is 18.4 months
  • Using this equation, predict how many months a
    person will survive for if their optimism score
    is 10.
  • Y .69(10) 18.4 25.3 months

30
State the following
  • What is the standard error of your prediction?
  • 4.5months
  • Does the regression model significantly predict
    the dependent variable?
  • Yes, F (1, 432) 202, p lt .001
  • What percentage of variance in survival is
    explained by optimism level?
  • 32

31
Summary
  • Simple linear regression
  • Provides a model of the relationship between two
    variables
  • Creates a straight line that best represents the
    relationship between two variables
  • Enables us to estimate the percentage of variance
    in one variable that can be explained by another
  • Enables us to predict one variable on the basis
    of another
  • Remember that a regression line can be fitted to
    any dataset. Its necessary to assess the
    accuracy of the fit.
Write a Comment
User Comments (0)
About PowerShow.com