Lecturer 10: Regression with one X variable - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Lecturer 10: Regression with one X variable

Description:

Choose c and b (constant and slope) to make MSE as low as possible ... Concentrate on understanding the formula/method in intuitive terms (normal distribution) ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 26

Provided by: pc208

Category:

more less

Transcript and Presenter's Notes

Title: Lecturer 10: Regression with one X variable

1
Lecturer 10 Regression with one X variable

Straight line (linear model) for predicting one
variable from another
Predicted Y constant slope X
y c bx
Uses method of least squares to choose best line
R squared measures accuracy of prediction
Slope tells you

2
Method of least squares

Take any values of c and b (constant and slope)
Work out prediction for each x in data
Work out error of prediction
Work out mean square error (MSE)
Choose c and b (constant and slope) to make MSE
as low as possible .

3
How to apply the method of least squares

Use Excel Solver as in spreadsheet pred1var.xls
Advantage is it shows you whats going on (and
its more flexible)
Use formulae derived from calculus traditional
method.
The formulae are not helpful for understanding
whats going on so I will not be covering them
Easiest to use software with formulae built in -
eg Excel
Tools Data Analysis Regression
If this isnt on the menu use Tools Add-Ins

4
An example think of a story

With six people / organisations / etc
And two numerical variables which may be related
in an interesting way
Get the data, or make it up ..
Do a regression analysis to predict one variable
from the other using pred1var.xls, and then the
Regression Tool in Excel
Use the model to make a prediction
Now try doing it the other way round
Make sure you understand

5
Regression terminology

See table in the Word handout

6
Slope / regression coefficient / x coefficient

Interpretation obvious and important
The slope tells you
A negative slope means

7
R squared easy version

R squared is the square of the correlation
coefficient
Often used as a measure of how good the model is
R squared 1 if correl 1 or -1 model very
good
R squared 0 if correl 0 model very bad
R squared 0.5 means the model half way between
good and bad
R squared 0.9 means its good but not perfect
Etc

8
R squared more detail

Model based on a variable with zero correlation
with the dependent variable would be completely
useless
Best prediction here is the mean.
MSE variance square of sd. (See Pred1var.xls)
Model based on straight line relationship is the
best possible
correlation 1 or -1
MSE 0
A reasonable measure of the model is the
reduction in MSE from the worst model (with MSE
variance)
Ie the proportional reduction in MSE
This turns out to be the same as R squared

9
But dont forget the sample

Even a model with R squared 0.9 or 1 may not be
as good as it seems if the sample size is small
This is a separate issue which is not assessed by
R squared
See work on hypothesis (significance) tests and
confidence intervals

10
Edited output from Excel Tool Regression for job
satisfaction data
11
Note that

This output is edited either read a book on
mathematical statistics, or ignore the rest of
the output
Eg we have ignored the t stat. This is just used
to calculate ps and confidence intervals
(I have left the term standard error, although
I wont be explaining it in detail. Its a term
used for the standard deviation of something when
you are using it as a measure of error.)
In practice you would always want a larger
sample! But this illustrates the principle.

12
Multiple regression

Prediction model (linear) using several variables
Pred Y const slope1X1 slopenXn
y c b1x1 bnxn
Uses method of least squares to choose best line
R squared (coefficient of determination) measures
goodness of fit of model to data
Slopes tell you impact of each variable on
dependent variable

13
Mostly same as with single variable regression

Least squares
Predmvar.xls or Excel Tool (need independent X
variables in a block so you can select them all)
R squared
Slope for each variable
predicted increase in dependent variable if
variable is increased by one without changing
other variables
Category variables represented by 1/0 eg sexn

14
Problems with regression

Model may not be reasonable (eg infant mortality
and GNP)
Sample too small coefficients unreliable (check
confidence intervals)
Have you got the right variables?
Highly correlated variables can give misleading
results
Too many variables
See reading for more detail

15
Uses of regression

Very widely used in research (over-used?)
Examples

16
Predicting returns from shares

Dissanaike (1999) produced a regression model to
predict the return which investors would receive
from investing in a particular security for a
period of four years, from the return they would
have received if they had invested in the same
security in the previous four years. The data on
which the model was based were the returns for a
sample of large companies over consecutive
periods of four years.
The regression coefficient cited was -0.112, and
the value of R squared was 0.0413.
Suppose you were considering investing in two
shares A or B. A has produced a return over the
last four years of -5, and B has produced 5.
Use the regression model to predict which share
is likely to produce the better returns over the
next four years, and by how much. How sure would
you be?

17
(No Transcript)
18
(No Transcript)
19
Further statistics .
20
Mathematical notation

You may need to be familiar with some
mathematical notation for more advanced work
(this will not be required in the exam)
Sigma (summation) notation
Pi (product) notation
Use of a bar above a symbol for mean (average)
Subscripts RJK etc
Standard symbols n for sample size, t for time,
etc

21
Covariances and variances

The attached handout explains what these are and
the relationships between them
You may need this to follow some mathematical
work in finance
It will not be directly assessed in the exam
(although it may improve your answers)

22
Formulae, computers and understanding (1)

You can usually get the answer (eg sd, correl,
regression coefficient)
with a computer
Using the formula / method
Computer is
quicker and more accurate, but
You may not understand what the answer means or
how to use it . This can be serious!

23
Formulae, computers and understanding (2)

Sometimes the formula / method will help you
understand what the answer means
Eg percentiles, Kendall correlation coefficients
Then its a good idea to do simple examples with
formula/method to help you understand, then use a
computer

24
Formulae, computers and understanding (3)

Sometimes the formula / method will not help you
understand what the answer means
Eg formulae for a regression coefficient, and
normal distribution
Here you need much more mathematical background
to understand properly (especially the normal
distribution)
Then its a good idea to
try to find an alternative approach which is
easier to follow (regression), or
Concentrate on understanding the formula/method
in intuitive terms (normal distribution)

25
What do I need to understand?

What the answer means, how it relates to the
inputs, assumptions made, and how it can be used
How to work it out with a computer (although in
an exam you will not have a computer and will not
be expected to remember details of computer
menus, etc)
In some cases, how to estimate a rough answer
For easy methods only, how to work it out without
a computer

Write a Comment

User Comments (0)