Lecturer 10: Regression with one X variable - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Lecturer 10: Regression with one X variable

Description:

Choose c and b (constant and slope) to make MSE as low as possible ... Concentrate on understanding the formula/method in intuitive terms (normal distribution) ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 26
Provided by: pc208
Category:

less

Transcript and Presenter's Notes

Title: Lecturer 10: Regression with one X variable


1
Lecturer 10 Regression with one X variable
  • Straight line (linear model) for predicting one
    variable from another
  • Predicted Y constant slope X
  • y c bx
  • Uses method of least squares to choose best line
  • R squared measures accuracy of prediction
  • Slope tells you

2
Method of least squares
  • Take any values of c and b (constant and slope)
  • Work out prediction for each x in data
  • Work out error of prediction
  • Work out mean square error (MSE)
  • Choose c and b (constant and slope) to make MSE
    as low as possible .

3
How to apply the method of least squares
  • Use Excel Solver as in spreadsheet pred1var.xls
  • Advantage is it shows you whats going on (and
    its more flexible)
  • Use formulae derived from calculus traditional
    method.
  • The formulae are not helpful for understanding
    whats going on so I will not be covering them
  • Easiest to use software with formulae built in -
    eg Excel
  • Tools Data Analysis Regression
  • If this isnt on the menu use Tools Add-Ins

4
An example think of a story
  • With six people / organisations / etc
  • And two numerical variables which may be related
    in an interesting way
  • Get the data, or make it up ..
  • Do a regression analysis to predict one variable
    from the other using pred1var.xls, and then the
    Regression Tool in Excel
  • Use the model to make a prediction
  • Now try doing it the other way round
  • Make sure you understand

5
Regression terminology
  • See table in the Word handout

6
Slope / regression coefficient / x coefficient
  • Interpretation obvious and important
  • The slope tells you
  • A negative slope means

7
R squared easy version
  • R squared is the square of the correlation
    coefficient
  • Often used as a measure of how good the model is
  • R squared 1 if correl 1 or -1 model very
    good
  • R squared 0 if correl 0 model very bad
  • R squared 0.5 means the model half way between
    good and bad
  • R squared 0.9 means its good but not perfect
  • Etc

8
R squared more detail
  • Model based on a variable with zero correlation
    with the dependent variable would be completely
    useless
  • Best prediction here is the mean.
  • MSE variance square of sd. (See Pred1var.xls)
  • Model based on straight line relationship is the
    best possible
  • correlation 1 or -1
  • MSE 0
  • A reasonable measure of the model is the
    reduction in MSE from the worst model (with MSE
    variance)
  • Ie the proportional reduction in MSE
  • This turns out to be the same as R squared

9
But dont forget the sample
  • Even a model with R squared 0.9 or 1 may not be
    as good as it seems if the sample size is small
  • This is a separate issue which is not assessed by
    R squared
  • See work on hypothesis (significance) tests and
    confidence intervals

10
Edited output from Excel Tool Regression for job
satisfaction data
11
Note that
  • This output is edited either read a book on
    mathematical statistics, or ignore the rest of
    the output
  • Eg we have ignored the t stat. This is just used
    to calculate ps and confidence intervals
  • (I have left the term standard error, although
    I wont be explaining it in detail. Its a term
    used for the standard deviation of something when
    you are using it as a measure of error.)
  • In practice you would always want a larger
    sample! But this illustrates the principle.

12
Multiple regression
  • Prediction model (linear) using several variables
  • Pred Y const slope1X1 slopenXn
  • y c b1x1 bnxn
  • Uses method of least squares to choose best line
  • R squared (coefficient of determination) measures
    goodness of fit of model to data
  • Slopes tell you impact of each variable on
    dependent variable

13
Mostly same as with single variable regression
  • Least squares
  • Predmvar.xls or Excel Tool (need independent X
    variables in a block so you can select them all)
  • R squared
  • Slope for each variable
  • predicted increase in dependent variable if
    variable is increased by one without changing
    other variables
  • Category variables represented by 1/0 eg sexn

14
Problems with regression
  • Model may not be reasonable (eg infant mortality
    and GNP)
  • Sample too small coefficients unreliable (check
    confidence intervals)
  • Have you got the right variables?
  • Highly correlated variables can give misleading
    results
  • Too many variables
  • See reading for more detail

15
Uses of regression
  • Very widely used in research (over-used?)
  • Examples

16
Predicting returns from shares
  • Dissanaike (1999) produced a regression model to
    predict the return which investors would receive
    from investing in a particular security for a
    period of four years, from the return they would
    have received if they had invested in the same
    security in the previous four years. The data on
    which the model was based were the returns for a
    sample of large companies over consecutive
    periods of four years.
  • The regression coefficient cited was -0.112, and
    the value of R squared was 0.0413.
  • Suppose you were considering investing in two
    shares A or B. A has produced a return over the
    last four years of -5, and B has produced 5.
    Use the regression model to predict which share
    is likely to produce the better returns over the
    next four years, and by how much. How sure would
    you be?

17
(No Transcript)
18
(No Transcript)
19
Further statistics .
20
Mathematical notation
  • You may need to be familiar with some
    mathematical notation for more advanced work
    (this will not be required in the exam)
  • Sigma (summation) notation
  • Pi (product) notation
  • Use of a bar above a symbol for mean (average)
  • Subscripts RJK etc
  • Standard symbols n for sample size, t for time,
    etc

21
Covariances and variances
  • The attached handout explains what these are and
    the relationships between them
  • You may need this to follow some mathematical
    work in finance
  • It will not be directly assessed in the exam
    (although it may improve your answers)

22
Formulae, computers and understanding (1)
  • You can usually get the answer (eg sd, correl,
    regression coefficient)
  • with a computer
  • Using the formula / method
  • Computer is
  • quicker and more accurate, but
  • You may not understand what the answer means or
    how to use it . This can be serious!

23
Formulae, computers and understanding (2)
  • Sometimes the formula / method will help you
    understand what the answer means
  • Eg percentiles, Kendall correlation coefficients
  • Then its a good idea to do simple examples with
    formula/method to help you understand, then use a
    computer

24
Formulae, computers and understanding (3)
  • Sometimes the formula / method will not help you
    understand what the answer means
  • Eg formulae for a regression coefficient, and
    normal distribution
  • Here you need much more mathematical background
    to understand properly (especially the normal
    distribution)
  • Then its a good idea to
  • try to find an alternative approach which is
    easier to follow (regression), or
  • Concentrate on understanding the formula/method
    in intuitive terms (normal distribution)

25
What do I need to understand?
  • What the answer means, how it relates to the
    inputs, assumptions made, and how it can be used
  • How to work it out with a computer (although in
    an exam you will not have a computer and will not
    be expected to remember details of computer
    menus, etc)
  • In some cases, how to estimate a rough answer
  • For easy methods only, how to work it out without
    a computer
Write a Comment
User Comments (0)
About PowerShow.com