Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Regression

Description:

Pick the lowest scoring 10% on the midterm and give them extra tutoring ... 'When x changes by Dx in the sample, then on average y changes by Dx b in the sample. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 17
Provided by: halva
Category:

less

Transcript and Presenter's Notes

Title: Regression


1
Regression
  • Hal Varian
  • 10 April 2006

2
What is regression?
  • History
  • Curve fitting v statistics
  • Correlation and causation
  • Statistical models
  • Gauss-Markov theorem
  • Maximum likelihood
  • Conditional mean
  • What can go wrong
  • Examples

3
Francis Galton, 1877
  • Plotted first regression line
  • Diameter of sweetpeas v diameter of parents
  • Heights of fathers v heights of sons
  • Sons of unusually tall fathers tend to be tall,
    but shorter than their fathers. Galton called
    this regression to mediocrity.
  • But this is also true the other way around!
    Regression to the mean fallacy.
  • Pick the lowest scoring 10 on the midterm and
    give them extra tutoring
  • If they do better on the final, what can you
    conclude? Did the tutoring help?

4
Regression analysis
  • Assume a linear relation between two variables
    and estimate unknown parameters
  • yt a b xt et for t 1,,T
  • observed fitted error or residual
  • dependent variable independent
    variables/predictors/correlates

5
Curve fitting v regression
  • Often choose (a,b) to minimize the sum of squared
    residuals (least squares)
  • Why not absolute value of residuals?
  • Why not fit xt a b yt?
  • How much can you trust the estimated values?
  • Need a statistical model to answer these
    questions!
  • Linear regression linear in parameters
  • Nonlinear regression, local regression, general
    linear model, general additive model same
    principles apply

6
Possible goals
  • Estimate parameters (a , b and error variance)
  • Test hypotheses (such as x has no influence on
    y)
  • Make predictions about y conditional on observing
    a new x-value
  • Summarize data (most common unstated goal!)

7
Summarizing relationships
  • Would like to be able to interpret regression as
    causal
  • If x changes by Dx, then y will on average
    change by Dx b.
  • Correlation v causation
  • Compare the time on my wristwatch with the time
    on your wristwatch
  • Even ideally, best you can say is
  • When x changes by Dx in the sample, then on
    average y changes by Dx b in the sample.

8
Problem with causality
  • There may be a third cause
  • my watch time and your watch time both depend
    on NIST time
  • Economics example
  • income b education (unobserved IQother)
  • education IQ
  • Higher income is associated with higher education
    in sample, but b is a biased estimate of partial
    effect of education on income
  • Need a controlled experiment or more elaborate
    estimation technique to resolve this
    simultaneous equations bias

9
Statistical regression model
  • yt a b xt et for t 1,,T
  • Think of random variable et as the sum of the
    other omitted effects
  • What are attractive properties for error term?
  • E et 0
  • Var et constant
  • E et es 0 (errors are independent)
  • E xt et 0 (errors are conditionally
    uncorrelated with explanatory variables often
    problematic for reasons on last slide! Exogenous
    v endogenous.)
  • Have to ask how do the variables you dont
    observe affect the variables you do observe?

10
Optimality properties
  • Gauss-Markov theorem If the error term has these
    properties, then the linear regression estimates
    of (a,b) are BLUE best linear unbiased
    estimates out of all unbiased estimates that
    are linear in yt the least squares estimates have
    minimum variance.
  • If et are Normal IID distributed, then the OLSQ
    estimates are maximum likelihood estimates

11
Conditional means
  • In the regression model, note that the expected
    value of yt is a b xt . So the conditional
    mean is linear in xt, which is another
    interpretation of regression.
  • More generally, can think of regression model as
    being E yt f(xt, b)

12
Regression output
  • Estimates of parameters
  • Standard errors of estimates and error term
  • t-statistics estimate/se and p-values
  • R2 goodness of fit measure
  • Total SS Fitted SS Residual SS
  • R2 Fitted SS / Total SS

13
Example from R
gt x lt- 1100 gt y lt- x 10rnorm(100) gt
summary(lm(yx)) Coefficients
Estimate Std. Error t value Pr(gtt)
(Intercept) 1.81944 1.86779 0.974 0.332
x 0.97354 0.03211 30.319
lt2e-16 Residual standard error 9.269 on 98
degrees of freedom Multiple R-Squared
0.9037, Adjusted R-squared 0.9027
14
What can go wrong?
  • Nonlinear relationship
  • Try quadratic, interaction term, logs, etc.
  • Var et is not constant
  • Heteroskedasticity affects testing not
    estimates
  • Take logs or use weighted least squares
  • Serial correlation affects testing and
    prediction accuracy
  • Use time series methods
  • Multiple regression colinearity
  • Socks right shoes left shoes shoes error

15
What can go wrong, cont
  • Errors in variables
  • Underestimate magnitude of true effect
  • Omitted variable bias
  • Bias depending on correlation of omitted with
    included variables
  • Simultaneous equations bias
  • Third cause alluded to earlier, need to estimate
    full model or use controlled experiment
  • Outliers
  • Non-normality of errors and influential
    observations remove them or use robust
    estimation

16
Diagnostics
  • Look at residuals!!
  • R allows you to plot various regression
    diagnostics
  • reg lt- lm(yx)
  • plot(reg)
  • Examples to follow
Write a Comment
User Comments (0)
About PowerShow.com