Regression Assumptions

About This Presentation

Title:

Regression Assumptions

Description:

If the following assumptions are met: The Model is Complete Linear Additive Variables are measured at an interval or ratio scale without error The regression error ... – PowerPoint PPT presentation

Number of Views:156

Avg rating:3.0/5.0

Slides: 28

Provided by: AkosRo4

Learn more at: https://pages.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Regression Assumptions

1
Regression Assumptions
2
Best Linear Unbiased Estimate (BLUE)

If the following assumptions are met
The Model is
Complete
Linear
Additive
Variables are
measured at an interval or ratio scale
without error
The regression error term is
unrelated to predictors
normally distributed
has an expected value of 0
errors are independent
homoscedasticity
In a system of interrelated equations the errors
are unrelated to each other
Characteristics of OLS if sample is probability
sample
Unbiased
Efficient

3
The Three Desirable Characteristics

Unbiased
E(b)ß b is the sample ß is the true,
population coefficient
On the average we are on target
Efficient
Standard error will be minimum
Consistent
As N increases the standard error decreases and
closes in on the population value

4
Completeness
. regress API13 MEALS AVG_ED P_EL P_GATE EMER
DMOB if AVG_EDgt0 AVG_EDlt6, beta Source
SS df MS Number
of obs 10082 -------------------------------
------------- ------------------------------------
-- F( 6, 10075) 2947.08 Model
65503313.6 6 10917218.9 Prob gt F
0.0000 Residual 37321960.3 10075
3704.41293 R-squared
0.6370 ------------------------------------------
----------------------------------------- Adj
R-squared 0.6368 Total 102825274
10081 10199.9081 Root MSE
60.864 ------------------------------------------
--------------------------------------------------
---------------- API13 Coef.
Std. Err. t Pgtt
Beta --------------------------------------------
--------------------------------------------------
------------- MEALS .1843877 .0394747
4.67 0.000 .0508435
AVG_ED 92.81476 1.575453 58.91 0.000
.6976283 P_EL .6984374
.0469403 14.88 0.000
.1225343 P_GATE .8179836 .0666113
12.28 0.000 .0769699 EMER
-1.095043 .1424199 -7.69 0.000
-.046344 DMOB 4.715438
.0817277 57.70 0.000
.3746754 _cons 52.79082 8.491632
6.22 0.000
. ------------------------------------------------
--------------------------------------------------
----------
Meals
. regress API13 MEALS AVG_ED P_EL P_GATE EMER
DMOB PCT_AA PCT_AI PCT_AS PCT_FI PCT_HI PCT_PI
PCT_MR if AVG_EDgt0 AVG_EDlt6, beta Source
SS df MS
Number of obs 10082 -----------------------
--------------------------------------------------
----------- F( 13, 10068) 1488.01
Model 67627352 13 5202104
Prob gt F 0.0000 Residual
35197921.9 10068 3496.01926 R-squared
0.6577 ----------------------------------
-------------------------------------------------
Adj R-squared 0.6572 Total
102825274 10081 10199.9081 Root
MSE 59.127 -----------------------------
--------------------------------------------------
------------------------------- API13
Coef. Std. Err. t Pgtt
Beta --------------------------------
--------------------------------------------------
--------------------------- MEALS .370891
.0395857 9.37 0.000
.1022703 AVG_ED 89.51041 1.851184
48.35 0.000 .6727917
P_EL .2773577 .0526058 5.27 0.000
.0486598 P_GATE .7084009
.0664352 10.66 0.000
.0666584 EMER -.7563048 .1396315
-5.42 0.000 -.032008 DMOB
4.398746 .0817144 53.83 0.000
.349512 PCT_AA -1.096513
.0651923 -16.82 0.000
-.1112841 PCT_AI -1.731408 .1560803
-11.09 0.000 -.0718944 PCT_AS
.5951273 .0585275 10.17 0.000
.0715228 PCT_FI .2598189
.1650952 1.57 0.116
.0099543 PCT_HI .0231088 .0445723
0.52 0.604 .0066676 PCT_PI
-2.745531 .6295791 -4.36 0.000
-.0274142 PCT_MR -.8061266
.1838885 -4.38 0.000
-.0295927 _cons 96.52733 9.305661
10.37 0.000
. ------------------------------------------------
--------------------------------------------------
---------
Parents education
5
Diagnosis and Remedy

Diagnosis
Theoretical
Remedy
Including new variables

6
Linearity

Violation of linearity
An almost perfect relationship will appear as a
weak one
Almost all linear relations stop being linear at
a certain point

7
Diagnosis Remedy

Diagnosis
Visual scatter plots
Comparing regression with continuous and dummied
independent variable
Remedy
Use dummies
YabXe becomes
Yab1D1 bk-1Dk-1e where X is broken up into
k dummies (Di) and k-1 is included. If the
R-square of this equation is significantly higher
than the R-square of the original that is a sign
of non-linearity. The pattern of the slopes (bi)
will indicate the shape of the non-linearity.
Transform the variables through a non-linear
transformation, therefore
YabXe becomes
Quadratic Yab1Xb2X2e
Cubic Yab1Xb2X2b3X3e
Kth degree polynomial Yab1XbkXke
Logarithmic Yablog(X)e or
Exponential log(Y)abXe or Yeabxe
Inverse Yab/Xe etc.

8
Example
9
Meaningless!
Inflection point -b1/2b2 -(-3.666183)/2.018
1756100.85425 As you approach 100 the negative
effect disappears
10
Additivity

Yab1X1b2X2e
The assumption is that both X1 and X2 each,
separately add to Y regardless of the value of
the other.
E.g. Incab1Educationb2Citizenshipe
Imagine, that the effect of X1 depends on X2.
If Citizen Incab1Educatione
If Not Citizen Incab1Educatione
where b1 gtb1
You cannot simply add the two. If Citizenship is
takes only two values, their effect is
multiplicative
Incab1Educationb2Citizenshipe
There are many examples of the violation of
additivity
E.g., the effect of previous knowledge (X1) and
effort (X2) on grades (Y)
The effect of race and skills on income
(discrimination)
The effect of paternal and maternal education on
academic achievement

11
Diagnosis Remedy

Diagnosis
Try other functional forms and compare R-squares
Remedy
Introducing the multiplicative term as a new
variable so
Yiab1X1b2X2e becomes
Yiab1X1b2X2b3Z e where ZX1X2
Or transforming the equation into additive form
If YaX1b1X2b2e then
log Ylog(a)b1log(X1)b2log(X2)e so

12
Example with one dummy variable
Model Summary Model R R Square Adjusted R
Square Std. Error of the Estimate 1 .720(a) .519
.519 70.918 a Predictors (Constant), ESCHOOL,
AVG_ED
Does parents education matter more in elementary
school or later?
Coefficients(a) Model Unstandardized
Coefficients Standardized Coefficients t Sig.
B Std. Error Beta 1(Constant) 510.030 2.738
186.250 .000 AVG_ED 87.476 .930 .649 94.08
5 .000 ESCHOOL 54.352 1.424 .264 38.179 .000
a Dependent Variable API13
Model Summary Model R R Square Adjusted R
Square Std. Error of the Estimate 1 .730(a) .533
.533 69.867 a Predictors (Constant), INTESXED,
AVG_ED, ESCHOOL
Coefficients(a) Model Unstandardized
Coefficients Standardized Coefficients t Sig.
B Std. Error Beta 1(Constant) 454.542 4.151
109.497 .000 AVG_ED 107.938 1.481 .801 72.
896 .000 ESCHOOL 145.801 5.386 .707 27.073 .00
0 AVG_EDESCHOOL(interaction) -33.145 1.885 -.49
5 -17.587 .000 a Dependent Variable API13
13
Equations

Pred(API13i) 454.542 107.938AVG_EDi
145.801ESCHOOLi(-33.145)AVG_EDiESCHOOLi
IF ESCHOOL1 i.e. school is an elementary school
Pred(API13i) 454.542 107.938AVG_EDi
145.8011(-33.145)AVG_EDi1
454.542 107.938AVG_EDi 145.801(-33.145)AVG_ED
i
(454.542 145.801) (107.938 -33.145)AVG_EDi
600.34374.793AVG_EDi
IF ESCHOOL0 i.e. school is not an elementary but
a middle or high school
Pred(API13i) 454.542 107.938AVG_EDi
145.8010(-33.145)AVG_EDi0
454.542 107.938AVG_EDi
The effect of parental education is larger after
elementary school!
Is this difference statistically significant?

Coefficients(a) Model Unstandardized
Coefficients Standardized Coefficients t Sig.
B Std. Error Beta 1(Constant) 454.542 4.151
109.497 .000 AVG_ED 107.938 1.481 .801 72.
896 .000 ESCHOOL 145.801 5.386 .707 27.073 .00
0 AVG_EDESCHOOL(interaction) -33.145 1.885 -.49
5 -17.587 .000 a Dependent Variable API13
14
Example with continuous variables
15
Proper Level of Measurement
16
Measurement Error

Take YabXe
Suppose XXe where X is the real value and e
is a random measurement error
Then YabXe ? Yab(Xe)eabXbee ?
YabXE where Ebee and bb
The slope (b) will not change but the error will
increase as a result
Our R-square will be smaller
Our standard errors will be larger ? t-values
smaller ? significance smaller
Suppose XXcWe where W is a systematic
measurement error c is a weight
Then YabXe ? Yab(XcWe)eabXbcWE
bb iff rwx0 or rwy0 otherwise b?b which
means that the slope will change together with
the increase in the error. Apart from the
problems stated above, that means that
Our slope will be wrong

17
Diagnosis Remedy

Diagnosis
Look at the correlation of the measure with other
measures of the same variable
Remedy
Use multiple indicators and structural equation
models (AMOS)
Confirmatory factor analysis
Better measures

18
Normally Distributed Error
19
Non-Normal Error

Our calculations of statistical significance
depends on this assumption
Statistical inference can be robust even when
error is non-normal
Diagnosis
You can look at the distribution of the error.
Because of the homoscedasticity assumption (see
later) the error when summed up for each
prediction should be also normal. (In principle,
we have multiple observations for each
prediction.)
Remember! Our measured variables (Y and X) do not
have to have a normal distribution! Only the
error for each prediction.
Remedy
Any non-linear transformation will change the
shape of the distribution of the error

20
Example Count Data
N Minimum Maximum Mean Std. Deviation
childs NUMBER OF CHILDREN 1751 0 8 1.89 1.665
DEPENDENT VARIABLE Underdispersion
Mean/Std.Dev.gt1 Overdispersion
Mean/Std.Dev.lt1 As Mean gtStd. Deviation we have
a case of a (small) underdispersion
21
Poisson and Negative Binomial Regressions
Poisson assumes MeanStd.Dev (No over- or
underdispersion) Negative Binomial does not make
this assumption
Log of expected counts is now the unit of the
dependent variable
22
Error Has a Non-Zero Mean

The solid line gives a negative
The dotted line a positive mean
This can happen when we have some selection
problem
Diagnosis
Visual scatter plot will not help unless we know
in advance somehow the true regression line
Remedy
If it is a selection problem try to address it.

23
Non-independent errors

Example 1 Suppose you take a survey of 10 people
but you interview everyone 10 times.
Now your N1000 but your errors are not
independent. For the same person you will have
similar errors
Example 2 Suppose you take 10 countries and you
observe them in 10 different time period
Now your N1000 but your errors are not
independent. For the same country you will have
similar errors
Example 3 Suppose you take 100 countries and you
observe them only once. Now your N100. But
countries that are next to each other are often
similar (same geography and climate, similar
history, cooperation etc.). If your model
underpredicts Denmark, it is likely to
underpredict Sweden as well.
Example 4 Suppose you take 100 people but they
are all couples, so what you really have is 50
couples. Husband and wife tend to be similar. If
your model underestimates one chances are it does
the same for the other. Spouses have similar
errors.
Statistical inference assumes that each case is
independent of the other and in the two examples
above it is not the case. In fact, your N lt 100.
This biases your standard error because the
formula is tricked into believing that you have
a larger sample than you actually have and larger
samples give smaller standard errors and better
statistical significance.
This may also bias your estimates of the
intercept and the slope. Non-linearity is a
special case of correlated errors.

24
Diagnosis Remedy

It is called autocorrelation because the
correlation is between cases and not variables,
although autocorrelations often can be traced to
certain variables such as common geographic
location or same country or person or family.
Diagnosis
Visual, scatterplot
Checking groups of cases that are theoretically
suspect
Certain forms of serial or spatial
autocorrelations can be diagnosed by calculating
certain statistics (e.g., Durbin-Watson test)
Remedy
You can include new variables in the equation
E.g. for serial (temporal) correlation you can
include the value of Y in t-1 as an independent
variable
For spatial correlation we can often model the
relationships by introducing an weight matrix

25
Heteroscedasticity

Homoscedasticity means equal variance
Heteroscedasticity means unequal variance
We assume that each prediction is not just on
target on average but also that we make the same
amount of error
Heteroscedasticity results in biased standard
errors and statistical significance
Diagnosis
Visual, scatter plot
Remedy
Introducing a weight matrix (e.g. using 1/X)

26
Predictor Related to Error

Error represents all factors influencing Y that
are not included in the regression equation
If an omitted variable is related to X the
assumption is violated. This is the same as the
Completeness or Omitted Variable Problem
Diagnosis
The error will ALWAYS be uncorrelated with X,
there is no way to establish the TRUE error
Theoretical
Remedy
Adding new variables to the model

27
Correlated errors across interrelated equations

We sometimes estimate more than one regression.
Suppose Ytab1Xt-1b2Zt-1e but
Xtab1Yt-1b2Zt-1e
e and e will be correlated
(whatever is omitted from both equations will
show up in both e and e making them correlated)
This is also the case in sample selection models
Sab1Xb2Ze S is whether one is selected into
the sample
Yab1Xb2Zb3Wb4Ve Y is the outcome of
interest
e and e will be correlated
(whatever is omitted from both equations will
show up in both e and e making them correlated)