Issues Regarding Regression Models - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Issues Regarding Regression Models

Description:

A perfect linear relationship between two (or more) ... Under this condition, the least-square regression coefficients cannot be ... curvilinear ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 30

Provided by: zeli1

Category:

more less

Transcript and Presenter's Notes

Title: Issues Regarding Regression Models

1
Issues Regarding Regression Models

(Lesson - 06/C)

2
Collinearity

A perfect linear relationship between two (or
more) independent variables is called
collinearity (multi-collinearity)
Under this condition, the least-square regression
coefficients cannot be uniquely defined.

3
Collinearity

A strong but less than perfect linear
relationship between the independent variables
can cause

Regression coefficients to be unstable,
Standard errors to the coefficients become
large, hence, confidence intervals for
coefficients become large and coefficients
become imprecise.

4
Collinearity Mesurement

One of the measures to determine the impact of
Collinearity on the precision of the estimates is
called
the Variance Inflation Factor (VIF).
Regression / Linear / Statistics / (check)
Collinearity

5
Collinearity Effect

Wrong signs for the coefficients
Drastic changes in the coefficients in terms of
size and/or sign as a new variable is added to
the equation.
High VIF (VIF gt 5) or Low Tolerance (lt 0.1)
are indicators of collinearity.

6
Collinearity Remedies

There is no Quick Fix for collinearity,
Some strategies
1. Variable selection for the model
Based on correlation matrix, some of the highly
correlated variables could be excluded from the
model,
2. Ridge Regression instead Ordinary Least
Squared Regression (OLR).

7
Unusual Data

A single observation that is substantially
different from all other observations can make a
large difference in the results of your
regression analysis.
If a single observation (or small group of
observations) substantially changes your results,
you would want to know about this and investigate
further.
There are three ways that an observation can be
unusual.

8
Unusual Data

Outliers In linear regression, an outlier is an
observation with large residual. In other words,
it is an observation whose dependent-variable
value is unusual given its values on the
predictor variables. An outlier may indicate a
sample peculiarity or may indicate a data entry
error or other problem.

9
Unusual Data

Leverage An observation with an extreme value on
a predictor variable is called a point with high
leverage. Leverage is a measure of how far an
independent variable deviates from its mean.
These leverage points can have an unusually large
effect on the estimate of regression
coefficients.

10
Unusual Data

Influence An observation is said to be
influential if removing the observation
substantially changes the estimate of
coefficients. Influence can be thought of as the
product of leverage and outlierness.

11
Influential Data Diagnosis

Cooks D
If Cooks distance for a particular observation
is greater than a cutoff point than that
observation could be considered as influential
data.
One such cutoff point is
Di gt 4 / (n-k-1)
where, k number of independent variables
D gt 1, strong indication of problem

12
Influential Data Diagnostics on SPSS

Standardized DfBETA(s)
Change in the regression coefficient that results
from the deletion of the ith case. A standardized
DfBETA value is computed for each case for each
regression coefficient generated by a model.
Cut-off Points
gt 0 means case i increases the slope
lt 0 means case i decreases the slope
DfBETA(s) gt 2 strong indication of influence
DfBETA(s) gt 2/sqrt(n) might be problem

13
Influential Data Diagnostics on SPSS

Leverage h
max(h) lt 0.2 OK, no problem
0.2 lt max(h) lt 0.5, might be problem
max(h) gt 0.5, usually a problem of too much
leverage for one case
h gt 2k/n, top few of cases

14
Influential Data Diagnostics on SPSS

Standardized DfFIT
Change in the predicted value when the ith case
is deleted.
Cut-off Point
DfFIT gt 2sqrt(k/n) problem

15
Influential Data Remedies

The unusual data need to be investigated
For example, it may stem from an error in data
entry
The model could be re-specified, robust
estimation methods could be used,
An influential data could only be discarded if it
is a truly bad data and cannot be corrected.

16
Checking the Assumptions

There are assumptions that need to be met to
accept the results of Regression analysis and use
the model for future decision making
Linearity
Independence of errors (No autocorrelation),
Normality of errors,
Constant Variance of errors (Homoscadasticity).

17
Tests for Linearity

Linearity
Plot dependent variable against each of the
independent variables separately.
Decide whether linear regression is a
Reasonable description of the tendency in the
data.
Consider curvilinear pattern,
Consider undue influence of one data point on the
regression line, etc.

18
Nonlinear Relationships
Diminishing Returns Relationship of Advertising
versus Sales
Sales
Advertising
19
Analysis of Residuals
3
2
1
Residuals
0
-1
-2
(a) Nonlinear Pattern
-3
3
2
1
Residuals
0
-1
-2
(b) Linear Pattern
-3
20
Tests for Independence

Independence of Errors
Ljung-Box Test
Graphs / Time Series / Autocorrelations
Plot residuals against time (Residual-Time Plot)
Residuals form y-axis, time form x-axis
If the residuals group alternately into positive
and negative clusters then that indicates
auto-correlation

21
Residuals-Time Plot

Notice the tendency of the residuals to group
alternately into positive and negative clusters.
That is an indication that the residuals are not
independent but auto-correlated.

22
Analysis of Residuals
3
2
1
Residuals
0
-1
-2
(a) Independent Residuals
Time
-3
3
2
1
Residuals
0
-1
-2
(b) Residuals Not Independent
-3
Time
23
Non-Independence Remedies

EGLS (Estimated Generalized Least Squares)
Methods
Prais-Winsten
Cochrane-Orcutt
(Note that these are effective only for
first-order autocorrelation.)

24
Tests for Normality

Normality of Errors
Kolmogorov-Smirnov test on Residuals
Compute Skewness
Compute Kurtosis
Jarque-Bera Test

25
Jarque-Bera Test

Compute JB-Test Statistics
JB (n/6)Skew(Data_Range)
(n/24) ( Kurt(Data_Range)-3)2
Compare with Chi-square_alpha with 2 df
Chiinv ( alpha / tails, 2)
Ho JB lt Chi-square_alpha

26
Non-Normality Remedies

To stabilize error variance, one of the most
frequently used technique is data transformation.

X and/or Y values could be transformed by
employing power to those variables,
y (or x) gt yp (or xp)
where p -2, -1, -½, ½, 2, 3

27
Tests for Constant Variance

Constant Variance of Errors
Divide Residuals in two half and run an F-test
Plot residuals against y-estimates
Residuals form y-axis and estimated y-values form
x-axis.
When errors get larger (or smaller) as y-values
increase that would indicate non-constant
variance.

28
Analysis of Residuals
3
2
1
Residuals
0
-1
-2
-3
x1
(a) Variance Decreases as x Increases
29
Analysis of Residuals
3
2
1
Residuals
0
-1
-2
-3
x1
(b) Variance Increases as x Increases
30
Analysis of Residuals
3
2
1
Residuals
0
-1
-2
-3
x1
(c) Constant Variance
31
Non-Constant Variance Remedies