Issues Regarding Regression Models - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Issues Regarding Regression Models

Description:

A perfect linear relationship between two (or more) ... Under this condition, the least-square regression coefficients cannot be ... curvilinear ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 30
Provided by: zeli1
Category:

less

Transcript and Presenter's Notes

Title: Issues Regarding Regression Models


1
Issues Regarding Regression Models
  • (Lesson - 06/C)

2
Collinearity
  • A perfect linear relationship between two (or
    more) independent variables is called
    collinearity (multi-collinearity)
  • Under this condition, the least-square regression
    coefficients cannot be uniquely defined.

3
Collinearity
  • A strong but less than perfect linear
    relationship between the independent variables
    can cause
  • Regression coefficients to be unstable,
  • Standard errors to the coefficients become
  • large, hence, confidence intervals for
  • coefficients become large and coefficients
  • become imprecise.

4
Collinearity Mesurement
  • One of the measures to determine the impact of
    Collinearity on the precision of the estimates is
    called
  • the Variance Inflation Factor (VIF).
  • Regression / Linear / Statistics / (check)
    Collinearity

5
Collinearity Effect
  • Wrong signs for the coefficients
  • Drastic changes in the coefficients in terms of
    size and/or sign as a new variable is added to
    the equation.
  • High VIF (VIF gt 5) or Low Tolerance (lt 0.1)
  • are indicators of collinearity.

6
Collinearity Remedies
  • There is no Quick Fix for collinearity,
  • Some strategies
  • 1. Variable selection for the model
  • Based on correlation matrix, some of the highly
    correlated variables could be excluded from the
    model,
  • 2. Ridge Regression instead Ordinary Least
    Squared Regression (OLR).

7
Unusual Data
  • A single observation that is substantially
    different from all other observations can make a
    large difference in the results of your
    regression analysis. 
  • If a single observation (or small group of
    observations) substantially changes your results,
    you would want to know about this and investigate
    further. 
  • There are three ways that an observation can be
    unusual.

8
Unusual Data
  • Outliers In linear regression, an outlier is an
    observation with large residual. In other words,
    it is an observation whose dependent-variable
    value is unusual given its values on the
    predictor variables. An outlier may indicate a
    sample peculiarity or may indicate a data entry
    error or other problem.

9
Unusual Data
  • Leverage An observation with an extreme value on
    a predictor variable is called a point with high
    leverage. Leverage is a measure of how far an
    independent variable deviates from its mean.
    These leverage points can have an unusually large
    effect on the estimate of regression
    coefficients.

10
Unusual Data
  • Influence An observation is said to be
    influential if removing the observation
    substantially changes the estimate of
    coefficients. Influence can be thought of as the
    product of leverage and outlierness.

11
Influential Data Diagnosis
  • Cooks D
  • If Cooks distance for a particular observation
    is greater than a cutoff point than that
    observation could be considered as influential
    data.
  • One such cutoff point is
  • Di gt 4 / (n-k-1)
  • where, k number of independent variables
  • D gt 1, strong indication of problem

12
Influential Data Diagnostics on SPSS
  • Standardized DfBETA(s)
  • Change in the regression coefficient that results
    from the deletion of the ith case. A standardized
    DfBETA value is computed for each case for each
    regression coefficient generated by a model.
  • Cut-off Points
  • gt 0 means case i increases the slope
  • lt 0 means case i decreases the slope
  • DfBETA(s) gt 2 strong indication of influence
  • DfBETA(s) gt 2/sqrt(n) might be problem

13
Influential Data Diagnostics on SPSS
  • Leverage h
  • max(h) lt 0.2 OK, no problem
  • 0.2 lt max(h) lt 0.5, might be problem
  • max(h) gt 0.5, usually a problem of too much
    leverage for one case
  • h gt 2k/n, top few of cases

14
Influential Data Diagnostics on SPSS
  • Standardized DfFIT
  • Change in the predicted value when the ith case
    is deleted. 
  • Cut-off Point
  • DfFIT gt 2sqrt(k/n) problem

15
Influential Data Remedies
  • The unusual data need to be investigated
  • For example, it may stem from an error in data
    entry
  • The model could be re-specified, robust
    estimation methods could be used,
  • An influential data could only be discarded if it
    is a truly bad data and cannot be corrected.

16
Checking the Assumptions
  • There are assumptions that need to be met to
    accept the results of Regression analysis and use
    the model for future decision making
  • Linearity
  • Independence of errors (No autocorrelation),
  • Normality of errors,
  • Constant Variance of errors (Homoscadasticity).

17
Tests for Linearity
  • Linearity
  • Plot dependent variable against each of the
    independent variables separately.
  • Decide whether linear regression is a
    Reasonable description of the tendency in the
    data.
  • Consider curvilinear pattern,
  • Consider undue influence of one data point on the
    regression line, etc.

18
Nonlinear Relationships
Diminishing Returns Relationship of Advertising
versus Sales
Sales
Advertising
19
Analysis of Residuals
3
2
1
Residuals
0
-1
-2
(a) Nonlinear Pattern
-3
3
2
1
Residuals
0
-1
-2
(b) Linear Pattern
-3
20
Tests for Independence
  • Independence of Errors
  • Ljung-Box Test
  • Graphs / Time Series / Autocorrelations
  • Plot residuals against time (Residual-Time Plot)
  • Residuals form y-axis, time form x-axis
  • If the residuals group alternately into positive
    and negative clusters then that indicates
    auto-correlation

21
Residuals-Time Plot
  • Notice the tendency of the residuals to group
    alternately into positive and negative clusters.
  • That is an indication that the residuals are not
    independent but auto-correlated.

22
Analysis of Residuals
3
2
1
Residuals
0
-1
-2
(a) Independent Residuals
Time
-3
3
2
1
Residuals
0
-1
-2
(b) Residuals Not Independent
-3
Time
23
Non-Independence Remedies
  • EGLS (Estimated Generalized Least Squares)
    Methods
  • Prais-Winsten
  • Cochrane-Orcutt
  • (Note that these are effective only for
    first-order autocorrelation.)

24
Tests for Normality
  • Normality of Errors
  • Kolmogorov-Smirnov test on Residuals
  • Compute Skewness
  • Compute Kurtosis
  • Jarque-Bera Test

25
Jarque-Bera Test
  • Compute JB-Test Statistics
  • JB (n/6)Skew(Data_Range)
  • (n/24) ( Kurt(Data_Range)-3)2
  • Compare with Chi-square_alpha with 2 df
  • Chiinv ( alpha / tails, 2)
  • Ho JB lt Chi-square_alpha

26
Non-Normality Remedies
  • To stabilize error variance, one of the most
    frequently used technique is data transformation.
  • X and/or Y values could be transformed by
    employing power to those variables,
  • y (or x) gt yp (or xp)
  • where p -2, -1, -½, ½, 2, 3

27
Tests for Constant Variance
  • Constant Variance of Errors
  • Divide Residuals in two half and run an F-test
  • Plot residuals against y-estimates
  • Residuals form y-axis and estimated y-values form
    x-axis.
  • When errors get larger (or smaller) as y-values
    increase that would indicate non-constant
    variance.

28
Analysis of Residuals
3
2
1
Residuals
0
-1
-2
-3
x1
(a) Variance Decreases as x Increases
29
Analysis of Residuals
3
2
1
Residuals
0
-1
-2
-3
x1
(b) Variance Increases as x Increases
30
Analysis of Residuals
3
2
1
Residuals
0
-1
-2
-3
x1
(c) Constant Variance
31
Non-Constant Variance Remedies
  • Transform dependent variable (y)
  • y gt yp
  • where p -2, -1, -½, ½, 2, 3
  • Weighted Least Square Regression Method

32
Next Lesson
  • (Lesson - 07/A)
  • Qualitative Judgmental Forecasting Methods
Write a Comment
User Comments (0)
About PowerShow.com