Diagnostics in Regression Analysis - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Diagnostics in Regression Analysis

Description:

Linearity: Each term in the regression model must be linearly related to ... stepwise regression to reduce multiC and check final model for glaring omissions ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 14
Provided by: hrcla
Category:

less

Transcript and Presenter's Notes

Title: Diagnostics in Regression Analysis


1
Diagnostics in Regression Analysis
Topics Regression Model Assumptions Graphical
Checks for Assumption Violations Numerical
Indicators Spin-off Benefits Detecting
Outliers Model Building Strategy
2
Model Assumptions
  • Linearity Each term in the regression model must
    be linearly related to the dependent variable
  • Errors must
  • Be normally distributed with zero mean
  • Have constant variance
  • Be independent

3
Graphical Checks for Linearity Violation
  • Plot Y vs. each X variable and look for linear
    trends. (good)
  • Plot residuals vs. fitted Y (Yhat) and look for
    non-linear trends. (bad)
  • To remedy consider transformations on X such as
    log, square, square root, reciprocal

4
Graphical Checks for Normality Violation
  • Plot histogram of residuals and look for
    approximate mound shape
  • Run Best Fit analysis on residuals using only
    normal distribution as input
  • To remedy consider transformations on Y such as
    log, reciprocal

5
Graphical Checks for Constant Variance Violation
  • Plot residuals vs. fitted Y (Y-hat) and look for
    randomness (shot-gun blast)
  • Typical departures are fan or egg shape
  • Remedy consider log transform on Y

6
Graphical Checks for Independence Violation
  • If data are collected over time, plot residuals
    vs. time or order of observation. Look for
    randomness
  • Typical departures positive trend, sinusoidal
    wave, or zig-zag pattern
  • Remedy consider including time as an explanatory
    variable in the model

7
Numerical Indicators
  • Normality
  • Look for p-value gt 0.1 in Best Fit Analysis
  • Independence
  • Durbin-Watson statistic automatically generated
    in StatTools when residual plot requested. Look
    for DW close to 2

8
Spin-Off Benefits from Residual Plot
  • Residual vs. fitted Y plot should look random
  • Any pattern indicates model needs tweaking
  • If pattern detected look at plots of residuals
    vs. each X variable to locate which variable
    needs transforming

9
Detecting Outliers from Residual Plots
  • Residual vs. fitted Y plot should be confined to
    2Se boundaries
  • Any observation outside boundaries is a potential
    Y outlier
  • Investigate origin of outlier and correct if
    possible
  • Consider deleting outlying obs.

10
Detecting Influential Outliers
  • These are observations for which the regression
    equation will change significantly depending on
    whether they are left out or included
  • Outliers in X variables are potential influential
    observations
  • Consider deleting X outliers before running
    regression

11
Strategy for Building Regression Models
  • An art as well as a science
  • Use parsimony as over-riding principle
  • Do Box-whisker plots of all variables before
    conducting regression to help detect outliers
  • Obtain correlation matrix of quantitative
    variables to detect possible multiC and potential
    good predictors

12
Strategy for Building Regression Models
  • Before conducting regression inspect
    scatter-plots of Y vs. potential Xs to check on
    linearity and possible need for transformations
  • Conduct general stepwise regression to reduce
    multiC and check final model for glaring
    omissions
  • Analyze various types of residual plots take
    remedial action if necessary

13
Strategy for Building Regression Models
  • Use model to predict Ynew and Mean Y only after
    any assumption violations have been remedied and
    influential outliers have been deleted or
    corrected
  • Interpret slope coefficients only in the absence
    of multiC
  • If prediction intervals too wide look for other
    predictors to reduce Se
Write a Comment
User Comments (0)
About PowerShow.com