Title: Diagnostics in Regression Analysis
1Diagnostics in Regression Analysis
Topics Regression Model Assumptions Graphical
Checks for Assumption Violations Numerical
Indicators Spin-off Benefits Detecting
Outliers Model Building Strategy
2Model Assumptions
- Linearity Each term in the regression model must
be linearly related to the dependent variable - Errors must
- Be normally distributed with zero mean
- Have constant variance
- Be independent
3Graphical Checks for Linearity Violation
- Plot Y vs. each X variable and look for linear
trends. (good) - Plot residuals vs. fitted Y (Yhat) and look for
non-linear trends. (bad) - To remedy consider transformations on X such as
log, square, square root, reciprocal
4Graphical Checks for Normality Violation
- Plot histogram of residuals and look for
approximate mound shape - Run Best Fit analysis on residuals using only
normal distribution as input - To remedy consider transformations on Y such as
log, reciprocal
5Graphical Checks for Constant Variance Violation
- Plot residuals vs. fitted Y (Y-hat) and look for
randomness (shot-gun blast) - Typical departures are fan or egg shape
- Remedy consider log transform on Y
6Graphical Checks for Independence Violation
- If data are collected over time, plot residuals
vs. time or order of observation. Look for
randomness - Typical departures positive trend, sinusoidal
wave, or zig-zag pattern - Remedy consider including time as an explanatory
variable in the model
7Numerical Indicators
- Normality
- Look for p-value gt 0.1 in Best Fit Analysis
- Independence
- Durbin-Watson statistic automatically generated
in StatTools when residual plot requested. Look
for DW close to 2
8Spin-Off Benefits from Residual Plot
- Residual vs. fitted Y plot should look random
- Any pattern indicates model needs tweaking
- If pattern detected look at plots of residuals
vs. each X variable to locate which variable
needs transforming
9Detecting Outliers from Residual Plots
- Residual vs. fitted Y plot should be confined to
2Se boundaries - Any observation outside boundaries is a potential
Y outlier - Investigate origin of outlier and correct if
possible - Consider deleting outlying obs.
10Detecting Influential Outliers
- These are observations for which the regression
equation will change significantly depending on
whether they are left out or included - Outliers in X variables are potential influential
observations - Consider deleting X outliers before running
regression
11Strategy for Building Regression Models
- An art as well as a science
- Use parsimony as over-riding principle
- Do Box-whisker plots of all variables before
conducting regression to help detect outliers - Obtain correlation matrix of quantitative
variables to detect possible multiC and potential
good predictors
12Strategy for Building Regression Models
- Before conducting regression inspect
scatter-plots of Y vs. potential Xs to check on
linearity and possible need for transformations - Conduct general stepwise regression to reduce
multiC and check final model for glaring
omissions - Analyze various types of residual plots take
remedial action if necessary
13Strategy for Building Regression Models
- Use model to predict Ynew and Mean Y only after
any assumption violations have been remedied and
influential outliers have been deleted or
corrected - Interpret slope coefficients only in the absence
of multiC - If prediction intervals too wide look for other
predictors to reduce Se