Title: Topic 22: Diagnostics and Remedies
1Topic 22 Diagnostics and Remedies
2Outline
- Diagnostics
- residual checks
- ANOVA remedial measures
3Diagnostics Overview
- We will take the diagnostics and remedial
measures that we learned for regression and adapt
them to the ANOVA setting - Many things are essentially the same
- Some things require modification
4Residuals
- Predicted values are cell means,
- Residuals are the differences between the
observed values and the cell means Yij-
5Basic plots
- Plot the data vs the factor levels (the values of
the explanatory variables) - Plot the residuals vs the factor levels
- Construct a normal quantile plot of the residuals
6NKNW Example
- NKNW p 712
- Compare 4 brands of rust inhibitor (X has r4
levels) - Response variable is a measure of the
effectiveness of the inhibitor - There are 10 units per brand (n10)
7Plots
- Data versus the factor
- Residuals versus the factor
- Normal quantile plot of the residuals
8Plots vs the factor
symbol1 vcircle inone proc gplot dataa2
plot (eff resid)abrand run
9Data vs the factor
10Residuals vs the factor
11QQ-plot
12Summary
- Look for
- Outliers
- Variance that depends on level
- Non-normal errors
- Plot resdiuals vs time and other variables if
available
13Homogeneity tests
- Homogeneity of variance (homoscedasticity)
- H0 s12 s22 sr2
- H1 not all si2 are equal
- Several significance tests are available
14Homogeneity tests
- Text discusses Hartley, modified Levene
- SAS has several including Bartletts (essentially
the likelihood ratio test) and several versions
of Levene
15Homogeneity tests
- There is a problem with assumptions
- ANOVA is robust with respect to moderate
deviations from normality - ANOVA results can be sensitive to the homogeneity
of variance assumption - Some homogeneity tests are sensitive to the
normality assumption
16Levenes Test
- Do ANOVA on the squared residuals
- Modified Levenes test uses absolute values of
the residuals - Modified Levenes test is recommended
17NKNW Example
- NKNW p 765
- Compare the strengths of 5 types of solder flux
(X has r5 levels) - Response variable is the pull strength, force in
pounds required to break the joint - There are 8 solder joints per flux (n8)
18Levenes Test
proc glm dataa1 class type model
strengthtype means type/
hovtestlevene(typeabs) run
19Output
Levene's Test ANOVA of Absolute Deviations
Source DF F Value Pr gt F type
4 3.07 0.0288 Error 35
20Means and SDs
Level strength type N Mean Std Dev 1
8 15.42 1.23 2 8 18.52 1.25 3
8 15.00 2.48 4 8 9.74 0.81 5
8 12.34 0.76
21Remedies
- Delete outliers
- Is their removal important?
- Use weights (weighted regression)
- Transformations
- Nonparametric procedures
22Weighted least squares
- We used this with regression
- Obtain model for how the sd depends on the
explanatory variable (plotted absolute value of
residual vs x) - Then used weights inversely proportional to the
estimated variance
23Weighted Least Squares
- Here we can compute the variance for each level
- Use these as weights in PROC GLM
- We will illustrate with the soldering example
from NKNW
24Obtain the variances and weights
proc means dataa1 var strength by
type output outa2 vars2 data a2 set a2
wt1/s2
NOTE. Data set a2 has 5 cases
25Merge and then use the weights in PROC GLM
data a3 merge a1 a2 by type proc glm
dataa3 class type model
strengthtype weight wt run
26Output
Source DF F Value Pr gt F Model 4 81.05
lt.0001 Error 35 Total 39
27Transformation Guides
- When si2 is proportional to µi, use
- When si is proportional to µi, use log(y)
- When si is proportional to µi2, use 1/y
- For proportions, use arcsin( )
- arsin(sqrt(y)) in a SAS data step
28Nonparametric approach
- Based on ranks
- See NKNW section 18.7, p 777
- See the SAS procedure NPAR1WAY
29Last slide
- We finished Chapter 18 .
- We used program NKNW758.sas and NKNW768.sas.