Title: Residuals, Residual Plots,
1Residuals, Residual Plots, Influential points
2Residuals (error) -
- The vertical deviation between the observations
the LSRL - the sum of the residuals is always zero
- error observed - expected
3Residual plot
- A scatterplot of the (x, residual) pairs.
- Residuals can be graphed against other statistics
besides x - Purpose is to tell if a linear association exist
between the x y variables - If no pattern exists between the points in the
residual plot, then the association is linear.
4Linear
Not linear
5Age Range of Motion 35 154 24 142 40 137 31 13
3 28 122 25 126 26 135 16 135 14 108 20 120
21 127 30 122
One measure of the success of knee surgery is
post-surgical range of motion for the knee joint
following a knee dislocation. Is there a linear
relationship between age range of
motion? Sketch a residual plot.
Since there is no pattern in the residual plot,
there is a linear relationship between age and
range of motion
6Age Range of Motion 35 154 24 142 40 137 31 13
3 28 122 25 126 26 135 16 135 14 108 20 120
21 127 30 122
Plot the residuals against the y-hats. How does
this residual plot compare to the previous one?
7Residual plots are the same no matter if plotted
against x or y-hat.
8Coefficient of determination-
- r2
- gives the proportion of variation in y that can
be attributed to an approximate linear
relationship between x y - remains the same no matter which variable is
labeled x
9Age Range of Motion 35 154 24 142 40 137 31 13
3 28 122 25 126 26 135 16 135 14 108 20 120
21 127 30 122
Lets examine r2. Suppose you were going to
predict a future y but you didnt know the
x-value. Your best guess would be the overall
mean of the existing ys. Now, find the sum of
the squared residuals (errors). L3
(L2-130.0833)2. Do 1VARSTAT on L3 to find the
sum.
Sum of the squared residuals (errors) using the
mean of y.
10Age Range of Motion 35 154 24 142 40 137 31 13
3 28 122 25 126 26 135 16 135 14 108 20 120
21 127 30 122
Now suppose you were going to predict a future y
but you DO know the x-value. Your best guess
would be the point on the LSRL for that x-value
(y-hat). Find the LSRL store in Y1. In L3
Y1(L1) to calculate the predicted y for each
x-value. Now, find the sum of the squared
residuals (errors). In L4 (L2-L3)2. Do
1VARSTAT on L4 to find the sum.
Sum of the squared residuals (errors) using the
LSRL.
11Age Range of Motion 35 154 24 142 40 137 31 13
3 28 122 25 126 26 135 16 135 14 108 20 120
21 127 30 122
By what percent did the sum of the squared error
go down when you went from just an overall mean
model to the regression on x model?
This is r2 the amount of the variation in the
y-values that is explained by the x-values.
12Age Range of Motion 35 154 24 142 40 137 31 13
3 28 122 25 126 26 135 16 135 14 108 20 120
21 127 30 122
How well does age predict the range of motion
after knee surgery?
Approximately 30.6 of the variation in range of
motion after knee surgery can be explained by the
linear regression of age and range of motion.
13Interpretation of r2 Approximately r2 of the
variation in y can be explained by the LSRL of x
y.
14Computer-generated regression analysis of knee
surgery data Predictor Coef Stdev T P Constan
t 107.58 11.12 9.67 0.000 Age 0.8710 0.4146 2
.10 0.062 s 10.42 R-sq 30.6 R-sq(adj)
23.7
Be sure to convert r2 to decimal before taking
the square root!
NEVER use adjusted r2!
What is the equation of the LSRL? Find the slope
y-intercept.
What are the correlation coefficient and the
coefficient of determination?
15Outlier
- In a regression setting, an outlier is a data
point with a large residual
16Influential point-
- A point that influences where the LSRL is located
- If removed, it will significantly change the
slope of the LSRL
17Racket Resonance Acceleration (Hz)
(m/sec/sec) 1 105 36.0 2 106 35.0 3 110 34
.5 4 111 36.8 5 112 37.0 6 113 34.0 7 113
34.2 8 114 33.8 9 114 35.0 10 119 35.0 11 1
20 33.6 12 121 34.2 13 126 36.2 14 189 30.0
One factor in the development of tennis elbow is
the impact-induced vibration of the racket and
arm at ball contact. Sketch a scatterplot of
these data. Calculate the LSRL correlation
coefficient.
Does there appear to be an influential point? If
so, remove it and then calculate the new LSRL
correlation coefficient.
18Which of these measures are resistant?
- LSRL
- Correlation coefficient
- Coefficient of determination
NONE all are affected by outliers