Title: PHIL ROWE Statistics Lecture 1 Data Presentation
1 2Correlation v Regression
Correlation Is there a relationship and, if so,
how strong is it? Regression What mathematical
formula relates the two parameters?
3Regression equation
Applicable where one variable depends upon
another. Regression equation will allow us to
predict the value of the dependent variable from
that of the independent variable.
Need to know which is dependent/independent
4Example regression problem -Heights of leaves
above ground and their drug content
- (Use same data as for correlation)
- Height above ground on tree
- Drug content in each leaf.
5Identifying dependency
Dependencies that we might test for Drug
content depends on where the leaf was growing.
Conceivable Leaf position depends on drug
content. Obvious nonsense
Dependent variable - Drug content Independent
variable - Height of leaf
6Regression equation
An equation by which we can predict drug content
from the leafs position
Dependent variable
Independent variable
7Heights of leaves above ground and their drug
content
Height (M) Drug (mg/g dry leaf) 1.3 81 1.9 65 2.
4 61 2.6 69 3.0 77 3.7 44 4.1 45 4.3 46 4.9
39 5.6 49 6.2 31 6.8 28 7.0 46 7.4 31 8.6 3
8
8Initial graphical assessment
As with correlation. Draw a graph to make sure
that we are not dealing with some obviously
non-linear relationship
9Drug concentration versus height of leaf
Reasonable to treat as linear
Drug ( mg/g)
Height (M)
10Choosing the line of best fit1) Draw a possible
line and determine the vertical deviation of each
point from the proposed line
Deviation of point from line
11Choosing the line of best fitcontinued ...
2) Square all deviations 3) Add all squared
deviations (Sum of squares) - measures how
well the line fits the points. (Bigger the
number, the worse fit) 4) Adjust line - minimise
Sum of Squares - Best fitting line
(Least squares fit)
12Relationship between regression line and
regression equation
Regression equation Y a b.X
Dependent variable (Y)
Gradient b
Intercept a
Independent variable (X)
13Using Minitab to obtain regression equation
Use menus Stat / Regression / Regression ...
14Dependent variable
Independent variable (Mintab labels as
Predictor)
Make sure the right parameter is entered in the
right box!
15Minitab output
Regression Analysis Drug_conc versus
Height The regression equation is Drug_conc
79.3 - 6.30 Height Predictor Coef SE
Coef T P Constant 79.296
6.023 13.17 0.000 Height
-6.296 1.176 -5.35 0.000 S 9.751
R-Sq 68.8 R-Sq(adj)
66.4 Analysis of Variance Source DF
SS MS F
P Regression 1 2726.0 2726.0
28.67 0.000 Residual Error 13 1236.0
95.1 Total 14 3962.0
Regression equation
16Regression equation
Drug conc (mg/g) 79.3 - 6.30 x Height (M)
The minus sign indicates a negative relationship
between drug concentration and height. Graph has
a negative gradient
17Significance testing
- As with correlation analysis - Test whether the
evidence of a relationship is strong enough to be
convincing. - If you subject same set of data to both
correlation and regression analysis either - - Both will declare it significant or
- Both will declare it non-significant.
- (Not surprising. Testing essentially the same
question.)
18Minitab output
Regression is significant. Appropriate to use
equation to make predictions.
Regression Analysis Drug_conc versus
Height The regression equation is Drug_conc
79.3 - 6.30 Height Predictor Coef SE
Coef T P Constant 79.296
6.023 13.17 0.000 Height
-6.296 1.176 -5.35 0.000 S 9.751
R-Sq 68.8 R-Sq(adj)
66.4 Analysis of Variance Source DF
SS MS F
P Regression 1 2726.0 2726.0
28.67 0.000 Residual Error 13 1236.0
95.1 Total 14 3962.0
19Predicting drug content
Predict concentration of drug in a leaf gathered
from a height of 5 Metres up a tree Conc 79.3
- 6.30 x Height 79.3 - 6.30 x 5
79.3 - 31.5 47.8 Predicted drug
concentration 47.8 mg/g
20Reverse prediction
- To estimate height at which leaf grew by knowing
its drug content - Do NOT re-run the regression analysis with
height as dependent and drug content as
independent variable. - Correct approach Re-arrange the existing
equation
21Reverse prediction
Drug 79.3 - 6.30 x Height 6.30 x Height
79.3 - Drug Height 79.3 - Drug
6.30
22Reverse prediction
If drug conc in leaf 41.1 mg/g Height
79.3 - Drug 6.30
79.3 - 41.1 6.30
38.2 6.30
6.06 Metres (Height of leaf)
23Extrapolation
Predict drug conc in a leaf gathered 15 M up a
tree Drug conc 79.3 - 6.30 x Height
79.3 - 6.30 x 15
79.3 - 94.5 -15.2 mg/g
Obviously ridiculous!
24Interpolation versus Extrapolation
Extrapolation
Interpolation
Extrapolation
25Interpolation v Extrapolation
Interpolation usually quite safe Extrapolation
will only work if you can guarantee that the
linear relationship continues beyond the region
that has been observed. In this case, it cannot.
Concs would be negative in any leaf from above 12
Meters.
26RegressionExample 2Non-significant case
Serum creatinine concentrations reflect renal
function. Is it possible to predict the
clearance of a novel antibiotic using serum
creatinine values?
27Creatinine Clearance (mm/l) (l/h) ---------------
---------------- 91 4.7 103 5.6 118 5.1 101 8.
3 98 6.0 111 4.2 73 5.4 103 2.6 110 3.1 82 7
.2 86 8.0 89 4.2 88 8.3 102 3.8
Plasma creatinine concentrations and clearance
values for a novel antibiotic in 16 volunteers
28No obvious evidence of a non-linear relationship.
OK to proceed with regression.
29Regression equation should not be used to predict
clearance
Regression equation is generated ...
but it may have no predictive value (P gt 0.05)
Regression Analysis Clearance versus
Creatinine The regression equation is Clearance
12.0 - 0.0671 Creatinine Predictor Coef
SE Coef T P Constant
11.962 3.838 3.12 0.009 Creatini
-0.06711 0.03935 -1.71 0.114 S
1.770 R-Sq 19.5 R-Sq(adj)
12.8 Analysis of Variance Source DF
SS MS F
P Regression 1 9.109 9.109
2.91 0.114 Residual Error 12 37.584
3.132 Total 13 46.693
30Terms with which you should be familiar
- Regression line
- Regression equation
- Least squares fit
- Predictor
- Reverse prediction
- Interpolation
- Extrapolation
31What you should be able to do
- Describe the distinction between correlation and
regression analysis. - Describe the purpose of a regression equation.
- Use a graph to avoid the use of regression
analysis with non-linearly related data. - Describe the criterion by which we we select the
line of best fit. - Use Minitab (etc) to produce and test the
significance of regression equations
Continued on next slide ...
32What you should be able to do
(Continued)
- Use a regression equation to predict the value of
a dependent variable from that of an independent
variable. - Use reverse prediction to predict the value of
an independent variable from that of a dependent
variable. - Recognise the dangers of extrapolation.