PHIL ROWE Statistics Lecture 1 Data Presentation - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

PHIL ROWE Statistics Lecture 1 Data Presentation

Description:

Choosing the line of best fit. continued ... 2) Square all deviations ... Least squares fit. Predictor. Reverse prediction. Interpolation. Extrapolation ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 33
Provided by: serv414
Category:

less

Transcript and Presenter's Notes

Title: PHIL ROWE Statistics Lecture 1 Data Presentation


1
  • Lecture 17
  • Regression

2
Correlation v Regression
Correlation Is there a relationship and, if so,
how strong is it? Regression What mathematical
formula relates the two parameters?
3
Regression equation
Applicable where one variable depends upon
another. Regression equation will allow us to
predict the value of the dependent variable from
that of the independent variable.
Need to know which is dependent/independent
4
Example regression problem -Heights of leaves
above ground and their drug content
  • (Use same data as for correlation)
  • Height above ground on tree
  • Drug content in each leaf.

5
Identifying dependency
Dependencies that we might test for Drug
content depends on where the leaf was growing.
Conceivable Leaf position depends on drug
content. Obvious nonsense
Dependent variable - Drug content Independent
variable - Height of leaf
6
Regression equation
An equation by which we can predict drug content
from the leafs position
Dependent variable
Independent variable
7
Heights of leaves above ground and their drug
content
Height (M) Drug (mg/g dry leaf) 1.3 81 1.9 65 2.
4 61 2.6 69 3.0 77 3.7 44 4.1 45 4.3 46 4.9
39 5.6 49 6.2 31 6.8 28 7.0 46 7.4 31 8.6 3
8
8
Initial graphical assessment
As with correlation. Draw a graph to make sure
that we are not dealing with some obviously
non-linear relationship
9
Drug concentration versus height of leaf
Reasonable to treat as linear
Drug ( mg/g)
Height (M)
10
Choosing the line of best fit1) Draw a possible
line and determine the vertical deviation of each
point from the proposed line
Deviation of point from line
11
Choosing the line of best fitcontinued ...
2) Square all deviations 3) Add all squared
deviations (Sum of squares) - measures how
well the line fits the points. (Bigger the
number, the worse fit) 4) Adjust line - minimise
Sum of Squares - Best fitting line
(Least squares fit)
12
Relationship between regression line and
regression equation
Regression equation Y a b.X
Dependent variable (Y)
Gradient b
Intercept a
Independent variable (X)
13
Using Minitab to obtain regression equation
Use menus Stat / Regression / Regression ...
14
Dependent variable
Independent variable (Mintab labels as
Predictor)
Make sure the right parameter is entered in the
right box!
15
Minitab output
Regression Analysis Drug_conc versus
Height The regression equation is Drug_conc
79.3 - 6.30 Height Predictor Coef SE
Coef T P Constant 79.296
6.023 13.17 0.000 Height
-6.296 1.176 -5.35 0.000 S 9.751
R-Sq 68.8 R-Sq(adj)
66.4 Analysis of Variance Source DF
SS MS F
P Regression 1 2726.0 2726.0
28.67 0.000 Residual Error 13 1236.0
95.1 Total 14 3962.0
Regression equation
16
Regression equation
Drug conc (mg/g) 79.3 - 6.30 x Height (M)
The minus sign indicates a negative relationship
between drug concentration and height. Graph has
a negative gradient
17
Significance testing
  • As with correlation analysis - Test whether the
    evidence of a relationship is strong enough to be
    convincing.
  • If you subject same set of data to both
    correlation and regression analysis either -
  • Both will declare it significant or
  • Both will declare it non-significant.
  • (Not surprising. Testing essentially the same
    question.)

18
Minitab output
Regression is significant. Appropriate to use
equation to make predictions.
Regression Analysis Drug_conc versus
Height The regression equation is Drug_conc
79.3 - 6.30 Height Predictor Coef SE
Coef T P Constant 79.296
6.023 13.17 0.000 Height
-6.296 1.176 -5.35 0.000 S 9.751
R-Sq 68.8 R-Sq(adj)
66.4 Analysis of Variance Source DF
SS MS F
P Regression 1 2726.0 2726.0
28.67 0.000 Residual Error 13 1236.0
95.1 Total 14 3962.0
19
Predicting drug content
Predict concentration of drug in a leaf gathered
from a height of 5 Metres up a tree Conc 79.3
- 6.30 x Height 79.3 - 6.30 x 5
79.3 - 31.5 47.8 Predicted drug
concentration 47.8 mg/g
20
Reverse prediction
  • To estimate height at which leaf grew by knowing
    its drug content
  • Do NOT re-run the regression analysis with
    height as dependent and drug content as
    independent variable.
  • Correct approach Re-arrange the existing
    equation

21
Reverse prediction
Drug 79.3 - 6.30 x Height 6.30 x Height
79.3 - Drug Height 79.3 - Drug
6.30
22
Reverse prediction
If drug conc in leaf 41.1 mg/g Height
79.3 - Drug 6.30
79.3 - 41.1 6.30
38.2 6.30
6.06 Metres (Height of leaf)
23
Extrapolation
Predict drug conc in a leaf gathered 15 M up a
tree Drug conc 79.3 - 6.30 x Height
79.3 - 6.30 x 15
79.3 - 94.5 -15.2 mg/g
Obviously ridiculous!
24
Interpolation versus Extrapolation
Extrapolation
Interpolation
Extrapolation
25
Interpolation v Extrapolation
Interpolation usually quite safe Extrapolation
will only work if you can guarantee that the
linear relationship continues beyond the region
that has been observed. In this case, it cannot.
Concs would be negative in any leaf from above 12
Meters.
26
RegressionExample 2Non-significant case
Serum creatinine concentrations reflect renal
function. Is it possible to predict the
clearance of a novel antibiotic using serum
creatinine values?
27
Creatinine Clearance (mm/l) (l/h) ---------------
---------------- 91 4.7 103 5.6 118 5.1 101 8.
3 98 6.0 111 4.2 73 5.4 103 2.6 110 3.1 82 7
.2 86 8.0 89 4.2 88 8.3 102 3.8
Plasma creatinine concentrations and clearance
values for a novel antibiotic in 16 volunteers
28
No obvious evidence of a non-linear relationship.
OK to proceed with regression.
29
Regression equation should not be used to predict
clearance
Regression equation is generated ...
but it may have no predictive value (P gt 0.05)
Regression Analysis Clearance versus
Creatinine The regression equation is Clearance
12.0 - 0.0671 Creatinine Predictor Coef
SE Coef T P Constant
11.962 3.838 3.12 0.009 Creatini
-0.06711 0.03935 -1.71 0.114 S
1.770 R-Sq 19.5 R-Sq(adj)
12.8 Analysis of Variance Source DF
SS MS F
P Regression 1 9.109 9.109
2.91 0.114 Residual Error 12 37.584
3.132 Total 13 46.693
30
Terms with which you should be familiar
  • Regression line
  • Regression equation
  • Least squares fit
  • Predictor
  • Reverse prediction
  • Interpolation
  • Extrapolation

31
What you should be able to do
  • Describe the distinction between correlation and
    regression analysis.
  • Describe the purpose of a regression equation.
  • Use a graph to avoid the use of regression
    analysis with non-linearly related data.
  • Describe the criterion by which we we select the
    line of best fit.
  • Use Minitab (etc) to produce and test the
    significance of regression equations

Continued on next slide ...
32
What you should be able to do
(Continued)
  • Use a regression equation to predict the value of
    a dependent variable from that of an independent
    variable.
  • Use reverse prediction to predict the value of
    an independent variable from that of a dependent
    variable.
  • Recognise the dangers of extrapolation.
Write a Comment
User Comments (0)
About PowerShow.com