Linear Regression - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Linear Regression

Description:

CONCENT 2.916 0.069 42.030 0.000. 21. Questions. How to obtain the best straight line ? ... CONCENT 121251.776 5 24250.355 289.434 0.000. Error 1005.427 12 83.786. 29 ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 50
Provided by: CONCO6
Category:

less

Transcript and Presenter's Notes

Title: Linear Regression


1
Linear Regression
  • Didier Concordet

2
An example
3
About the straight line
Y a b x
a intercept
b slope
4
Questions
  • How to obtain the best straight line ?
  • Is this straight line the best curve to use ?
  • How to use this straight line ?

5
How to obtain the best straight line ?
Proceed in three main steps
  • write a (statistical) model
  • estimate the parameters
  • graphical inspection of data

6
Write a model
A statistical model
Mean model functionnal relationship
Variance model Assumptions on the residuals
7
Write a model
Mean model
residual (error term)
8
Assumptions on the residuals
  • the xi 's are not random variables
  • they are known with a high precision
  • the ei 's have a constant variance
  • homoscedasticity
  • the ei 's are independent
  • the ei 's are normally distributed
  • normality

9
Homoscedasticity
homoscedasticity
heteroscedasticity
10
Normality
Y
x
11
Estimate the parameters
A criterion is needed to estimate parameters
A statistical model
A criterion
12
How to estimate the "best" a et b ?
Intuitive criterion
minimum
compensation
Reasonnable criterion
minimum
Linear model Homoscedasticity Normality
Least squares criterion (L.S.)
13
The least squares criterion
14
Result of optimisation
and
change with samples
and
are random variables
15
Balance sheet
True mean straight line
Estimated straight line
or
Mean predicted value for the ith observation
ith residual
16
Example
Dep Var HPLC N 18 Effect
Coefficient Std Error t P(2 Tail)
CONSTANT 20.046 3.682 5.444
0.000 CONCENT 2.916 0.069
42.030 0.000
Intercept
Estimated straight line
Slope
17
Example
18
Example
19
Residual variance
by construction
but
The residual variance is defined by
standard error of estimate
20
Example
Dep Var HPLC N 18 Multiple R 0.996 Squared
multiple R 0.991 Adjusted squared multiple R
0.991 Standard error of estimate 8.282
Effect Coefficient Std Error t
P(2 Tail) CONSTANT 20.046
3.682 5.444 0.000 CONCENT 2.916
0.069 42.030 0.000
21
Questions
  • How to obtain the best straight line ?
  • Is this straight line the best curve to use ?
  • How to use this straight line ?

22
Is this model the best one to use ?
  • Tools to check the mean model
  • scatterplot residuals vs fitted values
  • test(s)
  • Tools to check the variance model
  • scatterplot residuals vs fitted values
  • Probability plot (Pplot)

23
Checking the mean model
scatterplot residuals vs fitted values
0
0
structure in the residuals change the mean model
No structure in the residuals OK
24
Checking the mean model tests
Two cases
replications
no replication
25
Without replication
try another mean model and test the improvement
Example
If the test on c is significant (c ? 0) then keep
this model
Dep Var HPLC N 18 Multiple R 0.996
Squared multiple R 0.991 Adjusted squared
multiple R 0.991 Standard error of estimate
8.539 Effect Coefficient Std Error
t P(2 Tail) CONSTANT 21.284
6.649 3.201 0.006 CONCENT
2.842 0.335 8.486
0.000 CONCENT CONCENT 0.001
0.003 0.227 0.824
26
With replications
Perform a test of lack of fit
Pure error
Principle compare
to
if
gt
then change the model
-
27
Test of lack of fit how to do it ?
Three steps
1) Linear regression
2) One way ANOVA
3)
if
then change the model
28
Test of lack of fit example
Three steps
1) Linear regression
2) One way ANOVA
Dep Var HPLC N 18
Analysis of Variance Source
Sum-of-Squares df Mean-Square F-ratio
P CONCENT 121251.776 5
24250.355 289.434 0.000 Error
1005.427 12 83.786
3)
if
We keep the straight line
29
Checking the variance model homoscedasticity
scatterplot residuals vs fitted values
0
0
No structure in the residuals but
heteroscedasticity change the model (criterion)
homoscedasticity OK
30
What to do with heteroscedasticity ?
scatterplot residuals vs fitted values
modelize the dispersion.
0
The standard deviation of the residuals
increases with it increases with x
31
What to do with heteroscedasticity ?
Estimate again the slope and the intercept but
with weights proportionnal to the variance.
with
and check that the weight residuals (as defined
above) are homoscedastic
32
Checking the variance model normality
0
Expected value for normal distribution
Expected value for normal distribution
0
No curvature Normality
Curvature non normality is it so important ?
33
What to do with non normality ?
Try to modelize the distribution of residuals
In general, it is difficult with few observations
If enough observations are available, the non
normality does not affect too much the result.
34
An interesting indice R²
R² square correlation coefficient
of dispersion of the Yi's explained by the
straight line (the model)
0 ? R² ? 1
If R² 1, all the ei 0, the straight line
explain all the variation of the Yi's
If R² 0, the slope is 0, the straight line
does not explain any variation of the Yi's
35
An interesting indice R²
R² and R (correlation coefficient) are not
designed to measure linearity !
Example
Multiple R 0.990 Squared multiple R
0.980 Adjusted squared multiple R 0.980
36
Questions
  • How to obtain the best straight line ?
  • Is this straight line the best curve to use ?
  • How to use this straight line ?

37
How to use this straight line ?
  • Direct use for a given x
  • predict the mean Y
  • construct a confidence interval of the mean Y
  • construct a prediction interval of Y
  • Reverse use calibration (approximate results)
    for a given Y
  • predict the mean x
  • construct a confidence interval of the mean x
  • construct a prediction interval of X

38
For a given x predict the mean Y
Example
39
Confidence interval of the mean Y
There is a probability 1-a that abx belongs to
this interval
40
Confidence interval of the mean Y
U
L
30
41
Example
42
Prediction interval of Y
100(1-a) of the measurements carried-out for
this x belongs to this interval
43
Prediction interval of Y
U
L
30
44
Example
45
Reverse use for a given Yy0 predict the mean X
Example
46
For a given Yy0 a confidence interval of the
mean X
Y0
X
U
L
47
Confidence interval of the mean X
There is a probability 1-a that the mean X
belongs to L , U
L and U are so that
48
Example
49
What you should no longer believe
One can fit the straight line by inverting x and Y
If the correlation coefficient is high, the
straight line is the best model
Normality of the xi's is required to perform a
regression
Normality of the ei's is essential to perform a
good regression
Write a Comment
User Comments (0)
About PowerShow.com