Title: Nonlinear Regression
1Nonlinear Regression
2An example
3Questions
- What does nonlinear mean ?
- What is a nonlinear kinetics ?
- What is a nonlinear statistical model ?
- For a given model, how to fit the data ?
- Is this model relevant ?
4What does nonlinear mean ?
- Definition An operator (P) is linear if
- for all objects x, y on which it operates
- P(xy) P (x) P(y)
- for all numbers a and all objects x
- P (ax) a P(x)
When an operator is not linear, it is nonlinear
5Examples
Among the operators below which one are nonlinear
?
- P(a,b) a ? t b ? t²
- P(A,a) A exp (- a t)
- P(A) A exp (- 0.1 t)
- P(t) A exp (- a t)
- P (t) a ? t
- P(t) a
- P(t) a b? t
- P(t) a ? t b ? t²
6What is a nonlinear kinetics ?
Concentration at time t, C(t,D)
For a given dose D
The kinetics is linear when the operator
is linear
When P(D) is not linear, the kinetics is nonlinear
7What is a nonlinear kinetics ?
Examples
8What is a nonlinear statistical model ?
A statistical model
Observation Dep. variable
Parameters
Covariates indep. variables
Error residual
function
9What is a nonlinear statistical model ?
A statistical model is linear when the operator
is linear.
When
is not linear
the model is nonlinear
10What is a nonlinear statistical model ?
Example
Y Concentration t time
The model
is linear
11Examples
Among the statistical models below which one are
nonlinear ?
12Questions
- What does nonlinear mean ?
- What is a nonlinear kinetics ?
- What is a nonlinear statistical model ?
- For a given model, how to fit the data ?
- Is this model relevant ?
13How to fit the data ?
Proceed in three main steps
- Write a (statistical) model
- Choose a criterion
- Minimize the criterion
14Write a (statistical) model
- Find a function of covariate(s) to describe the
mean variation of the dependent variable (mean
model). - Find a function of covariate(s) to describe the
dispersion of the dependent variable about the
mean (variance model).
15Example
is assumed gaussian with a constant variance
homoscedastic model
16How to choose the criterion to optimize ?
Homoscedasticity Ordinary Least Squares (OLS)
When normality OLS are equivalent to maximum
likelihood
Heteroscedasticity Weight Least Squares
(WLS) Extended Least Squares (ELS)
17Homoscedastic models
The Ordinary Least-Squares criterion
Define
18Heteroscedastic models Weight Least-Squares
criterion
Define
19How to choose the weights ?
When the model
is heteroscedastic (ie is not constant
with i)
It is possible to rewrite it as
where does not depend on i
The weights are chosen as
20Example
with
The model can be rewritten as
with
The weights are chosen as
21Extended (Weight) Least Squares
Define
22Balance sheet
23The criterion properties
It converges It leads to consistent (unbiased)
estimates It leads to efficient estimates
It has several minima
24It converges
When the sample size increases, it concentrates
about a value of the parameter
Example Consider the homoscedastic model
The criterion to use is the Least Squares
criterion
25It converges
Small sample size
Large sample size
26It leads to consistent estimates
The criterion concentrates about the true value
27It leads to efficient estimates
For a fixed n, the variance of an consistent
estimator is always greater than a limit
(Cramer-Rao lower bound).
For a fixed n, the "precision" of a consistent
estimator is bounded
An estimator is efficient when its
variance equals this lower bound
28Geometric interpretation
This ellipsoid is a confidence region of the
parameter
29It leads to efficient estimates
For a given large n, it does not exist a
criterion giving consistent estimates more
"convex" than - 2 ln(likelihood)
- 2 ln(likelihood)
criterion
30It has several minima
criterion
31Minimize the criterion
Suppose that the criterion to optimize has been
chosen
We are looking for the value of
denoted
which achieve the minimum of the criterion.
We need an algorithm to minimize such a criterion
32Example
Consider the homoscedastic model
We are looking for the value of
denoted
which achieve the minimumof the criterion
33Isocontours
34Different families of algorithms
- Zero order algorithms computation of the
criterion - First order algorithms computation of the
first derivative of the criterion - Second order algorithms computation of the
second derivative of the criterion
35Zero order algorithms
- Simplex algorithm
- Grid search and Monte-Carlo methods
36Simplex algorithm
37Monte-carlo algorithm
38First order algorithms
- Line search algorithm
- Conjugate gradient
39First order algorithms
The derivatives of the criterion cancel at its
optima
Suppose that there is only one parameter to
estimate
The criterion (e.g. SS) depends only on
How to find the value(s) of where the
criterion cancels ?
40Line search algorithm
Derivative of the criterion
1
0
q
2
41Second order algorithms
Gauss-Newton (steepest descent method) Marquardt
42Second order algorithms
The derivatives of the criterion cancel at its
optima. When the criterion is (locally) convex
there is a path to reach the minimum the
steepest direction.
43Gauss Newton (one dimension)
Derivative of the criterion
3
2
1
The criterion is convex
44Gauss Newton (one dimension)
Derivative of the criterion
0
q
1
2
The criterion is not convex
45Gauss Newton
46Marquardt
Allows to deal with the case where the criterion
is not convex
When the second derivative lt0 (first derivative
decreases) it is set to a positive value
Derivative of the criterion
0
q
1
2
3
47Balance sheet
48Questions
- What does nonlinear mean ?
- What is a nonlinear kinetics ?
- What is a nonlinear statistical model ?
- For a given model, how to fit the data ?
- Is this model relevant ?
49Is this model relevant ?
- Graphical inspection of the residuals
- mean model ( f )
- variance model ( g )
- Inspection of numerical results
- variance-correlation matrix of the estimator
- Akaike indice
50Graphical inspection of the residuals
For the model
Calculate the weight residuals
and draw
vs
51Check the mean model
scatterplot of weight residuals vs fitted values
0
0
structure in the residuals change the mean
model (f function)
No structure in the residuals OK
52Check the variance model homoscedasticity
Scatterplot of weight residuals vs fitted values
0
0
No structure in the residuals but
heteroscedasticity change the model (g function)
homoscedasticity OK
53Example
homoscedastic model
Criterion OLS
54Example
structure in the residuals
change the mean model
New model
homoscedastic model
55Example
heteroscedasticity
change the variance model
New model
Need WLS
56Example
57Inspection of numerical results
correlation matrix of the estimator
- Strong correlations between estimators
- the model is over-parametrized
- the parametrization is not good
- the model is not identifiable
58The model is over-parametrized
Change the mean and/or variance model (f and/or
g )
Example The appropriate model is
and you fitted
Perform a test or check the AIC
59The parametrization is not good
Change the parametrization of your model
Example you fitted
try
the parametric curvature the intrinsic curvature
Two useful indices
60The model is not identifiable
The model has too many parameters compare to the
number of data there are lots of solutions to
the optimisation
Examples
Look at the eigenvalues of the correlation matrix
if
is too large and/or
too small, simplify the model
61The Akaike indice
The Akaike indice allows to select a model among
several models in "competition".
The Akaike indice is nothing else but the
penalized log likelihood. That is, it chooses
the model which is the more likely.
The penality is chosen such that the indice is
convergent when the sample size increases, the
indice selects the "true" model.
n sample size, SS (Weight or Ordinary) SS p
number of parameters that have been estimated
The model with the smaller AIC is the best among
the compared models
62Example
Iteration
Loss
63Example
essentially intrinsic curvature
R
64About the ellipsoid
It is linked to the convexity of the criterion It
is linked to the variance of the estimator
The convexity of the criterion is linked to the
variance of the estimator
65Different degres of convexity
flat criterion weakly convex
convex criterion
locally convex
convex in some directions
locally convex
66How to measure convexity ?
When the second derivative is positive, the
criterion is convex at the point where the
second derivative is evaluated
One parameter
Calculate the hessian matrix matrix of partial
second derivatives
Several parameters
67How to measure convexity ?
It is possible to find a linear transformation of
the parameters such that the hessian matrix is
are the eigenvalues of the hessian matrix
the criterion is convex
68How to measure convexity ?
When for some ,
and
the criterion is locally convex
When
are low (but gt0),
and
the criterion is flat
69The variance-covariance matrix
The variance-covariance matrix of the estimator
(denoted V) is proportional to
It is possible to find a linear transformation
of the parameters such that V is
70The variance-covariance matrix
are the eigenvalues of the variance-covariance
matrix V
71The correlation matrix
The correlation matrix of the estimator (denoted
C ) is obtained from V
correlation matrix
72Geometric interpretation
criterion
Axes of the ellipsoid // axes
r 0