Title: Non-linear relationships
1Non-linear relationships
- What to do, when the relationship is not linear
2Possibilities
- Transformation can practically help just in
monotonous relationships one have to be careful
transformation of predictor changes just shape,
transformation of response changes also the
probability characteristics (distribution of
residuals)
3Possibilities
- Polynomial regression any function can be
replaced (in limited range of predictor values)
by polynomial function, - i.e. b0 b1x b2x2 b3x3...
- Assumption, residuals are normal, homogeneously
distributed, i.e - Y b0 b1x b2x2 b3x3e
- Traditional names quadratic regression, cubic
regression
4Polynomial regression
Actually, it is application of multiple linear
regression, where predictors are X, X2, X3 etc.
Computation is the same (i.e. again criteria of
least sum of residual squares, which again has
(in normal conditions) one minimum). Similar
meaning has also R2, similar computation of
significance tests (i.e. total ANOVA of model,
and tests for single terms of polynomial).
So, I assume again, that e is additive,
independent of predicted value (homogeneity of
variance).
5With degree of polynomial increases flexibility
1
2
Time, in hours
Time, in hours
Weight, in kilograms
Weight, in kilograms
3
4
Time, in hours
Time, in hours
Weight, in kilograms
Weight, in kilograms
5
Attention! Increasing complexity does not always
mean better prediction ability.
Time, in hours
Weight, in kilograms
6Stepwise regression it makes model more complex
gradually
7Stepwise regression it makes model more complex
gradually -quadratic regression can be highly
significant even if linear regression is not
Significance of quadratic term can be understood
as prove of relation non-linearity
8We usually use polynomial regression, when
- we see the relation isnt linear, but we havent
any idea, what shape does the function have - I do not remember seeing wise use of higher than
third order polynomial.
9Other possibilities
- I have idea (e.g. from some theory), how should
dependence looks like and I believe the residuals
will be randomly spread around value predicted,
i.e. model is - Yf(X) e X here is vector, so, it can be more
then one independent variable - We estimate again using method of least sum of
squares
10In contrast to methods of linear regression
(including polynomial one) it is necessary to
find minimum using methods of numerical
mathematics there havent to exist analytical
solution, nor there is any certainty, that
minimum found is global one. Numerical progress
1. Derivate according to all parameters
estimated. 2. Take all the derivations as equal
to zero. 3. Solve the system.
Numerical solution of formula f(x)0
11In contrast to methods of linear regression
(including polynomial one) it is necessary to
find minimum using methods of numerical
mathematics there havent to exist analytical
solution, nor there is any certainty, that
minimum found is global one. Numerical progress
1. Derivate according to all parameters
estimated. 2. Take all the derivations as equal
to zero. 3. Resolve the system.
Numerical solution of formula f(x)0 Newton
method
f(x)
x1
x2
x3
x
My estimation of x
12Disadvantages of numerical solution
- It doesnt always converge
- It sometimes finds just local minimum
(derivations are equal to zero even there), and
we havent many possibilities how to prove, which
minimum it is. - We need initial values of parameters.
13Analogy taw is falling down
14Various local regressions I won't get
function, it is different for various local parts
of the line
15I know distribution of response variable
- Generalized Linear Models
- They are able to reflect distribution type - so,
even which values can the response have (e.g.
probability of survival must be between zero and
one) - Link function
16Typical example - logistic regression