A Comparison of Mixed Model Splines

About This Presentation

Title:

A Comparison of Mixed Model Splines

Description:

... as knots, we get an exact fit and the fitted curve is the interpolation spline ... Note: fitting both terms as fixed gives interpolation spline. ... – PowerPoint PPT presentation

Number of Views:151

Avg rating:3.0/5.0

Slides: 64

Provided by: wwwmath

Category:

more less

Transcript and Presenter's Notes

Title: A Comparison of Mixed Model Splines

1
A Comparison of Mixed Model Splines

Sue Welham
Rothamsted Research
with acknowledgements to
Robin Thompson, Brian Cullis,
Mike Kenward, GRDC

2
Motivation

Smoothing splines provide flexible data-driven
functions useful for modelling real data
Several types of spline have been proposed within
the mixed model context
cubic smoothing splines
P-splines
penalised splines
Which should you use?
Does it make any difference?
This talk will review these models, show the
connections between them, and look at the impact
of different penalties on the fitted spline.

3
Outline

Splines and basis functions
Smoothing splines and penalties
Mixed model splines
Cubic smoothing spline P-spline Penalised
splines
Connections between models
General model (with constraints)
Choice of model
example yield-response data
simulation study to assess behaviour of models
diagnostics to assess model fit
Conclusions

4
Simple spline model

Suppose we have n data values
y ( y1 yn )T
and explanatory variable on range a, b
x ( x1 xn )T a lt x1 lt x2 lt lt xn lt b
We assume the data follows some functional form
y(x) g(x) e(x)
for some unknown but smooth function g, and
error process e.
Evaluated at the data points, x,
y g(x) e
with e N( 0, s2R(f) ), in the simplest case
RIn.

5
Polynomial splines

A polynomial spline g(x) of degree k (k3 for
cubic splines) with r knots t1tr has the
properties
on each interval tj, tj1 , g(x) is a
polynomial of degree k
g(x) and derivatives to order k-1 are continuous
on a,b
Natural splines of degree k 2m -1 obey an
additional constraint
g( j )(t1) g( j )(tr) 0 for j mk
ie. natural splines have higher order
derivatives constrained to zero outside the range
of the knots.
Natural cubic splines are linear outside the
range of the knots.

6
Spline basis functions

A set of cubic spline basis functions for knot
points t(t1 tr)T is a set of functions si
i1r4 such that
each basis function si is itself a cubic spline
the set si spans the set of cubic splines for
the knot points t
Then any cubic spline g(x) can be represented as
for suitably chosen coefficients ci.
For natural splines, the basis functions should
also be natural splines, with only r basis
functions required.

7
Regression splines

Regression splines are constructed as follows
a set of r knots is chosen, r n
a set of rk1 basis functions si is constructed
the spline is fitted via regression using the
model
y g(x) e ? ci si(x) e
If the n data points are used as knots, we get an
exact fit and the fitted curve is the
interpolation spline
The smoothness of the curve is determined by the
number and position of the knots fewer knots ?
smoother curve
Choice of knots subjective

8
Regression spline

black line is function
blue dots are data (32 points)
green line is regression spline with 20 knots
(equal spacing)
red line is regression spline with 9 knots (equal
spacing)

9
Smoothing splines

Smoothing splines are fitted to the model
y g(x) e ? ci si(x) e
by minimising a penalised sum of squares (PSS)
y-g(x) T R-1 y-g(x) ? penalty
where
the first term (residual sum of squares) measures
fidelity of the fitted curve to the data
the penalty quantifies undesirable behaviour
(roughness) of the fitted curve
the PSS balances the fit of the curve to the data
against the roughness of the fitted curve
? is a smoothing parameter that controls the
balance between the two terms, and hence
smoothness of fitted curve

10
Cubic smoothing spline

The cubic smoothing spline is the natural cubic
spline with knots at the n covariate values that,
for a given smoothing parameter value, minimises
PSS
y-g(x) T R-1 y-g(x) ? ? g''(x)2
the penalty is the integrated squared second
derivative of the fitted spline
can be regarded as measuring accumulated
deviations from straight line
Penalty can be rewritten as
? g''(x)2 ?TGs-1 ?
where
? ( ?2 ?n-1 ) holds values of second
derivatives of fitted spline at internal knots x2
xn-1
Gs-1 is a tri-diagonal matrix with n-2 rows
defined in terms of distances between successive
knots

11
Cubic smoothing splines

Cubic spline can be written as
g(x) t1 t2x ? ?i Pi(x)
where Pi(x), i2n-1are basis functions for
natural cubic splines representing deviations
about linear trend
or in matrix form
g(x) Xt P?
where X 1 x , t (t1 t2)T and P P2(x)
Pn-1(x)
with penalised sum of squares
y- Xt - P? T R-1 y- Xt - P? ? ?TGs-1 ?
this looks like the mixed model equations from a
linear mixed model ? mixed model smoothing
splines
described by Verbyla et al 1999, Brumback Rice
(1998), Wang (1998), Zhang et al (1998).

12
Basis functions for natural cubic spline

Basis functions for natural cubic spline
with knots at x 1, 2 10
(excluding constant and linear)

13
Mixed model cubic smoothing splines

Given the smoothing parameter, the cubic
smoothing spline is the BLUP
from the mixed model
y Xt P? e
with var(?)ss2Gs , var(e) s2R and ? s2/
ss2
The smoothing parameter can also be estimated by
REML as a variance parameter.
Note fitting both terms as fixed gives
interpolation spline. Fitting deviations as
random introduces smoothing via shrinkage of
random effects. Amount of shrinkage depends on ?.

14
Mixed model cubic smoothing splines

REML estimation selects smoothing parameter value
that gives best fit to implicit variance model
defined by P and Gs
Practical experience (and various reported
simulation studies) suggests this generally works
reasonably well
Advantage we can build spline terms into complex
mixed models to account for all other sources of
variation in the data, including correlated
errors
But
are we using this model just as a tool, or do we
really believe it as a model for the data?
if a tool, need to bear this in mind when making
inferences from fitted model
might get better results when model is compatible
with data

15
Reduced knot mixed model spline

Advantage of cubic smoothing spline no need to
use subjective choice of knots
Design matrix for spline basis functions is
dense, with columns number of distinct
covariate values - 2
For large data sets, models may be slow to fit
Use reduced knot version of cubic smoothing
spline
for specified knots t ( t1 tr )T form basis
functions P2 Pr-1
fit model with same fixed effects, but basis
functions, random effects ? and variance matrix
Gs defined with respect to knots
Number of columns in random design matrix r-2
Choice of knots subjective (and affects fitted
spline)
Fitted spline no longer minimises PSS
May be better methods to fit low-rank
approximation

16
P-splines

Introduced by Eilers Marx (1996) as simple and
computationally efficient alternative to
smoothing splines
P-splines
use a B-spline basis
reduced knot set, knots equally spaced
discrete penalty defined using differencing
operators
Model g(x) ? ai Bi(x) or g(x) Ba
Penalised sum of squares
y-Ba Ty-Ba ?aT?dT ?d a
where ?d is dth order differencing matrix
User has choice of
order of B-splines (k)
number and position of knots t1 tr
order of differencing in the penalty (d)

17
Cubic B-spline basis

Cubic B-spline basis for knots 1, 2 10
Note compact support for individual basis
functions

18
P-splines as mixed models

For k3, d2, P-spline approximates cubic
smoothing spline
In the general case, let
Ir ?dT ( ?d?dT )-1?d LLT
for full column-rank matrix L
Mixed model form of P-spline uses
y BLtb B ?dT ( ?d?dT )-1ub e
with tb LTa and ub ?da with var(ub)ss2I
BL is a polynomial of degree d
For d2, fixed part of model represents linear
trend, random part represents deviations about
linear trend.
Sparse representation lost within mixed model
context

19
Cubic P-spline mixed models

Comparison with cubic smoothing spline
P-splines
use equally spaced knots
has choice of differencing order in penalty
differencing order determines fixed terms in
model
non-natural basis (this can be modified)
Cubic smoothing spline
optimisation property for full-knot basis
any knots can be used for reduced-knot set
fixed terms determined by order of spline basis

20
Penalised splines

described in Semi-parametric regression by
Ruppert, Wand Carroll (2003)
Penalised splines use
truncated power function basis
reduced knot set
ad-hoc penalty to induce smoothing
Model
where
XT 1 x xk tT (tT0 tTk)T,
T T1(x) Tr(x) ß (ß1 ßr)T

21
Truncated power function (TPF) basis

Cubic TPF basis for knots 1, 2 10

22
Penalised splines

Penalised sum of squares
y XTtT - Tß Ty XTtT - Tß ?ßTß
ad-hoc penalty used to introduce
shrinkage/smoothing
User choice
order of basis
knot positions
Mixed model formulation
y XTtT Tß e
with var(ß) ss2I, var(e) s2R
TPF basis has the advantage that it is simple,
can add/remove knots without disturbing whole
basis

23
Comparison of mixed model splines

Polynomial smoothing P-spline Penalised spline
spline
--------------------------------------------------
---------------------------------------------
Choice of k Choice of k Choice of k
(order of polynomial)
- Choice of d -
(differencing)
k ? fixed terms d ? fixed terms k ? fixed terms
k ? penalty d ? penalty Identity penalty
matrix
Choice of knots Choice of knots Choice of knots
(equal spaced)
--------------------------------------------------
---------------------------------------------

24
Connections between spline bases

Blurring over details
k-degree B-spline basis can be constructed as
(k1)th differenced TPF basis
cubic B-spline basis is 4th difference of cubic
TPF basis
natural cubic spline (NCS) basis functions can be
constructed as 2nd differences of cubic TPF basis
Hence for equally-spaced knots, spacing h, we
have equivalent representations of cubic spline
from different bases with
ß ?2? / 6h ? ?2a / h2 ß ?4a / 6h3

25
Connections between spline bases

We can compare penalties
For cubic smoothing spline and cubic P-spline
with 2nd order differencing (recommended default)
?TGs-1? ? ??j2 0.5 ??j ?j1
aT?2T?2 a ? ?T?
so second difference penalty can be regarded as
approximation to cubic smoothing spline penalty
For cubic penalised spline
ßTß ? aT?4T?4a ? ?T?2T?2?
so cubic penalised spline is equivalent to cubic
P-spline with d4
Linear penalised spline (recommended default) is
equivalent to linear P-spline with d2.

26
General mixed model spline

For spline basis Bi(x) i1q
Model is
y ? aiBi(x) e Ba e
with penalised sum of squares
y-Ba T y-Ba ?aTG-1a
where G is a semi-positive definite symmetric
penalty matrix (eg. ?dT?d or evaluated integral)
Extend model to impose c constraints CTa0
(eg. natural, periodic) via aS? where CTS0
so PSS becomes
y-BS? T y-BS? ? ?TSTG-1S?

27
General mixed model spline

Fit model via eigen-value decomposition
STG-1S UDUT
for some q-c ? q-c-s matrix U with UTUI and
Ddiagd1dq-c-s
Then
BS? BSLt BSUD0.5u
uTu ?TSTG-1S?
where
LLTI-UUT
tLT? is an s?1 vector fit as fixed effects
uD0.5UT? is a q-c-s vector, fit as random
effects with var(u)ss2I

28
General mixed model spline

General model allows easy transformation between
bases
if Tß Ba with TBA for some full rank matrix A
then a Aß
So PSS becomes
y-BAß T y-BAß ? ßTATG-1Aß
Immediately gives relationship between penalties
Calculation proceeds as before

29
Example cubic smoothing spline with additional
periodic constraints

Suppose data periodic on range 0,?
Use full TPF basis with penalty ?g''2 ? ßTG-1ß
Impose constraints g(j)(0)g(j)(?) for j0,1,2

Derive basis
fixed term constant
subset of random basis fns shown

30
User choices that affect fitted spline

Order of spline basis
for given penalty and reasonable number of knots,
order of basis makes little difference
basis should be appropriate to aims of analysis
eg. to estimate derivatives of curve, require
differentiable function
linear splines are continuous but not
differentiable

31
User choices that affect fitted spline

Number of knots
choice of both number and position of knots
subjective
reducing knots reduces flexibility of fitted
spline
usually aim to have sufficient knots to allow
flexible spline without incurring large
computational load
large regression spline literature on knot
selection
Ruppert et al (2003) advocate
with knots at quantiles of covariate
can use REML logL to compare knot sets

32
User choices that affect fitted spline

Penalty matrix
consider in P-spline context penalty depends on
order of differencing
increased order of differencing ? more
constrained/ smoother spline
in practice, selection of smoothing parameter by
REML to optimise fit to data often (but not
always) leads to similar fitted splines
difficulties occur in comparing models with
different penalties, as this also implies
different fixed effects
first step understand implications of different
penalties
Example
experiment to look at the influence of soil
phosphate on sugar content of beet
16 samples taken in 4 years

Soil
phosphate
data
Sugar yield increases with soil phosphate
Differences between years
No trend across years

34
Analysis of soil phosphate data

Model for data separate cubic splines for each
year
fixed Year lin(t) Year.lin(t)
random spl(t) Year.spl(t)
residual independent error
Model allows for separate linear slopes for each
year, and separate smooth trends for each year
Assess whether separate spline terms need for
each year using change in REML log-likelihood for
?
Note new results on these tests from Crainiceanu
Ruppert (2004)
Assess whether separate linear trends required
for each year using Wald tests (better with
Kenward-Roger adjustments)

35
Analysis of soil phosphate data

Can write similar models for data using
cubic P-spline with differencing order 1, 2, 3 or
4
linear penalised spline
Note fixed model changes according to order of
differencing.
Use 14 equally-spaced knots over range of data in
all cases.
Fitted models
all models require separate fixed effects for
each year
P-spline with order 1 and 3 require only common
spline
other splines require separate splines for each
year

Fitted curves (restricted to range of data) for
cubic P-splines with d1, 2, 3, 4
Estimated residual variances
d1 s20.94
d2 s20.44
d3 s20.51
d4 s20.41

Fitted curves (restricted to range of data) for
NCS, cubic P-spline with d2, linear penalised
spline
Estimated residual variances
NCS s20.39
Pspl s20.44
LPS s20.41

38
Does order of penalty make any real difference?

Simulation study
four known functions on 0,1
data generated as function error with normal
errors at 32 equally-spaced points
four different error variances, defined with as
0.5, 0.25, 0.1 or 0.05 ? range of function
Compare splines, all with full knot set
NCS natural cubic spline
cubic P-spline with d1,2,3,4
linear penalised spline
Evaluate models in terms of average mean squared
error of prediction (MSEP) across 500 data sets

Known functions used for simulation study
1 smooth symmetric curve
2 regular cycles
3 changing curvature
4 not diffnble at x1/3

40
Average fitted spline for function fA

-- function
-- d1
-- d2
-- d3
-- d4

41
Average absolute error in fitted spline for
function fA

-- d1
-- d2
-- d3
-- d4

42
Results for function fA (500 simulations)

Average MSEP as of minimum (se)
s d1 d2 d3 d4
0.5 127 (3.6) 100 (2.9) 104 (2.7) 117 (3.1)
0.25 127 (2.9) 100 (2.2) 126 (2.2) 112 (2.5)
0.1 142 (2.6) 100 (2.0) 112 (2.7) 111 (2.0)
0.05 203 (3.1) 111 (2.2) 100 (2.1) 110 (2.2)
100 x Average ratio estimated s actual s ( zero
ss2)
s d1 d2 d3 d4
0.5 91 (1) 97 99 (64) 98 (49)
0.25 85 99 104 (46) 101 (17)
0.1 72 97 104 (1) 106
0.05 53 91 97 104

43
Summary for function fA

d1 fits sharp change at ends well, but does less
well elsewhere
d2 gives best MSEP for large s2
d3 gives best MSEP for small s2
d1 over-fits (allocates noise as signal)
d4 slightly over-smooths (allocates signal as
noise)
For large s2, d3,4 often fit fixed terms only,
ie. all other variation is allocated as noise
As noise decreases, variation around fixed terms
split between smooth trend and noise
As d increases, more likely to allocate high
frequency variation as noise

44
Average fitted spline for function fB

-- function
-- d1
-- d2
-- d3
-- d4

45
Average absolute error in fitted spline for
function fB

-- d1
-- d2
-- d3
-- d4

46
Results for function fB

Average MSEP as of minimum (se)
s d1 d2 d3 d4
0.5 100 (1.7) 119 (1.5) 120 (1.5) 112 (1.9)
0.25 119 (2.9) 110 (4.3) 105 (4.5) 100 (3.0)
0.1 183 (2.7) 116 (2.0) 100 (1.9) 101 (1.9)
0.05 250 (2.7) 134 (2.1) 106 (1.8) 100 (1.8)
Average ratio estimated s actual s ( zero ss2)
s d1 d2 d3 d4
0.5 110 (39) 126 (38) 126 (62) 117 (47)
0.25 85 (1) 97 (1) 101 (4) 102 (1)
0.1 53 83 93 97
0.05 13 73 90 95

47
Summary for function fB

For large s2, pattern lost in noise for all
splines (many zero spline variance components)
d3,4 give best MSEP
d1 over-fits (allocates noise as signal)
d2 slightly over-fits
optimal value of d increases as signalnoise
ratio increases

48
Average fitted spline for function fc

-- function
-- d1
-- d2
-- d3
-- d4

49
Average absolute error in fitted spline for
function fC

-- d1
-- d2
-- d3
-- d4

50
Results for function fC

Average MSEP as of minimum (se)
s d1 d2 d3 d4
0.5 100 (1.3) 111 (1.3) 116 (1.4) 116 (1.6)
0.25 100 (2.7) 132 (2.9) 153 (2.5) 150 (2.1)
0.1 101 (1.5) 100 (1.7) 155 (3.5) 255 (4.0)
0.05 140 (1.6) 100 (1.4) 110 (1.8) 181 (7.7)
Average ratio estimated s actual s ( zero ss2)
s d1 d2 d3 d4
0.5 109 (49) 115 (51) 115 (50) 113 (27)
0.25 104 (10) 133 (14) 147 (15) 146 (3)
0.1 68 109 158 234
0.05 25 83 117 185

51
Fitted splines for function fC for s 0.5, 0.25,
0.1

x data
-- d1
-- d4

52
Second difference of fitted splines for function
fC with s 0.5, 0.25, 0.1

-- d1
-- d4

53
P-spline vs linear penalised spline for function
fC

For s0.05 linear penalised spline over-fits
Fitted spline becomes noticeably spiky
Errors deviations of fitted splines from
function

-- Pspline k3, d2
-- Linear penalised spline
(k1, d2)

54
Summary for function fC

sharp peak shrunk, shrinkage decreases as noise
decreases
small d fits peaks better, but does slightly less
well elsewhere
d1,2 gives best MSEP
d3,4 over-smooths (allocates signal as noise)
some over-fitting for linear penalised spline for
small s

55
Average fitted spline for function fD

-- function
-- d1
-- d2
-- d3
-- d4

56
Average absolute error in fitted spline for
function fD

-- d1
-- d2
-- d3
-- d4

57
Results for function fD

Average MSEP as of minimum (se)
s d1 d2 d3 d4
0.5 101 (3.1) 100 (2.6) 101 (2.3) 111 (3.2)
0.25 103 (2.7) 100 (2.7) 102 (2.4) 103 (2.5)
0.1 130 (2.7) 100 (1.7) 123 (2.1) 139 (2.4)
0.05 146 (2.5) 100 (1.8) 120 (2.2) 136 (2.5)
Average ratio estimated s actual s ( zero ss2)
s d1 d2 d3 d4
0.5 93 (1) 98 (30) 98 (42) 97 (66)
0.25 90 99 (2) 100 (11) 99 (61)
0.1 81 98 106 108 (37)
0.05 71 98 110 116 (2)

58
Summary for function fD

For large s2, all splines similar as
discontinuity masked by noise
As noise decreases, discontinuity becomes
apparent and fitted better by small d
d2 fits uniformly better balances good fit of
smooth trend with adaptation to discontinuity

59
Summary of simulation results

d1 fits detail well, but smooth trend badly
d4 fits detail badly, smooth trend well
As signalnoise ratio increases, smooth trend
becomes more apparent, optimal order of d tends
to increase
Optimal d depends on characteristics of curve
and signalnoise ratio
As d increases, splines are more likely to
allocate high frequency pattern as noise
For small d, there is a tendency to over-fit.
For large d, there is a tendency to over-smooth.
Both ameliorated by replication enabling
independent estimate of error

60
Which penalty to use?

For exploratory analysis
purpose is to explore shape of underlying
function
use spline with small d, eg. cubic spline, cubic
P-spline with d2
For modelling curves
purpose is to get reliable description of
underlying function
more problematic no objective method to choose
between penalties
requires further work
wider class of L-splines that uses different
penalties, eg. useful for periodic data, should
also be considered
we are looking at diagnostics to identify which
spline is most consistent with data

61
Conclusions

There are close connections between common
polynomial mixed model splines with approximate
equivalences between
cubic smoothing spline cubic P-spline with d2
linear penalised spline
linear penalised spline linear P-spline with d2
cubic penalised spline cubic P-spline with d4
Writing a general spline model makes these
connections clear and enables easy transformation
between bases as well as inclusion of constraints

62
Conclusions