Title: A Comparison of Mixed Model Splines
1A Comparison of Mixed Model Splines
- Sue Welham
- Rothamsted Research
- with acknowledgements to
- Robin Thompson, Brian Cullis,
- Mike Kenward, GRDC
2Motivation
- Smoothing splines provide flexible data-driven
functions useful for modelling real data - Several types of spline have been proposed within
the mixed model context - cubic smoothing splines
- P-splines
- penalised splines
- Which should you use?
- Does it make any difference?
- This talk will review these models, show the
connections between them, and look at the impact
of different penalties on the fitted spline.
3Outline
- Splines and basis functions
- Smoothing splines and penalties
- Mixed model splines
- Cubic smoothing spline P-spline Penalised
splines - Connections between models
- General model (with constraints)
- Choice of model
- example yield-response data
- simulation study to assess behaviour of models
- diagnostics to assess model fit
- Conclusions
4Simple spline model
- Suppose we have n data values
- y ( y1 yn )T
- and explanatory variable on range a, b
- x ( x1 xn )T a lt x1 lt x2 lt lt xn lt b
- We assume the data follows some functional form
- y(x) g(x) e(x)
- for some unknown but smooth function g, and
error process e. - Evaluated at the data points, x,
- y g(x) e
- with e N( 0, s2R(f) ), in the simplest case
RIn.
5Polynomial splines
- A polynomial spline g(x) of degree k (k3 for
cubic splines) with r knots t1tr has the
properties - on each interval tj, tj1 , g(x) is a
polynomial of degree k - g(x) and derivatives to order k-1 are continuous
on a,b - Natural splines of degree k 2m -1 obey an
additional constraint - g( j )(t1) g( j )(tr) 0 for j mk
- ie. natural splines have higher order
derivatives constrained to zero outside the range
of the knots. - Natural cubic splines are linear outside the
range of the knots.
6Spline basis functions
- A set of cubic spline basis functions for knot
points t(t1 tr)T is a set of functions si
i1r4 such that - each basis function si is itself a cubic spline
- the set si spans the set of cubic splines for
the knot points t - Then any cubic spline g(x) can be represented as
- for suitably chosen coefficients ci.
- For natural splines, the basis functions should
also be natural splines, with only r basis
functions required.
7Regression splines
- Regression splines are constructed as follows
- a set of r knots is chosen, r n
- a set of rk1 basis functions si is constructed
- the spline is fitted via regression using the
model - y g(x) e ? ci si(x) e
- If the n data points are used as knots, we get an
exact fit and the fitted curve is the
interpolation spline - The smoothness of the curve is determined by the
number and position of the knots fewer knots ?
smoother curve - Choice of knots subjective
8Regression spline
- black line is function
- blue dots are data (32 points)
- green line is regression spline with 20 knots
(equal spacing) - red line is regression spline with 9 knots (equal
spacing)
9Smoothing splines
- Smoothing splines are fitted to the model
- y g(x) e ? ci si(x) e
- by minimising a penalised sum of squares (PSS)
- y-g(x) T R-1 y-g(x) ? penalty
- where
- the first term (residual sum of squares) measures
fidelity of the fitted curve to the data - the penalty quantifies undesirable behaviour
(roughness) of the fitted curve - the PSS balances the fit of the curve to the data
against the roughness of the fitted curve - ? is a smoothing parameter that controls the
balance between the two terms, and hence
smoothness of fitted curve
10Cubic smoothing spline
- The cubic smoothing spline is the natural cubic
spline with knots at the n covariate values that,
for a given smoothing parameter value, minimises
PSS - y-g(x) T R-1 y-g(x) ? ? g''(x)2
- the penalty is the integrated squared second
derivative of the fitted spline - can be regarded as measuring accumulated
deviations from straight line - Penalty can be rewritten as
- ? g''(x)2 ?TGs-1 ?
- where
- ? ( ?2 ?n-1 ) holds values of second
derivatives of fitted spline at internal knots x2
xn-1 - Gs-1 is a tri-diagonal matrix with n-2 rows
defined in terms of distances between successive
knots
11Cubic smoothing splines
- Cubic spline can be written as
- g(x) t1 t2x ? ?i Pi(x)
- where Pi(x), i2n-1are basis functions for
natural cubic splines representing deviations
about linear trend - or in matrix form
- g(x) Xt P?
- where X 1 x , t (t1 t2)T and P P2(x)
Pn-1(x) - with penalised sum of squares
- y- Xt - P? T R-1 y- Xt - P? ? ?TGs-1 ?
- this looks like the mixed model equations from a
linear mixed model ? mixed model smoothing
splines - described by Verbyla et al 1999, Brumback Rice
(1998), Wang (1998), Zhang et al (1998).
12Basis functions for natural cubic spline
- Basis functions for natural cubic spline
- with knots at x 1, 2 10
- (excluding constant and linear)
13Mixed model cubic smoothing splines
- Given the smoothing parameter, the cubic
smoothing spline is the BLUP - from the mixed model
- y Xt P? e
- with var(?)ss2Gs , var(e) s2R and ? s2/
ss2 - The smoothing parameter can also be estimated by
REML as a variance parameter. - Note fitting both terms as fixed gives
interpolation spline. Fitting deviations as
random introduces smoothing via shrinkage of
random effects. Amount of shrinkage depends on ?.
14Mixed model cubic smoothing splines
- REML estimation selects smoothing parameter value
that gives best fit to implicit variance model
defined by P and Gs - Practical experience (and various reported
simulation studies) suggests this generally works
reasonably well - Advantage we can build spline terms into complex
mixed models to account for all other sources of
variation in the data, including correlated
errors - But
- are we using this model just as a tool, or do we
really believe it as a model for the data? - if a tool, need to bear this in mind when making
inferences from fitted model - might get better results when model is compatible
with data
15Reduced knot mixed model spline
- Advantage of cubic smoothing spline no need to
use subjective choice of knots - Design matrix for spline basis functions is
dense, with columns number of distinct
covariate values - 2 - For large data sets, models may be slow to fit
- Use reduced knot version of cubic smoothing
spline - for specified knots t ( t1 tr )T form basis
functions P2 Pr-1 - fit model with same fixed effects, but basis
functions, random effects ? and variance matrix
Gs defined with respect to knots - Number of columns in random design matrix r-2
- Choice of knots subjective (and affects fitted
spline) - Fitted spline no longer minimises PSS
- May be better methods to fit low-rank
approximation
16P-splines
- Introduced by Eilers Marx (1996) as simple and
computationally efficient alternative to
smoothing splines - P-splines
- use a B-spline basis
- reduced knot set, knots equally spaced
- discrete penalty defined using differencing
operators - Model g(x) ? ai Bi(x) or g(x) Ba
- Penalised sum of squares
- y-Ba Ty-Ba ?aT?dT ?d a
- where ?d is dth order differencing matrix
- User has choice of
- order of B-splines (k)
- number and position of knots t1 tr
- order of differencing in the penalty (d)
17Cubic B-spline basis
- Cubic B-spline basis for knots 1, 2 10
- Note compact support for individual basis
functions
18P-splines as mixed models
- For k3, d2, P-spline approximates cubic
smoothing spline - In the general case, let
- Ir ?dT ( ?d?dT )-1?d LLT
- for full column-rank matrix L
- Mixed model form of P-spline uses
- y BLtb B ?dT ( ?d?dT )-1ub e
- with tb LTa and ub ?da with var(ub)ss2I
- BL is a polynomial of degree d
- For d2, fixed part of model represents linear
trend, random part represents deviations about
linear trend. - Sparse representation lost within mixed model
context
19Cubic P-spline mixed models
- Comparison with cubic smoothing spline
- P-splines
- use equally spaced knots
- has choice of differencing order in penalty
- differencing order determines fixed terms in
model - non-natural basis (this can be modified)
- Cubic smoothing spline
- optimisation property for full-knot basis
- any knots can be used for reduced-knot set
- fixed terms determined by order of spline basis
20Penalised splines
- described in Semi-parametric regression by
Ruppert, Wand Carroll (2003) - Penalised splines use
- truncated power function basis
- reduced knot set
- ad-hoc penalty to induce smoothing
- Model
- where
- XT 1 x xk tT (tT0 tTk)T,
- T T1(x) Tr(x) ß (ß1 ßr)T
21Truncated power function (TPF) basis
- Cubic TPF basis for knots 1, 2 10
22Penalised splines
- Penalised sum of squares
- y XTtT - Tß Ty XTtT - Tß ?ßTß
- ad-hoc penalty used to introduce
shrinkage/smoothing - User choice
- order of basis
- knot positions
- Mixed model formulation
- y XTtT Tß e
- with var(ß) ss2I, var(e) s2R
- TPF basis has the advantage that it is simple,
can add/remove knots without disturbing whole
basis
23Comparison of mixed model splines
- Polynomial smoothing P-spline Penalised spline
- spline
- --------------------------------------------------
--------------------------------------------- - Choice of k Choice of k Choice of k
- (order of polynomial)
- - Choice of d -
- (differencing)
- k ? fixed terms d ? fixed terms k ? fixed terms
- k ? penalty d ? penalty Identity penalty
matrix - Choice of knots Choice of knots Choice of knots
- (equal spaced)
- --------------------------------------------------
---------------------------------------------
24Connections between spline bases
- Blurring over details
- k-degree B-spline basis can be constructed as
(k1)th differenced TPF basis - cubic B-spline basis is 4th difference of cubic
TPF basis - natural cubic spline (NCS) basis functions can be
constructed as 2nd differences of cubic TPF basis - Hence for equally-spaced knots, spacing h, we
have equivalent representations of cubic spline
from different bases with - ß ?2? / 6h ? ?2a / h2 ß ?4a / 6h3
25Connections between spline bases
- We can compare penalties
- For cubic smoothing spline and cubic P-spline
with 2nd order differencing (recommended default) - ?TGs-1? ? ??j2 0.5 ??j ?j1
- aT?2T?2 a ? ?T?
- so second difference penalty can be regarded as
approximation to cubic smoothing spline penalty - For cubic penalised spline
- ßTß ? aT?4T?4a ? ?T?2T?2?
- so cubic penalised spline is equivalent to cubic
P-spline with d4 - Linear penalised spline (recommended default) is
equivalent to linear P-spline with d2.
26General mixed model spline
- For spline basis Bi(x) i1q
- Model is
- y ? aiBi(x) e Ba e
- with penalised sum of squares
- y-Ba T y-Ba ?aTG-1a
- where G is a semi-positive definite symmetric
penalty matrix (eg. ?dT?d or evaluated integral) - Extend model to impose c constraints CTa0
- (eg. natural, periodic) via aS? where CTS0
- so PSS becomes
- y-BS? T y-BS? ? ?TSTG-1S?
27General mixed model spline
- Fit model via eigen-value decomposition
- STG-1S UDUT
- for some q-c ? q-c-s matrix U with UTUI and
Ddiagd1dq-c-s - Then
- BS? BSLt BSUD0.5u
- uTu ?TSTG-1S?
- where
- LLTI-UUT
- tLT? is an s?1 vector fit as fixed effects
- uD0.5UT? is a q-c-s vector, fit as random
effects with var(u)ss2I
28General mixed model spline
- General model allows easy transformation between
bases - if Tß Ba with TBA for some full rank matrix A
- then a Aß
- So PSS becomes
- y-BAß T y-BAß ? ßTATG-1Aß
- Immediately gives relationship between penalties
- Calculation proceeds as before
29Example cubic smoothing spline with additional
periodic constraints
- Suppose data periodic on range 0,?
- Use full TPF basis with penalty ?g''2 ? ßTG-1ß
- Impose constraints g(j)(0)g(j)(?) for j0,1,2
- Derive basis
- fixed term constant
-
- subset of random basis fns shown
30User choices that affect fitted spline
- Order of spline basis
- for given penalty and reasonable number of knots,
order of basis makes little difference - basis should be appropriate to aims of analysis
- eg. to estimate derivatives of curve, require
differentiable function - linear splines are continuous but not
differentiable
31User choices that affect fitted spline
- Number of knots
- choice of both number and position of knots
subjective - reducing knots reduces flexibility of fitted
spline - usually aim to have sufficient knots to allow
flexible spline without incurring large
computational load - large regression spline literature on knot
selection - Ruppert et al (2003) advocate
-
-
- with knots at quantiles of covariate
- can use REML logL to compare knot sets
32User choices that affect fitted spline
- Penalty matrix
- consider in P-spline context penalty depends on
order of differencing - increased order of differencing ? more
constrained/ smoother spline - in practice, selection of smoothing parameter by
REML to optimise fit to data often (but not
always) leads to similar fitted splines - difficulties occur in comparing models with
different penalties, as this also implies
different fixed effects - first step understand implications of different
penalties - Example
- experiment to look at the influence of soil
phosphate on sugar content of beet - 16 samples taken in 4 years
33- Soil
- phosphate
- data
- Sugar yield increases with soil phosphate
- Differences between years
- No trend across years
34Analysis of soil phosphate data
- Model for data separate cubic splines for each
year - fixed Year lin(t) Year.lin(t)
- random spl(t) Year.spl(t)
- residual independent error
- Model allows for separate linear slopes for each
year, and separate smooth trends for each year - Assess whether separate spline terms need for
each year using change in REML log-likelihood for
? - Note new results on these tests from Crainiceanu
Ruppert (2004) - Assess whether separate linear trends required
for each year using Wald tests (better with
Kenward-Roger adjustments)
35Analysis of soil phosphate data
- Can write similar models for data using
- cubic P-spline with differencing order 1, 2, 3 or
4 - linear penalised spline
- Note fixed model changes according to order of
differencing. - Use 14 equally-spaced knots over range of data in
all cases. - Fitted models
- all models require separate fixed effects for
each year - P-spline with order 1 and 3 require only common
spline - other splines require separate splines for each
year
36- Fitted curves (restricted to range of data) for
- cubic P-splines with d1, 2, 3, 4
- Estimated residual variances
- d1 s20.94
- d2 s20.44
- d3 s20.51
- d4 s20.41
37- Fitted curves (restricted to range of data) for
NCS, cubic P-spline with d2, linear penalised
spline - Estimated residual variances
- NCS s20.39
- Pspl s20.44
- LPS s20.41
38Does order of penalty make any real difference?
- Simulation study
- four known functions on 0,1
- data generated as function error with normal
errors at 32 equally-spaced points - four different error variances, defined with as
0.5, 0.25, 0.1 or 0.05 ? range of function - Compare splines, all with full knot set
- NCS natural cubic spline
- cubic P-spline with d1,2,3,4
- linear penalised spline
- Evaluate models in terms of average mean squared
error of prediction (MSEP) across 500 data sets
39- Known functions used for simulation study
- 1 smooth symmetric curve
- 2 regular cycles
- 3 changing curvature
- 4 not diffnble at x1/3
40Average fitted spline for function fA
- -- function
- -- d1
- -- d2
- -- d3
- -- d4
41Average absolute error in fitted spline for
function fA
42Results for function fA (500 simulations)
- Average MSEP as of minimum (se)
- s d1 d2 d3 d4
- 0.5 127 (3.6) 100 (2.9) 104 (2.7) 117 (3.1)
- 0.25 127 (2.9) 100 (2.2) 126 (2.2) 112 (2.5)
- 0.1 142 (2.6) 100 (2.0) 112 (2.7) 111 (2.0)
- 0.05 203 (3.1) 111 (2.2) 100 (2.1) 110 (2.2)
- 100 x Average ratio estimated s actual s ( zero
ss2) - s d1 d2 d3 d4
- 0.5 91 (1) 97 99 (64) 98 (49)
- 0.25 85 99 104 (46) 101 (17)
- 0.1 72 97 104 (1) 106
- 0.05 53 91 97 104
43Summary for function fA
- d1 fits sharp change at ends well, but does less
well elsewhere - d2 gives best MSEP for large s2
- d3 gives best MSEP for small s2
- d1 over-fits (allocates noise as signal)
- d4 slightly over-smooths (allocates signal as
noise) - For large s2, d3,4 often fit fixed terms only,
ie. all other variation is allocated as noise - As noise decreases, variation around fixed terms
split between smooth trend and noise - As d increases, more likely to allocate high
frequency variation as noise
44Average fitted spline for function fB
- -- function
- -- d1
- -- d2
- -- d3
- -- d4
45Average absolute error in fitted spline for
function fB
46Results for function fB
- Average MSEP as of minimum (se)
- s d1 d2 d3 d4
- 0.5 100 (1.7) 119 (1.5) 120 (1.5) 112 (1.9)
- 0.25 119 (2.9) 110 (4.3) 105 (4.5) 100 (3.0)
- 0.1 183 (2.7) 116 (2.0) 100 (1.9) 101 (1.9)
- 0.05 250 (2.7) 134 (2.1) 106 (1.8) 100 (1.8)
- Average ratio estimated s actual s ( zero ss2)
- s d1 d2 d3 d4
- 0.5 110 (39) 126 (38) 126 (62) 117 (47)
- 0.25 85 (1) 97 (1) 101 (4) 102 (1)
- 0.1 53 83 93 97
- 0.05 13 73 90 95
47Summary for function fB
- For large s2, pattern lost in noise for all
splines (many zero spline variance components) - d3,4 give best MSEP
- d1 over-fits (allocates noise as signal)
- d2 slightly over-fits
- optimal value of d increases as signalnoise
ratio increases
48Average fitted spline for function fc
- -- function
- -- d1
- -- d2
- -- d3
- -- d4
49Average absolute error in fitted spline for
function fC
50Results for function fC
- Average MSEP as of minimum (se)
- s d1 d2 d3 d4
- 0.5 100 (1.3) 111 (1.3) 116 (1.4) 116 (1.6)
- 0.25 100 (2.7) 132 (2.9) 153 (2.5) 150 (2.1)
- 0.1 101 (1.5) 100 (1.7) 155 (3.5) 255 (4.0)
- 0.05 140 (1.6) 100 (1.4) 110 (1.8) 181 (7.7)
- Average ratio estimated s actual s ( zero ss2)
- s d1 d2 d3 d4
- 0.5 109 (49) 115 (51) 115 (50) 113 (27)
- 0.25 104 (10) 133 (14) 147 (15) 146 (3)
- 0.1 68 109 158 234
- 0.05 25 83 117 185
51Fitted splines for function fC for s 0.5, 0.25,
0.1
52Second difference of fitted splines for function
fC with s 0.5, 0.25, 0.1
53P-spline vs linear penalised spline for function
fC
- For s0.05 linear penalised spline over-fits
- Fitted spline becomes noticeably spiky
- Errors deviations of fitted splines from
function
- -- Pspline k3, d2
- -- Linear penalised spline
- (k1, d2)
54Summary for function fC
- sharp peak shrunk, shrinkage decreases as noise
decreases - small d fits peaks better, but does slightly less
well elsewhere - d1,2 gives best MSEP
- d3,4 over-smooths (allocates signal as noise)
- some over-fitting for linear penalised spline for
small s
55Average fitted spline for function fD
- -- function
- -- d1
- -- d2
- -- d3
- -- d4
56Average absolute error in fitted spline for
function fD
57Results for function fD
- Average MSEP as of minimum (se)
- s d1 d2 d3 d4
- 0.5 101 (3.1) 100 (2.6) 101 (2.3) 111 (3.2)
- 0.25 103 (2.7) 100 (2.7) 102 (2.4) 103 (2.5)
- 0.1 130 (2.7) 100 (1.7) 123 (2.1) 139 (2.4)
- 0.05 146 (2.5) 100 (1.8) 120 (2.2) 136 (2.5)
- Average ratio estimated s actual s ( zero ss2)
- s d1 d2 d3 d4
- 0.5 93 (1) 98 (30) 98 (42) 97 (66)
- 0.25 90 99 (2) 100 (11) 99 (61)
- 0.1 81 98 106 108 (37)
- 0.05 71 98 110 116 (2)
58Summary for function fD
- For large s2, all splines similar as
discontinuity masked by noise - As noise decreases, discontinuity becomes
apparent and fitted better by small d - d2 fits uniformly better balances good fit of
smooth trend with adaptation to discontinuity
59Summary of simulation results
- d1 fits detail well, but smooth trend badly
- d4 fits detail badly, smooth trend well
- As signalnoise ratio increases, smooth trend
becomes more apparent, optimal order of d tends
to increase - Optimal d depends on characteristics of curve
and signalnoise ratio - As d increases, splines are more likely to
allocate high frequency pattern as noise - For small d, there is a tendency to over-fit.
- For large d, there is a tendency to over-smooth.
- Both ameliorated by replication enabling
independent estimate of error
60Which penalty to use?
- For exploratory analysis
- purpose is to explore shape of underlying
function - use spline with small d, eg. cubic spline, cubic
P-spline with d2 - For modelling curves
- purpose is to get reliable description of
underlying function - more problematic no objective method to choose
between penalties - requires further work
- wider class of L-splines that uses different
penalties, eg. useful for periodic data, should
also be considered - we are looking at diagnostics to identify which
spline is most consistent with data
61Conclusions
- There are close connections between common
polynomial mixed model splines with approximate
equivalences between - cubic smoothing spline cubic P-spline with d2
linear penalised spline - linear penalised spline linear P-spline with d2
- cubic penalised spline cubic P-spline with d4
- Writing a general spline model makes these
connections clear and enables easy transformation
between bases as well as inclusion of constraints
62Conclusions
- Choice of basis depends on context
- Choice of knots subjective fewer knots gives
smoother fitted spline - can use REML logL to compare different knot
allocations - usually use as many knots as feasible
- Choice of penalty affects smoothness of fitted
spline - no good method for choosing optimal penalty
- further work required
63Acknowledgements
- Funding by the Australian GRDC for this work is
gratefully acknowledged