A Comparison of Mixed Model Splines - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

A Comparison of Mixed Model Splines

Description:

... as knots, we get an exact fit and the fitted curve is the interpolation spline ... Note: fitting both terms as fixed gives interpolation spline. ... – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 64
Provided by: wwwmath
Category:

less

Transcript and Presenter's Notes

Title: A Comparison of Mixed Model Splines


1
A Comparison of Mixed Model Splines
  • Sue Welham
  • Rothamsted Research
  • with acknowledgements to
  • Robin Thompson, Brian Cullis,
  • Mike Kenward, GRDC

2
Motivation
  • Smoothing splines provide flexible data-driven
    functions useful for modelling real data
  • Several types of spline have been proposed within
    the mixed model context
  • cubic smoothing splines
  • P-splines
  • penalised splines
  • Which should you use?
  • Does it make any difference?
  • This talk will review these models, show the
    connections between them, and look at the impact
    of different penalties on the fitted spline.

3
Outline
  • Splines and basis functions
  • Smoothing splines and penalties
  • Mixed model splines
  • Cubic smoothing spline P-spline Penalised
    splines
  • Connections between models
  • General model (with constraints)
  • Choice of model
  • example yield-response data
  • simulation study to assess behaviour of models
  • diagnostics to assess model fit
  • Conclusions

4
Simple spline model
  • Suppose we have n data values
  • y ( y1 yn )T
  • and explanatory variable on range a, b
  • x ( x1 xn )T a lt x1 lt x2 lt lt xn lt b
  • We assume the data follows some functional form
  • y(x) g(x) e(x)
  • for some unknown but smooth function g, and
    error process e.
  • Evaluated at the data points, x,
  • y g(x) e
  • with e N( 0, s2R(f) ), in the simplest case
    RIn.

5
Polynomial splines
  • A polynomial spline g(x) of degree k (k3 for
    cubic splines) with r knots t1tr has the
    properties
  • on each interval tj, tj1 , g(x) is a
    polynomial of degree k
  • g(x) and derivatives to order k-1 are continuous
    on a,b
  • Natural splines of degree k 2m -1 obey an
    additional constraint
  • g( j )(t1) g( j )(tr) 0 for j mk
  • ie. natural splines have higher order
    derivatives constrained to zero outside the range
    of the knots.
  • Natural cubic splines are linear outside the
    range of the knots.

6
Spline basis functions
  • A set of cubic spline basis functions for knot
    points t(t1 tr)T is a set of functions si
    i1r4 such that
  • each basis function si is itself a cubic spline
  • the set si spans the set of cubic splines for
    the knot points t
  • Then any cubic spline g(x) can be represented as
  • for suitably chosen coefficients ci.
  • For natural splines, the basis functions should
    also be natural splines, with only r basis
    functions required.

7
Regression splines
  • Regression splines are constructed as follows
  • a set of r knots is chosen, r n
  • a set of rk1 basis functions si is constructed
  • the spline is fitted via regression using the
    model
  • y g(x) e ? ci si(x) e
  • If the n data points are used as knots, we get an
    exact fit and the fitted curve is the
    interpolation spline
  • The smoothness of the curve is determined by the
    number and position of the knots fewer knots ?
    smoother curve
  • Choice of knots subjective

8
Regression spline
  • black line is function
  • blue dots are data (32 points)
  • green line is regression spline with 20 knots
    (equal spacing)
  • red line is regression spline with 9 knots (equal
    spacing)

9
Smoothing splines
  • Smoothing splines are fitted to the model
  • y g(x) e ? ci si(x) e
  • by minimising a penalised sum of squares (PSS)
  • y-g(x) T R-1 y-g(x) ? penalty
  • where
  • the first term (residual sum of squares) measures
    fidelity of the fitted curve to the data
  • the penalty quantifies undesirable behaviour
    (roughness) of the fitted curve
  • the PSS balances the fit of the curve to the data
    against the roughness of the fitted curve
  • ? is a smoothing parameter that controls the
    balance between the two terms, and hence
    smoothness of fitted curve

10
Cubic smoothing spline
  • The cubic smoothing spline is the natural cubic
    spline with knots at the n covariate values that,
    for a given smoothing parameter value, minimises
    PSS
  • y-g(x) T R-1 y-g(x) ? ? g''(x)2
  • the penalty is the integrated squared second
    derivative of the fitted spline
  • can be regarded as measuring accumulated
    deviations from straight line
  • Penalty can be rewritten as
  • ? g''(x)2 ?TGs-1 ?
  • where
  • ? ( ?2 ?n-1 ) holds values of second
    derivatives of fitted spline at internal knots x2
    xn-1
  • Gs-1 is a tri-diagonal matrix with n-2 rows
    defined in terms of distances between successive
    knots

11
Cubic smoothing splines
  • Cubic spline can be written as
  • g(x) t1 t2x ? ?i Pi(x)
  • where Pi(x), i2n-1are basis functions for
    natural cubic splines representing deviations
    about linear trend
  • or in matrix form
  • g(x) Xt P?
  • where X 1 x , t (t1 t2)T and P P2(x)
    Pn-1(x)
  • with penalised sum of squares
  • y- Xt - P? T R-1 y- Xt - P? ? ?TGs-1 ?
  • this looks like the mixed model equations from a
    linear mixed model ? mixed model smoothing
    splines
  • described by Verbyla et al 1999, Brumback Rice
    (1998), Wang (1998), Zhang et al (1998).

12
Basis functions for natural cubic spline
  • Basis functions for natural cubic spline
  • with knots at x 1, 2 10
  • (excluding constant and linear)

13
Mixed model cubic smoothing splines
  • Given the smoothing parameter, the cubic
    smoothing spline is the BLUP
  • from the mixed model
  • y Xt P? e
  • with var(?)ss2Gs , var(e) s2R and ? s2/
    ss2
  • The smoothing parameter can also be estimated by
    REML as a variance parameter.
  • Note fitting both terms as fixed gives
    interpolation spline. Fitting deviations as
    random introduces smoothing via shrinkage of
    random effects. Amount of shrinkage depends on ?.

14
Mixed model cubic smoothing splines
  • REML estimation selects smoothing parameter value
    that gives best fit to implicit variance model
    defined by P and Gs
  • Practical experience (and various reported
    simulation studies) suggests this generally works
    reasonably well
  • Advantage we can build spline terms into complex
    mixed models to account for all other sources of
    variation in the data, including correlated
    errors
  • But
  • are we using this model just as a tool, or do we
    really believe it as a model for the data?
  • if a tool, need to bear this in mind when making
    inferences from fitted model
  • might get better results when model is compatible
    with data

15
Reduced knot mixed model spline
  • Advantage of cubic smoothing spline no need to
    use subjective choice of knots
  • Design matrix for spline basis functions is
    dense, with columns number of distinct
    covariate values - 2
  • For large data sets, models may be slow to fit
  • Use reduced knot version of cubic smoothing
    spline
  • for specified knots t ( t1 tr )T form basis
    functions P2 Pr-1
  • fit model with same fixed effects, but basis
    functions, random effects ? and variance matrix
    Gs defined with respect to knots
  • Number of columns in random design matrix r-2
  • Choice of knots subjective (and affects fitted
    spline)
  • Fitted spline no longer minimises PSS
  • May be better methods to fit low-rank
    approximation

16
P-splines
  • Introduced by Eilers Marx (1996) as simple and
    computationally efficient alternative to
    smoothing splines
  • P-splines
  • use a B-spline basis
  • reduced knot set, knots equally spaced
  • discrete penalty defined using differencing
    operators
  • Model g(x) ? ai Bi(x) or g(x) Ba
  • Penalised sum of squares
  • y-Ba Ty-Ba ?aT?dT ?d a
  • where ?d is dth order differencing matrix
  • User has choice of
  • order of B-splines (k)
  • number and position of knots t1 tr
  • order of differencing in the penalty (d)

17
Cubic B-spline basis
  • Cubic B-spline basis for knots 1, 2 10
  • Note compact support for individual basis
    functions

18
P-splines as mixed models
  • For k3, d2, P-spline approximates cubic
    smoothing spline
  • In the general case, let
  • Ir ?dT ( ?d?dT )-1?d LLT
  • for full column-rank matrix L
  • Mixed model form of P-spline uses
  • y BLtb B ?dT ( ?d?dT )-1ub e
  • with tb LTa and ub ?da with var(ub)ss2I
  • BL is a polynomial of degree d
  • For d2, fixed part of model represents linear
    trend, random part represents deviations about
    linear trend.
  • Sparse representation lost within mixed model
    context

19
Cubic P-spline mixed models
  • Comparison with cubic smoothing spline
  • P-splines
  • use equally spaced knots
  • has choice of differencing order in penalty
  • differencing order determines fixed terms in
    model
  • non-natural basis (this can be modified)
  • Cubic smoothing spline
  • optimisation property for full-knot basis
  • any knots can be used for reduced-knot set
  • fixed terms determined by order of spline basis

20
Penalised splines
  • described in Semi-parametric regression by
    Ruppert, Wand Carroll (2003)
  • Penalised splines use
  • truncated power function basis
  • reduced knot set
  • ad-hoc penalty to induce smoothing
  • Model
  • where
  • XT 1 x xk tT (tT0 tTk)T,
  • T T1(x) Tr(x) ß (ß1 ßr)T

21
Truncated power function (TPF) basis
  • Cubic TPF basis for knots 1, 2 10

22
Penalised splines
  • Penalised sum of squares
  • y XTtT - Tß Ty XTtT - Tß ?ßTß
  • ad-hoc penalty used to introduce
    shrinkage/smoothing
  • User choice
  • order of basis
  • knot positions
  • Mixed model formulation
  • y XTtT Tß e
  • with var(ß) ss2I, var(e) s2R
  • TPF basis has the advantage that it is simple,
    can add/remove knots without disturbing whole
    basis

23
Comparison of mixed model splines
  • Polynomial smoothing P-spline Penalised spline
  • spline
  • --------------------------------------------------
    ---------------------------------------------
  • Choice of k Choice of k Choice of k
  • (order of polynomial)
  • - Choice of d -
  • (differencing)
  • k ? fixed terms d ? fixed terms k ? fixed terms
  • k ? penalty d ? penalty Identity penalty
    matrix
  • Choice of knots Choice of knots Choice of knots
  • (equal spaced)
  • --------------------------------------------------
    ---------------------------------------------

24
Connections between spline bases
  • Blurring over details
  • k-degree B-spline basis can be constructed as
    (k1)th differenced TPF basis
  • cubic B-spline basis is 4th difference of cubic
    TPF basis
  • natural cubic spline (NCS) basis functions can be
    constructed as 2nd differences of cubic TPF basis
  • Hence for equally-spaced knots, spacing h, we
    have equivalent representations of cubic spline
    from different bases with
  • ß ?2? / 6h ? ?2a / h2 ß ?4a / 6h3

25
Connections between spline bases
  • We can compare penalties
  • For cubic smoothing spline and cubic P-spline
    with 2nd order differencing (recommended default)
  • ?TGs-1? ? ??j2 0.5 ??j ?j1
  • aT?2T?2 a ? ?T?
  • so second difference penalty can be regarded as
    approximation to cubic smoothing spline penalty
  • For cubic penalised spline
  • ßTß ? aT?4T?4a ? ?T?2T?2?
  • so cubic penalised spline is equivalent to cubic
    P-spline with d4
  • Linear penalised spline (recommended default) is
    equivalent to linear P-spline with d2.

26
General mixed model spline
  • For spline basis Bi(x) i1q
  • Model is
  • y ? aiBi(x) e Ba e
  • with penalised sum of squares
  • y-Ba T y-Ba ?aTG-1a
  • where G is a semi-positive definite symmetric
    penalty matrix (eg. ?dT?d or evaluated integral)
  • Extend model to impose c constraints CTa0
  • (eg. natural, periodic) via aS? where CTS0
  • so PSS becomes
  • y-BS? T y-BS? ? ?TSTG-1S?

27
General mixed model spline
  • Fit model via eigen-value decomposition
  • STG-1S UDUT
  • for some q-c ? q-c-s matrix U with UTUI and
    Ddiagd1dq-c-s
  • Then
  • BS? BSLt BSUD0.5u
  • uTu ?TSTG-1S?
  • where
  • LLTI-UUT
  • tLT? is an s?1 vector fit as fixed effects
  • uD0.5UT? is a q-c-s vector, fit as random
    effects with var(u)ss2I

28
General mixed model spline
  • General model allows easy transformation between
    bases
  • if Tß Ba with TBA for some full rank matrix A
  • then a Aß
  • So PSS becomes
  • y-BAß T y-BAß ? ßTATG-1Aß
  • Immediately gives relationship between penalties
  • Calculation proceeds as before

29
Example cubic smoothing spline with additional
periodic constraints
  • Suppose data periodic on range 0,?
  • Use full TPF basis with penalty ?g''2 ? ßTG-1ß
  • Impose constraints g(j)(0)g(j)(?) for j0,1,2
  • Derive basis
  • fixed term constant
  • subset of random basis fns shown

30
User choices that affect fitted spline
  • Order of spline basis
  • for given penalty and reasonable number of knots,
    order of basis makes little difference
  • basis should be appropriate to aims of analysis
  • eg. to estimate derivatives of curve, require
    differentiable function
  • linear splines are continuous but not
    differentiable

31
User choices that affect fitted spline
  • Number of knots
  • choice of both number and position of knots
    subjective
  • reducing knots reduces flexibility of fitted
    spline
  • usually aim to have sufficient knots to allow
    flexible spline without incurring large
    computational load
  • large regression spline literature on knot
    selection
  • Ruppert et al (2003) advocate
  • with knots at quantiles of covariate
  • can use REML logL to compare knot sets

32
User choices that affect fitted spline
  • Penalty matrix
  • consider in P-spline context penalty depends on
    order of differencing
  • increased order of differencing ? more
    constrained/ smoother spline
  • in practice, selection of smoothing parameter by
    REML to optimise fit to data often (but not
    always) leads to similar fitted splines
  • difficulties occur in comparing models with
    different penalties, as this also implies
    different fixed effects
  • first step understand implications of different
    penalties
  • Example
  • experiment to look at the influence of soil
    phosphate on sugar content of beet
  • 16 samples taken in 4 years

33
  • Soil
  • phosphate
  • data
  • Sugar yield increases with soil phosphate
  • Differences between years
  • No trend across years

34
Analysis of soil phosphate data
  • Model for data separate cubic splines for each
    year
  • fixed Year lin(t) Year.lin(t)
  • random spl(t) Year.spl(t)
  • residual independent error
  • Model allows for separate linear slopes for each
    year, and separate smooth trends for each year
  • Assess whether separate spline terms need for
    each year using change in REML log-likelihood for
    ?
  • Note new results on these tests from Crainiceanu
    Ruppert (2004)
  • Assess whether separate linear trends required
    for each year using Wald tests (better with
    Kenward-Roger adjustments)

35
Analysis of soil phosphate data
  • Can write similar models for data using
  • cubic P-spline with differencing order 1, 2, 3 or
    4
  • linear penalised spline
  • Note fixed model changes according to order of
    differencing.
  • Use 14 equally-spaced knots over range of data in
    all cases.
  • Fitted models
  • all models require separate fixed effects for
    each year
  • P-spline with order 1 and 3 require only common
    spline
  • other splines require separate splines for each
    year

36
  • Fitted curves (restricted to range of data) for
  • cubic P-splines with d1, 2, 3, 4
  • Estimated residual variances
  • d1 s20.94
  • d2 s20.44
  • d3 s20.51
  • d4 s20.41

37
  • Fitted curves (restricted to range of data) for
    NCS, cubic P-spline with d2, linear penalised
    spline
  • Estimated residual variances
  • NCS s20.39
  • Pspl s20.44
  • LPS s20.41

38
Does order of penalty make any real difference?
  • Simulation study
  • four known functions on 0,1
  • data generated as function error with normal
    errors at 32 equally-spaced points
  • four different error variances, defined with as
    0.5, 0.25, 0.1 or 0.05 ? range of function
  • Compare splines, all with full knot set
  • NCS natural cubic spline
  • cubic P-spline with d1,2,3,4
  • linear penalised spline
  • Evaluate models in terms of average mean squared
    error of prediction (MSEP) across 500 data sets

39
  • Known functions used for simulation study
  • 1 smooth symmetric curve
  • 2 regular cycles
  • 3 changing curvature
  • 4 not diffnble at x1/3

40
Average fitted spline for function fA
  • -- function
  • -- d1
  • -- d2
  • -- d3
  • -- d4

41
Average absolute error in fitted spline for
function fA
  • -- d1
  • -- d2
  • -- d3
  • -- d4

42
Results for function fA (500 simulations)
  • Average MSEP as of minimum (se)
  • s d1 d2 d3 d4
  • 0.5 127 (3.6) 100 (2.9) 104 (2.7) 117 (3.1)
  • 0.25 127 (2.9) 100 (2.2) 126 (2.2) 112 (2.5)
  • 0.1 142 (2.6) 100 (2.0) 112 (2.7) 111 (2.0)
  • 0.05 203 (3.1) 111 (2.2) 100 (2.1) 110 (2.2)
  • 100 x Average ratio estimated s actual s ( zero
    ss2)
  • s d1 d2 d3 d4
  • 0.5 91 (1) 97 99 (64) 98 (49)
  • 0.25 85 99 104 (46) 101 (17)
  • 0.1 72 97 104 (1) 106
  • 0.05 53 91 97 104

43
Summary for function fA
  • d1 fits sharp change at ends well, but does less
    well elsewhere
  • d2 gives best MSEP for large s2
  • d3 gives best MSEP for small s2
  • d1 over-fits (allocates noise as signal)
  • d4 slightly over-smooths (allocates signal as
    noise)
  • For large s2, d3,4 often fit fixed terms only,
    ie. all other variation is allocated as noise
  • As noise decreases, variation around fixed terms
    split between smooth trend and noise
  • As d increases, more likely to allocate high
    frequency variation as noise

44
Average fitted spline for function fB
  • -- function
  • -- d1
  • -- d2
  • -- d3
  • -- d4

45
Average absolute error in fitted spline for
function fB
  • -- d1
  • -- d2
  • -- d3
  • -- d4

46
Results for function fB
  • Average MSEP as of minimum (se)
  • s d1 d2 d3 d4
  • 0.5 100 (1.7) 119 (1.5) 120 (1.5) 112 (1.9)
  • 0.25 119 (2.9) 110 (4.3) 105 (4.5) 100 (3.0)
  • 0.1 183 (2.7) 116 (2.0) 100 (1.9) 101 (1.9)
  • 0.05 250 (2.7) 134 (2.1) 106 (1.8) 100 (1.8)
  • Average ratio estimated s actual s ( zero ss2)
  • s d1 d2 d3 d4
  • 0.5 110 (39) 126 (38) 126 (62) 117 (47)
  • 0.25 85 (1) 97 (1) 101 (4) 102 (1)
  • 0.1 53 83 93 97
  • 0.05 13 73 90 95

47
Summary for function fB
  • For large s2, pattern lost in noise for all
    splines (many zero spline variance components)
  • d3,4 give best MSEP
  • d1 over-fits (allocates noise as signal)
  • d2 slightly over-fits
  • optimal value of d increases as signalnoise
    ratio increases

48
Average fitted spline for function fc
  • -- function
  • -- d1
  • -- d2
  • -- d3
  • -- d4

49
Average absolute error in fitted spline for
function fC
  • -- d1
  • -- d2
  • -- d3
  • -- d4

50
Results for function fC
  • Average MSEP as of minimum (se)
  • s d1 d2 d3 d4
  • 0.5 100 (1.3) 111 (1.3) 116 (1.4) 116 (1.6)
  • 0.25 100 (2.7) 132 (2.9) 153 (2.5) 150 (2.1)
  • 0.1 101 (1.5) 100 (1.7) 155 (3.5) 255 (4.0)
  • 0.05 140 (1.6) 100 (1.4) 110 (1.8) 181 (7.7)
  • Average ratio estimated s actual s ( zero ss2)
  • s d1 d2 d3 d4
  • 0.5 109 (49) 115 (51) 115 (50) 113 (27)
  • 0.25 104 (10) 133 (14) 147 (15) 146 (3)
  • 0.1 68 109 158 234
  • 0.05 25 83 117 185

51
Fitted splines for function fC for s 0.5, 0.25,
0.1
  • x data
  • -- d1
  • -- d4

52
Second difference of fitted splines for function
fC with s 0.5, 0.25, 0.1
  • -- d1
  • -- d4

53
P-spline vs linear penalised spline for function
fC
  • For s0.05 linear penalised spline over-fits
  • Fitted spline becomes noticeably spiky
  • Errors deviations of fitted splines from
    function
  • -- Pspline k3, d2
  • -- Linear penalised spline
  • (k1, d2)

54
Summary for function fC
  • sharp peak shrunk, shrinkage decreases as noise
    decreases
  • small d fits peaks better, but does slightly less
    well elsewhere
  • d1,2 gives best MSEP
  • d3,4 over-smooths (allocates signal as noise)
  • some over-fitting for linear penalised spline for
    small s

55
Average fitted spline for function fD
  • -- function
  • -- d1
  • -- d2
  • -- d3
  • -- d4

56
Average absolute error in fitted spline for
function fD
  • -- d1
  • -- d2
  • -- d3
  • -- d4

57
Results for function fD
  • Average MSEP as of minimum (se)
  • s d1 d2 d3 d4
  • 0.5 101 (3.1) 100 (2.6) 101 (2.3) 111 (3.2)
  • 0.25 103 (2.7) 100 (2.7) 102 (2.4) 103 (2.5)
  • 0.1 130 (2.7) 100 (1.7) 123 (2.1) 139 (2.4)
  • 0.05 146 (2.5) 100 (1.8) 120 (2.2) 136 (2.5)
  • Average ratio estimated s actual s ( zero ss2)
  • s d1 d2 d3 d4
  • 0.5 93 (1) 98 (30) 98 (42) 97 (66)
  • 0.25 90 99 (2) 100 (11) 99 (61)
  • 0.1 81 98 106 108 (37)
  • 0.05 71 98 110 116 (2)

58
Summary for function fD
  • For large s2, all splines similar as
    discontinuity masked by noise
  • As noise decreases, discontinuity becomes
    apparent and fitted better by small d
  • d2 fits uniformly better balances good fit of
    smooth trend with adaptation to discontinuity

59
Summary of simulation results
  • d1 fits detail well, but smooth trend badly
  • d4 fits detail badly, smooth trend well
  • As signalnoise ratio increases, smooth trend
    becomes more apparent, optimal order of d tends
    to increase
  • Optimal d depends on characteristics of curve
    and signalnoise ratio
  • As d increases, splines are more likely to
    allocate high frequency pattern as noise
  • For small d, there is a tendency to over-fit.
  • For large d, there is a tendency to over-smooth.
  • Both ameliorated by replication enabling
    independent estimate of error

60
Which penalty to use?
  • For exploratory analysis
  • purpose is to explore shape of underlying
    function
  • use spline with small d, eg. cubic spline, cubic
    P-spline with d2
  • For modelling curves
  • purpose is to get reliable description of
    underlying function
  • more problematic no objective method to choose
    between penalties
  • requires further work
  • wider class of L-splines that uses different
    penalties, eg. useful for periodic data, should
    also be considered
  • we are looking at diagnostics to identify which
    spline is most consistent with data

61
Conclusions
  • There are close connections between common
    polynomial mixed model splines with approximate
    equivalences between
  • cubic smoothing spline cubic P-spline with d2
    linear penalised spline
  • linear penalised spline linear P-spline with d2
  • cubic penalised spline cubic P-spline with d4
  • Writing a general spline model makes these
    connections clear and enables easy transformation
    between bases as well as inclusion of constraints

62
Conclusions
  • Choice of basis depends on context
  • Choice of knots subjective fewer knots gives
    smoother fitted spline
  • can use REML logL to compare different knot
    allocations
  • usually use as many knots as feasible
  • Choice of penalty affects smoothness of fitted
    spline
  • no good method for choosing optimal penalty
  • further work required

63
Acknowledgements
  • Funding by the Australian GRDC for this work is
    gratefully acknowledged
Write a Comment
User Comments (0)
About PowerShow.com