Title: The Elements of Statistical Learning Chapter 5: Basis Expansions and Regularization
1The Elements of Statistical LearningChapter 5
Basis Expansions and Regularization
Speaker N.Delannay
2Introduction
Regression problem Linear models Linear
basis expansions models
M transformations of X
3Choice of hm ?
Examples Xj, Xj², Xj.Xk, log(Xj), Ch.5
splines, wavelets.
- Dictionary control of complexity with
- Restriction
- Selection
- Regularization
4(5.2) Piecewise polynomials and splines
Cubic spline
Second derivative continuity
Basis functions incorporates constraints of
continuity
5Notations
- M-order polynomials of degree M-1
- K knots
- order M spline gt continuity up to order M-2
- Truncated-power basis hj(X) (see p.120)
- vector space
- gt other basis numerically more convenient
6B-splines
Definition B1,m (5.77) Bi,m (5.77)
- functions non-zero on M intervals
- Knot duplication gt reduced continuity
- local support gt computation O(N)
7Natural cubic splines
Polynomials fit gt bad behaviour near boundaries
Fit of a model with constant error varriance
Not constant at all !
- Idea
- linear fit on boundary intervals
- 4 additional constraints
- 4 additional knots
- basis functions Nj(X)
8Example1 South-African heart desease
- Model
- algorithm see 4.4.1
- selection methods see 7.5
- variance calculation
- test table 5.1
- Deviance, AIC, LRT, P-value
Non linear included in model
9Example 2 Phenomen recognition
Logistic model ß forced to vary
smoothly filtered input gt better
regularization error
10Smoothing splines
N points, N knots regularization
needed unique minimizer
11Degree of freedom and smoother matrix (1)
Smoother matrix (NxN)
- Analogy with LS fitting on M basis functions
- H? and S? symetric and positive semidefinite
-
- rank M rank N
12Degree of freedom and smoother matrix (2)
Dimension of projection space degree of freedom
M trace(H?)
- Analogy df? trace(S?)
- Effective degree of freedom
- Monotonic relation df? - ?
Eigen decomposition of S?
Independent of ?
13Degree of freedom and smoother matrix (3)
Linear fit
Same eigen vectors
- Eigen values
- decrease from 1 to 0
- df
1
0
14Degree of freedom and smoother matrix (4)
S?y y decompose wrt uk with factor
?k(?) shrinking smoother gtlt projection
smoother component either left, either taken
(eigen values 0 or 1) H?y
Smoothing spline matrix S? local approximator
15(5.5) Automatic selection of smoothing
parameters
bias !
df ? Bias-Variance tradeoff
integrated least squared prediction error CV
(leave-one-out) is an estimate of EPE
best
variance