Title: Basis Expansions and Regularization
1Basis Expansions and Regularization
- Selection of Smoothing Parameters
- Non-parametric Logistic Regression
- Multi-dimensional Splines
- - Nagaraj Prasanth
2Smoothing Parameters
- Regression Splines
- Degree
- Number of knots
- Placement of knots
- Smoothing Splines
- Only penalty parameter
- (since knots are at all Xs and cubic degree is
almost always used)
3Smoothing spline
- Cubic smoothing spline is minimizer of
- smoothing parameter (ve const)
- Controls trade-off betn bias and variance of
4Selection of Smoothing Parameters
- Fixing Degrees of Freedom
- Bias-Variance Trade-off
- Cross-Validation
- Criterion
- Improved AIC
- Risk Estimation Methods
- Risk Estimation using Classical Pilots (RECP)
- Exact Double Smoothing (EDS)
5Fixing Degrees of Freedom
-
- (monotone in for smoothing splines)
- Fix df gt get value of
-
6Bias-Variance Tradeoff
- Sample Data Generation
- N100 pairs of drawn independently from
the model below - Standard error bands at a given point x
-
7Bias-Variance Tradeoff
8(No Transcript)
9Observations (Bias-Variance Tradeoff)
- df5
- Spline under fits
- Bias that is most dramatic in regions of high
curvature - Narrow se band gt bad estimate with great
reliability - df9
- Close to true function
- Slight bias
- Small variance
- df15
- Wiggly, but close to true function
- Increased width of se bands
10Integrated Squared Prediction Error (EPE)
- Combines bias and variance
11Smoothing parameter selection methods
- EPE calculation needs true function
- Other methods
- Cross-Validation
- Generalized Cross Validation
- Criterion
- Improved AIC
12Cross Validation
- Divide data into N blocks of N-1 points
- Compute squared error
- over all examples
- and take its average
- is chosen as the minimizer of
13Cross-Validation Estimate of EPE
- EPE and CV curves have similar shape
- CV curve is an approximately unbiased as an
estimate of EPE
14Other Classical Methods
- Generalized Cross-Validation
- Mallows Criterion
- Pre-chosen by CV
15Classical Methods (contd)
- Improved AIC
- Improved finite sample bias of classical AIC
is corrected - is chosen as minimizer
- Classical methods
- Tend to be highly variable
- Tendency to undersmooth
16Risk Estimation Methods
- Require choosing pilot estimates
- Risk Estimation using Classical Pilots (RECP)
- Need pilot estimates for f and
- Method 1 Blocking Method
- Method 2
- Get a pilot using a classical method
- Use it to compute and
- Final as minimizer of ,
hoping that it is a good estimator of
17Risk Estimation Methods (contd)
- Exact Double Smoothing (EDS)
- Another approach to choose pilot estimates
- Two levels of pilot estimates
- optimal parameter that minimizes
(Assume) - minimizer of
- Derive closed form expression , for
- Replace unknown with
- Choose as minimizer of ,
where minimizes
18Speed Comparisons
- CV, GCV, AIC roughly same computational time
- Cp, RECP longer computational time
- (2 numerical minimization)
- EDS even longer (3 minimizations)
19Conclusions from simulations
- No method performed uniformly the best
- The three classical methods CV, GCV and Cp -
gave very similar results - The AIC method has never given a worse
performance than CV, GCV or Cp - For a simple regression function with a high
noise level, the two restriction methods (RECP
and EDS) seem to be superior
20Nonparametric Logistic Regression
- Goal Approximate
- Penalized log-likelihood
21Nonparametric Logistic Regression (contd)
- Parameters in f(x) are set so as to maximize
log-likelihood - Parametric
- Non-parametric
22Nonparametric Logistic Regression (contd)
-
- p N-vector with elements
- W diagonal matrix of weights
-
- First derivative is non-linear in gt
- Iterative algorithm (Newton-Raphson)
23Nonparametric Logistic Regression (contd)
-
- (
) - Update fits a weighted smoothing spline to the
working response z - Altho here x is 1D, generalized to higher-D x
24Multidimensional Splines
- Suppose
- We have a separate basis of functions for
representing functions of X1 and X2 - The M1xM2 dimensional tensor product basis is
-
-
25(No Transcript)
262D Function Basis
- Can be generalized to higher dimensions
- Dimension of the basis grows exponentially in the
number of coordinates - MARS greedy algorithm for including only the
basis functions deemed necessary by least squares
27Higher Dimensional Smoothing Splines
- Suppose we have pairs
- Want to find d-D regression function f(x)
- Solve
- ( 1-D
) - J is a penalty functional for stabilizing f in
282D Roughness Penalty
- Generalization of 1D penalty
- Yields thin plate spline (smooth 2D surface)
29Thin Plate Spline
- Properties in common with 1D cubic smoothing
spline - As ? ? 0, solution approaches interpolating fn
- As ? ? ?, solution approaches least squares plane
- For intermediate values of ?, solution is a
linear expansion of basis functions
30Thin Plate Spline
- Solution has form
- Can be generalized to arbitrary dimension
31(No Transcript)
32Computation Speeds
- 1D splines O(N)
- Thin plate splines O(N3)
- Can use fewer than N knots
- Using K knots reduces order to O(NK2K3)
33Additive Splines
- Solution of the form
- Each fi is a univariate spline
- Assume f is additive and impose a penalty on each
of the component functions - ANOVA Spline Decomposition
34Additive Vs Tensor Product
- Tensor product basis can achieve more flexibility
but introduces some spurious structure too.
35Conclusion
- Smoothing Parameter selection and Bias-Variance
- Comparison of various classical and risk
estimation methods for parameter selection - Non-parametric logistic regression
- Extension to higher dimensions
- Tensor Product
- Thin plate splines
- Additive Splines
36References
- Lee, T. (2002). Smoothing parameter selection for
Smoothing Splines A Simulation Study - (www.stat.colostate.edu/tlee/PSfiles/spline.
ps.gz) - Hastie, T., Tibshirani, R., Friedman, J. (2001).
The Elements of Statistical Learning Data
Mining, Inference, and Prediction