Basis Expansions and Regularization - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Basis Expansions and Regularization

Description:

Controls trade-off betn bias and variance of. Selection of Smoothing Parameters ... Smoothing Parameter selection and Bias-Variance ... – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 37

Provided by: Pras50

Category:

more less

Transcript and Presenter's Notes

Title: Basis Expansions and Regularization

1
Basis Expansions and Regularization

Selection of Smoothing Parameters
Non-parametric Logistic Regression
Multi-dimensional Splines
- Nagaraj Prasanth

2
Smoothing Parameters

Regression Splines
Degree
Number of knots
Placement of knots
Smoothing Splines
Only penalty parameter
(since knots are at all Xs and cubic degree is
almost always used)

3
Smoothing spline

Cubic smoothing spline is minimizer of
smoothing parameter (ve const)
Controls trade-off betn bias and variance of

4
Selection of Smoothing Parameters

Fixing Degrees of Freedom
Bias-Variance Trade-off
Cross-Validation
Criterion
Improved AIC
Risk Estimation Methods
Risk Estimation using Classical Pilots (RECP)
Exact Double Smoothing (EDS)

5
Fixing Degrees of Freedom

(monotone in for smoothing splines)
Fix df gt get value of

6
Bias-Variance Tradeoff

Sample Data Generation
N100 pairs of drawn independently from
the model below
Standard error bands at a given point x

7
Bias-Variance Tradeoff
8
(No Transcript)
9
Observations (Bias-Variance Tradeoff)

df5
Spline under fits
Bias that is most dramatic in regions of high
curvature
Narrow se band gt bad estimate with great
reliability
df9
Close to true function
Slight bias
Small variance
df15
Wiggly, but close to true function
Increased width of se bands

10
Integrated Squared Prediction Error (EPE)

Combines bias and variance

11
Smoothing parameter selection methods

EPE calculation needs true function
Other methods
Cross-Validation
Generalized Cross Validation
Criterion
Improved AIC

12
Cross Validation

Divide data into N blocks of N-1 points
Compute squared error
over all examples
and take its average
is chosen as the minimizer of

13
Cross-Validation Estimate of EPE

EPE and CV curves have similar shape
CV curve is an approximately unbiased as an
estimate of EPE

14
Other Classical Methods

Generalized Cross-Validation
Mallows Criterion
Pre-chosen by CV

15
Classical Methods (contd)

Improved AIC
Improved finite sample bias of classical AIC
is corrected
is chosen as minimizer
Classical methods
Tend to be highly variable
Tendency to undersmooth

16
Risk Estimation Methods

Require choosing pilot estimates
Risk Estimation using Classical Pilots (RECP)
Need pilot estimates for f and
Method 1 Blocking Method
Method 2
Get a pilot using a classical method
Use it to compute and
Final as minimizer of ,
hoping that it is a good estimator of

17
Risk Estimation Methods (contd)

Exact Double Smoothing (EDS)
Another approach to choose pilot estimates
Two levels of pilot estimates
optimal parameter that minimizes
(Assume)
minimizer of
Derive closed form expression , for
Replace unknown with
Choose as minimizer of ,
where minimizes

18
Speed Comparisons

CV, GCV, AIC roughly same computational time
Cp, RECP longer computational time
(2 numerical minimization)
EDS even longer (3 minimizations)

19
Conclusions from simulations

No method performed uniformly the best
The three classical methods CV, GCV and Cp -
gave very similar results
The AIC method has never given a worse
performance than CV, GCV or Cp
For a simple regression function with a high
noise level, the two restriction methods (RECP
and EDS) seem to be superior

20
Nonparametric Logistic Regression

Goal Approximate
Penalized log-likelihood

21
Nonparametric Logistic Regression (contd)

Parameters in f(x) are set so as to maximize
log-likelihood
Parametric
Non-parametric

22
Nonparametric Logistic Regression (contd)

p N-vector with elements
W diagonal matrix of weights
First derivative is non-linear in gt
Iterative algorithm (Newton-Raphson)

23
Nonparametric Logistic Regression (contd)

(
)
Update fits a weighted smoothing spline to the
working response z
Altho here x is 1D, generalized to higher-D x

24
Multidimensional Splines

Suppose
We have a separate basis of functions for
representing functions of X1 and X2
The M1xM2 dimensional tensor product basis is

25
(No Transcript)
26
2D Function Basis

Can be generalized to higher dimensions
Dimension of the basis grows exponentially in the
number of coordinates
MARS greedy algorithm for including only the
basis functions deemed necessary by least squares

27
Higher Dimensional Smoothing Splines

Suppose we have pairs
Want to find d-D regression function f(x)
Solve
( 1-D
)
J is a penalty functional for stabilizing f in

28
2D Roughness Penalty

Generalization of 1D penalty
Yields thin plate spline (smooth 2D surface)

29
Thin Plate Spline

Properties in common with 1D cubic smoothing
spline
As ? ? 0, solution approaches interpolating fn
As ? ? ?, solution approaches least squares plane
For intermediate values of ?, solution is a
linear expansion of basis functions

30
Thin Plate Spline

Solution has form
Can be generalized to arbitrary dimension

31
(No Transcript)
32
Computation Speeds

1D splines O(N)
Thin plate splines O(N3)
Can use fewer than N knots
Using K knots reduces order to O(NK2K3)

33
Additive Splines

Solution of the form
Each fi is a univariate spline
Assume f is additive and impose a penalty on each
of the component functions
ANOVA Spline Decomposition

34
Additive Vs Tensor Product

Tensor product basis can achieve more flexibility
but introduces some spurious structure too.

35
Conclusion

Smoothing Parameter selection and Bias-Variance
Comparison of various classical and risk
estimation methods for parameter selection
Non-parametric logistic regression
Extension to higher dimensions
Tensor Product
Thin plate splines
Additive Splines

36
References

Lee, T. (2002). Smoothing parameter selection for
Smoothing Splines A Simulation Study
(www.stat.colostate.edu/tlee/PSfiles/spline.
ps.gz)
Hastie, T., Tibshirani, R., Friedman, J. (2001).
The Elements of Statistical Learning Data
Mining, Inference, and Prediction

Write a Comment

User Comments (0)