Basis Expansions and Regularization - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Basis Expansions and Regularization

Description:

Controls trade-off betn bias and variance of. Selection of Smoothing Parameters ... Smoothing Parameter selection and Bias-Variance ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 37
Provided by: Pras50
Category:

less

Transcript and Presenter's Notes

Title: Basis Expansions and Regularization


1
Basis Expansions and Regularization
  • Selection of Smoothing Parameters
  • Non-parametric Logistic Regression
  • Multi-dimensional Splines
  • - Nagaraj Prasanth

2
Smoothing Parameters
  • Regression Splines
  • Degree
  • Number of knots
  • Placement of knots
  • Smoothing Splines
  • Only penalty parameter
  • (since knots are at all Xs and cubic degree is
    almost always used)

3
Smoothing spline
  • Cubic smoothing spline is minimizer of
  • smoothing parameter (ve const)
  • Controls trade-off betn bias and variance of

4
Selection of Smoothing Parameters
  • Fixing Degrees of Freedom
  • Bias-Variance Trade-off
  • Cross-Validation
  • Criterion
  • Improved AIC
  • Risk Estimation Methods
  • Risk Estimation using Classical Pilots (RECP)
  • Exact Double Smoothing (EDS)

5
Fixing Degrees of Freedom
  • (monotone in for smoothing splines)
  • Fix df gt get value of

6
Bias-Variance Tradeoff
  • Sample Data Generation
  • N100 pairs of drawn independently from
    the model below
  • Standard error bands at a given point x

7
Bias-Variance Tradeoff
8
(No Transcript)
9
Observations (Bias-Variance Tradeoff)
  • df5
  • Spline under fits
  • Bias that is most dramatic in regions of high
    curvature
  • Narrow se band gt bad estimate with great
    reliability
  • df9
  • Close to true function
  • Slight bias
  • Small variance
  • df15
  • Wiggly, but close to true function
  • Increased width of se bands

10
Integrated Squared Prediction Error (EPE)
  • Combines bias and variance

11
Smoothing parameter selection methods
  • EPE calculation needs true function
  • Other methods
  • Cross-Validation
  • Generalized Cross Validation
  • Criterion
  • Improved AIC

12
Cross Validation
  • Divide data into N blocks of N-1 points
  • Compute squared error
  • over all examples
  • and take its average
  • is chosen as the minimizer of

13
Cross-Validation Estimate of EPE
  • EPE and CV curves have similar shape
  • CV curve is an approximately unbiased as an
    estimate of EPE

14
Other Classical Methods
  • Generalized Cross-Validation
  • Mallows Criterion
  • Pre-chosen by CV

15
Classical Methods (contd)
  • Improved AIC
  • Improved finite sample bias of classical AIC
    is corrected
  • is chosen as minimizer
  • Classical methods
  • Tend to be highly variable
  • Tendency to undersmooth

16
Risk Estimation Methods
  • Require choosing pilot estimates
  • Risk Estimation using Classical Pilots (RECP)
  • Need pilot estimates for f and
  • Method 1 Blocking Method
  • Method 2
  • Get a pilot using a classical method
  • Use it to compute and
  • Final as minimizer of ,
    hoping that it is a good estimator of

17
Risk Estimation Methods (contd)
  • Exact Double Smoothing (EDS)
  • Another approach to choose pilot estimates
  • Two levels of pilot estimates
  • optimal parameter that minimizes
    (Assume)
  • minimizer of
  • Derive closed form expression , for
  • Replace unknown with
  • Choose as minimizer of ,
    where minimizes

18
Speed Comparisons
  • CV, GCV, AIC roughly same computational time
  • Cp, RECP longer computational time
  • (2 numerical minimization)
  • EDS even longer (3 minimizations)

19
Conclusions from simulations
  • No method performed uniformly the best
  • The three classical methods CV, GCV and Cp -
    gave very similar results
  • The AIC method has never given a worse
    performance than CV, GCV or Cp
  • For a simple regression function with a high
    noise level, the two restriction methods (RECP
    and EDS) seem to be superior

20
Nonparametric Logistic Regression
  • Goal Approximate
  • Penalized log-likelihood

21
Nonparametric Logistic Regression (contd)
  • Parameters in f(x) are set so as to maximize
    log-likelihood
  • Parametric
  • Non-parametric

22
Nonparametric Logistic Regression (contd)
  • p N-vector with elements
  • W diagonal matrix of weights
  • First derivative is non-linear in gt
  • Iterative algorithm (Newton-Raphson)

23
Nonparametric Logistic Regression (contd)
  • (
    )
  • Update fits a weighted smoothing spline to the
    working response z
  • Altho here x is 1D, generalized to higher-D x

24
Multidimensional Splines
  • Suppose
  • We have a separate basis of functions for
    representing functions of X1 and X2
  • The M1xM2 dimensional tensor product basis is

25
(No Transcript)
26
2D Function Basis
  • Can be generalized to higher dimensions
  • Dimension of the basis grows exponentially in the
    number of coordinates
  • MARS greedy algorithm for including only the
    basis functions deemed necessary by least squares


27
Higher Dimensional Smoothing Splines
  • Suppose we have pairs
  • Want to find d-D regression function f(x)
  • Solve
  • ( 1-D
    )
  • J is a penalty functional for stabilizing f in

28
2D Roughness Penalty
  • Generalization of 1D penalty
  • Yields thin plate spline (smooth 2D surface)

29
Thin Plate Spline
  • Properties in common with 1D cubic smoothing
    spline
  • As ? ? 0, solution approaches interpolating fn
  • As ? ? ?, solution approaches least squares plane
  • For intermediate values of ?, solution is a
    linear expansion of basis functions

30
Thin Plate Spline
  • Solution has form
  • Can be generalized to arbitrary dimension

31
(No Transcript)
32
Computation Speeds
  • 1D splines O(N)
  • Thin plate splines O(N3)
  • Can use fewer than N knots
  • Using K knots reduces order to O(NK2K3)

33
Additive Splines
  • Solution of the form
  • Each fi is a univariate spline
  • Assume f is additive and impose a penalty on each
    of the component functions
  • ANOVA Spline Decomposition

34
Additive Vs Tensor Product
  • Tensor product basis can achieve more flexibility
    but introduces some spurious structure too.

35
Conclusion
  • Smoothing Parameter selection and Bias-Variance
  • Comparison of various classical and risk
    estimation methods for parameter selection
  • Non-parametric logistic regression
  • Extension to higher dimensions
  • Tensor Product
  • Thin plate splines
  • Additive Splines

36
References
  • Lee, T. (2002). Smoothing parameter selection for
    Smoothing Splines A Simulation Study
  • (www.stat.colostate.edu/tlee/PSfiles/spline.
    ps.gz)
  • Hastie, T., Tibshirani, R., Friedman, J. (2001).
    The Elements of Statistical Learning Data
    Mining, Inference, and Prediction
Write a Comment
User Comments (0)
About PowerShow.com