Discussion of - PowerPoint PPT Presentation

About This Presentation
Title:

Discussion of

Description:

'Notorious' problem of automatic model building algorithms for ... The R package dr can be used to estimate d using methods such as sliced inverse regression ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 14
Provided by: michaels157
Category:

less

Transcript and Presenter's Notes

Title: Discussion of


1
Discussion of Least Angle Regression by Weisberg
  • Mike Salwan
  • November 2, 2006
  • Stat 882

2
Introduction
  • Notorious problem of automatic model building
    algorithms for linear regression
  • Implicit Assumption
  • Replacing Y by something without loss of info
  • Selecting variables
  • Summary

3
Implicit Assumption
  • We have n x m matrix X and n-vector Y
  • P is the projection onto the column space
  • LARS assumes we can replace Y with Y PY,
    in large samples F(yx) F(yxß)
  • We estimate residual variance by
  • If this assumption does not hold, then LARS is
    unlikely to produce useful results

4
Implicit Assumption (cont)
  • Alternative let F(yx) F(yxB), where B is an
    m x d rank d matrix. The smallest d is called
    the structural dimension of the regression
    problem
  • The R package dr can be used to estimate d using
    methods such as sliced inverse regression
  • Find a smooth function that operates on a
    variable set of projections
  • Expanded variables from 10 to 65 in paper such
    that F(yx) F(yxß) holds

5
Implicit Assumption (cont)
  • LARS relies too much on correlations
  • Correlation measures degree of linear association
    (obviously)
  • Requires linearity in conditional distributions
    of y and of ax and bx for all a and b,
    otherwise bizarre results can come
  • Any method replacing Y by PY cannot be sensitive
    to nonlinearity

6
Implicit Assumption (cont)
  • Methods based on PY alone can be strongly
    influenced by outliers and high leverage cases
  • Consider
  • Estimate s² by
  • Thus the ith term is given by
  • Yi is the ith element of PY and hi is the ith
    leverage which is a diagonal element in P

7
Implicit Assumption (cont)
  • From the simulation in the article, we can
    approximate the covariance term by ,
    where ui is the ith diagonal of the projection
    matrix on the columns of (1,X) at the current
    step of the algorithm
  • Thus,
  • This is the same formula in another paper by
    Weisberg where is computed from LARS instead
    of a projection

8
Implicit Assumption (cont)
  • The value of depends on the agreement
    between and yi, the leverage in the subset
    model and the difference in the leverage between
    the full and subset models
  • Neither of the latter two terms has much to do
    with the problem of interest (study of
    conditional distribution of y given x), but they
    are determined by the predictors only

9
Selecting Variables
  • We want to decompose x into two parts xu and xa
    where xa represents the active predictors
  • We want the smallest xa such that F(yx)
    F(yxa), often using some criterion
  • Standard methods are too greedy
  • LARS permits highly correlated predictors to be
    used

10
Selecting Variables (cont)
  • Example to disprove LARS
  • Added nine new variables by multiplying original
    variables by 2.2, then rounding to the nearest
    integer
  • LARS method applied to both sets
  • LARS selects two of the rounded variables
    including one variable and its rounded variable
    (BP)

11
Selecting Variables (cont)
  • Inclusion or exclusion depends on the marginal
    distribution of x as much as the conditional
    distribution of yx
  • Ex Two variables have a high correlation.
  • LARS selects one for its active set
  • Modify the other to make it now uncorrelated
  • Doesnt change yx, changes marginal of x
  • Could change set of active predictors selected by
    LARS or any method that uses correlation

12
Selecting Variables (cont)
  • LARS results are invariant under rescaling, but
    not under reparameterization of related
    predictors
  • By first scaling predictors then adding all
    cross-products and quadratics, we get a different
    model if done other way around
  • This can be solved by considering them
    simultaneously, but this is self-defeating in
    terms of subset selection

13
Summary
  • Problems gain notoriety because their solution is
    illusive but of wide interest
  • LARS nor any other automatic model selection
    considers the context of the problem
  • There seems to be no foreseeable solution to this
    problem
Write a Comment
User Comments (0)
About PowerShow.com