Discussion of - PowerPoint PPT Presentation

About This Presentation

Title:

Discussion of

Description:

Number of Views:23

Avg rating:3.0/5.0

Slides: 14

Provided by: michaels157

Category:

Tags: discussion | sliced

Transcript and Presenter's Notes

Title: Discussion of

1
Discussion of Least Angle Regression by Weisberg

2
Introduction

3
Implicit Assumption

We have n x m matrix X and n-vector Y
P is the projection onto the column space
LARS assumes we can replace Y with Y PY,
in large samples F(yx) F(yxß)
We estimate residual variance by
If this assumption does not hold, then LARS is
unlikely to produce useful results

4
Implicit Assumption (cont)

Alternative let F(yx) F(yxB), where B is an
m x d rank d matrix. The smallest d is called
the structural dimension of the regression
problem
The R package dr can be used to estimate d using
methods such as sliced inverse regression
Find a smooth function that operates on a
variable set of projections
Expanded variables from 10 to 65 in paper such
that F(yx) F(yxß) holds

5
Implicit Assumption (cont)

LARS relies too much on correlations
Correlation measures degree of linear association
(obviously)
Requires linearity in conditional distributions
of y and of ax and bx for all a and b,
otherwise bizarre results can come
Any method replacing Y by PY cannot be sensitive
to nonlinearity

6
Implicit Assumption (cont)

Methods based on PY alone can be strongly
influenced by outliers and high leverage cases
Consider
Estimate s² by
Thus the ith term is given by
Yi is the ith element of PY and hi is the ith
leverage which is a diagonal element in P

7
Implicit Assumption (cont)

From the simulation in the article, we can
approximate the covariance term by ,
where ui is the ith diagonal of the projection
matrix on the columns of (1,X) at the current
step of the algorithm
Thus,
This is the same formula in another paper by
Weisberg where is computed from LARS instead
of a projection

8
Implicit Assumption (cont)

The value of depends on the agreement
between and yi, the leverage in the subset
model and the difference in the leverage between
the full and subset models
Neither of the latter two terms has much to do
with the problem of interest (study of
conditional distribution of y given x), but they
are determined by the predictors only

9
Selecting Variables

We want to decompose x into two parts xu and xa
where xa represents the active predictors
We want the smallest xa such that F(yx)
F(yxa), often using some criterion
Standard methods are too greedy
LARS permits highly correlated predictors to be
used

10
Selecting Variables (cont)

Example to disprove LARS
Added nine new variables by multiplying original
variables by 2.2, then rounding to the nearest
integer
LARS method applied to both sets
LARS selects two of the rounded variables
including one variable and its rounded variable
(BP)

11
Selecting Variables (cont)

Inclusion or exclusion depends on the marginal
distribution of x as much as the conditional
distribution of yx
Ex Two variables have a high correlation.
LARS selects one for its active set
Modify the other to make it now uncorrelated
Doesnt change yx, changes marginal of x
Could change set of active predictors selected by
LARS or any method that uses correlation

12
Selecting Variables (cont)

LARS results are invariant under rescaling, but
not under reparameterization of related
predictors
By first scaling predictors then adding all
cross-products and quadratics, we get a different
model if done other way around
This can be solved by considering them
simultaneously, but this is self-defeating in
terms of subset selection

13
Summary

Problems gain notoriety because their solution is
illusive but of wide interest
LARS nor any other automatic model selection
considers the context of the problem
There seems to be no foreseeable solution to this
problem

Write a Comment

User Comments (0)