The Group Lasso for Logistic Regression - PowerPoint PPT Presentation

About This Presentation
Title:

The Group Lasso for Logistic Regression

Description:

The Group Lasso for Logistic Regression. Lukas Meier, Sara ... controls the amount of penalization. rescale the penalty with respect to the. dimensionality of ... – PowerPoint PPT presentation

Number of Views:497
Avg rating:3.0/5.0
Slides: 20
Provided by: Lu97
Category:

less

Transcript and Presenter's Notes

Title: The Group Lasso for Logistic Regression


1
The Group Lasso for Logistic Regression
Lukas Meier, Sara van de Geer and Peter Bühlmann
Presenter Lu Ren ECE Dept., Duke
University Sept. 19, 2008
2
Outline
  • From lasso to group lasso
  • logistic group lasso
  • Algorithms for the logistic group lasso
  • Logistic group lasso-ridge hybrid
  • Simulation and application to splice site
    detection
  • Discussion

3
Lasso
A popular model selection and shrinkage
estimation method. In a linear regression set-up
  • continuous response
  • design matrix
  • parameter vector
  • The lasso estimator is then defined as

where , and larger set
some exactly to 0.
4
Group Lasso
In some cases not only continuous but also
categorical predictors (factors) are present, the
lasso solution is not satisfactory with only
selecting individual dummy variables but the
whole factor.
Extended from the lasso penalty, the group lasso
estimator is
the index set belonging to the
th group of variables. The penalty does
the variable selection at the group level ,
belonging to the intermediate between and
type penalty. It encourages
that either or for all

5
Connection
Consider a case two factors
and
Observe the contour of the penalty function
-penalty treats the three co-ordinate
directions differently encourage sparsity in
individual coefficients while -penalty treats
all directions equally and does not encourage
sparsity.
Ref Ming Yuan and Yi Lin, Model selection and
estimation in regression with grouped variables,
J.R. Statist.,2008
6
Logistic Group Lasso
Independent and identically distributed
observations
p-dimensional vector of predictors
a binary response variable,
feedom degree
The conditional probability
with
The estimator is given by the minimizer of
the convex function
7
Logistic Group Lasso
controls the amount of penalization
rescale the penalty with respect to the
dimensionality of
8
Optimization Algorithms
  • Block co-ordinate descent
  • Cycle through the parameter groups and
    minimize the object function , keeping
    all except the current group fixed.

9
Optimization Algorithms
set to while all other components
remain unchanged


the parameter vector after block updates, and
it can be
shown every limit point of the sequence is a
minimum point of
blockwise minimizations of the active groups
must be performed numerically, and sufficiently
fast for small group size and dimension.
2. Block co-ordinate gradient descent Combine a
quadratic approximation of the log-likelihood
with an additional line search
10
Optimization Algorithms
Armijo rule an inexact line search, let be
the largest value in so that
11
Optimization Algorithms
  • Minimization with respect to the th parameter
    group depends on
  • only , here define
    .
  • A proper choice is
    where
  • is a lower bound to ensure convergence.
  • To calculate the on a grid of the penalty
    parameter we can start at

We use as a starting value for and
proceed iteratively until with
equal or close to 0.
12
Hybrid Methods
  • Logistic group lasso-ridge hybrid
  • The models selected by the group lasso are large
    compared with the underlying true models
  • The ordinary lasso can obtain good prediction
    with smaller models by using lasso with
    relaxation.

Define the index set of
predictors selected by the group lasso with ,
and
is the set of possible parameter vectors
of the corresponding submodel.
The group lasso-ridge hybrid estimator
is a special case called the group lasso-MLE
hybrid
13
Simulation
First sample instances of a nine-dim
multivariate normal distribution
with mean 0 and covariance matrix
Each is transformed into a four-valued
categorical variable by using the quartiles of
the standard normal so that Simulate
independent standard normal
and
Four different cases are studied
14
Observations The group lasso seems to select
unnecessarily large models with many noise
variables The group lasso-MLE hybrid is very
conservative in selecting terms The group
lasso-ridge hybrid seems to be the best
compromise and has the best prediction
performance in terms of the log-likelihood score.
15
(No Transcript)
16
Application Experiment
Splice sites the regions between coding (exons)
and non-coding (introns) DNA segments.
Two training data set 5610 true and 5610 false
donor sites
2805 true and 59804 false donor sites Test sets
4208 true and 89717 false donor sites. For a
threshold we assign observation to
class if And to class otherwise. The Person
correlation between true class membership and
the predicted class membership.
17
The best model with respect to the log-likelihood
score on the validation set is the group lasso
estimator.
The corresponding values of on the test set
are and ,
respectively. Whereas the group lasso solution
has some active three-way interactions, the group
lasso-ridge hybrid and the group lasso-MLE hybrid
contain only two-way interations. The three-way
interactions of the group lasso solution seem to
be very weak.
18
(No Transcript)
19
Conclusions
  • Study the group lasso for logistic regression
  • Present efficient algorithm (automatic and much
    faster)
  • Propose the group lasso-ridge hybrid method
  • Apply to short DNA motif modelling and splice
    site detection
Write a Comment
User Comments (0)
About PowerShow.com