Bayesian%20Semi-Parametric%20Multiple%20Shrinkage - PowerPoint PPT Presentation

About This Presentation
Title:

Bayesian%20Semi-Parametric%20Multiple%20Shrinkage

Description:

Non-identified effects are commonplace due to high ... Update the vector of coefficient configurations using a Metropolis step. Posterior Computation ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 21
Provided by: Lu97
Category:

less

Transcript and Presenter's Notes

Title: Bayesian%20Semi-Parametric%20Multiple%20Shrinkage


1
Bayesian Semi-Parametric Multiple Shrinkage
Paper by Richard F. MacLehose, David B. Dunson
Duke University Machine Learning Group
Presented by Lu Ren
2
Outline
  • Motivation
  • Model and Lasso Prior
  • Semi-Parametric Multiple Shrinkage Priors
  • Posterior Computation
  • Experiment Results
  • Conclusions

3
Motivation
  • Non-identified effects are commonplace due to
    high-dimensional or correlated data, such as gene
    microarray.
  • 2. Standard techniques use independent normal
    priors centered at zero, with the degree of
    shrinkage controlled by the prior variance.
  • 3. Coefficients could be assumed exchangeable
    within specific groups and be allowed to shrunk
    toward different means, when sufficient prior
    knowledge is available.
  • 4. As such prior knowledge is lacking, a Bayesian
    semiparametric hierarchical model is proposed in
    this paper, by placing a DP prior on the unknown
    mean and scale parameters.

4
Model and Lasso Prior
Suppose we collect data , where
is a vector of predictors and is a
binary outcome. A stand approach is to estimate
the coefficients in a regression model
. For
large , maximum likelihood estimates will
tend to have high variance and may not be
unique. However, we could incorporate a penalty
by using a lasso prior
. The DE denotes a double exponential
distribution, equivalent to
5
Multiple Shrinkage Prior
In many situations shrinkage toward non-null
values will be beneficial. Instead of inducing
shrinkage toward zero, the lasso model is
extended by introducing a mixture prior with
separate prior location and scale parameters
The data play more of role in their choice of
the hyper-parameters while favoring sparsity
through a carefully-tailored hyperprior. DP prior
is non-parametric and allows clustering of
parameters to help reduce dimensionality.
6
Multiple Shrinkage Prior
The proposed prior structure
The amount of shrinkage a coefficient exhibits
toward its prior mean is determined by ,
with larger values resulting in greater
shrinkage. Therefore, are
specified to make the prior as sparse as possible.
7
Multiple Shrinkage Prior
Assume the coefficient-specific
hyperparameter values into
clusters, . The number of
clusters is controlled by and the
coefficients are adaptively shrinked toward
non-zero locations. The priors equivalent stick
breaking form
if
if
The random variable
and .
8
Multiple Shrinkage Prior
  1. Small make the number of clusters increase
    more slowly than the number of coefficients.
  2. Choosing a relatively large can give support
    to a wide range of possible prior means.

9
Multiple Shrinkage Prior
  • Treat falling within small range around
    zero as having no meaningful biologic effect .
  • Default prior specification
  • For ,
    .
  • Recommend to choose smaller values for and
    that are large enough to encourage shrinkage
    but not so large as to overwhelm the data.
  • Specify so the DE prior
    has prior credible
  • intervals of unit width.
  • 3. Setting and to assign
    95 probability to a very wide range of
    reasonable prior effects.

10
Multiple Shrinkage Prior
Some testing methods
Assuming is null and let
indicates the predictor to have some effect
with probability
. From MCMC, we estimate
Or we can estimate the
posterior expected false discovery rate (FDR) for
a threshold , Or simply list the predictors
ordered by their posterior probabilities of
.
11
Posterior Computation
Assume the outcome occurs when a
latent variable, .
where and
  • 1a. Augment the data with
    sampled from
  • 1b. Update by sampling from
  • 2. Update the regression coefficients using the
    current estimates of
    by sampling from the following

12
Posterior Computation
where and
. The
matrix is a diagonal
matrix with element and is an
diagonal matrix with element
  • 3. Update the mixing parameter .
  • 4a. Update the prior location and scale
    parameters using a modified version of the
    retrospective stick breaking algorithm.
  • Sample with
    where

13
Posterior Computation
The conditional distribution is
where and
  • 4b. Sample from
  • 4c. Update the vector of coefficient
    configurations using a Metropolis step.

14
Posterior Computation
where
and normalizing
constant for is To determine the proposal
configuration for the prior, sample
. If
, let and draw new values of
from their prior until
. The new proposed configuration
is for moving
the coefficient to bin . The accepting
probability of moving from configuration to
is
15
(No Transcript)
16
Experiment Results
  • Simulation 50 data set
  • 400 observations and 20 parameters, with 10 of
    the parameters having true effect of 2 and the
    remaining 10 having a true effect of 0.
  • 100 observations and 200 parameters, 10 of which
    have true effect of 2 while the remaining have
    true effect 0.
  • The results show that the multiple shrinkage
    prior offers improvement over the standard
    Bayesian lasso and the reduction in MSE is
    largely a result of decreased bias.
  • a the first 10 coefficients (MSE0.03) compared
    to the standard lasso (MSE1.08) the remaining
    10 (MSE0.01) while in standard lasso (MSE0.04).

17
Experiment Results
b the MSE of the 10 coefficients with and
effect of 2 is much lower in the multiple
shrinkage model (1.5 vs 3.2) the remaining 190
coefficients are estimated with slightly higher
MSE in the multiple shrinkage prior than the
standard lasso (0.08 vs 0.01). 2. Experiments on
Diabetes (Pima).
18
Experiment Results
The multiple shrinkage prior offered improvement
with a lower misclassification rate than the two
standard lasso and SVM. (21, 22 and 23) 3.
Multiple myeloma Analyze the data from 80
individuals diagnosed with multiple myeloma to
determine whether any polymorphisms are related
to early age. The predictors dimension is 135.
19
Experiment Results
20
Conclusions
  1. The multiple shrinkage prior provides greater
    flexibility in both the amount of shrinkage and
    the value toward which coefficients are shrunk.
  2. The new method can greatly decrease MSE ( largely
    as a result of decreasing bias), which is
    demonstrated in the experiment results.
Write a Comment
User Comments (0)
About PowerShow.com