Bayesian%20Semi-Parametric%20Multiple%20Shrinkage - PowerPoint PPT Presentation

About This Presentation

Title:

Bayesian%20Semi-Parametric%20Multiple%20Shrinkage

Description:

Non-identified effects are commonplace due to high ... Update the vector of coefficient configurations using a Metropolis step. Posterior Computation ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 21

Provided by: Lu97

Learn more at: https://people.ee.duke.edu

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian%20Semi-Parametric%20Multiple%20Shrinkage

1
Bayesian Semi-Parametric Multiple Shrinkage
Paper by Richard F. MacLehose, David B. Dunson
Duke University Machine Learning Group
Presented by Lu Ren
2
Outline

Motivation
Model and Lasso Prior
Semi-Parametric Multiple Shrinkage Priors
Posterior Computation
Experiment Results
Conclusions

3
Motivation

Non-identified effects are commonplace due to
high-dimensional or correlated data, such as gene
microarray.
2. Standard techniques use independent normal
priors centered at zero, with the degree of
shrinkage controlled by the prior variance.
3. Coefficients could be assumed exchangeable
within specific groups and be allowed to shrunk
toward different means, when sufficient prior
knowledge is available.
4. As such prior knowledge is lacking, a Bayesian
semiparametric hierarchical model is proposed in
this paper, by placing a DP prior on the unknown
mean and scale parameters.

4
Model and Lasso Prior
Suppose we collect data , where
is a vector of predictors and is a
binary outcome. A stand approach is to estimate
the coefficients in a regression model
. For
large , maximum likelihood estimates will
tend to have high variance and may not be
unique. However, we could incorporate a penalty
by using a lasso prior
. The DE denotes a double exponential
distribution, equivalent to
5
Multiple Shrinkage Prior
In many situations shrinkage toward non-null
values will be beneficial. Instead of inducing
shrinkage toward zero, the lasso model is
extended by introducing a mixture prior with
separate prior location and scale parameters
The data play more of role in their choice of
the hyper-parameters while favoring sparsity
through a carefully-tailored hyperprior. DP prior
is non-parametric and allows clustering of
parameters to help reduce dimensionality.
6
Multiple Shrinkage Prior
The proposed prior structure
The amount of shrinkage a coefficient exhibits
toward its prior mean is determined by ,
with larger values resulting in greater
shrinkage. Therefore, are
specified to make the prior as sparse as possible.
7
Multiple Shrinkage Prior
Assume the coefficient-specific
hyperparameter values into
clusters, . The number of
clusters is controlled by and the
coefficients are adaptively shrinked toward
non-zero locations. The priors equivalent stick
breaking form
if
if
The random variable
and .
8
Multiple Shrinkage Prior

Small make the number of clusters increase
more slowly than the number of coefficients.
Choosing a relatively large can give support
to a wide range of possible prior means.

9
Multiple Shrinkage Prior

Treat falling within small range around
zero as having no meaningful biologic effect .
Default prior specification
For ,
.
Recommend to choose smaller values for and
that are large enough to encourage shrinkage
but not so large as to overwhelm the data.
Specify so the DE prior
has prior credible
intervals of unit width.
3. Setting and to assign
95 probability to a very wide range of
reasonable prior effects.

10
Multiple Shrinkage Prior
Some testing methods
Assuming is null and let
indicates the predictor to have some effect
with probability
. From MCMC, we estimate
Or we can estimate the
posterior expected false discovery rate (FDR) for
a threshold , Or simply list the predictors
ordered by their posterior probabilities of
.
11
Posterior Computation
Assume the outcome occurs when a
latent variable, .
where and

1a. Augment the data with
sampled from
1b. Update by sampling from
2. Update the regression coefficients using the
current estimates of
by sampling from the following

12
Posterior Computation
where and
. The
matrix is a diagonal
matrix with element and is an
diagonal matrix with element

3. Update the mixing parameter .
4a. Update the prior location and scale
parameters using a modified version of the
retrospective stick breaking algorithm.
Sample with
where

13
Posterior Computation
The conditional distribution is
where and

4b. Sample from
4c. Update the vector of coefficient
configurations using a Metropolis step.

14
Posterior Computation
where
and normalizing
constant for is To determine the proposal
configuration for the prior, sample
. If
, let and draw new values of
from their prior until
. The new proposed configuration
is for moving
the coefficient to bin . The accepting
probability of moving from configuration to
is
15
(No Transcript)
16
Experiment Results

Simulation 50 data set
400 observations and 20 parameters, with 10 of
the parameters having true effect of 2 and the
remaining 10 having a true effect of 0.
100 observations and 200 parameters, 10 of which
have true effect of 2 while the remaining have
true effect 0.
The results show that the multiple shrinkage
prior offers improvement over the standard
Bayesian lasso and the reduction in MSE is
largely a result of decreased bias.
a the first 10 coefficients (MSE0.03) compared
to the standard lasso (MSE1.08) the remaining
10 (MSE0.01) while in standard lasso (MSE0.04).

17
Experiment Results
b the MSE of the 10 coefficients with and
effect of 2 is much lower in the multiple
shrinkage model (1.5 vs 3.2) the remaining 190
coefficients are estimated with slightly higher
MSE in the multiple shrinkage prior than the
standard lasso (0.08 vs 0.01). 2. Experiments on
Diabetes (Pima).
18
Experiment Results
The multiple shrinkage prior offered improvement
with a lower misclassification rate than the two
standard lasso and SVM. (21, 22 and 23) 3.
Multiple myeloma Analyze the data from 80
individuals diagnosed with multiple myeloma to
determine whether any polymorphisms are related
to early age. The predictors dimension is 135.
19
Experiment Results
20
Conclusions

The multiple shrinkage prior provides greater
flexibility in both the amount of shrinkage and
the value toward which coefficients are shrunk.
The new method can greatly decrease MSE ( largely
as a result of decreasing bias), which is
demonstrated in the experiment results.

Write a Comment

User Comments (0)