Title: Bayesian%20Semi-Parametric%20Multiple%20Shrinkage
1Bayesian Semi-Parametric Multiple Shrinkage
Paper by Richard F. MacLehose, David B. Dunson
Duke University Machine Learning Group
Presented by Lu Ren
2Outline
- Motivation
- Model and Lasso Prior
- Semi-Parametric Multiple Shrinkage Priors
- Posterior Computation
- Experiment Results
- Conclusions
3Motivation
- Non-identified effects are commonplace due to
high-dimensional or correlated data, such as gene
microarray. - 2. Standard techniques use independent normal
priors centered at zero, with the degree of
shrinkage controlled by the prior variance. - 3. Coefficients could be assumed exchangeable
within specific groups and be allowed to shrunk
toward different means, when sufficient prior
knowledge is available. - 4. As such prior knowledge is lacking, a Bayesian
semiparametric hierarchical model is proposed in
this paper, by placing a DP prior on the unknown
mean and scale parameters.
4Model and Lasso Prior
Suppose we collect data , where
is a vector of predictors and is a
binary outcome. A stand approach is to estimate
the coefficients in a regression model
. For
large , maximum likelihood estimates will
tend to have high variance and may not be
unique. However, we could incorporate a penalty
by using a lasso prior
. The DE denotes a double exponential
distribution, equivalent to
5Multiple Shrinkage Prior
In many situations shrinkage toward non-null
values will be beneficial. Instead of inducing
shrinkage toward zero, the lasso model is
extended by introducing a mixture prior with
separate prior location and scale parameters
The data play more of role in their choice of
the hyper-parameters while favoring sparsity
through a carefully-tailored hyperprior. DP prior
is non-parametric and allows clustering of
parameters to help reduce dimensionality.
6Multiple Shrinkage Prior
The proposed prior structure
The amount of shrinkage a coefficient exhibits
toward its prior mean is determined by ,
with larger values resulting in greater
shrinkage. Therefore, are
specified to make the prior as sparse as possible.
7Multiple Shrinkage Prior
Assume the coefficient-specific
hyperparameter values into
clusters, . The number of
clusters is controlled by and the
coefficients are adaptively shrinked toward
non-zero locations. The priors equivalent stick
breaking form
if
if
The random variable
and .
8Multiple Shrinkage Prior
- Small make the number of clusters increase
more slowly than the number of coefficients. - Choosing a relatively large can give support
to a wide range of possible prior means.
9Multiple Shrinkage Prior
- Treat falling within small range around
zero as having no meaningful biologic effect . - Default prior specification
- For ,
. - Recommend to choose smaller values for and
that are large enough to encourage shrinkage
but not so large as to overwhelm the data. - Specify so the DE prior
has prior credible - intervals of unit width.
- 3. Setting and to assign
95 probability to a very wide range of
reasonable prior effects.
10Multiple Shrinkage Prior
Some testing methods
Assuming is null and let
indicates the predictor to have some effect
with probability
. From MCMC, we estimate
Or we can estimate the
posterior expected false discovery rate (FDR) for
a threshold , Or simply list the predictors
ordered by their posterior probabilities of
.
11Posterior Computation
Assume the outcome occurs when a
latent variable, .
where and
- 1a. Augment the data with
sampled from - 1b. Update by sampling from
- 2. Update the regression coefficients using the
current estimates of
by sampling from the following
12Posterior Computation
where and
. The
matrix is a diagonal
matrix with element and is an
diagonal matrix with element
- 3. Update the mixing parameter .
- 4a. Update the prior location and scale
parameters using a modified version of the
retrospective stick breaking algorithm. - Sample with
where
13Posterior Computation
The conditional distribution is
where and
- 4b. Sample from
- 4c. Update the vector of coefficient
configurations using a Metropolis step.
14Posterior Computation
where
and normalizing
constant for is To determine the proposal
configuration for the prior, sample
. If
, let and draw new values of
from their prior until
. The new proposed configuration
is for moving
the coefficient to bin . The accepting
probability of moving from configuration to
is
15(No Transcript)
16Experiment Results
- Simulation 50 data set
- 400 observations and 20 parameters, with 10 of
the parameters having true effect of 2 and the
remaining 10 having a true effect of 0. - 100 observations and 200 parameters, 10 of which
have true effect of 2 while the remaining have
true effect 0. - The results show that the multiple shrinkage
prior offers improvement over the standard
Bayesian lasso and the reduction in MSE is
largely a result of decreased bias. - a the first 10 coefficients (MSE0.03) compared
to the standard lasso (MSE1.08) the remaining
10 (MSE0.01) while in standard lasso (MSE0.04).
17Experiment Results
b the MSE of the 10 coefficients with and
effect of 2 is much lower in the multiple
shrinkage model (1.5 vs 3.2) the remaining 190
coefficients are estimated with slightly higher
MSE in the multiple shrinkage prior than the
standard lasso (0.08 vs 0.01). 2. Experiments on
Diabetes (Pima).
18Experiment Results
The multiple shrinkage prior offered improvement
with a lower misclassification rate than the two
standard lasso and SVM. (21, 22 and 23) 3.
Multiple myeloma Analyze the data from 80
individuals diagnosed with multiple myeloma to
determine whether any polymorphisms are related
to early age. The predictors dimension is 135.
19Experiment Results
20Conclusions
- The multiple shrinkage prior provides greater
flexibility in both the amount of shrinkage and
the value toward which coefficients are shrunk. - The new method can greatly decrease MSE ( largely
as a result of decreasing bias), which is
demonstrated in the experiment results.