Title: Special Topic: Logistic Regression for Binary outcomes
1Special Topic Logistic Regression for Binary
outcomes
- The dependent variable is often binary such as
whether a person litters or not, uses drugs or
not, dead or alive, diseased or not, or divorced
or not. - In this case, logistic or probit regression is
the method of choice because of violation of
assumptions if ordinary least squares regression
is used. - Estimates of the mediated effect using logistic
and probit regression can be distorted using
conventional procedures. - Here we examine binary or continuous X,
continuous M, and binary Y.
2Logistic Regression Model for Equations 1 and 2
- Standard logistic regression model, where Y
depends on X, ?1 is the intercept and t codes the
relation between X and Y. - logit PrY1X ?1 tX (1)
- Standard logistic regression model, where Y
depends on X and M, ?2 is the intercept, t' codes
the relation between X and Y adjusted for M and ?
codes the relation between M and Y, adjusted for
X. - logit PrY1X,M ?2 t'X ?M (2)
- .
3Logistic Regression Model for latent variable Y
- Y ?1 tX e1 (1)
- Y ?2 t'X ?M e2 (2)
- The unobserved latent variable Y is linearly
related to X and then to both X and M, e1 and e2
represent residual variability and have a
standard logistic distribution. The dichotomous Y
is derived from Y through the relation Y 1 if
and only if Y gt 0. The same model applies for
the probit with the errors having a standard
normal distribution rather than a standard
logistic distribution.
4Equation 3
- M ?3 aX e3 (3)
- M is a continuous variable so ordinary least
squares regression is used to estimate this model
where ?3 is the intercept, a represents the
relation between X and Y, and e3 is residual
variability.
5Logistic Regression Model for latent variable Y
- t - t' Difference in coefficients. The
coefficients are from separate logistic
regression equations. - a? Product of coefficients. The ?
coefficient is from a logistic regression model
and a is from an ordinary least squares
regression model. - As will be shown, the difference in coefficient
method can give distorted values for the mediated
effect because of differences in the scale of
separate logistic regression models. For both
Equations 1 and 2, residual variability is fixed
at ?2/3 and fixed at 1 for probit regression.
6What is the in the next plot?
- Expected logistic regression coefficients based
on Haggstrom (1983) are used to compute t - t'
and - a?.
- All possible combinations of a, ? and t' values
for small (2 variance explained), medium (13),
large (26), and very large (40) effects (4 X 4
X 4 64) - Y-axis is the expected value for t - t' and
- a?
- X-axis is the true value of the b coefficient in
the continuous variable mediation model. It is
indicated by ?C - Later plots will show the same information for
expected values after standardization.
7Plot of true values of a? and t - t' as a
function of true mediated effect and true value
of ?C.
8Plot of true proportion mediated as a function of
true value of ?C.
9a? and t - t' are not equal in Logistic and
Probit Regression
- The two estimators, a? and t - t' are not
identical in logistic or probit regression
because, unlike ordinary least squares regression
where the residual variance varies across
equations, in logistic regression the residual
variance is fixed to equal B2/3 (MacKinnon
Dwyer, 1993). So the logistic regression
coefficients are a function of the relations
among variables and the fixed value of the
residual variance. - There are solutions
10Solutions to mediation estimation in Logistic and
Probit Regression
- Standardize the values of the coefficients.
- One standardization method computes the variance
of Y in both equations and uses that to
standardize values (MacKinnon Dwyer, 1993
Winship Mare, 1983). - Another standardization method standardizes
coefficients in Equation 2 to be in the same
metric as Equation 1. To the best of our
knowledge, this is a new method that is described
below. - Use a computer program such as Mplus that
appropriately handles categorical variables in
covariance structure models. I believe that this
approach is similar to the first approach to
standardization, i.e., the scale of the latent Y
is the same for all equations in a model.
11Standardizing across logistic regression
equations
- Standardize the values of the coefficients in
Equations 1 and 2 (see MacKinnon Dwyer, 1993
and Winship Mare, 1983). - s2Y t 2sX2 ?2/3 and divide the t coefficient
and standard error by sY from this equation. - s2Y t'2sX2 ? 2sM2 2 t' ?sXM ? 2/3 and
divide the t' and ? coefficients and standard
errors by sY from this equation. - where sX2 is the variance of the X variable, sM2
is the variance of the M variable, and sXM is the
covariance of the X and M variables. - The a parameter does not require rescaling if M
is continuous. Note that if probit regression is
used the last term of the equations for s2Y
should be 1 rather than ?2/3.
12Standardizing Equation 2 to the metric of
Equation 1
- The coefficients from Equation 2 are divided by
the following quantity - where s233X is the residual variance in the
regression model for M predicted by X, i.e.
Equation 3. The first term is replaced with 1 for
probit regression.
13Plot of true values of a? and t - t' as a
function ?C, after standardization.
14Plot of true values of proportion mediated as a
function of ?C, after standardization.
15The estimated mediated effect, t - t', as a
function of ?C for a.14.
16Summary and Future Directions
-
- Unlike the linear OLS model case, the difference
in coefficients and product of coefficients
estimators of the mediated effect are not equal.
The difference in coefficients estimator is
distorted, as shown with expected values and in
the simulation study. The same problem occurs
for the proportion mediated measures. - Standardization of coefficients across equations
solves the problem and removes distortion. Two
approaches to standardization were mentioned, but
the results for rescaling coefficients in
Equation 2 to be in the same metric as those in
Equation 1 were described. The other
standardization method works in a similar manner. - The simplest approach is the product of
coefficients estimator of the mediated effect,
which does not require standardization.
Researchers who prefer the logic of the
difference in coefficients methods should
standardize coefficients prior to computing the
mediated effect.. - The standardization approaches should apply to
other examples of the Generalized Linear model
such as the Poisson and survival analysis model.