Title: Sociology 680
1Sociology 680
- Multivariate Analysis
- Logistic Regression
2 The Analysis of Categories
Quantity
Category
IV DV
Quantity
2) Structural Equation Models (SEM)
1) Analysis of Variance Models (ANOVA)
Linear Models
Category
4) Logistic Regression Models (LRM)
3) Log Linear Models (LLM)
Category Models
3What is Logistic Regression
- Logistic regression is typically used as an
extension of multiple regression, particularly
adapted to situations where the DV is categorical
and IVs are continuous. - It is not, however, restricted to quantitative
IVs. In fact, to the extent the IVs are
categorical themselves, logistic regression can
be thought of as an extension of log linear
modeling, where we are interested in
differentiating the IVs and DV. - If the categorical DV is dichotomous
(2-outcomes), it is called Binomial Logistic
Regression. If the DV has more than two
attributes, it is called Multinomial Logistic
Regression. If the DV categories can be ranked,
it is called ordinal logistic regression.
4The Premise of Logistic Regression
- Logistic Regression is similar to OLS regression
with the exception that it is based on the IVs
prediction of probabilities, odds, and the
logarithm of the odds, for a categorical DV,
rather than the prediction of specific values of
a quantitative DV - For example, age and income become predictors of
the likelihood of a dichotomous DV variable
like union membership, rather than some
quantitative variable, such as occupational
prestige, which would be predicted in a multiple
regression / path analysis.
5Probability and Odds
- Consider the following distribution of union
membership for 650 respondents members 212
non-members 438. - The probability of being a member would be
simply the number of members (outcome of
interest) expressed as a proportion to the total
possible (e.g. P(M) 212 / 650 .326) - The odds would be the ratio of the probability
P(x) to its compliment (1-P(x)). Using the
example above, the odds of being a member would
be P(M) / 1-P(M) .326/.674 .484.
6The Logit of Logistic Regression
- The index analyzed by logistic regression is the
log of the odds. In our example, the odds were
0.484 and the log of the odds is ln(.484)
-.728. This is called a logit and is simply
the natural logarithm of the odds of being in
that category. - In our union membership example, we might want to
know the effect a one unit change, in the value
of age or income has on predicting union
membership. (In logistic regression this odds
ratio is symbolized by or Exp (B) in SPSS).
It is defined as the ratio of the odds of being
classified in one category of the DV for two
different values of the IV.
7The Formulae of Logistic Regression
- Taking the definitions of probability, odds and
logits into account, we produce a formula that is
equivalent to a regression equation and is
characterized by the value , where B1X1
B2X2 ...BkXk ln / 1- . - is a somewhat involved calculation based on
the expected values of the odds ratios, but for
us, lets look at it as a number that gets us to
where we can comment on the probability of
observing one outcome or another, on the DV,
given the best linear combination of IV
predictors.
8Example of Logistic Regression
- Suppose we had a dichotomous dependent variable
such as job satisfaction (satisfied vs. not
satisfied), and wanted to know the ability of age
and hours worked (as IVs) to predict the
likelihood of being satisfied with ones job, or
not (i.e. to predict the likelihood of being in
one category or the other). - This would be equivalent to a multiple regression
analysis if job satisfaction were a continuous
dependant variable. But it is not. Therefore,
we use the binary logistic regression procedure
to identify the equivalent of beta weights, the
multiple R and residuals.
9SPSS Input for Logistic Regression
- In SPSS, this procedure is accessed through the
menus ANALYSE, REGRESSION, BINARY LOGISTIC.
10Output 1 for Logistic Regression
There are two important pieces of output to
review in assessing the effect of the IVs. The
first is a classification table that uses the
values of to generate predicted frequencies
for each category of the DV. When compared to
the observed frequencies, we can determine the
percentage correct in using our IVs variables to
predict DV outcomes
11Output 2 for Logistic Regression
The second output to be looked at is the table of
coefficients. Here, it would show the beta
weights for each variable and demonstrate that an
incremental change in life satisfaction is
marginally lower for each unit change in age and
marginally higher for each unit change in hours
worked the previous week. However, due to its
lack of significance, age makes this a weak
predictor IV.