Title: Intermediate Social Statistics Lecture 4
1Intermediate Social StatisticsLecture 4
- Hilary Term 2006
- Dr David Rueda
2Today Multinomial Logit and Probit Models.
Ordered Logit and Probit Models.
- Some practical things.
- Multinomial Logit the set-up, probabilities,
interpretation and IIA assumption. - Multinomial Probit brief explanation.
- Ordered Logit set-up and interpretation.
- Ordered Probit set-up and interpretation.
3Practical Things
- All lectures available from course website
- http//users.ox.ac.uk/polf0050/page7.html
- Deadline for take-home exam
- 21st April (Friday, Noughth Week, Trinity Term).
4Multinomial Logit Models (1)
- Also known as polytomous logit (or logistic)
models (because of the polytomous dependent
variable). - This is the unordered dependent variable case.
- Simple extension to the Logit model when the
dependent variable can take more than 2
categorical values. - Examples?
- Multinomial logit models are multi-equation
models. - A response variable with k categories will
generate k-1 equations. - Each of these k-1 equations is a binary logistic
regression comparing a group with the reference
group. - Example if our dependent variable Y 0,1, or 2
and our reference category was 0, then we would
have two logit functions, one for Y1 versus Y0,
and another for Y2 versus Y0. - Multinomial logistic regression simultaneously
estimates the k-1 logits (done through MLE). - Same properties as those of the Logit model (see
notes from last week).
5Multinomial Logit Models (2) Setup.
- y1,,J ? P1, P2, , PJ
- Our dependent variable has several categories and
each category has a probability associated with
it. - P1 P2 PJ1
- The probabilities for all the categories of Y
(all the possible outcomes for our dependent
variable) add to 100. - There is no order within the categories of Y (any
of them can be the baseline for comparison). - Statas default is to chose the most common
category as the baseline. - The choice is arbitrary but should be
theoretically motivated. - The multinomial logit is equivalent to running a
series of binomial logits - Imagine the outcome is voting and the options are
Labour, Conservative, LibDem, etc. - The multinomial logit is the equivalent of
running a series of binomial logits Con vs. Lab,
LibDem vs. Lab, etc.. - For details about the likelihood function and how
to maximize it, see Borooah, Logit and Probit
(Sage,2002).
6Multinomial Logit Models (3) Setup.
- The multinomial logit model is analogous to a
logistic regression model, except that - The probability distribution of the response is
multinomial instead of binomial and we have K - 1
equations instead of one. - The K - 1 multinomial logit equations contrast
each of categories 1, 2, . . . K -1 with category
K, whereas the single logistic regression
equation is a contrast between successes and
failures. - If K 2 the multinomial logit model reduces to
the usual logistic regression model. - Theoretically, we could simply fit a series
separate binary logit models to find the
coefficients, but these models would not give us
a single overall measure of the deviance. - For details about the likelihood function and how
to maximize it, see Borooah, Logit and Probit
(Sage,2002).
7Multinomial Logit Models (4) Probabilities.
- McFadden (Conditional Logit Analysis of
Qualitative Choice Behavior, 1973) shows (in the
case of three categories) - Pr(y1) exp(ß1x) / (exp(ß1x)exp(ß2x)exp(ß3
x)) - Pr(y2) exp(ß2x) / (exp(ß1x)exp(ß2x)exp(ß3
x)) - Pr(y3) exp(ß3x) / (exp(ß1x)exp(ß2x)exp(ß3
x)) - Since the probabilities of all categories add to
1, only the probability of all categories minus 1
can be estimated independently. - We set one of the logits 0 (it doesnt matter
which one). - If we set b1 0, the coefficients represent the
change relative to the Y 1 category. They
become risk ratios (assessments of the
probability change relative to the probability of
Y1). - The equations for the probabilities become
- Pr(y1) 1 / (1 exp(ß2x)exp(ß3x))
- Pr(y2) exp(ß2x) / (1 exp(ß2x)exp(ß3x))
- Pr(y3) exp(ß3x) / (1 exp(ß2x)exp(ß3x))
- The relative probability of Y2 to the base
category Y1 - Pr(Y2) / Pr(Y1) exp (b2x)
8Multinomial Logit Models (5) Interpretation.
- We will get k-1 sets of estimates. One set of
estimates for the effects of the independent
variables for each comparison with the reference
level. - The sign of a coefficient estimate reflects the
direction of change in the risk ratio (the ratio
between P(Yk) / P(Y1)) in response to a ceteris
paribus change in the value to which the
coefficient is attached. - It does not reflect the direction of change in
the individual probabilities P(Yk). - Odds ratios The odds ratios are simply the ratio
of the exponentiated coefficients. - They mean that each change in the independent
variable of interest multiplies the odds (by a
factor of whatever the odds ratio is) of Yk
(taking Y1 as our reference), holding the other
independent variables constant. - Significance of coefficients similar to the
binomial logit model (see notes from last week). - Goodness of fit similar to the binomial logit
model (see notes from last week).
9Multinomial Logit Models (6) IIA.
- The IIA assumption
- IIA Independence of Irrelevant Alternatives.
- Assumption The odds between any two categories
of the dependent variable are independent. - Back to the voting example this assumption means
that multinomial logit models assume that the
odds of voting for, say, Conservative vs Labour
are not dependent on the addition of deletion of
other categories in the model. - Testing the IIA assumption
- Hausman and Small-Hsiao tests
- Ho difference in coefficients not systematic.
- Estimate the full model with k outcomes
- Estimate a restricted model by eliminating one or
more K. - Test difference between the two, which is a
chi-square test with dfparameters in restricted
model. - Significant values indicate violation of the
assumption (i.e. the difference between the two
models is systematic).
10Multinomial Logit Models (6) IIA.
- We can do this in Stata, more in computer
session. - Results may be different.
- We dont have a IIA assumption in the multinomial
probit model (this model allows the response
errors to correlate). - However, there are some other complications
regarding the multinomial probit model.
11Multinomial Probit Models.
- Multinomial probit models are rarely used because
of estimation difficulties. - In principle, we can do something similar to what
we have done in the multinomial logit example.
We have an equation for the binomial probit case,
and we could add more equations for additional
categories of Y. See notes from last week. - The practical problem has to do with the
evaluation of higher-order multivariate normal
integrals (see equations from last week). - It is sometimes used for dependent variables with
3 categories, but estimation is computationally
complicated after that. - Stata 8 does not estimate multinomial probit.
For what you can do with Stata 9, go to computer
session. - See Greene Econometric Analysis (third edition,
p. 911) for details
12Ordered Logit Models (1) Setup.
- Also known as Proportional Odds models (for
reasons youll see below). - Some multinomial variables are inherently
ordered. Examples? Skill levels, agree
questions, etc. - Although the outcome is discreet, a multinomial
logit model could not account for the ordinal
nature of the outcome. - The distance between categories is unknown (OLS
would be inappropriate). - Ordered logit models are built around the same
set of assumptions as binomial logit models. - In ordered logit, the probability of an outcome
is calculated as a linear function of the
independent variables plus a set of cut points. - The probability of observing Yji equals the
probability that the estimated linear function is
within the cut points estimated for the outcome - P(YJi)P(Ki-1lt (b1x1 b2x2 bkxk u) lt
Ki) - u is the logistically distributed error term, we
estimate b1bk together with the cut points (K)
for each category of Y. - The probability of observing Yji can be
rewritten as - P(YJi) 1/(1exp(-Ki(b1x1 bkxk))) -
1/(1exp(-Ki-1(b1x1bkxk)))
13Ordered Logit Models (2) Interpretation
- We will obtain estimates of our cut points.
These are the estimated values dividing the
categories in our dependent variable. - We estimate the coefficient for each category of
the independent variable to be the same while we
estimate cut points that are free to vary. - Interpretation of the coefficients
- The logits are cumulative logits that contrast
categories above category K, for example, with
category K and below. - A positive coefficient means a one-unit increase
in the independent variable has the effect of
increasing the odds of being in a higher category
for the dependent variable. - In other words, we estimate the effects of X in
raising or lowering the odds of a response in
category K or below.
14Ordered Logit Models (2) Interpretation
- The Parallel Slopes Assumption (also known as the
proportional odds assumption) requires that the
separate equations for each category differ only
in their intercepts. - Which means that the slopes are assumed to be the
same when going from each category to the next. - In other words, the effect is a proportionate
change in the odds of Yi K for all response
categories. If a certain explanatory variable
doubles the odds of being in category 1, it also
doubles the odds of being in category 2 or below,
or in category 3 or below (hence the alternative
name of Proportional Odds model). - We can test this assumption by comparing the fit
of the ordered logit model with that of a
multinomial logit model.
15Ordered Logit Models (3) Interpretation
- More intuitive interpretation of the
coefficients - We can also calculate the probability of a
particular outcome P(YJi) associated with a
particular set of independent variable values.
For example Probability of (Y1) when x133,
x20, x30 and x4 0. - We use the estimates for coefficients and cut
points - P(YJi) 1/(1exp(-Ki(b1x1 bkxk))) -
1/(1exp(-Ki-1(b1x1bkxk))) - Stata can do it for us, more in computer session.
- This is also a good way of thinking about
interactions.
16Ordered Logit Models (3) Interpretation
- Significance of coefficients is measured the same
way as with the binomial logit model (see notes
from last week). - Goodness of fit is measured the same way as with
the binomial logit model with chi-square tests
and pseudo R2 (see notes from last week).
17Ordered Probit Models (1) Setup.
- Ordered probit models are built around the same
set of assumptions as binomial probit models. - In ordered probit, the probability of an outcome
is calculated as a linear function of the
independent variables plus a set of cut points. - The probability of observing Yji equals the
probability that the estimated linear function is
within the cut points estimated for the outcome - P(YJi)P(Ki-1lt (b1x1 b2x2 bkxk u) lt
Ki) - u is the normally distributed error term, we
estimate b1bk together with the cut points (K)
for each category of Y. - The probability of observing Yji can be
rewritten as - P(YJi) F (Ki - (b1x1 bkxk)) - F (Ki-1 -
(b1x1 bkxk)) - F () is the standard normal cumulative
distribution function. - u is the normally distributed error term, we
estimate b1bk together with the cut points (K)
for each category of Y. See similarity with
logit case.
18Ordered Probit Models (2) Interpretation
- Estimates from the ordered probit model are
usually very similar to estimates from the
ordered logit model (especially if the
coefficients have been standardized to correct
for the different variances in the normal and
logistic distributions). - We will obtain estimates of our cut points.
These are the estimated values dividing the
categories in our dependent variable. - Interpretation of the coefficients
- A positive coefficient means a one-unit increase
in the independent variable has the effect of
increasing the odds of being in a higher category
for the dependent variable. - More intuitive interpretation of the
coefficients - We can also calculate the probability of a
particular outcome P(YJi) associated with a
particular set of independent variable values.
For example Probability of (Y1) when x133,
x20, x30 and x4 0. - We use the estimates for coefficients and cut
points - P(YJi) F (Ki - (b1x1 bkxk)) - F (Ki-1 -
(b1x1 bkxk)) - Stata can do it for us, more in computer session.