Intermediate Social Statistics Lecture 4 - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Intermediate Social Statistics Lecture 4

Description:

Multinomial probit models are rarely used because of estimation difficulties. ... Stata 8 does not estimate multinomial probit. ... – PowerPoint PPT presentation

Number of Views:313
Avg rating:3.0/5.0
Slides: 19
Provided by: itde1
Category:

less

Transcript and Presenter's Notes

Title: Intermediate Social Statistics Lecture 4


1
Intermediate Social StatisticsLecture 4
  • Hilary Term 2006
  • Dr David Rueda

2
Today Multinomial Logit and Probit Models.
Ordered Logit and Probit Models.
  • Some practical things.
  • Multinomial Logit the set-up, probabilities,
    interpretation and IIA assumption.
  • Multinomial Probit brief explanation.
  • Ordered Logit set-up and interpretation.
  • Ordered Probit set-up and interpretation.

3
Practical Things
  • All lectures available from course website
  • http//users.ox.ac.uk/polf0050/page7.html
  • Deadline for take-home exam
  • 21st April (Friday, Noughth Week, Trinity Term).

4
Multinomial Logit Models (1)
  • Also known as polytomous logit (or logistic)
    models (because of the polytomous dependent
    variable).
  • This is the unordered dependent variable case.
  • Simple extension to the Logit model when the
    dependent variable can take more than 2
    categorical values.
  • Examples?
  • Multinomial logit models are multi-equation
    models.
  • A response variable with k categories will
    generate k-1 equations.
  • Each of these k-1 equations is a binary logistic
    regression comparing a group with the reference
    group.
  • Example if our dependent variable Y 0,1, or 2
    and our reference category was 0, then we would
    have two logit functions, one for Y1 versus Y0,
    and another for Y2 versus Y0.
  • Multinomial logistic regression simultaneously
    estimates the k-1 logits (done through MLE).
  • Same properties as those of the Logit model (see
    notes from last week).

5
Multinomial Logit Models (2) Setup.
  • y1,,J ? P1, P2, , PJ
  • Our dependent variable has several categories and
    each category has a probability associated with
    it.
  • P1 P2 PJ1
  • The probabilities for all the categories of Y
    (all the possible outcomes for our dependent
    variable) add to 100.
  • There is no order within the categories of Y (any
    of them can be the baseline for comparison).
  • Statas default is to chose the most common
    category as the baseline.
  • The choice is arbitrary but should be
    theoretically motivated.
  • The multinomial logit is equivalent to running a
    series of binomial logits
  • Imagine the outcome is voting and the options are
    Labour, Conservative, LibDem, etc.
  • The multinomial logit is the equivalent of
    running a series of binomial logits Con vs. Lab,
    LibDem vs. Lab, etc..
  • For details about the likelihood function and how
    to maximize it, see Borooah, Logit and Probit
    (Sage,2002).

6
Multinomial Logit Models (3) Setup.
  • The multinomial logit model is analogous to a
    logistic regression model, except that
  • The probability distribution of the response is
    multinomial instead of binomial and we have K - 1
    equations instead of one.
  • The K - 1 multinomial logit equations contrast
    each of categories 1, 2, . . . K -1 with category
    K, whereas the single logistic regression
    equation is a contrast between successes and
    failures.
  • If K 2 the multinomial logit model reduces to
    the usual logistic regression model.
  • Theoretically, we could simply fit a series
    separate binary logit models to find the
    coefficients, but these models would not give us
    a single overall measure of the deviance.
  • For details about the likelihood function and how
    to maximize it, see Borooah, Logit and Probit
    (Sage,2002).

7
Multinomial Logit Models (4) Probabilities.
  • McFadden (Conditional Logit Analysis of
    Qualitative Choice Behavior, 1973) shows (in the
    case of three categories)
  • Pr(y1) exp(ß1x) / (exp(ß1x)exp(ß2x)exp(ß3
    x))
  • Pr(y2) exp(ß2x) / (exp(ß1x)exp(ß2x)exp(ß3
    x))
  • Pr(y3) exp(ß3x) / (exp(ß1x)exp(ß2x)exp(ß3
    x))
  • Since the probabilities of all categories add to
    1, only the probability of all categories minus 1
    can be estimated independently.
  • We set one of the logits 0 (it doesnt matter
    which one).
  • If we set b1 0, the coefficients represent the
    change relative to the Y 1 category. They
    become risk ratios (assessments of the
    probability change relative to the probability of
    Y1).
  • The equations for the probabilities become
  • Pr(y1) 1 / (1 exp(ß2x)exp(ß3x))
  • Pr(y2) exp(ß2x) / (1 exp(ß2x)exp(ß3x))
  • Pr(y3) exp(ß3x) / (1 exp(ß2x)exp(ß3x))
  • The relative probability of Y2 to the base
    category Y1
  • Pr(Y2) / Pr(Y1) exp (b2x)

8
Multinomial Logit Models (5) Interpretation.
  • We will get k-1 sets of estimates. One set of
    estimates for the effects of the independent
    variables for each comparison with the reference
    level.
  • The sign of a coefficient estimate reflects the
    direction of change in the risk ratio (the ratio
    between P(Yk) / P(Y1)) in response to a ceteris
    paribus change in the value to which the
    coefficient is attached.
  • It does not reflect the direction of change in
    the individual probabilities P(Yk).
  • Odds ratios The odds ratios are simply the ratio
    of the exponentiated coefficients.
  • They mean that each change in the independent
    variable of interest multiplies the odds (by a
    factor of whatever the odds ratio is) of Yk
    (taking Y1 as our reference), holding the other
    independent variables constant.
  • Significance of coefficients similar to the
    binomial logit model (see notes from last week).
  • Goodness of fit similar to the binomial logit
    model (see notes from last week).

9
Multinomial Logit Models (6) IIA.
  • The IIA assumption
  • IIA Independence of Irrelevant Alternatives.
  • Assumption The odds between any two categories
    of the dependent variable are independent.
  • Back to the voting example this assumption means
    that multinomial logit models assume that the
    odds of voting for, say, Conservative vs Labour
    are not dependent on the addition of deletion of
    other categories in the model.
  • Testing the IIA assumption
  • Hausman and Small-Hsiao tests
  • Ho difference in coefficients not systematic.
  • Estimate the full model with k outcomes
  • Estimate a restricted model by eliminating one or
    more K.
  • Test difference between the two, which is a
    chi-square test with dfparameters in restricted
    model.
  • Significant values indicate violation of the
    assumption (i.e. the difference between the two
    models is systematic).

10
Multinomial Logit Models (6) IIA.
  • We can do this in Stata, more in computer
    session.
  • Results may be different.
  • We dont have a IIA assumption in the multinomial
    probit model (this model allows the response
    errors to correlate).
  • However, there are some other complications
    regarding the multinomial probit model.

11
Multinomial Probit Models.
  • Multinomial probit models are rarely used because
    of estimation difficulties.
  • In principle, we can do something similar to what
    we have done in the multinomial logit example.
    We have an equation for the binomial probit case,
    and we could add more equations for additional
    categories of Y. See notes from last week.
  • The practical problem has to do with the
    evaluation of higher-order multivariate normal
    integrals (see equations from last week).
  • It is sometimes used for dependent variables with
    3 categories, but estimation is computationally
    complicated after that.
  • Stata 8 does not estimate multinomial probit.
    For what you can do with Stata 9, go to computer
    session.
  • See Greene Econometric Analysis (third edition,
    p. 911) for details

12
Ordered Logit Models (1) Setup.
  • Also known as Proportional Odds models (for
    reasons youll see below).
  • Some multinomial variables are inherently
    ordered. Examples? Skill levels, agree
    questions, etc.
  • Although the outcome is discreet, a multinomial
    logit model could not account for the ordinal
    nature of the outcome.
  • The distance between categories is unknown (OLS
    would be inappropriate).
  • Ordered logit models are built around the same
    set of assumptions as binomial logit models.
  • In ordered logit, the probability of an outcome
    is calculated as a linear function of the
    independent variables plus a set of cut points.
  • The probability of observing Yji equals the
    probability that the estimated linear function is
    within the cut points estimated for the outcome
  • P(YJi)P(Ki-1lt (b1x1 b2x2 bkxk u) lt
    Ki)
  • u is the logistically distributed error term, we
    estimate b1bk together with the cut points (K)
    for each category of Y.
  • The probability of observing Yji can be
    rewritten as
  • P(YJi) 1/(1exp(-Ki(b1x1 bkxk))) -
    1/(1exp(-Ki-1(b1x1bkxk)))

13
Ordered Logit Models (2) Interpretation
  • We will obtain estimates of our cut points.
    These are the estimated values dividing the
    categories in our dependent variable.
  • We estimate the coefficient for each category of
    the independent variable to be the same while we
    estimate cut points that are free to vary.
  • Interpretation of the coefficients
  • The logits are cumulative logits that contrast
    categories above category K, for example, with
    category K and below.
  • A positive coefficient means a one-unit increase
    in the independent variable has the effect of
    increasing the odds of being in a higher category
    for the dependent variable.
  • In other words, we estimate the effects of X in
    raising or lowering the odds of a response in
    category K or below.

14
Ordered Logit Models (2) Interpretation
  • The Parallel Slopes Assumption (also known as the
    proportional odds assumption) requires that the
    separate equations for each category differ only
    in their intercepts.
  • Which means that the slopes are assumed to be the
    same when going from each category to the next.
  • In other words, the effect is a proportionate
    change in the odds of Yi K for all response
    categories. If a certain explanatory variable
    doubles the odds of being in category 1, it also
    doubles the odds of being in category 2 or below,
    or in category 3 or below (hence the alternative
    name of Proportional Odds model).
  • We can test this assumption by comparing the fit
    of the ordered logit model with that of a
    multinomial logit model.

15
Ordered Logit Models (3) Interpretation
  • More intuitive interpretation of the
    coefficients
  • We can also calculate the probability of a
    particular outcome P(YJi) associated with a
    particular set of independent variable values.
    For example Probability of (Y1) when x133,
    x20, x30 and x4 0.
  • We use the estimates for coefficients and cut
    points
  • P(YJi) 1/(1exp(-Ki(b1x1 bkxk))) -
    1/(1exp(-Ki-1(b1x1bkxk)))
  • Stata can do it for us, more in computer session.
  • This is also a good way of thinking about
    interactions.

16
Ordered Logit Models (3) Interpretation
  • Significance of coefficients is measured the same
    way as with the binomial logit model (see notes
    from last week).
  • Goodness of fit is measured the same way as with
    the binomial logit model with chi-square tests
    and pseudo R2 (see notes from last week).

17
Ordered Probit Models (1) Setup.
  • Ordered probit models are built around the same
    set of assumptions as binomial probit models.
  • In ordered probit, the probability of an outcome
    is calculated as a linear function of the
    independent variables plus a set of cut points.
  • The probability of observing Yji equals the
    probability that the estimated linear function is
    within the cut points estimated for the outcome
  • P(YJi)P(Ki-1lt (b1x1 b2x2 bkxk u) lt
    Ki)
  • u is the normally distributed error term, we
    estimate b1bk together with the cut points (K)
    for each category of Y.
  • The probability of observing Yji can be
    rewritten as
  • P(YJi) F (Ki - (b1x1 bkxk)) - F (Ki-1 -
    (b1x1 bkxk))
  • F () is the standard normal cumulative
    distribution function.
  • u is the normally distributed error term, we
    estimate b1bk together with the cut points (K)
    for each category of Y. See similarity with
    logit case.

18
Ordered Probit Models (2) Interpretation
  • Estimates from the ordered probit model are
    usually very similar to estimates from the
    ordered logit model (especially if the
    coefficients have been standardized to correct
    for the different variances in the normal and
    logistic distributions).
  • We will obtain estimates of our cut points.
    These are the estimated values dividing the
    categories in our dependent variable.
  • Interpretation of the coefficients
  • A positive coefficient means a one-unit increase
    in the independent variable has the effect of
    increasing the odds of being in a higher category
    for the dependent variable.
  • More intuitive interpretation of the
    coefficients
  • We can also calculate the probability of a
    particular outcome P(YJi) associated with a
    particular set of independent variable values.
    For example Probability of (Y1) when x133,
    x20, x30 and x4 0.
  • We use the estimates for coefficients and cut
    points
  • P(YJi) F (Ki - (b1x1 bkxk)) - F (Ki-1 -
    (b1x1 bkxk))
  • Stata can do it for us, more in computer session.
Write a Comment
User Comments (0)
About PowerShow.com