Logit - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Logit

Description:

twoway (connected probit x) (connected logit x) ... Logit or Probit? ... Logit and probit models are part of the 'binomial' family in the generalized ... – PowerPoint PPT presentation

Number of Views:457
Avg rating:3.0/5.0
Slides: 26
Provided by: davidl135
Category:
Tags: logit | probit

less

Transcript and Presenter's Notes

Title: Logit


1
Logit Probit Models
  • Theory and Estimation

2
Linear Probability Model
  • Linear probability model is the OLS model applied
    to a dichotomous dependent variable
  • Recall OLS model
  • Recall also that
  • Since we are dealing with a proportion
    P(p)E(p) we know the probability
  • Interpretation of the coefficients is
    straightforward
  • A one unit increase in X is associated with a ß
    increase in the probability of an event occurring
  • The relationship is linear so the impact of X on
    Y is constant

3
Example Swedish EURO Referenda
(sweden_class.dta)
  • . reg yesno age /regress euro vote on age/
  • Source SS df MS
    Number of obs 9936
  • -------------------------------------------
    F( 1, 9934) 15.72
  • Model 3.9140087 1 3.9140087
    Prob gt F 0.0001
  • Residual 2473.23001 9934 .248966178
    R-squared 0.0016
  • -------------------------------------------
    Adj R-squared 0.0015
  • Total 2477.14402 9935 .24933508
    Root MSE .49897
  • --------------------------------------------------
    ----------------------------
  • yesno Coef. Std. Err. t
    Pgtt 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • age -.001222 .0003082 -3.96
    0.000 -.0018262 -.0006179
  • _cons 2.868081 .6038954 4.75
    0.000 1.684324 4.051839
  • --------------------------------------------------
    ----------------------------
  • Interpretation as age increases by one year the
    expected change in the probability of voting for
    the referendum decreases by .001.
  • predict yhat, xb /generate predicted values/
  • (484 missing values generated)

4
(No Transcript)
5
Problems with the Linear Probability Model
  • Non-normal errors
  • Since Y takes only two possible values the
    residual (e) can also take on only two values
  • If Y1 then e 1 - p(x) with probability p(x)
  • If Y0 then e p(x) with probability 1 - p(x)
  • The distribution of e will have mean 0 and
    variance equal to p(x)1- p(x)
  • Note normality is not required for estimates to
    be unbiased but it is necessary for efficiency

6
  • Non-Constant Error Variance
  • Since the variance of e is p(x)1- p(x) we have
    non-constant variance variance that is a
    function of the value of X
  • This means that the OLS estimator for the linear
    probability model is inefficient and the standard
    errors are biased ? resulting in incorrect
    hypothesis tests
  • Non-Linearity
  • Since the OLS estimator is a linear model
    probabilities increase by the same amount as X
    goes up one unit, regardless of the value of X
  • This assumption is often met over a limited range
    of values of X
  • We often expect the impact of an X variable on
    the probability of Y to diminish as X increases
    (or decreases)
  • For example the likelihood of owning a house
    increases as income increases but at a decreasing
    rate
  • Nonsensical Predictions
  • The linear model can create predicted values that
    are not bounded by zero and one. This clearly
    does not make sense.

7
Logit and Probit ModelsA Latent Variable
Approach
  • A latent variable approach treats the use of a
    dichotomous variable essentially as a measurement
    problem
  • There is a continuous underlying, latent,
    variable (denoted Y) but we cannot observeand
    therefore cannot measureit.
  • Rather, we observe a dichotomous indicator of
    that latent variable
  • For example there is an underlying propensity
    for an individual to vote, for a nation to go to
    war, for a student to cheat. However, we only
    observe the outcomethe action, not the
    underlying propensity
  • The underlying model is

8
  • But we only observe the following realizations of
    Y
  • We can write
  • The last equality holds because the eis are
    distributed symmetrically.
  • In words we can say that Y1 if the random part
    is less than or equal to the systematic part
  • The problem is figuring out the probability. The
    requires the use of some distribution.

9
Logit
  • If we assume that e follows a standard logistic
    distribution then we get the logit model
  • Standard logistic distributionthe pdf
  • Cumulative distribution function for the standard
    logistic distribution
  • Standard logistic distribution is symmetrical
    around zero.

10
  • Recall that
  • Assuming a standard logistic distribution for e
    we can write this as
  • We can write this out for every observation in
    our sample in terms of the conditional
    expectation of Y given the value(s) of X. The
    likelihood for a given observation is
  • Observations with Y1 contribute P(Y1X) to the
    likelihood while those with Y0 contribute
    P(y0X).

11
  • Assuming independent observations we can take the
    product over all N observations to get the
    overall likelihood
  • Taking the natural logarithm results in
  • Now maximize this log-likelihood with respect to
    the Bs

12
Probit Models
  • Standard normal distribution has mean zero and
    unit variance. Its density looks as follows
  • The cumulative distribution function is

13
  • The probability for a probit looks like this
  • With a log likelihood of

14
What do these distributions look like?
  • . set obs 600
  • obs was 0, now 600
  • . egen xfill(-300 -299)
  • . replace xx/100
  • (599 real changes made)
  • . gen probit1/sqrt(23.1415)exp(-((x2)/2))
  • . gen logit(exp(x))/1exp(x)2
  • . twoway (connected probit x) (connected logit x)
  • Logit has fatter tails that is the major
    difference between the two

15
Cumulative distribution function
  • gen cumul_logitsum(logit)
  • gen cumul_probitsum(probit)
  • twoway (connected cumul_probit x) (connected
    cumul_logit x)

16
Which is Better? Logit or Probit?
  • From an empirical standpoint logits and probits
    typically yield similar estimates of the relevant
    derivatives
  • Because the cumulative distribution functions for
    the two models differ slightly only in the tails
    of their respective distributions
  • The derivatives are different only if there are
    enough observations in the tail of the
    distribution
  • While the derivatives are usually similar, the
    parameter estimates associated with the two
    models are not
  • Multiplying the logit estimates by 0.625 makes
    the logit estimates comparable to the probit
    estimates

17
Hypothesis Testing
  • Logit and probit models are part of the
    binomial family in the generalized linear model
    (GLM) framework.
  • All GLMs are fit using mle and provide a
    framework that we will use later when we add
    panel and time-series considerations.
  • Key component in hypothesis testing of GLM models
    is the likelihood both the initial likelihood
    and the final likelihood.
  • The likelihood also provides information
    regarding goodness of fit. In GLM models we can
    construct a measure called the deviance (G2)
    which is computed as G2-2logeL
  • The deviance is similar to the residual sum of
    squares from OLS.

18
  • Hypothesis tests and confidence intervals are
    standard across all MLE models.
  • Tests for individual slopes are based on the Wald
    statistic
  • Tests that several slopes are jointly equal to
    zero are based on the generalized
    likelihood-ratio test (based on the deviance) and
    have a ?2 distribution. This is similar to the
    F-test from OLS where the difference in ESS from
    a nested model is compared to the ESS from the
    comparison model with degrees of freedom
    dependent on the number of parameters being
    tested

19
Example EURO referendum Sweden September
2003VALU 2003/Exitpolls from 80 polling places.
  • Dataset is a subset of the exit poll 44
    questions in total.
  • N10,732.
  • sweden_class.dta
  • Question of interest how did you vote in the
    referendum today? Yes means that Sweden should
    join the EU and adopt the Euro No means that
    Sweden will maintain the status quo.
  • Outcomethe referendum was defeated.
  • Substantively interesting for lots of reasons
    useful for this class because there are lots of
    questions that are coded on a nominal, ordinal
    and ratio scale.

20
  • Contains data from C\Documents and
    Settings\Administrator\Desktop\class_sweden.dta
  • obs 10,732
    Extract from Swedish Exit Poll

  • Data
  • vars 14 8
    Sep 2004 1139
  • size 203,908 (99.7 of memory free)
  • --------------------------------------------------
    -----------------------------
  • storage display value
  • variable name type format label
    variable label
  • --------------------------------------------------
    -----------------------------
  • eu byte 40.0g eu Do
    you think Sweden should

  • resign from the EU or stay in

  • the Union
  • party byte 39.0g party
    What political party would you

  • vote for in a parliamentary

  • election today
  • gender byte 14.0g gender
    Gender
  • birthyear int 14.0g birth_year

  • What year were you born
  • citizen byte 14.0g citizen Are
    you a Swedish citizen

21
  • trust byte 14.0g trust
    Generally speeking, how much

  • trust do you have for

  • politicians
  • employed byte 67.0g employment

  • What is your employment

  • situation
  • immigration byte 33.0g imm_vote How
    important was the issue of

  • immigration for how you decided

  • to vote
  • democracy byte 33.0g dem_vote How
    important was democracy for

  • how you decided to vote
  • interestrate byte 33.0g intrate_vote
  • How
    important was the

  • possibility for Sweden to

  • decided its interest rate for

  • ho
  • ownecon byte 33.0g ownecon_vote
  • How
    important was the question

  • of your own economy for how you

22
Variable and Value Labels
  • Variable labels allow you to add a label that
    contains a description of the variable in the
    dataset
  • label var yesno Yesvote for Euro
  • Value labels allow you to label the values that
    an ordinal or nominal variable takes
  • label define eu 1"Sweden should resign from the
    EU" 2"Sweden should remain a member of the EU" 3
    "No opinion on hte matter" 9"No information"
  • label values v12 eu
  • . tab eu
  • Do you think Sweden should resign from
  • the EU or stay in the Union
    Freq. Percent Cum.
  • -------------------------------------------------
    --------------------------
  • Sweden should resign from the EU
    2,499 28.16 28.16
  • Sweden should remain a member of the E
    6,375 71.84 100.00
  • -------------------------------------------------
    --------------------------
  • Total
    8,874 100.00

23
Simple Logit Model
  • . logit yesno eu gender birthyear citizen
  • Iteration 0 log likelihood -5672.4066
  • Iteration 1 log likelihood -4144.9001
  • Iteration 2 log likelihood -4055.7904
  • Iteration 3 log likelihood -4048.5039
  • Iteration 4 log likelihood -4048.3928
  • Iteration 5 log likelihood -4048.3928
  • Logit estimates
    Number of obs 8196

  • LR chi2(4) 3248.03

  • Prob gt chi2 0.0000
  • Log likelihood -4048.3928
    Pseudo R2 0.2863
  • --------------------------------------------------
    ----------------------------
  • yesno Coef. Std. Err. z
    Pgtz 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • eu 3.661967 .0945649 38.72
    0.000 3.476623 3.847311
  • gender .3624785 .0547282 6.62
    0.000 .2552131 .4697439

24
Simple Probit Model
  • . probit yesno eu gender birthyear citizen
  • Iteration 0 log likelihood -5672.4066
  • Iteration 1 log likelihood -4131.6314
  • Iteration 2 log likelihood -4049.9967
  • Iteration 3 log likelihood -4047.6447
  • Iteration 4 log likelihood -4047.6407
  • Probit estimates
    Number of obs 8196

  • LR chi2(4) 3249.53

  • Prob gt chi2 0.0000
  • Log likelihood -4047.6407
    Pseudo R2 0.2864
  • --------------------------------------------------
    ----------------------------
  • yesno Coef. Std. Err. z
    Pgtz 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • eu 2.110938 .0458988 45.99
    0.000 2.020978 2.200898
  • gender .2146035 .0320565 6.69
    0.000 .1517739 .2774331

25
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com