Probit and Logit Models

About This Presentation

Title:

Probit and Logit Models

Description:

smoker byte %9.0g is current smoking. worka byte %9.0g has workplace smoking bans ... probit smoker age incomel male black hispanic hsgrad somecol college worka; ... – PowerPoint PPT presentation

Number of Views:1473

Avg rating:3.0/5.0

Slides: 76

Provided by: mprc8

Learn more at: https://www3.nd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Probit and Logit Models

1
Section 3

Probit and Logit Models

2
Dichotomous Data

Suppose data is discrete but there are only 2
outcomes
Examples
Graduate high school or not
Patient dies or not
Working or not
Smoker or not
In data, yi1 if yes, yi 0 if no

3
How to model the data generating process?

There are only two outcomes
Research question What factors impact whether
the event occurs?
To answer, will model the probability the outcome
occurs
Pr(Yi1) when yi1 or
Pr(Yi0) 1- Pr(Yi1) when yi0

Think of the problem from a MLE perspective
Likelihood for ith observation
Li Pr(Yi1)Yi 1 - Pr(Yi1)(1-Yi)
When yi1, only relevant part is Pr(Yi1)
When yi0, only relevant part is 1 - Pr(Yi1)

L Si lnLi
Si yi lnPr(yi1) (1-yi)lnPr(yi0)
Notice that up to this point, the model is
generic. The log likelihood function will
determined by the assumptions concerning how we
determine Pr(yi1)

6
Modeling the probability

There is some process (biological, social,
decision theoretic, etc) that determines the
outcome y
Some of the variables impacting are observed,
some are not
Requires that we model how these factors impact
the probabilities
Model from a latent variable perspective

Consider a womens decision to work
yi the persons net benefit to work
Two components of yi
Characteristics that we can measure
Education, age, income of spouse, prices of child
care
Some we cannot measure
How much you like spending time with your kids
how much you like/hate your job

We aggregate these two components into one
equation
yi ß0 x1i ß1 x2i ß2 xki ßk ei
xi ß ei
xi ß (measurable characteristics but with
uncertain weights)
ei random unmeasured characteristics
Decision rule person will work if yi gt 0
(if net benefits are positive)
yi1 if yigt0
yi0 if yi0

yi1 if yigt0
yi xi ß ei gt 0 only if
ei gt - xi ß
yi0 if yi0
yi xi ß ei 0 only if
ei - xi ß

Suppose xi ß is big.
High wages
Low husbands income
Low cost of child care
We would expect this person to work, UNLESS,
there is some unmeasured variable that
counteracts this

Suppose a mom really likes spending time with her
kids, or she hates her job.
The unmeasured benefit of working has a big
negative coefficient ei
If we observe them working, ei must not have been
too big, since
yi1 if ei gt - xi ß

Consider the opposite. Suppose we observe
someone NOT working.
Then ei must not have been big, since
yi0 if ei - xi ß

13
Logit

Recall yi 1 if ei gt - xi ß
Since ei is a logistic distribution
Pr(ei gt - xi ß) 1 F(- xi ß)
The logistic is also a symmetric distribution, so
1 F(- xi ß)
F(xi ß)
exp(xi ß)/(1exp(xi ß))

When ei is a logistic distribution
Pr(yi 1) exp(xi ß)/(1exp(xi ß))
Pr(yi0) 1/(1exp(xi ß))

15
Example Workplace smoking bans

Smoking supplements to 1991 and 1993 National
Health Interview Survey
Asked all respondents whether they currently
smoke
Asked workers about workplace tobacco policies
Sample workers
Key variables current smoking and whether they
faced by workplace ban

Data workplace1.dta
Sample program workplace1.doc
Results workplace1.log

17
Description of variables in data

. desc
storage display value
variable name type format label
variable label
--------------------------------------------------
----------------------
gt -
smoker byte 9.0g is
current smoking
worka byte 9.0g has
workplace smoking bans
age byte 9.0g age
in years
male byte 9.0g
male
black byte 9.0g
black
hispanic byte 9.0g
hispanic
incomel float 9.0g log
income
hsgrad byte 9.0g is
hs graduate
somecol byte 9.0g has
some college
college float 9.0g
--------------------------------------------------
---------------------

18
Summary statistics

sum
Variable Obs Mean Std. Dev.
Min Max
-------------------------------------------------
--------------------
smoker 16258 .25163 .433963
0 1
worka 16258 .6851396 .4644745
0 1
age 16258 38.54742 11.96189
18 87
male 16258 .3947595 .488814
0 1
black 16258 .1119449 .3153083
0 1
-------------------------------------------------
--------------------
hispanic 16258 .0607086 .2388023
0 1
incomel 16258 10.42097 .7624525
6.214608 11.22524
hsgrad 16258 .3355271 .4721889
0 1
somecol 16258 .2685447 .4432161
0 1
college 16258 .3293763 .4700012
0 1

19
Running a probit

probit smoker age incomel male black hispanic
hsgrad somecol college worka
The first variable after probit is the discrete
outcome, the rest of the variables are the
independent variables
Includes a constant as a default

20
Running a logit

logit smoker age incomel male black hispanic
hsgrad somecol college worka
Same as probit, just change the first word

21
Running linear probability

reg smoker age incomel male black hispanic hsgrad
somecol college worka, robust
Simple regression.
Standard errors are incorrect (heteroskedasticity)
robust option produces standard errors with
arbitrary form of heteroskedasticity

22
Probit Results

Probit estimates
Number of obs 16258
LR chi2(9) 819.44
Prob gt chi2 0.0000
Log likelihood -8761.7208
Pseudo R2 0.0447
--------------------------------------------------
----------------------------
smoker Coef. Std. Err. z
Pgtz 95 Conf. Interval
-------------------------------------------------
----------------------------
age -.0012684 .0009316 -1.36
0.173 -.0030943 .0005574
incomel -.092812 .0151496 -6.13
0.000 -.1225047 -.0631193
male .0533213 .0229297 2.33
0.020 .0083799 .0982627
black -.1060518 .034918 -3.04
0.002 -.17449 -.0376137
hispanic -.2281468 .0475128 -4.80
0.000 -.3212701 -.1350235
hsgrad -.1748765 .0436392 -4.01
0.000 -.2604078 -.0893453
somecol -.363869 .0451757 -8.05
0.000 -.4524118 -.2753262
college -.7689528 .0466418 -16.49
0.000 -.860369 -.6775366
worka -.2093287 .0231425 -9.05
0.000 -.2546873 -.1639702
_cons .870543 .154056 5.65
0.000 .5685989 1.172487
--------------------------------------------------
----------------------------

23
How to measure fit?

Regression (OLS)
minimize sum of squared errors
Or, maximize R2
The model is designed to maximize predictive
capacity
Not the case with Probit/Logit
MLE models pick distribution parameters so as
best describe the data generating process
May or may not predict the outcome well

24
Pseudo R2

LLk log likelihood with all variables
LL1 log likelihood with only a constant
0 gt LLk gt LL1 so LLk lt LL1
Pseudo R2 1 - LL1/LLk
Bounded between 0-1
Not anything like an R2 from a regression

25
Predicting Y

Let b be the estimated value of ß
For any candidate vector of xi , we can predict
probabilities, Pi
Pi ?(xib)
Once you have Pi, pick a threshold value, T, so
that you predict
Yp 1 if Pi gt T
Yp 0 if Pi T
Then compare, fraction correctly predicted

Question what value to pick for T?
Can pick .5
Intuitive. More likely to engage in the activity
than to not engage in it
However, when the ? is small, this criteria does
a poor job of predicting Yi1
However, when the ? is close to 1, this criteria
does a poor job of picking Yi0

predict probability of smoking
predict pred_prob_smoke
get detailed descriptive data about predicted
prob
sum pred_prob, detail
predict binary outcome with 50 cutoff
gen pred_smoke1pred_prob_smokegt.5
label variable pred_smoke1 "predicted smoking,
50 cutoff"
compare actual values
tab smoker pred_smoke1, row col cell

. sum pred_prob, detail
Pr(smoker)
--------------------------------------------------
-----------
Percentiles Smallest
1 .0959301 .0615221
5 .1155022 .0622963
10 .1237434 .0633929 Obs
16258
25 .1620851 .0733495 Sum of Wgt.
16258
50 .2569962 Mean
.2516653
Largest Std. Dev.
.0960007
75 .3187975 .5619798
90 .3795704 .5655878 Variance
.0092161
95 .4039573 .5684112 Skewness
.1520254
99 .4672697 .6203823 Kurtosis
2.149247

Notice two things
Sample mean of the predicted probabilities is
close to the sample mean outcome
99 of the probabilities are less than .5
Should predict few smokers if use a 50 cutoff

predicted smoking,
is current 50 cutoff
smoking 0 1 Total
-------------------------------------------
0 12,153 14 12,167
99.88 0.12 100.00
74.93 35.90 74.84
74.75 0.09 74.84
-------------------------------------------
1 4,066 25 4,091
99.39 0.61 100.00
25.07 64.10 25.16
25.01 0.15 25.16
-------------------------------------------
Total 16,219 39 16,258
99.76 0.24 100.00
100.00 100.00 100.00
99.76 0.24 100.00

Check on-diagonal elements.
The last number in each 2x2 element is the
fraction in the cell
The model correctly predicts 74.75 0.15
74.90 of the obs
It only predicts a small fraction of smokers

Do not be amazed by the 75 percent correct
prediction
If you said everyone has a ? chance of smoking (a
case of no covariates), you would be correct
Max(?,(1-?) percent of the time

In this case, 25.16 smoke.
If everyone had the same chance of smoking, we
would assign everyone Pr(y1) .2516
We would be correct for the 1 - .2516 0.7484
people who do not smoke

34
Key points about prediction

MLE models are not designed to maximize
prediction
Should not be surprised they do not predict well
In this case, not particularly good measures of
predictive capacity

35
Translating coefficients in probitContinuous
Covariates

Pr(yi1) Fß0 x1i ß1 x2i ß2 xki ßk
Suppose that x1i is a continuous variable
d Pr(yi1) /d x1i ?
What is the change in the probability of an event
give a change in x1i?

36
Marginal Effect

d Pr(yi1) /d x1i
ß1 fß0 x1i ß1 x2i ß2 xki ßk
Notice two things. Marginal effect is a function
of the other parameters and the values of x.

37
Translating CoefficientsDiscrete Covariates

Pr(yi1) Fß0 x1i ß1 x2i ß2 xki ßk
Suppose that x2i is a dummy variable (1 if yes, 0
if no)
Marginal effect makes no sense, cannot change x2i
by a little amount. It is either 1 or 0.
Redefine the variable of interest. Compare
outcomes with and without x2i

y1 Pr(yi1 x2i1)
Fß0 x1iß1 ß2 x3iß3
y0 Pr(yi1 x2i0)
Fß0 x1iß1 x3iß3
Marginal effect y1 y0.
Difference in probabilities with and without x2i?

39
In STATA

Marginal effects for continuous variables, STATA
picks sample means for Xs
Change in probabilities for dichotomous outcomes,
STATA picks sample means for Xs

40
STATA command for Marginal Effects

mfx compute
Must be after the outcome when estimates are
still active in program.

Marginal effects after probit
y Pr(smoker) (predict)
.24093439
--------------------------------------------------
----------------------------
variable dy/dx Std. Err. z Pgtz
95 C.I. X
-------------------------------------------------
----------------------------
age -.0003951 .00029 -1.36 0.173
-.000964 .000174 38.5474
incomel -.0289139 .00472 -6.13 0.000
-.03816 -.019668 10.421
male .0166757 .0072 2.32 0.021
.002568 .030783 .39476
black -.0320621 .01023 -3.13 0.002
-.052111 -.012013 .111945
hispanic -.0658551 .01259 -5.23 0.000
-.090536 -.041174 .060709
hsgrad -.053335 .01302 -4.10 0.000
-.07885 -.02782 .335527
somecol -.1062358 .01228 -8.65 0.000
-.130308 -.082164 .268545
college -.2149199 .01146 -18.76 0.000
-.237378 -.192462 .329376
worka -.0668959 .00756 -8.84 0.000
-.08172 -.052072 .68514
--------------------------------------------------
----------------------------
() dy/dx is for discrete change of dummy
variable from 0 to 1

42
Interpret results

10 increase in income will reduce smoking by 2.9
percentage points
10 year increase in age will decrease smoking
rates .4 percentage points
Those with a college degree are 21.5 percentage
points less likely to smoke
Those that face a workplace smoking ban have 6.7
percentage point lower probability of smoking

Do not confuse percentage point and percent
differences
A 6.7 percentage point drop is 29 of the sample
mean of 24 percent.
Blacks have smoking rates that are 3.2 percentage
points lower than others, which is 13 percent of
the sample mean

44
Comparing Marginal Effects
Variable LP Probit Logit
age -0.00040 -0.00048 -0.00048
incomel -0.0289 -0.0287 -0.0276
male 0.0167 0.0168 0.0172
Black -0.0321 -0.0357 -0.0342
hispanic -0.0658 -0.0706 -0.0602
hsgrad -0.0533 -0.0661 -0.0514
college -0.2149 -0.2406 -0.2121
worka -0.0669 -0.0661 -0.0658
45
When will results differ?

Normal and logit CDF look
Similar in the mid point of the distribution
Different in the tails
You obtain more observations in the tails of the
distribution when
Samples sizes are large
? approaches 1 or 0
These situations will produce more differences in
estimates

46
Some nice properties of the Logit

Outcome, y1 or 0
Treatment, x1 or 0
Other covariates, x
Context,
x whether a baby is born with a low weight
birth
x whether the mom smoked or not during pregnancy

Risk ratio
RR Prob(y1x1)/Prob(y1x0)
Differences in the probability of an event
when x is and is not observed
How much does smoking elevate the chance your
child will be a low weight birth

Let Yyx be the probability y1 or 0 given x1 or
0
Think of the risk ratio the following way
Y11 is the probability Y1 when X1
Y10 is the probability Y1 when X0
Y11 RRY10

Odds Ratio
ORA/B Y11/Y01/Y10/Y00
A Pr(Y1X1)/Pr(Y0X1)
odds of Y occurring if you are a smoker
B Pr(Y1X0)/Pr(Y0X0)
odds of y happening if you are not a
smoker
What are the relative odds of Y happening if you
do or do not experience X

Suppose Pr(Yi 1) F(ßo ß1Xi ß2Z) and F is
the logistic function
Can show that
OR exp(ß1) e ß1
This number is typically reported by most
statistical packages

Details
Y11 exp(ßo ß1 ß2Z) /(1 exp(ßo ß1 ß2Z) )
Y10 exp(ßo ß2Z)/(1 exp(ßoß2Z))
Y01 1 /(1 exp(ßo ß1 ß2Z) )
Y00 1/(1 exp(ßoß2Z)
Y11/Y01 exp(ßo ß1 ß2Z)
Y10/Y00 exp(ßo ß2Z)
ORA/B Y11/Y01/Y10/Y00
exp(ßo ß1 ß2Z)/ exp(ßo
ß2Z)
exp(ß1)

Suppose Y is rare, ? close to 0
Pr(Y0X1) and Pr(Y0X0) are both close to 1,
so they cancel
Therefore, when ? is close to 0
Odds Ratio Risk Ratio
Why is this nice?

53
Population attributable risk

Average outcome in the population
? (1-?) Y10 ? Y11 (1- ?)Y10 ?(RR)Y10
Average outcomes are a weighted average of
outcomes for X0 and X1
What would the average outcome be in the absence
of X (e.g., reduce smoking rates to 0)
Ya Y10

54
Population Attributable Risk

PAR
Fraction of outcome attributed to X
The difference between the current rate and the
rate that would exist without X, divided by the
current rate
PAR (? Ya)/?
(RR 1)?/(1-?) RR?

55
Example Maternal Smoking and Low Weight Births

6 births are low weight
lt 2500 grams (
Average birth is 3300 grams (5.5 lbs)
Maternal smoking during pregnancy has been
identified as a key cofactor
13 of mothers smoke
This number was falling about 1 percentage point
per year during 1980s/90s
Doubles chance of low weight birth

56
Natality detail data

Census of all births (4 million/year)
Annual files starting in the 60s
Information about
Baby (birth weight, length, date, sex,
plurality, birth injuries)
Demographics (age, race, marital, educ of mom)
Birth (who delivered, method of delivery)
Health of mom (smoke/drank during preg, weight
gain)

Smoking not available from CA or NY
3 million usable observations
I pulled .5 random sample from 1995
About 12,500 obs
Variables birthweight (grams), smoked, married,
4-level race, 5 level education, mothers age at
birth

--------------------------------------------------
----------------------------
gt -
storage display value
variable name type format label
variable label
--------------------------------------------------
----------------------------
gt -
birthw int 9.0g
birth weight in grams
smoked byte 9.0g 1
if mom smoked during
pregnancy
age byte 9.0g
moms age at birth
married byte 9.0g 1
if married
race4 byte 9.0g
1white,2black,3asian,4other
educ5 byte 9.0g
10-8, 29-11, 312, 413-15,
516
visits byte 9.0g
prenatal visits
--------------------------------------------------
----------------------------

dummy
variable,
1 1 if mom smoked
ifBWlt2500 during pregnancy
grams 0 1 Total
-------------------------------------------
0 11,626 1,745 13,371
86.95 13.05 100.00
94.64 89.72 93.96
81.70 12.26 93.96
-------------------------------------------
1 659 200 859
76.72 23.28 100.00
5.36 10.28 6.04
4.63 1.41 6.04
-------------------------------------------
Total 12,285 1,945 14,230
86.33 13.67 100.00
100.00 100.00 100.00

Notice a few things
13.7 of women smoke
6 have low weight birth
Pr(LBW Smoke) 10.28
Pr(LBW Smoke) 5.36
RR
Pr(LBW Smoke)/ Pr(LBW Smoke)
0.1028/0.0536 1.92

61
Logit results

Log likelihood -3136.9912
Pseudo R2 0.0330
--------------------------------------------------
----------------------------
lowbw Coef. Std. Err. z
Pgtz 95 Conf. Interval
-------------------------------------------------
----------------------------
smoked .6740651 .0897869 7.51
0.000 .4980861 .8500441
age .0080537 .006791 1.19
0.236 -.0052564 .0213638
married -.3954044 .0882471 -4.48
0.000 -.5683654 -.2224433
_Ieduc5_2 -.1949335 .1626502 -1.20
0.231 -.5137221 .1238551
_Ieduc5_3 -.1925099 .1543239 -1.25
0.212 -.4949791 .1099594
_Ieduc5_4 -.4057382 .1676759 -2.42
0.016 -.7343769 -.0770994
_Ieduc5_5 -.3569715 .1780322 -2.01
0.045 -.7059081 -.0080349
_Irace4_2 .7072894 .0875125 8.08
0.000 .5357681 .8788107
_Irace4_3 .386623 .307062 1.26
0.208 -.2152075 .9884535
_Irace4_4 .3095536 .2047899 1.51
0.131 -.0918271 .7109344
_cons -2.755971 .2104916 -13.09
0.000 -3.168527 -2.343415
--------------------------------------------------
----------------------------

62
Odds Ratios

Smoked
exp(0.674) 1.96
Smokers are twice as likely to have a low weight
birth
_Irace4_2 (Blacks)
exp(0.707) 2.02
Blacks are twice as likely to have a low weight
birth

63
Asking for odds ratios

Logistic y x1 x2
In this case
xi logistic lowbw smoked age married i.educ5
i.race4

Log likelihood -3136.9912
Pseudo R2 0.0330
--------------------------------------------------
----------------------------
lowbw Odds Ratio Std. Err. z
Pgtz 95 Conf. Interval
-------------------------------------------------
----------------------------
smoked 1.962198 .1761796 7.51
0.000 1.645569 2.33975
age 1.008086 .0068459 1.19
0.236 .9947574 1.021594
married .6734077 .0594262 -4.48
0.000 .5664506 .8005604
_Ieduc5_2 .8228894 .1338431 -1.20
0.231 .5982646 1.131852
_Ieduc5_3 .8248862 .1272996 -1.25
0.212 .6095837 1.116233
_Ieduc5_4 .6664847 .1117534 -2.42
0.016 .4798043 .9257979
_Ieduc5_5 .6997924 .1245856 -2.01
0.045 .4936601 .9919973
_Irace4_2 2.028485 .1775178 8.08
0.000 1.70876 2.408034
_Irace4_3 1.472001 .4519957 1.26
0.208 .8063741 2.687076
_Irace4_4 1.362817 .2790911 1.51
0.131 .9122628 2.035893
--------------------------------------------------
----------------------------

65
PAR

PAR (RR 1)?/(1-?) RR?
? 0.137
RR 1.96
PAR 0.116
11.6 of low weight births attributed to maternal
smoking

66
Hypothesis Testing in MLE models

MLE are asymptotically normally distributed, one
of the properties of MLE
Therefore, standard t-tests of hypothesis will
work as long as samples are large
What large means is open to question
What to do when samples are small table for a
moment

67
Testing a linear combination of parameters

Suppose you have a probit model
Fß0 x1iß1 x2i ß2 x3iß3
Test a linear combination or parameters
Simplest example, test a subset are zero
ß1 ß2 ß3 ß4 0
To fix the discussion
N observations
K parameters
J restrictions (count the equals signs, j4)

68
Wald Test

Based on the fact that the parameters are
distributed asymptotically normal
Probability theory review
Suppose you have m draws from a standard normal
distribution (zi)
M z12 z22 . Zm2
M is distributed as a Chi-square with m degrees
of freedom

Wald test constructs a quadratic form suggested
by the test you want to perform
This combination, because it contains squares of
the true parameters, should, if the hypothesis is
true, be distributed as a Chi square with j
degrees of freedom.
If the test statistic is large, relative to the
degrees of freedom of the test, we reject,
because there is a low probability we would have
drawn that value at random from the distribution

70
Reading values from a Table