Title: Probit and Logit Models
1Section 3
2Dichotomous Data
- Suppose data is discrete but there are only 2
outcomes - Examples
- Graduate high school or not
- Patient dies or not
- Working or not
- Smoker or not
- In data, yi1 if yes, yi 0 if no
3How to model the data generating process?
- There are only two outcomes
- Research question What factors impact whether
the event occurs? - To answer, will model the probability the outcome
occurs - Pr(Yi1) when yi1 or
- Pr(Yi0) 1- Pr(Yi1) when yi0
4- Think of the problem from a MLE perspective
- Likelihood for ith observation
- Li Pr(Yi1)Yi 1 - Pr(Yi1)(1-Yi)
- When yi1, only relevant part is Pr(Yi1)
- When yi0, only relevant part is 1 - Pr(Yi1)
5- L Si lnLi
- Si yi lnPr(yi1) (1-yi)lnPr(yi0)
- Notice that up to this point, the model is
generic. The log likelihood function will
determined by the assumptions concerning how we
determine Pr(yi1)
6Modeling the probability
- There is some process (biological, social,
decision theoretic, etc) that determines the
outcome y - Some of the variables impacting are observed,
some are not - Requires that we model how these factors impact
the probabilities - Model from a latent variable perspective
7- Consider a womens decision to work
- yi the persons net benefit to work
- Two components of yi
- Characteristics that we can measure
- Education, age, income of spouse, prices of child
care - Some we cannot measure
- How much you like spending time with your kids
- how much you like/hate your job
8- We aggregate these two components into one
equation - yi ß0 x1i ß1 x2i ß2 xki ßk ei
- xi ß ei
- xi ß (measurable characteristics but with
uncertain weights) - ei random unmeasured characteristics
- Decision rule person will work if yi gt 0
- (if net benefits are positive)
- yi1 if yigt0
- yi0 if yi0
9- yi1 if yigt0
- yi xi ß ei gt 0 only if
- ei gt - xi ß
- yi0 if yi0
- yi xi ß ei 0 only if
- ei - xi ß
10- Suppose xi ß is big.
- High wages
- Low husbands income
- Low cost of child care
- We would expect this person to work, UNLESS,
there is some unmeasured variable that
counteracts this
11- Suppose a mom really likes spending time with her
kids, or she hates her job. - The unmeasured benefit of working has a big
negative coefficient ei - If we observe them working, ei must not have been
too big, since - yi1 if ei gt - xi ß
12- Consider the opposite. Suppose we observe
someone NOT working. - Then ei must not have been big, since
- yi0 if ei - xi ß
13Logit
- Recall yi 1 if ei gt - xi ß
- Since ei is a logistic distribution
- Pr(ei gt - xi ß) 1 F(- xi ß)
- The logistic is also a symmetric distribution, so
- 1 F(- xi ß)
- F(xi ß)
- exp(xi ß)/(1exp(xi ß))
14- When ei is a logistic distribution
- Pr(yi 1) exp(xi ß)/(1exp(xi ß))
- Pr(yi0) 1/(1exp(xi ß))
15Example Workplace smoking bans
- Smoking supplements to 1991 and 1993 National
Health Interview Survey - Asked all respondents whether they currently
smoke - Asked workers about workplace tobacco policies
- Sample workers
- Key variables current smoking and whether they
faced by workplace ban
16- Data workplace1.dta
- Sample program workplace1.doc
- Results workplace1.log
17Description of variables in data
- . desc
- storage display value
- variable name type format label
variable label - --------------------------------------------------
---------------------- - gt -
- smoker byte 9.0g is
current smoking - worka byte 9.0g has
workplace smoking bans - age byte 9.0g age
in years - male byte 9.0g
male - black byte 9.0g
black - hispanic byte 9.0g
hispanic - incomel float 9.0g log
income - hsgrad byte 9.0g is
hs graduate - somecol byte 9.0g has
some college - college float 9.0g
- --------------------------------------------------
---------------------
18Summary statistics
- sum
- Variable Obs Mean Std. Dev.
Min Max - -------------------------------------------------
-------------------- - smoker 16258 .25163 .433963
0 1 - worka 16258 .6851396 .4644745
0 1 - age 16258 38.54742 11.96189
18 87 - male 16258 .3947595 .488814
0 1 - black 16258 .1119449 .3153083
0 1 - -------------------------------------------------
-------------------- - hispanic 16258 .0607086 .2388023
0 1 - incomel 16258 10.42097 .7624525
6.214608 11.22524 - hsgrad 16258 .3355271 .4721889
0 1 - somecol 16258 .2685447 .4432161
0 1 - college 16258 .3293763 .4700012
0 1
19Running a probit
- probit smoker age incomel male black hispanic
hsgrad somecol college worka - The first variable after probit is the discrete
outcome, the rest of the variables are the
independent variables - Includes a constant as a default
20Running a logit
- logit smoker age incomel male black hispanic
hsgrad somecol college worka - Same as probit, just change the first word
21Running linear probability
- reg smoker age incomel male black hispanic hsgrad
somecol college worka, robust - Simple regression.
- Standard errors are incorrect (heteroskedasticity)
- robust option produces standard errors with
arbitrary form of heteroskedasticity
22Probit Results
- Probit estimates
Number of obs 16258 -
LR chi2(9) 819.44 -
Prob gt chi2 0.0000 - Log likelihood -8761.7208
Pseudo R2 0.0447 - --------------------------------------------------
---------------------------- - smoker Coef. Std. Err. z
Pgtz 95 Conf. Interval - -------------------------------------------------
---------------------------- - age -.0012684 .0009316 -1.36
0.173 -.0030943 .0005574 - incomel -.092812 .0151496 -6.13
0.000 -.1225047 -.0631193 - male .0533213 .0229297 2.33
0.020 .0083799 .0982627 - black -.1060518 .034918 -3.04
0.002 -.17449 -.0376137 - hispanic -.2281468 .0475128 -4.80
0.000 -.3212701 -.1350235 - hsgrad -.1748765 .0436392 -4.01
0.000 -.2604078 -.0893453 - somecol -.363869 .0451757 -8.05
0.000 -.4524118 -.2753262 - college -.7689528 .0466418 -16.49
0.000 -.860369 -.6775366 - worka -.2093287 .0231425 -9.05
0.000 -.2546873 -.1639702 - _cons .870543 .154056 5.65
0.000 .5685989 1.172487 - --------------------------------------------------
----------------------------
23How to measure fit?
- Regression (OLS)
- minimize sum of squared errors
- Or, maximize R2
- The model is designed to maximize predictive
capacity - Not the case with Probit/Logit
- MLE models pick distribution parameters so as
best describe the data generating process - May or may not predict the outcome well
24Pseudo R2
- LLk log likelihood with all variables
- LL1 log likelihood with only a constant
- 0 gt LLk gt LL1 so LLk lt LL1
- Pseudo R2 1 - LL1/LLk
- Bounded between 0-1
- Not anything like an R2 from a regression
25Predicting Y
- Let b be the estimated value of ß
- For any candidate vector of xi , we can predict
probabilities, Pi - Pi ?(xib)
- Once you have Pi, pick a threshold value, T, so
that you predict - Yp 1 if Pi gt T
- Yp 0 if Pi T
- Then compare, fraction correctly predicted
26- Question what value to pick for T?
- Can pick .5
- Intuitive. More likely to engage in the activity
than to not engage in it - However, when the ? is small, this criteria does
a poor job of predicting Yi1 - However, when the ? is close to 1, this criteria
does a poor job of picking Yi0
27- predict probability of smoking
- predict pred_prob_smoke
- get detailed descriptive data about predicted
prob - sum pred_prob, detail
- predict binary outcome with 50 cutoff
- gen pred_smoke1pred_prob_smokegt.5
- label variable pred_smoke1 "predicted smoking,
50 cutoff" - compare actual values
- tab smoker pred_smoke1, row col cell
28- . sum pred_prob, detail
- Pr(smoker)
- --------------------------------------------------
----------- - Percentiles Smallest
- 1 .0959301 .0615221
- 5 .1155022 .0622963
- 10 .1237434 .0633929 Obs
16258 - 25 .1620851 .0733495 Sum of Wgt.
16258 - 50 .2569962 Mean
.2516653 - Largest Std. Dev.
.0960007 - 75 .3187975 .5619798
- 90 .3795704 .5655878 Variance
.0092161 - 95 .4039573 .5684112 Skewness
.1520254 - 99 .4672697 .6203823 Kurtosis
2.149247
29- Notice two things
- Sample mean of the predicted probabilities is
close to the sample mean outcome - 99 of the probabilities are less than .5
- Should predict few smokers if use a 50 cutoff
30- predicted smoking,
- is current 50 cutoff
- smoking 0 1 Total
- -------------------------------------------
- 0 12,153 14 12,167
- 99.88 0.12 100.00
- 74.93 35.90 74.84
- 74.75 0.09 74.84
- -------------------------------------------
- 1 4,066 25 4,091
- 99.39 0.61 100.00
- 25.07 64.10 25.16
- 25.01 0.15 25.16
- -------------------------------------------
- Total 16,219 39 16,258
- 99.76 0.24 100.00
- 100.00 100.00 100.00
- 99.76 0.24 100.00
31- Check on-diagonal elements.
- The last number in each 2x2 element is the
fraction in the cell - The model correctly predicts 74.75 0.15
74.90 of the obs - It only predicts a small fraction of smokers
32- Do not be amazed by the 75 percent correct
prediction - If you said everyone has a ? chance of smoking (a
case of no covariates), you would be correct
Max(?,(1-?) percent of the time
33- In this case, 25.16 smoke.
- If everyone had the same chance of smoking, we
would assign everyone Pr(y1) .2516 - We would be correct for the 1 - .2516 0.7484
people who do not smoke
34Key points about prediction
- MLE models are not designed to maximize
prediction - Should not be surprised they do not predict well
- In this case, not particularly good measures of
predictive capacity
35Translating coefficients in probitContinuous
Covariates
- Pr(yi1) Fß0 x1i ß1 x2i ß2 xki ßk
- Suppose that x1i is a continuous variable
- d Pr(yi1) /d x1i ?
- What is the change in the probability of an event
give a change in x1i?
36Marginal Effect
- d Pr(yi1) /d x1i
- ß1 fß0 x1i ß1 x2i ß2 xki ßk
- Notice two things. Marginal effect is a function
of the other parameters and the values of x.
37Translating CoefficientsDiscrete Covariates
- Pr(yi1) Fß0 x1i ß1 x2i ß2 xki ßk
- Suppose that x2i is a dummy variable (1 if yes, 0
if no) - Marginal effect makes no sense, cannot change x2i
by a little amount. It is either 1 or 0. - Redefine the variable of interest. Compare
outcomes with and without x2i
38- y1 Pr(yi1 x2i1)
- Fß0 x1iß1 ß2 x3iß3
- y0 Pr(yi1 x2i0)
- Fß0 x1iß1 x3iß3
- Marginal effect y1 y0.
- Difference in probabilities with and without x2i?
39In STATA
- Marginal effects for continuous variables, STATA
picks sample means for Xs - Change in probabilities for dichotomous outcomes,
STATA picks sample means for Xs
40STATA command for Marginal Effects
- mfx compute
- Must be after the outcome when estimates are
still active in program.
41- Marginal effects after probit
- y Pr(smoker) (predict)
- .24093439
- --------------------------------------------------
---------------------------- - variable dy/dx Std. Err. z Pgtz
95 C.I. X - -------------------------------------------------
---------------------------- - age -.0003951 .00029 -1.36 0.173
-.000964 .000174 38.5474 - incomel -.0289139 .00472 -6.13 0.000
-.03816 -.019668 10.421 - male .0166757 .0072 2.32 0.021
.002568 .030783 .39476 - black -.0320621 .01023 -3.13 0.002
-.052111 -.012013 .111945 - hispanic -.0658551 .01259 -5.23 0.000
-.090536 -.041174 .060709 - hsgrad -.053335 .01302 -4.10 0.000
-.07885 -.02782 .335527 - somecol -.1062358 .01228 -8.65 0.000
-.130308 -.082164 .268545 - college -.2149199 .01146 -18.76 0.000
-.237378 -.192462 .329376 - worka -.0668959 .00756 -8.84 0.000
-.08172 -.052072 .68514 - --------------------------------------------------
---------------------------- - () dy/dx is for discrete change of dummy
variable from 0 to 1
42Interpret results
- 10 increase in income will reduce smoking by 2.9
percentage points - 10 year increase in age will decrease smoking
rates .4 percentage points - Those with a college degree are 21.5 percentage
points less likely to smoke - Those that face a workplace smoking ban have 6.7
percentage point lower probability of smoking
43- Do not confuse percentage point and percent
differences - A 6.7 percentage point drop is 29 of the sample
mean of 24 percent. - Blacks have smoking rates that are 3.2 percentage
points lower than others, which is 13 percent of
the sample mean
44Comparing Marginal Effects
Variable LP Probit Logit
age -0.00040 -0.00048 -0.00048
incomel -0.0289 -0.0287 -0.0276
male 0.0167 0.0168 0.0172
Black -0.0321 -0.0357 -0.0342
hispanic -0.0658 -0.0706 -0.0602
hsgrad -0.0533 -0.0661 -0.0514
college -0.2149 -0.2406 -0.2121
worka -0.0669 -0.0661 -0.0658
45When will results differ?
- Normal and logit CDF look
- Similar in the mid point of the distribution
- Different in the tails
- You obtain more observations in the tails of the
distribution when - Samples sizes are large
- ? approaches 1 or 0
- These situations will produce more differences in
estimates
46Some nice properties of the Logit
- Outcome, y1 or 0
- Treatment, x1 or 0
- Other covariates, x
- Context,
- x whether a baby is born with a low weight
birth - x whether the mom smoked or not during pregnancy
47- Risk ratio
- RR Prob(y1x1)/Prob(y1x0)
- Differences in the probability of an event
when x is and is not observed - How much does smoking elevate the chance your
child will be a low weight birth
48- Let Yyx be the probability y1 or 0 given x1 or
0 - Think of the risk ratio the following way
- Y11 is the probability Y1 when X1
- Y10 is the probability Y1 when X0
- Y11 RRY10
49- Odds Ratio
- ORA/B Y11/Y01/Y10/Y00
- A Pr(Y1X1)/Pr(Y0X1)
- odds of Y occurring if you are a smoker
- B Pr(Y1X0)/Pr(Y0X0)
- odds of y happening if you are not a
smoker - What are the relative odds of Y happening if you
do or do not experience X
50- Suppose Pr(Yi 1) F(ßo ß1Xi ß2Z) and F is
the logistic function - Can show that
- OR exp(ß1) e ß1
- This number is typically reported by most
statistical packages
51- Details
- Y11 exp(ßo ß1 ß2Z) /(1 exp(ßo ß1 ß2Z) )
- Y10 exp(ßo ß2Z)/(1 exp(ßoß2Z))
- Y01 1 /(1 exp(ßo ß1 ß2Z) )
- Y00 1/(1 exp(ßoß2Z)
- Y11/Y01 exp(ßo ß1 ß2Z)
- Y10/Y00 exp(ßo ß2Z)
- ORA/B Y11/Y01/Y10/Y00
- exp(ßo ß1 ß2Z)/ exp(ßo
ß2Z) - exp(ß1)
52- Suppose Y is rare, ? close to 0
- Pr(Y0X1) and Pr(Y0X0) are both close to 1,
so they cancel - Therefore, when ? is close to 0
- Odds Ratio Risk Ratio
- Why is this nice?
53Population attributable risk
- Average outcome in the population
- ? (1-?) Y10 ? Y11 (1- ?)Y10 ?(RR)Y10
- Average outcomes are a weighted average of
outcomes for X0 and X1 - What would the average outcome be in the absence
of X (e.g., reduce smoking rates to 0) - Ya Y10
54Population Attributable Risk
- PAR
- Fraction of outcome attributed to X
- The difference between the current rate and the
rate that would exist without X, divided by the
current rate - PAR (? Ya)/?
- (RR 1)?/(1-?) RR?
55Example Maternal Smoking and Low Weight Births
- 6 births are low weight
- lt 2500 grams (
- Average birth is 3300 grams (5.5 lbs)
- Maternal smoking during pregnancy has been
identified as a key cofactor - 13 of mothers smoke
- This number was falling about 1 percentage point
per year during 1980s/90s - Doubles chance of low weight birth
56Natality detail data
- Census of all births (4 million/year)
- Annual files starting in the 60s
- Information about
- Baby (birth weight, length, date, sex,
plurality, birth injuries) - Demographics (age, race, marital, educ of mom)
- Birth (who delivered, method of delivery)
- Health of mom (smoke/drank during preg, weight
gain)
57- Smoking not available from CA or NY
- 3 million usable observations
- I pulled .5 random sample from 1995
- About 12,500 obs
- Variables birthweight (grams), smoked, married,
4-level race, 5 level education, mothers age at
birth
58- --------------------------------------------------
---------------------------- - gt -
- storage display value
- variable name type format label
variable label - --------------------------------------------------
---------------------------- - gt -
- birthw int 9.0g
birth weight in grams - smoked byte 9.0g 1
if mom smoked during -
pregnancy - age byte 9.0g
moms age at birth - married byte 9.0g 1
if married - race4 byte 9.0g
1white,2black,3asian,4other - educ5 byte 9.0g
10-8, 29-11, 312, 413-15, -
516 - visits byte 9.0g
prenatal visits - --------------------------------------------------
----------------------------
59- dummy
- variable,
- 1 1 if mom smoked
- ifBWlt2500 during pregnancy
- grams 0 1 Total
- -------------------------------------------
- 0 11,626 1,745 13,371
- 86.95 13.05 100.00
- 94.64 89.72 93.96
- 81.70 12.26 93.96
- -------------------------------------------
- 1 659 200 859
- 76.72 23.28 100.00
- 5.36 10.28 6.04
- 4.63 1.41 6.04
- -------------------------------------------
- Total 12,285 1,945 14,230
- 86.33 13.67 100.00
- 100.00 100.00 100.00
60- Notice a few things
- 13.7 of women smoke
- 6 have low weight birth
- Pr(LBW Smoke) 10.28
- Pr(LBW Smoke) 5.36
- RR
- Pr(LBW Smoke)/ Pr(LBW Smoke)
- 0.1028/0.0536 1.92
61Logit results
- Log likelihood -3136.9912
Pseudo R2 0.0330 - --------------------------------------------------
---------------------------- - lowbw Coef. Std. Err. z
Pgtz 95 Conf. Interval - -------------------------------------------------
---------------------------- - smoked .6740651 .0897869 7.51
0.000 .4980861 .8500441 - age .0080537 .006791 1.19
0.236 -.0052564 .0213638 - married -.3954044 .0882471 -4.48
0.000 -.5683654 -.2224433 - _Ieduc5_2 -.1949335 .1626502 -1.20
0.231 -.5137221 .1238551 - _Ieduc5_3 -.1925099 .1543239 -1.25
0.212 -.4949791 .1099594 - _Ieduc5_4 -.4057382 .1676759 -2.42
0.016 -.7343769 -.0770994 - _Ieduc5_5 -.3569715 .1780322 -2.01
0.045 -.7059081 -.0080349 - _Irace4_2 .7072894 .0875125 8.08
0.000 .5357681 .8788107 - _Irace4_3 .386623 .307062 1.26
0.208 -.2152075 .9884535 - _Irace4_4 .3095536 .2047899 1.51
0.131 -.0918271 .7109344 - _cons -2.755971 .2104916 -13.09
0.000 -3.168527 -2.343415 - --------------------------------------------------
----------------------------
62Odds Ratios
- Smoked
- exp(0.674) 1.96
- Smokers are twice as likely to have a low weight
birth - _Irace4_2 (Blacks)
- exp(0.707) 2.02
- Blacks are twice as likely to have a low weight
birth
63Asking for odds ratios
- Logistic y x1 x2
- In this case
- xi logistic lowbw smoked age married i.educ5
i.race4
64- Log likelihood -3136.9912
Pseudo R2 0.0330 - --------------------------------------------------
---------------------------- - lowbw Odds Ratio Std. Err. z
Pgtz 95 Conf. Interval - -------------------------------------------------
---------------------------- - smoked 1.962198 .1761796 7.51
0.000 1.645569 2.33975 - age 1.008086 .0068459 1.19
0.236 .9947574 1.021594 - married .6734077 .0594262 -4.48
0.000 .5664506 .8005604 - _Ieduc5_2 .8228894 .1338431 -1.20
0.231 .5982646 1.131852 - _Ieduc5_3 .8248862 .1272996 -1.25
0.212 .6095837 1.116233 - _Ieduc5_4 .6664847 .1117534 -2.42
0.016 .4798043 .9257979 - _Ieduc5_5 .6997924 .1245856 -2.01
0.045 .4936601 .9919973 - _Irace4_2 2.028485 .1775178 8.08
0.000 1.70876 2.408034 - _Irace4_3 1.472001 .4519957 1.26
0.208 .8063741 2.687076 - _Irace4_4 1.362817 .2790911 1.51
0.131 .9122628 2.035893 - --------------------------------------------------
----------------------------
65PAR
- PAR (RR 1)?/(1-?) RR?
- ? 0.137
- RR 1.96
- PAR 0.116
- 11.6 of low weight births attributed to maternal
smoking
66Hypothesis Testing in MLE models
- MLE are asymptotically normally distributed, one
of the properties of MLE - Therefore, standard t-tests of hypothesis will
work as long as samples are large - What large means is open to question
- What to do when samples are small table for a
moment
67Testing a linear combination of parameters
- Suppose you have a probit model
- Fß0 x1iß1 x2i ß2 x3iß3
- Test a linear combination or parameters
- Simplest example, test a subset are zero
- ß1 ß2 ß3 ß4 0
- To fix the discussion
- N observations
- K parameters
- J restrictions (count the equals signs, j4)
68Wald Test
- Based on the fact that the parameters are
distributed asymptotically normal - Probability theory review
- Suppose you have m draws from a standard normal
distribution (zi) - M z12 z22 . Zm2
- M is distributed as a Chi-square with m degrees
of freedom
69- Wald test constructs a quadratic form suggested
by the test you want to perform - This combination, because it contains squares of
the true parameters, should, if the hypothesis is
true, be distributed as a Chi square with j
degrees of freedom. - If the test statistic is large, relative to the
degrees of freedom of the test, we reject,
because there is a low probability we would have
drawn that value at random from the distribution
70Reading values from a Table
- All stats books will report the percentiles of
a chi-square - Vertical axis (degrees of freedom)
- Horizontal axis (percentiles)
- Entry is the value where percentile of the
distribution falls below
71- Example Suppose 4 restrictions
- 95 of a chi-square distribution falls below
9.488. - So there is only a 5 a number drawn at random
will exceed 9.488 - If your test statistic is below, cannot reject
null - If your test statistics is above, reject null
72Chi-square
73Wald test in STATA
- Default test in MLE models
- Easy to do. Look at program
- test hsgrad somecol college
- Does not estimate the restricted model
- Lower power than other tests, i.e., high chance
of false negative
74-2 Log likelihood test
- how to run the same tests with a -2 log like
test - estimate the unresticted model and save the
estimates - in urmodel
- probit smoker age incomel male black hispanic
- hsgrad somecol college worka
- estimates store urmodel
- estimate the restricted model. save results in
rmodel - probit smoker age incomel male black hispanic
- worka
- estimates store rmodel
- lrtest urmodel rmodel
75- I prefer -2 log likelihood test
- Estimates the restricted and unrestricted model
- Therefore, has more power than a Wald test
- In most cases, they give the same decision
(reject/not reject)