Title: Econometric Analysis of Panel Data
1Econometric Analysis of Panel Data
- William Greene
- Department of Economics
- Stern School of Business
2Econometric Analysis of Panel Data
- 16. Nonlinear Effects Models and Models for
Binary Choice
3Panel Data and Binary Choice Models
- Uit ? ?xit ?it Person i
specific effect - Fixed effects using dummy variables
- Uit ?i ?xit ?it
- Random effects using omitted heterogeneity
- Uit ? ?xit (?it vi)
- Same outcome mechanism Yit Uit gt 0
- Effects are not removed by differencing the data.
Need a direct estimation approach.
4Application Doctor Visits
German Health Care Usage Data, 7,293 Individuals,
Varying Numbers of PeriodsVariables in the file
areData downloaded from Journal of Applied
Econometrics Archive. This is an unbalanced panel
with 7,293 individuals. They can be used for
regression, count models, binary choice, ordered
choice, and bivariate binary choice. Â This is a
large data set. Â There are altogether 27,326
observations. Â The number of observations ranges
from 1 to 7. Â (Frequencies are 11525, 22158,
3825, 4926, 51051, 61000, 7987). Note, the
variable NUMOBS below tells how many observations
there are for each person. This variable is
repeated in each row of the data for the person.Â
(Downlo0aded from the JAE Archive) DOCTOR
1(Number of doctor visits gt 0)HHNINC Â
household nominal monthly net income in German
marks / 10000. (4
observations with income0 were dropped)HHKIDS
children under age 16 in the household 1
otherwise 0EDUC Â years of schooling AGE
age in years MARRIED marital status EDUC
years of education
5Application Innovations
6Pooled Estimation
7The Panel Probit Model
8FIML
See Greene, W., Convenient Estimators for the
Panel Probit Model Further Results, Empirical
Economics, 29, 1, Jan. 2004, pp. 21-48.
9GMM
10GMM Estimation-1
11GMM Estimation-2
12Unobserved Heterogeneity
13Fixed and Random Effects Models
- Random Effects
- Inconsistent if correlated with X
- Small number of parameters
- Easier to compute
- Fixed Effects
- Robust to both specifications? No. Actually
always inconsistent. - Inconvenient to compute (many parameters)
- Incidental parameters problem
- Computation available estimators
- Not possible to test RE vs. FE based on MLE
(because MLE/FE is inconsistent.)
14Random Effects
- Uit ? ?xit (?it ?v vi)
- Joint probability for individual i vi
- Unobserved component vi must be eliminated
- Maximize wrt ?, ? and ?v
- How to do the integration?
- Analytic integration quadrature most familiar
software - Simulation
15Quadrature Butler and Moffitt
16Estimation by Simulation
is the sum of the logs of EPr(y1,y2,vi). Can
be estimated by sampling vi and averaging. (Use
random numbers.)
17Random Effects is Equivalent to a Random Constant
Term
- Uit ? ?xit (?it ?v vi)
- (? ?? vi) ?xit ?it
- ?i ?xit ?it
- ?i is random with mean ? and variance
- View the simulation as sampling over ?i
Why not make all the coefficients random?
18No Effects Pooled Model
---------------------------------------------
Binomial Probit Model
Number of observations 27322
Log likelihood function -17408.37
Restricted log likelihood -18016.64
Chi squared 1216.548
Degrees of freedom 7
ProbChiSqd gt value .0000000
---------------------------------------------
----------------------------------------------
-------------------- Variable Coefficient
Standard Error b/St.Er.PZgtz Mean of
X -------------------------------------------
----------------------- Index
function for probability Constant
-.03345234 .06125319 -.546 .5850
HHNINC -.08611064 .04766266 -1.807
.0708 .35213516 HHKIDS -.15550242
.01834218 -8.478 .0000 .40271576
EDUC -.01433684 .00357801 -4.007
.0001 11.3201838 MARRIED .07154569
.02064966 3.465 .0005 .75869263 AGE
.01113647 .00081150 13.723
.0000 43.5271942 FEMALE .32490404
.01727948 18.803 .0000 .47880829
WORKING -.09339954 .01938730 -4.818
.0000 .67714662
19Butler and Moffitt
---------------------------------------------
Random Effects Binary Probit Model
Log likelihood function -16139.52
Number of parameters 9
Restricted log likelihood -17408.37
Chi squared 2537.696
Degrees of freedom 1
ProbChiSqd gt value .0000000
Unbalanced panel has 7289 individuals.
---------------------------------------------
----------------------------------------------
-------------------- Variable Coefficient
Standard Error b/St.Er.PZgtz Mean of
X -------------------------------------------
----------------------- Constant
-.23473434 .10082007 -2.328 .0199
HHNINC .03021003 .06710041 .450
.6526 .35213516 HHKIDS -.16974699
.02676266 -6.343 .0000 .40271576
EDUC -.01700231 .00629522 -2.701
.0069 11.3201838 MARRIED .03004334
.03099063 .969 .3323 .75869263 AGE
.01803997 .00130433 13.831
.0000 43.5271942 FEMALE .43851164
.03097895 14.155 .0000 .47880829
WORKING -.09188866 .02726688 -3.370
.0008 .67714662 Rho .42933127
.01022112 42.004 .0000 RHO is sv2 /(1
sv2)
20Simulation
---------------------------------------------
Random Coefficients Probit Model
Log likelihood function -16154.13
Number of parameters 9
Restricted log likelihood -17408.37
Chi squared 2508.469
Degrees of freedom 1
ProbChiSqd gt value .0000000
Unbalanced panel has 7289 individuals.
PROBIT (normal) probability model
Simulation based on 20 Halton draws
---------------------------------------------
----------------------------------------------
-------------------- Variable Coefficient
Standard Error b/St.Er.PZgtz Mean of
X -------------------------------------------
----------------------- Nonrandom
parameters HHNINC .03584846
.03558953 1.007 .3138 .35213516 HHKIDS
-.16946521 .01364814 -12.417
.0000 .40271576 EDUC -.01676742
.00272734 -6.148 .0000 11.3201838
MARRIED .02965951 .01535582 1.931
.0534 .75869263 AGE .01765054
.00061319 28.785 .0000 43.5271942
FEMALE .43479769 .01295456 33.563
.0000 .47880829 WORKING -.09023285
.01416939 -6.368 .0000 .67714662
Means for random parameters Constant
-.22761867 .04667520 -4.877 .0000
Scale parameters for dists. of random
parameters Constant .87528074
.00790787 110.685 .0000 RHO .40456
21Fixed Effects
- Dummy variable coefficients
- Uit ?i ?xit ?it
- Can be done by brute force for 10,000s of
individuals - F(.) appropriate probability for the observed
outcome - Compute ? and ?i for i1,,N (may be large)
- Group mean deviations does not work here. This
must be done the hard way. (Infeasible?
Generally viewed as such.) - See Estimating Econometric Models with Fixed
Effects at www.stern.nyu.edu/wgreene/fixedeffect
s.doc
22Estimating Models with Fixed Individual Effects
- Additive Effects
- Log Likelihood Function
- Approach
- Conditional estimation based on sufficient
statistics - Unconditional, brute force with all dummy
variables
23Application Probit in GSOEP Datayit 1Health
Satisfaction gt 6
24Conditional Estimation
- Principle f(yi1,yi2, some statistic) is free
of the fixed effects for some models. - Maximize the conditional log likelihood, given
the statistic.
25Example Two Period Binary Logit
26Conditional Logit Model General
27Binary Probabilities, cont.
- Estimate ? by maximizing conditional logL or some
other means - Estimate ?i by using the known ? in the FOC for
the unconditional logL. E.g., for the logit
model - Solve for the N constants, one at a time treating
? as known. - No solution when yit sums to 0 or Ti
- Iterating back and forth does not solve the
overall problem.
28Logit Constant Terms
29Estimating Partial Effects
- The fixed effects logit estimator of ?
immediately gives us the effect of each element
of xi on the log-odds ratio Unfortunately, we
cannot estimate the partial effects unless we
plug in a value for ai. Because the distribution
of ai is unrestricted in particular, Eai is
not necessarily zero it is hard to know what to
plug in for ai. In addition, we cannot estimate
average partial effects, as doing so would
require finding E?(xit ? ai), a task that
apparently requires specifying a distribution for
ai.
30Average Partial Effects
31Conditional Estimation
- Other Distributions?
- Poisson the leading nonbinary case
- Loglinear Exponential we looked at this when
we derived a concentrated log likelihood function - Almost no others
- Estimating constants is still a problem if
marginal effects or probabilities are desired
32Unconditional Estimation
- Maximize the whole log likelihood
- Difficult!
- Possibly many (thousands) of parameters.
- No way to condition them out of the likelihood
- Feasible Special structure of logL enables full
estimation.See http//www.stern.nyu.edu/wgreene/f
ixedeffects.doc - The record 150,000 dummy variable coefficients
in an education model (2006). - The estimation issue is the incidental parameters
problem, not the practical computation.
33Conditional vs. UnconditionalDep. Var. Healthy
Note, this estimator is not consistent
Incidental Parameters Problem
34Escaping the FE Assumptions
35Modeling a Binary Outcome
- Did firm i produce a product or process
innovation in year t ? yit 1Yes/0No - Observed N1270 firms for T5 years, 1984-1988
- Observed covariates xit Industry, competitive
pressures, size, productivity, etc. - How to model?
- Binary outcome
- Correlation across time
- Heterogeneity across firms
36Application
37(No Transcript)
38Estimates of a Fixed Effects Probit Model
39Pooled, Fixed Effects and Random Effects Probit
---------------------------------------------
Probit Regression Start Values for IP
--------------------------------------------
---------------------- Variable Coefficient
Standard Error b/St.Er.PZgtz Mean of
X -------------------------------------------
----------------------- EMPLP
.00013619 .154784D-04 8.798 .0000
580.944724 LOGSALES .13668535
.01792267 7.626 .0000 10.5400961 IMUM
.89572747 .14180988 6.316
.0000 .25275054 FDIUM 3.24193536
.38236984 8.479 .0000 .04580618
Constant -1.63354524 .20737277 -7.877
.0000 -----------------------------------------
---- Random Effects Binary Probit Model
------------------------------------------
------------------------ EMPLP
.00017616 .117150D-04 15.037 .0000
580.944724 LOGSALES .21174534
.04309101 4.914 .0000 10.5400961 IMUM
1.41657383 .34121909 4.152
.0000 .25275054 FDIUM 4.41817066
.83712165 5.278 .0000 .04580618
Constant -2.51015928 .49459030 -5.075
.0000 Rho .58588783 .01864491
31.423 .0000 --------------------------------
------------- FIXED EFFECTS Probit Model
----------------------------------
-------------------------------- EMPLP
.121081D-05 .00014700 .008 .9934
419.786630 LOGSALES -.53108315
.34473601 -1.541 .1234 10.5368540 IMUM
4.26652343 2.87418573 1.484
.1377 .25359436 FDIUM -7.34808205
3.31155361 -2.219 .0265 .04444097
40Fixed Effects
- Advantages
- Allows correlation of effect and regressors
- Fairly straightforward to estimate
- Simple to interpret
- Disadvantages
- Not necessarily simple to estimate if very large
samples (Stata just creates the thousands of
dummy variables) - The incidental parameters problem Small T bias.
41Incidental Parameters Problems Conventional
Wisdom
- General Biased in samples with fixed T except
in special cases such as linear or Poisson
regression - Specific Upward bias (experience with probit
and logit) in estimators of ?
42What We KNOW - Analytic
- Newey and Hahn MLE converges in probability to a
vector of constants. (Variance diminishes with
increase in N). - Abrevaya and Hsiao Logit estimator converges to
2? when T 2. - Han, Schmidt, Greene Probit estimator converges
to 2? when T 2.
43What We THINK We Know Monte Carlo
- Heckman
- Bias in probit estimator is small if T ? 8
- Bias in probit estimator is toward 0 in some
cases - Katz (et al numerous others), Greene
- Bias in probit and logit estimators is large
- Upward bias persists even as T ? 20
44Heckmans Monte Carlo Study
45Some Familiar Territory A Monte Carlo Study of
the FE Estimator Probit vs. Logit(Greene, The
Econometrics Journal, 7, 2004, pp. 98-119)
Estimates of Coefficients and Marginal Effects at
the Implied Data Means
Results are scaled so the desired quantity being
estimated (?, ?, marginal effects) all equal 1.0
in the population.
46A Monte Carlo Study of the FE Probit Estimator
Percentage Biases in Estimates of Coefficients
and Marginal Effects at the Implied Data Means
47Fixed Effects Models
- Incidental parameters problem if T lt 10 (roughly)
- Inconvenience of computation
- Appealing specification
- Alternative semiparametric estimators?
- Theory not well developed for T gt 2
- Not informative for anything but slopes (e.g.,
predictions and marginal effects) - Ignoring the heterogeneity definitely produces an
inconsistent estimator (even with cluster
correction!) - A Hobsons choice
- Dynamics make it worse
- Ongoing research