Categorical Data Analysis

About This Presentation

Title:

Categorical Data Analysis

Description:

probit models. more common in certain disciplines (economics) ... glm y A2 A3 P2, family(b n) link(probit) glm y A2 A3 P2, family(b n) link(logit) ... – PowerPoint PPT presentation

Number of Views:1050

Avg rating:3.0/5.0

Slides: 152

Provided by: danpo8

Category:

more less

Transcript and Presenter's Notes

Title: Categorical Data Analysis

1
Categorical Data Analysis

Week 2

2
Binary Response Models

binary and binomial responses
binary y assumes values of 0 or 1
binomial y is number of successes in n
trials
distributions
Bernoulli
Binomial

3
Transformational Approach

linear probability model
use grouped data (events/trials)
identity link
linear predictor
problems of prediction outside 0,1

4
The Logit Model

logit transformation
inverse logit
ensures that p is in 0,1 for all values of x
and .

5
The Logit Model

odds and odds ratios are the key to understanding
and interpreting this model
the log odds transformation is a stretching
transformation to map probabilities to the real
line

6
Odds and Probabilities
7
Probabilities and Log Odds
8
The Logit Transformation

properties of logit

linear
9
Odds, Odds Ratios, and Relative Risk

odds of success is the ratio
consider two groups with success probabilities
odds ratio (OR) is a measure of the odds of
success in group 1 relative to group 2

10
Odds Ratio
Y
0 1

2 X 2 table
OR is the cross-product ratio (compare x 1
group to x 0 group)
odds of y 1 are 4 times higher when x 1 than
when x 0

0 1
X
11
Odds Ratio

equivalent interpretation
odds of y 1 are 0.225 times higher when x 0
than when x 1
odds of y 1 are 1-0.225 .775 times lower
when x 0 than when x 1
odds of y 1 are 77.5 lower when x 0 than
when x 1

12
Log Odds Ratios

Consider the model
D is a dummy variable coded 1 if group 1 and 0
otherwise.
group 1
group 2
LOR OR

13
Relative Risk

similar to OR, but works with rates
relative risk or rate ratio (RR) is the rate in
group 1 relative to group 2
OR RR as .

14
Tutorial odds and odds ratios

consider the following data

15
Tutorial odds and odds ratios

read table

clear input educ psex f 0 0 873 0 1 1190 1 0
533 1 1 1208 end label define edlev 0 "HS or
less" 1 "Col or more" label val educ edlev label
var educ education
16
Tutorial odds and odds ratios

compute odds
verify by hand

tabodds psex educ fwf
17
Tutorial odds and odds ratios

compute odds ratios
verify by hand

tabodds psex educ fwf, or
18
Tutorial odds and odds ratios

stat facts
variances of functions
use in statistical significance tests and forming
confidence intervals
basic rule for variances of linear
transformations
g(x) a bx is a linear function of x, then
this is a trivial case of the delta method
applied to a single variable
the delta method for the variance of a nonlinear
function g(x) of a single variable is

19
Tutorial odds and odds ratios

stat facts
variances of odds and odds ratios
we can use the delta method to find the variance
in the odds and the odds ratios
from the asymptotic (large sample theory)
perspective it is best to work with log odds and
log odds ratios
the log odds ratio converges to normality at a
faster rate than the odds ratio, so statistical
tests may be more appropriate on log odds ratios
(nonlinear functions of p)

20
Tutorial odds and odds ratios

stat facts
the log odds ratio is the difference in the log
odds for two groups
groups are independent
variance of a difference is the sum of the
variances

21
Tutorial odds and odds ratios

data structures grouped or individual level
note
use frequency weights to handle grouped data
or we could expand this data by the frequency
weights resulting in individual-level data
model results from either data structures are the
same
expand the data and verify the following results

expand f
22
Tutorial odds and odds ratios

statistical modeling
logit model (glm)
logit model (logit)

glm psex educ fwf, f(b) eform
logit psex educ fwf, or
23
Tutorial odds and odds ratios

statistical modeling (1)
logit model (glm)

24
Tutorial odds and odds ratios

statistical modeling (2)
some ideas from alternative normalizations
what parameters will this model produce?
what is the interpretation of the constant

gen cons 1 glm psex cons educ fwf, nocons
f(b) eform
25
Tutorial odds and odds ratios

statistical modeling (2)

26
Tutorial odds and odds ratios

statistical modeling (3)
what parameters does this model produce?
how do you interpret them?

gen lowed educ 0 gen hied educ 1 glm
psex lowed hied fwf, nocons f(b) eform
27
Tutorial odds and odds ratios

statistical modeling (3)

are these odds ratios?
28
Tutorial prediction

fitted probabilities (after most recent model)

predict p, mu tab educ fwf, sum(p) nostandard
nofreq
29
Probit Model

inverse probit is the CDF for a standard normal
variable
link function

30
Probit Transformation
31
Interpretation

probit coefficients
interpreted as a standard normal variables (no
log odds-ratio interpretation)
scaled versions of logit coefficients
probit models
more common in certain disciplines (economics)
analogy with linear regression (normal latent
variable)
more easily extended to multivariate
distributions

32
Example Grouped Data

Swedish mortality data revisited

logit model
probit model
33
Swedish Historical Mortality Data

predictions

34
Programming

Stata generalized linear model (glm)
glm y A2 A3 P2, family(b n) link(probit)
glm y A2 A3 P2, family(b n) link(logit)
idea of glm is to make model linear in the link.
old days Iteratively Reweighted Least Squares
now Fisher scoring, Newton-Raphson
both approaches yield MLEs

35
Generalized Linear Models

applies to a broad class of models
iterative fitting (repeated updating) except for
linear model
update parameters, weights W, and predicted
values m
models differ in terms of W and m and assumptions
about the distribution of y
common distributions for y include normal,
binomial, and Poisson
common links include identity, logit, probit,
and log

36
Latent Variable Approach

example insect mortality
suppose a researcher exposes insects to dosage
levels (u) of an insecticide and observes whether
the subject lives or dies at that dosage.
the response is expected to depend on the
insects tolerance (c) to that dosage level.
the insect dies if u gt c and survives if u lt c
tolerance is not observed (survival is observed)

37
Latent Variables

u and c are continuous latent variables
examples
womens employment u is the market wage and c is
the reservation wage
migration u is the benefit of moving and c is
the cost of moving.
observed outcome y 1 or y 0 reveals the
individuals preference, which is assumed to
maximize a rational individuals utility function.

38
Latent Variables

Assume linear utility and criterion functions
over-parameterization identification problem
we can identify differences in components but not
the separate components

39
Latent Variables

constraints
Then
where F(.) is the CDF of e

40
Latent Variables and Standardization

Need to standardize the mean and variance of e
binary dependent variables lack inherent scales
magnitude of ß is only in reference to the mean
and variance of e which are unknown.
redefine e to a common standard
where a and b are two chosen constants.

41
Standardization for Logit and Probit Models

standardization implies
F() is the cdf of e
location a and scale b need to be fixed
setting
and

42
Standardization for Logit and Probit Models

distribution of e is standardized
standard normal ? probit
standard logistic ? logit
both distributions have a mean of 0
variances differ

43
Extending the Latent Variable Approach

observed y is a dichotomous (binary) 0/1 variable
continuous latent variable
linear predictor residual
observed outcome

44
Notation

conditional means of latent variables obtained
from index function
obtain probabilities from inverse link functions
logit model
probit model

45
ML

likelihood function
where if data are binary
log-likelihood function

46
Assessing Models

definitions
L null model (intercept only)
L saturated model (a parameter for each cell)
L current model
grouped data (events/trials)
deviance (likelihood ratio statistic)

47
Deviance

grouped data
if cell sizes are reasonably large deviance is
distributed as chi-square
individual-level data Lf 1 and log Lf 0
deviance is not a fit statistic

48
Deviance

deviance is like a residual sum of squares
larger values indicate poorer models
larger models have smaller deviance
deviance for the more constrained model (Model
1)
deviance for the less constrained model (Model 2)
assume that Model 1 is a constrained version of
Model 2.

49
Difference in Deviance

evaluate competing nested models using a
likelihood ratio statistic
model chi-square is a special case
SAS, Stata, R, etc. report different statistics

50
Other Fit Statistics

BIC AIC (useful for non-nested models)
basic idea of IC penalize log L for the number
of parameters (AIC/BIC) and/or the size of the
sample (BIC)
AIC s1
BIC s ½ log n (sample size)
dfm is the number of model parameters

51
Hypothesis Tests/Inference

single parameter
MLE are asymptotically normal ? Z-test
multi-parameter
likelihood ratio tests (after fitting)
Wald tests (test constraints from current model)

52
Hypothesis Tests/Inference

Wald test (tests a vector of restrictions)
a set of r parameters are all equal to 0
a set of r parameters are linearly restricted

restriction matrix
constraint vector
parameter subset
53
Interpreting Parameters

odds ratios consider the model where x is a
continuous predictor and d is a dummy variable
suppose that d denotes sex and x denotes income
and the problem concerns voting, where y is the
propensity to vote
results logit(pi) -1.92 0.012xi 0.67di

54
Interpreting Parameters

for d (dummy variable coded 1 for female) the
odds ratio is straightforward
holding income constant, womens odds of voting
are nearly twice those of men

55
Interpreting Parameters

for x (continuous variable for income in
thousands of dollars) the odds ratio is a
multiplicative effect
suppose we increase income by 1 unit (1,000)
suppose we increase income by c units (c ?
1,000

56
Interpreting Parameters

if income is increased by 10,000, this increases
the odds of voting by about 13
a note on percent change in odds
if estimate of ß gt 0 then percent increase in
odds for a unit change in x is
if estimate of ß lt 0 then percent decrease in
odds for a unit change in x is

57
Marginal Effects

marginal effect
effect of change in x on change in probability
pdf cdf
often we evaluate f(.) at the mean of x.

58
Marginal Effect for a Change in a Continuous
Variable
59
Marginal Effect of a Change in a Dummy Variable

if x is a continuous variable and z is a dummy
variable
marginal effect of change in z from 0 to 1 is the
difference

60
Example

logit models for high school graduation
odds ratios (constant is baseline odds)

61
LR Test

Model 3 vs. 2

62
Wald Test

Test equality of parental education effects

logit hsg blk hsp female nonint inc nsibs mhs
mcol fhs fcol wtest test mhsfhs test mcolfcol
cannot reject H of equal parental education
effects on HS graduation
63
Basic Estimation Commands (Stata)
estimation commands
model tests
model 0 - null model qui logit hsg est store
m0 model 1 - race, sex, family structure qui
logit hsg blk hsp female nonint est store m1
model 1a - race X family structure
interactions qui xi logit hsg blk hsp female
nonint i.noninti.blk i.noninti.hsp est store
m1a lrtest m1 m1a model 2 - SES qui xi logit
hsg blk hsp female nonint inc nsibs mhs mcol fhs
fcol est store m2 model 3 - Indiv qui xi
logit hsg blk hsp female nonint inc nsibs mhs
mcol fhs fcol wtest est store m3 lrtest m2 m3
64

Fit Statistics etc.

some 'hand' calculations with saved
results scalar ll e(ll) scalar npar
e(df_m)1 scalar nobs e(N) scalar AIC -2ll
2npar scalar BIC -2ll log(nobs)npar
scalar list AIC scalar list BIC or use
automated fitstat routine fitstat output as a
table estout1 m0 m1 m2 m3 using modF07, replace
star stfmt(9.2f 9.0f 9.0f) /// stats(ll N
df_m) eform
65
Analysis of Deviance
66
BIC and AIC (using fitstat)
67
Marginal Effects
68
Marginal Effects
69
Generate Income Quartiles
qui sum adjinc, det quartiles for income
distribution gen incQ1 adjinc lt r(p25) gen
incQ2 adjinc gt r(p25) adjinc lt r(p50) gen
incQ3 adjinc gt r(p50) adjinc lt r(p75) gen
incQ4 adjinc gt r(p75) gen incQ 1 if
incQ11 replace incQ 2 if incQ21 replace
incQ 3 if incQ31 replace incQ 4 if
incQ41 tab incQ
70
Fit Model for Each Quartile
calculate predictions
look at marginal effects of test score on
graduation by selected groups (1) model (income
quartiles) local i 1 while i' lt 5 logit
hsg blk female mhs nonint nsibs urban so wtest if
incQ i' margeff cap drop wm cap drop
bm prgen wtest, x(blk0 female0 mhs1 nonint0)
gen(wmi) from(-3) to(3) prgen wtest, x(blk0
female0 mhs1 nonint1) gen(wmn) from(-3)
to(3) label var wmip1 "white/intact" label var
wmnp1 "white/nonintact" prgen wtest, x(blk1
female0 mhs1 nonint0) gen(bmi) from(-3)
to(3) prgen wtest, x(blk1 female0 mhs1
nonint1) gen(bmn) from(-3) to(3) label var bmip1
"black/intact" label var bmnp1 "black/nonintact"
71
Graph
set scheme s2mono twoway (line wmip1 wmix, sort
xtitle("Test Score") ytitle("Pr(y1)")) ///
(line wmnp1 wmix, sort) (line bmip1 wmix,
sort) (line bmnp1 wmix, sort), ///
subtitle("Marginal Effect of Test Score on High
School Graduation" /// "Income Quartile
i'" ) saving(wtgrphi', replace) graph export
wtgrphi'.eps, as(eps) replace local i i'
1
72
Fitted Probabilities
logit hsg blk female mhs nonint inc nsibs urban
so wtest prtab nonint blk female
73
Fitted Probabilities

predicted values
evaluate fitted probabilities at the sample mean
values of x (or other fixed quantities)
averaging fitted probabilities over
subgroup-specific models will produce marginal
probabilities

74
Observed Fitted Probabilities
75
Alternative Probability Model

complementary log log (cloglog or CLL)
standard extreme-value distribution for u
cloglog model
cloglog link function

76
Extreme-Value Distribution

properties
mean of u (Eulers constant)
variance of u
difference in two independent extreme value
variables yields a logistic variable

77
CLL Transformation
78
CLL Model

no practical differences from logit and probit
models
often suited for survival data and other
applications
interpretation of coefficients
exp(ß) is a relative risk or hazard ratio not an
OR
glm binomial distribution for y with a cloglog
link
cloglog use the cloglog command directly

79
CLL and Logit Model Compared
80
Cloglog and Logit Model Compared
logit
cloglog
more agreement when modeling rare events
81
Extensions Multilevel Data

what is multilevel data?
individuals are nested in a larger context
children in families, kids in schools etc.

context 1
context 2
context 3
82
Multilevel Data

i.i.d. assumptions?
the outcomes for units in a given context could
be associated
standard model would treat all outcomes
(regardless of context) as independent
multilevel methods account for the within-cluster
dependence
a general problem with binomial responses
we assume that trials are independent
this might not be realistic
non-independence will inflate the variance
(overdispersion)

83
Multilevel Data

example (in book)
40 universities as units of analysis
for each university we observe the number of
graduates (n) and the number receiving
post-doctoral fellowships (y)
we could compute proportions (MLEs)
some proportions would be better estimates as
they would have higher precision or lower
variance
example the data y1/n1 2/5 and y2/n2 20/50
give identical estimates of p but variances of
0.048 and 0.0048 respectively
the 2nd estimate is more precise than the 1st

84
Multilevel Data

multilevel models allow for improved predictions
of individual probabilities
MLE estimate is unaltered if it is precise
MLE estimate moved toward average if it is
imprecise (shrinkage)
multilevel estimate of p would be a weighted
average of the MLE and the average over all MLEs
(weight (w) is based on the variance of each
MLE and the variance over all the MLEs)
we are generally less interested in the ps and
more interested in the model parameters and
variance components

85
Shrinkage Estimation

primitive approach
assume we have a set of estimates (MLEs)
our best estimate of the variance of each MLE is
this is the within variance (no pooling)
if this is large, then the MLE is a poor estimate
a better estimate might be the average of the
MLEs in this case (pooling the estimates)
we can average the MLEs and estimate the between
variance as

86
Shrinkage Estimation

primitive approach
we can then estimate a weight wi
a revised estimate of pi would take account of
the precision to for a precision-weighted average
precision is a function of ni
more weight is given to more precise MLEs

87
Shrinkage a primitive approach
88
Shrinkage
results from full Bayesian (multilevel) Analysis
89
Extension Multilevel Models

assumptions
within-context and between-context variation in
outcomes
individuals within the same context share the
same random error specific to that context
models are hierarchical
individuals (level-1)
contexts (level-2)

90
Multilevel Models Background

linear mixed model for continuous y
(multilevel, random coefficients, etc.)
level-1 model and level-2 sub-models
(hierarchical)

91
Multilevel Models Background

linear mixed model assumptions
level-1 and level-2 residuals

92
Multilevel Models Background

composite form

composite residual
fixed effects
cross-level interaction
random effects (level-2)
93
Multilevel Models Background

variance components

94
Multilevel Models Background

general form (linear mixed model)

variables associated with fixed coefficients
variables associated with random coefficients
95
Multilevel Models Logit Models

binomial model (random effect)
assumptions
u increases or decreases the expected response
for individual j in context i independently of x
all individuals in context i share the same value
of u
also called a random intercept model

96
Multilevel Models

a hierarchical model
z is a level-1 variable x is a level-2 variable
random intercept varies among level-2 units
note level-1 residual variance is fixed (why?)

97
Multilevel Models

a general expression
x are variables associated with fixed
coefficients
z are variables associated with random
coefficients
u is multivariate normal vector of level-2
residuals
mean of u is 0 covariance of u is

98
Multilevel Models

random effects vs. random coefficients
random effects u
random coefficients ß u
variance components
interested in level-2 variation in u
prediction
E(y) is not equal to E(yu)
model based predictions need to consider random
effects

99
Multilevel Models Generalized Linear Mixed
Models (GLMM)
Conditional Expectation
Marginal Expectation
requires numerical integration or simulation
100
Data Structure

multilevel data structure
requires a context id to identify individuals
belonging to the same context
NLSY sibling data contains a family id
(constructed by researcher)
data are unbalanced (we do not require clusters
to be the same size)
small clusters will contribute less information
to the estimation of variance components than
larger clusters
it is OK to have clusters of size 1
(i.e., an individual is a context unto
themselves)
clusters of size 1 contribute to the estimation
of fixed effects but not to the estimation of
variance components

101
Example clustered data

siblings nested in families
y is 1st premarital birth for NLSY women
select sib-ships of size gt 2
null model (random intercept)
xtlogit fpmbir, i(famid)
or
xtmelogit fpmbir famid

102
Example clustered data
random intercept xtlogit
103
Example clustered data
random intercept xtmelogit
104
Variance Component

add predictors (mostly level-2)

105
Variance Component

conditional variance in u is 2.107
proportionate reduction in error (PRE)
a 31 reduction in level-2 variance when level-2
predictors are accounted for

106
Random Effects

we can examine the distribution of random effects

107
Random Effects

we can examine the distribution of random effects

108
Random Effects Distribution

90th percentile u90 1.338
10th percentile u10 0.388
the risk for family at 90th percentile is
exp(1.338 0.388) 2.586
times higher than for a family at the 10th
percentile
even if families are compositionally identical on
covariates, we can assess the hypothetical
differential in risks

109
Growth Curve Models

growth models
individuals are level-2 units
repeated measures over time on individuals
(level-1)
models imply that logits vary across individuals
intercept (conditional average logit) varies
slope (conditional average effect of time) varies
change is usually assumed to be linear
use GLMM
complications due to dimensionality
intercept and slope may co-vary (necessitating a
more complex model) and more

110
Growth Curve Models

multilevel logit model for change over time
T is time (strictly increasing)
fixed and random coefficients (with covariates)
assume that u0 and u1 are bivariate normal

111
Multilevel Logit Models for Change

Example Log odds of employment of black men in
the U.S. 1982-1988 (NLSY)
(consider 5 years in this period)
time is coded 0, 1, 3, 4, 6
dependent variable is not-working, not-in-school
unconditional growth (no covariates except T)
conditional growth (add covariates)
note cross-level interactions implied by
composite model

112
Fitting Multilevel Model for Change

programming
Stata (unconditional growth)
Stata (conditional growth)

xtmelogit y year id year, var cov(un)
xtmelogit y year south unem unemyr inc hs id
year, var cov(un)
113
Fitting Multilevel Model for Change
114
Fitting Multilevel Logit Model for Change
115
Logits Observed, Conditional, and Marginal
the log odds of idleness decreases with time and
shows variation in level and change
116
Composite Residuals in a Growth Model

composite residual
composite residual variance
covariance of composite residual

117
Model

covariance term is 0 (from either model)
results in simplified interpretation
easier estimation via variance components
(default option)
significant variation in slopes and initial
levels
other results
log odds of idleness decrease over time (negative
slope)
other covariates except county unemployment have
significant effects on the odds of idleness
the main effects are interpreted as effects on
initial logits at time 1 or t 0 or the 1982
baseline)
interaction of time and unemployment rate
captures the effect of county unemployment rate
in 1982 on the change log odds of idleness
the positive effect implies that higher county
unemployment tends to dampen change in odds

118
IRT Models

IRT models
Item Response Theory
models account for an individual-level random
effect on a set of items (i.e., ability)
items are assumed to tap a single latent
construct (aptitude on a specific subject)
item difficulty
test items are assumed to be ordered on a
difficulty scale
easier ? harder
expected patterns emerge whereby if a more
difficult item is answered correctly the easier
items are likely to have been answered correctly

119
IRT Models

IRT models
1-parameter logistic (Rasch) model
pij individual is probability of a correct
response on the jth item
? individual is ability
b item js difficulty
properties
an individuals ability parameter is invariant
with respect to the item
the difficulty parameter is invariant with
respect to individuals ability
higher ability or lower item difficulty lead to a
higher probability of a correct response
both ability and difficulty are measured on the
same scale

120
ICC

item characteristics curve (item response curve)
depicts the probability of a correct response as
a function of an examinees ability or trait
level
curves are shifted rightward with increasing item
difficulty
assume that item 3 is more difficult than item 2
and item 2 is more difficult than item 1
probability of a correct response decreases as
the threshold ? bj is crossed, reflecting
increasing item difficulty

121
IRT Models ICC (3 Items)
slopes of item characteristics curves are equal
when ability item difficulty
122
Estimation as GLMM

specification
set up a person-item data structure
define x as a set of dummy variables
change signs on ß to reflect difficulty
fit model without intercept to estimate all item
difficulties
normalization is common

123
PL1 Estimation

Stata (data set up )

clear set memory 128m infile junk y1-y5 f using
LSAT.dat drop if junk11 junk13 expand
f drop f junk gen cons 1 collapse (sum)
wt2cons, by(y1-y5) gen id _n sort id reshape
long y, i(id) j(item)
124
PL1 Estimation

Stata (model set up )

gen i1 0 gen i2 0 gen i3 0 gen i4 0 gen
i5 0 replace i1 1 if item 1 replace i2 1
if item 2 replace i3 1 if item 3 replace
i4 1 if item 4 replace i5 1 if item
5 1PL constrain sd1 cons 1 id1_cons
1 gllamm y i1-i5, i(id) weight(wt) nocons
family(binom) cons(1) link(logit) adapt
125
PL1 Estimation

Stata (output )

126
PL1 Estimation

Stata (parameter normalization)

normalized solution 1 -- standard 1PL 2
-- coefs sum to 0 var 1 mata bALL
st_matrix("e(b)") b -bALL1,1..5 mb
mean(b') bs b-mb ("MML Estimates", "IRT
parameters", "B-A Normalization") (-b', b',
bs') end
127
PL1 Estimation

Stata (normalized solution)

128
IRT Extensions

2-parameter logistic (2PL) model

item discrimination parameters
129
IRT Extensions

2-parameter logistic (2PL) model
item discrimination parameters
reveal differences in items utility to
distinguish different ability levels among
examinees
high values denote items that are more useful in
terms of separating examinees into different
ability levels
low values denote items that are less useful in
distinguishing examinees in terms of ability
ICCs corresponding to this model can intersect as
they differ in location and slope
steeper slope of the ICC is associated with a
better discriminating item

130
IRT Extensions

2-parameter logistic (2PL) model

131
IRT Extensions

2-parameter logistic (2PL) model
Stata (estimation)

eq id i1 i2 i3 i4 i5 cons 1 id1_1i1
1 gllamm y i1-i5, i(id) weight(wt) nocons
family(binom) link(logit) frload(1) eqs(id)
cons(1) adapt matrix list e(b) normalized
solutions 1 standard 2PL) mata bALL
st_matrix("e(b)") b bALL1,1..5 c
bALL1,6..10 a -b/c ("MML Estimates-Dif",
"IRT Parameters") (b', a') ("MML Discrimination
Parameters") (c') end
132
IRT Extensions

2-parameter logistic (2PL) model
Stata (estimation)

Bock and Aitkin Normalization (p. 164
corrected) mata bALL st_matrix("e(b)") b
-bALL1,1..5 c bALL1,6..10 lc
ln(c) mb mean(b') mc mean(lc') bs
b-mb cs exp(lc-mc) ("B-A Normalization
DIFFICULTY", "B-A Normalization
DISCRIMINATION") (bs', cs') end
133
IRT 2PL (1)
134
IRT 2PL (2) Bock-Aitkin Normalization
item 3 has highest difficulty and greatest
discrimination
135
1PL and 2PL
136
1PL and 2PL
137
Binary Response Models for Event Occurrence

discrete-time event-history models
purpose
model the probability of an event occurring at
some point in time
Pr(event at t event has not yet occurred by t)
life table
events trials
observe the number of events occurring to those
who are at remain at risk as time passes
takes account of the changing composition of the
sample as time passes

138
Life Table
139
Life Table

observe
Rj number at risk in time interval j (R0 n),
where the number at risk in interval j is
adjusted over time
Dj events in time interval j (D0 0)
Wj removed from risk (censored) in time
interval j (W0 0)
(removed from risk due to other unrelated
causes)

140
Life Table

other key quantities
discrete-time hazard (event probability in
interval j)
surviving fraction (survivor function in interval
j)

141
Discrete-Time Hazard Models

statistical concepts
discrete random variable Ti (individuals event
or censoring time)
pdf of T (probability that individual i
experiences event in period j)
cdf of T (probability that individual i
experiences event in period j or earlier)
survivor function (probability that individual i
survives past period j)

142
Discrete-Time Hazard Models

statistical concepts
discrete hazard
the conditional probability of event occurrence
in interval j for individual i given that an
event has not already occurred to that individual
by interval j

143
Discrete-Time Hazard Models

equivalent expression using binary data
binary data dij 1 if individual i experiences
an event in interval j, 0 otherwise
use the sequence of binary values at each
interval to form a history of the process for
individual i up to the time the event occurs
discrete hazard

144
Discrete-Time Hazard Models

modeling (logit link)
modeling (complementary log log link)
non-proportional effects

145
Data Structure

person-level data? person-period form

146
Data Structure

binary sequences

147
Estimation

contributions to likelihood
contribution to log L for individual with event
in period j
contribution to log L for individual censored in
period j
combine

148
Example

dropping out of Ph.D. programs (large US
university)
data 6,964 individual histories spanning 20
years
dropout cannot be distinguished from other types
of leaving (transfer to other program etc.)
model the logit hazard of leaving the
originally-entered program as a function of the
following
time in program (the time-dependent) baseline
hazard)
female and percent female in program
race/ethnicity (black, Hispanic, Asian)
marital status
GRE score
also add a program-specific random effect
(multilevel)

149
Example
150
Example
151
Example
clear set memory 512m infile CID devnt I1-I5
female pctfem black hisp asian married gre using
DT28432.dat logit devnt I1-I5, nocons or est
store m1 logit devnt I1-I5 female pctfem, nocons
or est store m2 logit devnt I1-I5 female pctfem
black hisp asian , nocons or est store m3 logit
devnt I1-I5 female pctfem black hisp asian
married, nocons or est store m4 logit devnt
I1-I5 female pctfem black hisp asian married gre
, nocons or

Write a Comment

User Comments (0)