Title: Categorical Data Analysis
1Categorical Data Analysis
2Binary Response Models
- binary and binomial responses
- binary y assumes values of 0 or 1
- binomial y is number of successes in n
trials - distributions
- Bernoulli
- Binomial
3Transformational Approach
- linear probability model
- use grouped data (events/trials)
- identity link
- linear predictor
- problems of prediction outside 0,1
-
4The Logit Model
- logit transformation
- inverse logit
- ensures that p is in 0,1 for all values of x
and .
5The Logit Model
- odds and odds ratios are the key to understanding
and interpreting this model - the log odds transformation is a stretching
transformation to map probabilities to the real
line
6Odds and Probabilities
7Probabilities and Log Odds
8The Logit Transformation
linear
9Odds, Odds Ratios, and Relative Risk
- odds of success is the ratio
- consider two groups with success probabilities
- odds ratio (OR) is a measure of the odds of
success in group 1 relative to group 2 -
10Odds Ratio
Y
0 1
- 2 X 2 table
- OR is the cross-product ratio (compare x 1
group to x 0 group) - odds of y 1 are 4 times higher when x 1 than
when x 0
0 1
X
11Odds Ratio
- equivalent interpretation
- odds of y 1 are 0.225 times higher when x 0
than when x 1 - odds of y 1 are 1-0.225 .775 times lower
when x 0 than when x 1 - odds of y 1 are 77.5 lower when x 0 than
when x 1
12Log Odds Ratios
- Consider the model
- D is a dummy variable coded 1 if group 1 and 0
otherwise. - group 1
- group 2
- LOR OR
13Relative Risk
- similar to OR, but works with rates
- relative risk or rate ratio (RR) is the rate in
group 1 relative to group 2 - OR RR as .
14Tutorial odds and odds ratios
- consider the following data
15Tutorial odds and odds ratios
clear input educ psex f 0 0 873 0 1 1190 1 0
533 1 1 1208 end label define edlev 0 "HS or
less" 1 "Col or more" label val educ edlev label
var educ education
16Tutorial odds and odds ratios
- compute odds
- verify by hand
tabodds psex educ fwf
17Tutorial odds and odds ratios
- compute odds ratios
- verify by hand
tabodds psex educ fwf, or
18Tutorial odds and odds ratios
- stat facts
- variances of functions
- use in statistical significance tests and forming
confidence intervals - basic rule for variances of linear
transformations - g(x) a bx is a linear function of x, then
- this is a trivial case of the delta method
applied to a single variable - the delta method for the variance of a nonlinear
function g(x) of a single variable is
19Tutorial odds and odds ratios
- stat facts
- variances of odds and odds ratios
- we can use the delta method to find the variance
in the odds and the odds ratios - from the asymptotic (large sample theory)
perspective it is best to work with log odds and
log odds ratios - the log odds ratio converges to normality at a
faster rate than the odds ratio, so statistical
tests may be more appropriate on log odds ratios
(nonlinear functions of p) -
20Tutorial odds and odds ratios
- stat facts
- the log odds ratio is the difference in the log
odds for two groups - groups are independent
- variance of a difference is the sum of the
variances -
21Tutorial odds and odds ratios
- data structures grouped or individual level
- note
- use frequency weights to handle grouped data
- or we could expand this data by the frequency
weights resulting in individual-level data - model results from either data structures are the
same - expand the data and verify the following results
expand f
22Tutorial odds and odds ratios
- statistical modeling
- logit model (glm)
- logit model (logit)
glm psex educ fwf, f(b) eform
logit psex educ fwf, or
23Tutorial odds and odds ratios
- statistical modeling (1)
- logit model (glm)
24Tutorial odds and odds ratios
- statistical modeling (2)
- some ideas from alternative normalizations
- what parameters will this model produce?
- what is the interpretation of the constant
gen cons 1 glm psex cons educ fwf, nocons
f(b) eform
25Tutorial odds and odds ratios
26Tutorial odds and odds ratios
- statistical modeling (3)
- what parameters does this model produce?
- how do you interpret them?
gen lowed educ 0 gen hied educ 1 glm
psex lowed hied fwf, nocons f(b) eform
27Tutorial odds and odds ratios
are these odds ratios?
28Tutorial prediction
- fitted probabilities (after most recent model)
predict p, mu tab educ fwf, sum(p) nostandard
nofreq
29Probit Model
- inverse probit is the CDF for a standard normal
variable - link function
30Probit Transformation
31Interpretation
- probit coefficients
- interpreted as a standard normal variables (no
log odds-ratio interpretation) - scaled versions of logit coefficients
- probit models
- more common in certain disciplines (economics)
- analogy with linear regression (normal latent
variable) - more easily extended to multivariate
distributions
32Example Grouped Data
- Swedish mortality data revisited
logit model
probit model
33Swedish Historical Mortality Data
34Programming
- Stata generalized linear model (glm)
- glm y A2 A3 P2, family(b n) link(probit)
- glm y A2 A3 P2, family(b n) link(logit)
- idea of glm is to make model linear in the link.
- old days Iteratively Reweighted Least Squares
- now Fisher scoring, Newton-Raphson
- both approaches yield MLEs
35Generalized Linear Models
- applies to a broad class of models
- iterative fitting (repeated updating) except for
linear model - update parameters, weights W, and predicted
values m - models differ in terms of W and m and assumptions
about the distribution of y - common distributions for y include normal,
binomial, and Poisson - common links include identity, logit, probit,
and log -
36Latent Variable Approach
- example insect mortality
- suppose a researcher exposes insects to dosage
levels (u) of an insecticide and observes whether
the subject lives or dies at that dosage. - the response is expected to depend on the
insects tolerance (c) to that dosage level. - the insect dies if u gt c and survives if u lt c
- tolerance is not observed (survival is observed)
37Latent Variables
- u and c are continuous latent variables
- examples
- womens employment u is the market wage and c is
the reservation wage - migration u is the benefit of moving and c is
the cost of moving. - observed outcome y 1 or y 0 reveals the
individuals preference, which is assumed to
maximize a rational individuals utility function.
38Latent Variables
- Assume linear utility and criterion functions
- over-parameterization identification problem
- we can identify differences in components but not
the separate components
39Latent Variables
- constraints
- Then
- where F(.) is the CDF of e
40Latent Variables and Standardization
- Need to standardize the mean and variance of e
- binary dependent variables lack inherent scales
- magnitude of ß is only in reference to the mean
and variance of e which are unknown. - redefine e to a common standard
- where a and b are two chosen constants.
41Standardization for Logit and Probit Models
- standardization implies
- F() is the cdf of e
- location a and scale b need to be fixed
- setting
- and
-
-
42Standardization for Logit and Probit Models
- distribution of e is standardized
- standard normal ? probit
- standard logistic ? logit
- both distributions have a mean of 0
- variances differ
-
43Extending the Latent Variable Approach
- observed y is a dichotomous (binary) 0/1 variable
- continuous latent variable
- linear predictor residual
- observed outcome
44Notation
- conditional means of latent variables obtained
from index function - obtain probabilities from inverse link functions
- logit model
- probit model
45ML
- likelihood function
- where if data are binary
- log-likelihood function
46Assessing Models
- definitions
- L null model (intercept only)
- L saturated model (a parameter for each cell)
- L current model
- grouped data (events/trials)
- deviance (likelihood ratio statistic)
-
47Deviance
- grouped data
- if cell sizes are reasonably large deviance is
distributed as chi-square - individual-level data Lf 1 and log Lf 0
- deviance is not a fit statistic
48Deviance
- deviance is like a residual sum of squares
- larger values indicate poorer models
- larger models have smaller deviance
- deviance for the more constrained model (Model
1) - deviance for the less constrained model (Model 2)
- assume that Model 1 is a constrained version of
Model 2.
49Difference in Deviance
- evaluate competing nested models using a
likelihood ratio statistic - model chi-square is a special case
- SAS, Stata, R, etc. report different statistics
50Other Fit Statistics
- BIC AIC (useful for non-nested models)
- basic idea of IC penalize log L for the number
of parameters (AIC/BIC) and/or the size of the
sample (BIC) - AIC s1
- BIC s ½ log n (sample size)
- dfm is the number of model parameters
51Hypothesis Tests/Inference
- single parameter
- MLE are asymptotically normal ? Z-test
- multi-parameter
- likelihood ratio tests (after fitting)
- Wald tests (test constraints from current model)
52Hypothesis Tests/Inference
- Wald test (tests a vector of restrictions)
- a set of r parameters are all equal to 0
- a set of r parameters are linearly restricted
restriction matrix
constraint vector
parameter subset
53Interpreting Parameters
- odds ratios consider the model where x is a
continuous predictor and d is a dummy variable - suppose that d denotes sex and x denotes income
and the problem concerns voting, where y is the
propensity to vote - results logit(pi) -1.92 0.012xi 0.67di
54Interpreting Parameters
- for d (dummy variable coded 1 for female) the
odds ratio is straightforward - holding income constant, womens odds of voting
are nearly twice those of men
55Interpreting Parameters
- for x (continuous variable for income in
thousands of dollars) the odds ratio is a
multiplicative effect - suppose we increase income by 1 unit (1,000)
- suppose we increase income by c units (c ?
1,000
56Interpreting Parameters
- if income is increased by 10,000, this increases
the odds of voting by about 13 - a note on percent change in odds
- if estimate of ß gt 0 then percent increase in
odds for a unit change in x is - if estimate of ß lt 0 then percent decrease in
odds for a unit change in x is
57Marginal Effects
- marginal effect
- effect of change in x on change in probability
- pdf cdf
- often we evaluate f(.) at the mean of x.
58Marginal Effect for a Change in a Continuous
Variable
59Marginal Effect of a Change in a Dummy Variable
- if x is a continuous variable and z is a dummy
variable - marginal effect of change in z from 0 to 1 is the
difference
60Example
- logit models for high school graduation
- odds ratios (constant is baseline odds)
61LR Test
62Wald Test
- Test equality of parental education effects
logit hsg blk hsp female nonint inc nsibs mhs
mcol fhs fcol wtest test mhsfhs test mcolfcol
cannot reject H of equal parental education
effects on HS graduation
63Basic Estimation Commands (Stata)
estimation commands
model tests
model 0 - null model qui logit hsg est store
m0 model 1 - race, sex, family structure qui
logit hsg blk hsp female nonint est store m1
model 1a - race X family structure
interactions qui xi logit hsg blk hsp female
nonint i.noninti.blk i.noninti.hsp est store
m1a lrtest m1 m1a model 2 - SES qui xi logit
hsg blk hsp female nonint inc nsibs mhs mcol fhs
fcol est store m2 model 3 - Indiv qui xi
logit hsg blk hsp female nonint inc nsibs mhs
mcol fhs fcol wtest est store m3 lrtest m2 m3
64 some 'hand' calculations with saved
results scalar ll e(ll) scalar npar
e(df_m)1 scalar nobs e(N) scalar AIC -2ll
2npar scalar BIC -2ll log(nobs)npar
scalar list AIC scalar list BIC or use
automated fitstat routine fitstat output as a
table estout1 m0 m1 m2 m3 using modF07, replace
star stfmt(9.2f 9.0f 9.0f) /// stats(ll N
df_m) eform
65Analysis of Deviance
66BIC and AIC (using fitstat)
67Marginal Effects
68Marginal Effects
69Generate Income Quartiles
qui sum adjinc, det quartiles for income
distribution gen incQ1 adjinc lt r(p25) gen
incQ2 adjinc gt r(p25) adjinc lt r(p50) gen
incQ3 adjinc gt r(p50) adjinc lt r(p75) gen
incQ4 adjinc gt r(p75) gen incQ 1 if
incQ11 replace incQ 2 if incQ21 replace
incQ 3 if incQ31 replace incQ 4 if
incQ41 tab incQ
70Fit Model for Each Quartile
calculate predictions
look at marginal effects of test score on
graduation by selected groups (1) model (income
quartiles) local i 1 while i' lt 5 logit
hsg blk female mhs nonint nsibs urban so wtest if
incQ i' margeff cap drop wm cap drop
bm prgen wtest, x(blk0 female0 mhs1 nonint0)
gen(wmi) from(-3) to(3) prgen wtest, x(blk0
female0 mhs1 nonint1) gen(wmn) from(-3)
to(3) label var wmip1 "white/intact" label var
wmnp1 "white/nonintact" prgen wtest, x(blk1
female0 mhs1 nonint0) gen(bmi) from(-3)
to(3) prgen wtest, x(blk1 female0 mhs1
nonint1) gen(bmn) from(-3) to(3) label var bmip1
"black/intact" label var bmnp1 "black/nonintact"
71Graph
set scheme s2mono twoway (line wmip1 wmix, sort
xtitle("Test Score") ytitle("Pr(y1)")) ///
(line wmnp1 wmix, sort) (line bmip1 wmix,
sort) (line bmnp1 wmix, sort), ///
subtitle("Marginal Effect of Test Score on High
School Graduation" /// "Income Quartile
i'" ) saving(wtgrphi', replace) graph export
wtgrphi'.eps, as(eps) replace local i i'
1
72Fitted Probabilities
logit hsg blk female mhs nonint inc nsibs urban
so wtest prtab nonint blk female
73Fitted Probabilities
- predicted values
- evaluate fitted probabilities at the sample mean
values of x (or other fixed quantities) - averaging fitted probabilities over
subgroup-specific models will produce marginal
probabilities
74Observed Fitted Probabilities
75Alternative Probability Model
- complementary log log (cloglog or CLL)
- standard extreme-value distribution for u
- cloglog model
- cloglog link function
76Extreme-Value Distribution
- properties
- mean of u (Eulers constant)
- variance of u
- difference in two independent extreme value
variables yields a logistic variable
77CLL Transformation
78CLL Model
- no practical differences from logit and probit
models - often suited for survival data and other
applications - interpretation of coefficients
- exp(ß) is a relative risk or hazard ratio not an
OR - glm binomial distribution for y with a cloglog
link - cloglog use the cloglog command directly
79CLL and Logit Model Compared
80Cloglog and Logit Model Compared
logit
cloglog
more agreement when modeling rare events
81Extensions Multilevel Data
- what is multilevel data?
- individuals are nested in a larger context
- children in families, kids in schools etc.
context 1
context 2
context 3
82Multilevel Data
- i.i.d. assumptions?
- the outcomes for units in a given context could
be associated - standard model would treat all outcomes
(regardless of context) as independent - multilevel methods account for the within-cluster
dependence - a general problem with binomial responses
- we assume that trials are independent
- this might not be realistic
- non-independence will inflate the variance
(overdispersion)
83Multilevel Data
- example (in book)
- 40 universities as units of analysis
- for each university we observe the number of
graduates (n) and the number receiving
post-doctoral fellowships (y) - we could compute proportions (MLEs)
- some proportions would be better estimates as
they would have higher precision or lower
variance - example the data y1/n1 2/5 and y2/n2 20/50
give identical estimates of p but variances of
0.048 and 0.0048 respectively - the 2nd estimate is more precise than the 1st
-
84Multilevel Data
- multilevel models allow for improved predictions
of individual probabilities - MLE estimate is unaltered if it is precise
- MLE estimate moved toward average if it is
imprecise (shrinkage) - multilevel estimate of p would be a weighted
average of the MLE and the average over all MLEs
(weight (w) is based on the variance of each
MLE and the variance over all the MLEs) - we are generally less interested in the ps and
more interested in the model parameters and
variance components
85Shrinkage Estimation
- primitive approach
- assume we have a set of estimates (MLEs)
- our best estimate of the variance of each MLE is
- this is the within variance (no pooling)
- if this is large, then the MLE is a poor estimate
- a better estimate might be the average of the
MLEs in this case (pooling the estimates) - we can average the MLEs and estimate the between
variance as
86Shrinkage Estimation
- primitive approach
- we can then estimate a weight wi
- a revised estimate of pi would take account of
the precision to for a precision-weighted average - precision is a function of ni
- more weight is given to more precise MLEs
87Shrinkage a primitive approach
88Shrinkage
results from full Bayesian (multilevel) Analysis
89Extension Multilevel Models
- assumptions
- within-context and between-context variation in
outcomes - individuals within the same context share the
same random error specific to that context - models are hierarchical
- individuals (level-1)
- contexts (level-2)
90Multilevel Models Background
- linear mixed model for continuous y
- (multilevel, random coefficients, etc.)
- level-1 model and level-2 sub-models
(hierarchical)
91Multilevel Models Background
- linear mixed model assumptions
- level-1 and level-2 residuals
92Multilevel Models Background
composite residual
fixed effects
cross-level interaction
random effects (level-2)
93Multilevel Models Background
94Multilevel Models Background
- general form (linear mixed model)
variables associated with fixed coefficients
variables associated with random coefficients
95Multilevel Models Logit Models
- binomial model (random effect)
- assumptions
- u increases or decreases the expected response
for individual j in context i independently of x - all individuals in context i share the same value
of u - also called a random intercept model
-
96Multilevel Models
- a hierarchical model
- z is a level-1 variable x is a level-2 variable
- random intercept varies among level-2 units
- note level-1 residual variance is fixed (why?)
97Multilevel Models
- a general expression
- x are variables associated with fixed
coefficients - z are variables associated with random
coefficients - u is multivariate normal vector of level-2
residuals - mean of u is 0 covariance of u is
98Multilevel Models
- random effects vs. random coefficients
- random effects u
- random coefficients ß u
- variance components
- interested in level-2 variation in u
- prediction
- E(y) is not equal to E(yu)
- model based predictions need to consider random
effects
99Multilevel Models Generalized Linear Mixed
Models (GLMM)
Conditional Expectation
Marginal Expectation
requires numerical integration or simulation
100Data Structure
- multilevel data structure
- requires a context id to identify individuals
belonging to the same context - NLSY sibling data contains a family id
(constructed by researcher) - data are unbalanced (we do not require clusters
to be the same size) - small clusters will contribute less information
to the estimation of variance components than
larger clusters - it is OK to have clusters of size 1
- (i.e., an individual is a context unto
themselves) - clusters of size 1 contribute to the estimation
of fixed effects but not to the estimation of
variance components
101Example clustered data
- siblings nested in families
- y is 1st premarital birth for NLSY women
- select sib-ships of size gt 2
- null model (random intercept)
- xtlogit fpmbir, i(famid)
- or
- xtmelogit fpmbir famid
102Example clustered data
random intercept xtlogit
103Example clustered data
random intercept xtmelogit
104Variance Component
- add predictors (mostly level-2)
105Variance Component
- conditional variance in u is 2.107
- proportionate reduction in error (PRE)
- a 31 reduction in level-2 variance when level-2
predictors are accounted for
106Random Effects
- we can examine the distribution of random effects
107Random Effects
- we can examine the distribution of random effects
108Random Effects Distribution
- 90th percentile u90 1.338
- 10th percentile u10 0.388
- the risk for family at 90th percentile is
- exp(1.338 0.388) 2.586
- times higher than for a family at the 10th
percentile - even if families are compositionally identical on
covariates, we can assess the hypothetical
differential in risks
109Growth Curve Models
- growth models
- individuals are level-2 units
- repeated measures over time on individuals
(level-1) - models imply that logits vary across individuals
- intercept (conditional average logit) varies
- slope (conditional average effect of time) varies
- change is usually assumed to be linear
- use GLMM
- complications due to dimensionality
- intercept and slope may co-vary (necessitating a
more complex model) and more
110Growth Curve Models
- multilevel logit model for change over time
- T is time (strictly increasing)
- fixed and random coefficients (with covariates)
- assume that u0 and u1 are bivariate normal
111Multilevel Logit Models for Change
- Example Log odds of employment of black men in
the U.S. 1982-1988 (NLSY) - (consider 5 years in this period)
- time is coded 0, 1, 3, 4, 6
- dependent variable is not-working, not-in-school
- unconditional growth (no covariates except T)
- conditional growth (add covariates)
- note cross-level interactions implied by
composite model
112Fitting Multilevel Model for Change
- programming
- Stata (unconditional growth)
- Stata (conditional growth)
xtmelogit y year id year, var cov(un)
xtmelogit y year south unem unemyr inc hs id
year, var cov(un)
113Fitting Multilevel Model for Change
114Fitting Multilevel Logit Model for Change
115Logits Observed, Conditional, and Marginal
the log odds of idleness decreases with time and
shows variation in level and change
116Composite Residuals in a Growth Model
- composite residual
- composite residual variance
- covariance of composite residual
117Model
- covariance term is 0 (from either model)
- results in simplified interpretation
- easier estimation via variance components
(default option) - significant variation in slopes and initial
levels - other results
- log odds of idleness decrease over time (negative
slope) - other covariates except county unemployment have
significant effects on the odds of idleness - the main effects are interpreted as effects on
initial logits at time 1 or t 0 or the 1982
baseline) - interaction of time and unemployment rate
captures the effect of county unemployment rate
in 1982 on the change log odds of idleness - the positive effect implies that higher county
unemployment tends to dampen change in odds
118IRT Models
- IRT models
- Item Response Theory
- models account for an individual-level random
effect on a set of items (i.e., ability) - items are assumed to tap a single latent
construct (aptitude on a specific subject) - item difficulty
- test items are assumed to be ordered on a
difficulty scale - easier ? harder
- expected patterns emerge whereby if a more
difficult item is answered correctly the easier
items are likely to have been answered correctly
119IRT Models
- IRT models
- 1-parameter logistic (Rasch) model
- pij individual is probability of a correct
response on the jth item - ? individual is ability
- b item js difficulty
- properties
- an individuals ability parameter is invariant
with respect to the item - the difficulty parameter is invariant with
respect to individuals ability - higher ability or lower item difficulty lead to a
higher probability of a correct response - both ability and difficulty are measured on the
same scale
120ICC
- item characteristics curve (item response curve)
- depicts the probability of a correct response as
a function of an examinees ability or trait
level - curves are shifted rightward with increasing item
difficulty - assume that item 3 is more difficult than item 2
and item 2 is more difficult than item 1 - probability of a correct response decreases as
the threshold ? bj is crossed, reflecting
increasing item difficulty
121IRT Models ICC (3 Items)
slopes of item characteristics curves are equal
when ability item difficulty
122Estimation as GLMM
- specification
- set up a person-item data structure
- define x as a set of dummy variables
- change signs on ß to reflect difficulty
- fit model without intercept to estimate all item
difficulties - normalization is common
123PL1 Estimation
clear set memory 128m infile junk y1-y5 f using
LSAT.dat drop if junk11 junk13 expand
f drop f junk gen cons 1 collapse (sum)
wt2cons, by(y1-y5) gen id _n sort id reshape
long y, i(id) j(item)
124PL1 Estimation
gen i1 0 gen i2 0 gen i3 0 gen i4 0 gen
i5 0 replace i1 1 if item 1 replace i2 1
if item 2 replace i3 1 if item 3 replace
i4 1 if item 4 replace i5 1 if item
5 1PL constrain sd1 cons 1 id1_cons
1 gllamm y i1-i5, i(id) weight(wt) nocons
family(binom) cons(1) link(logit) adapt
125PL1 Estimation
126PL1 Estimation
- Stata (parameter normalization)
normalized solution 1 -- standard 1PL 2
-- coefs sum to 0 var 1 mata bALL
st_matrix("e(b)") b -bALL1,1..5 mb
mean(b') bs b-mb ("MML Estimates", "IRT
parameters", "B-A Normalization") (-b', b',
bs') end
127PL1 Estimation
- Stata (normalized solution)
128IRT Extensions
- 2-parameter logistic (2PL) model
item discrimination parameters
129IRT Extensions
- 2-parameter logistic (2PL) model
- item discrimination parameters
- reveal differences in items utility to
distinguish different ability levels among
examinees - high values denote items that are more useful in
terms of separating examinees into different
ability levels - low values denote items that are less useful in
distinguishing examinees in terms of ability - ICCs corresponding to this model can intersect as
they differ in location and slope - steeper slope of the ICC is associated with a
better discriminating item
130IRT Extensions
- 2-parameter logistic (2PL) model
131IRT Extensions
- 2-parameter logistic (2PL) model
- Stata (estimation)
eq id i1 i2 i3 i4 i5 cons 1 id1_1i1
1 gllamm y i1-i5, i(id) weight(wt) nocons
family(binom) link(logit) frload(1) eqs(id)
cons(1) adapt matrix list e(b) normalized
solutions 1 standard 2PL) mata bALL
st_matrix("e(b)") b bALL1,1..5 c
bALL1,6..10 a -b/c ("MML Estimates-Dif",
"IRT Parameters") (b', a') ("MML Discrimination
Parameters") (c') end
132IRT Extensions
- 2-parameter logistic (2PL) model
- Stata (estimation)
Bock and Aitkin Normalization (p. 164
corrected) mata bALL st_matrix("e(b)") b
-bALL1,1..5 c bALL1,6..10 lc
ln(c) mb mean(b') mc mean(lc') bs
b-mb cs exp(lc-mc) ("B-A Normalization
DIFFICULTY", "B-A Normalization
DISCRIMINATION") (bs', cs') end
133IRT 2PL (1)
134IRT 2PL (2) Bock-Aitkin Normalization
item 3 has highest difficulty and greatest
discrimination
1351PL and 2PL
1361PL and 2PL
137Binary Response Models for Event Occurrence
- discrete-time event-history models
- purpose
- model the probability of an event occurring at
some point in time - Pr(event at t event has not yet occurred by t)
- life table
- events trials
- observe the number of events occurring to those
who are at remain at risk as time passes - takes account of the changing composition of the
sample as time passes
138Life Table
139Life Table
- observe
- Rj number at risk in time interval j (R0 n),
where the number at risk in interval j is
adjusted over time - Dj events in time interval j (D0 0)
- Wj removed from risk (censored) in time
interval j (W0 0) - (removed from risk due to other unrelated
causes)
140Life Table
- other key quantities
- discrete-time hazard (event probability in
interval j) - surviving fraction (survivor function in interval
j)
141Discrete-Time Hazard Models
- statistical concepts
- discrete random variable Ti (individuals event
or censoring time) - pdf of T (probability that individual i
experiences event in period j) - cdf of T (probability that individual i
experiences event in period j or earlier) - survivor function (probability that individual i
survives past period j)
142Discrete-Time Hazard Models
- statistical concepts
- discrete hazard
- the conditional probability of event occurrence
in interval j for individual i given that an
event has not already occurred to that individual
by interval j
143Discrete-Time Hazard Models
- equivalent expression using binary data
- binary data dij 1 if individual i experiences
an event in interval j, 0 otherwise - use the sequence of binary values at each
interval to form a history of the process for
individual i up to the time the event occurs - discrete hazard
144Discrete-Time Hazard Models
- modeling (logit link)
- modeling (complementary log log link)
- non-proportional effects
145Data Structure
- person-level data? person-period form
146Data Structure
147Estimation
- contributions to likelihood
- contribution to log L for individual with event
in period j - contribution to log L for individual censored in
period j - combine
148Example
- dropping out of Ph.D. programs (large US
university) - data 6,964 individual histories spanning 20
years - dropout cannot be distinguished from other types
of leaving (transfer to other program etc.) - model the logit hazard of leaving the
originally-entered program as a function of the
following - time in program (the time-dependent) baseline
hazard) - female and percent female in program
- race/ethnicity (black, Hispanic, Asian)
- marital status
- GRE score
- also add a program-specific random effect
(multilevel)
149Example
150Example
151Example
clear set memory 512m infile CID devnt I1-I5
female pctfem black hisp asian married gre using
DT28432.dat logit devnt I1-I5, nocons or est
store m1 logit devnt I1-I5 female pctfem, nocons
or est store m2 logit devnt I1-I5 female pctfem
black hisp asian , nocons or est store m3 logit
devnt I1-I5 female pctfem black hisp asian
married, nocons or est store m4 logit devnt
I1-I5 female pctfem black hisp asian married gre
, nocons or