Categorical Data Analysis - PowerPoint PPT Presentation

1 / 151
About This Presentation
Title:

Categorical Data Analysis

Description:

probit models. more common in certain disciplines (economics) ... glm y A2 A3 P2, family(b n) link(probit) glm y A2 A3 P2, family(b n) link(logit) ... – PowerPoint PPT presentation

Number of Views:1050
Avg rating:3.0/5.0
Slides: 152
Provided by: danpo8
Category:

less

Transcript and Presenter's Notes

Title: Categorical Data Analysis


1
Categorical Data Analysis
  • Week 2

2
Binary Response Models
  • binary and binomial responses
  • binary y assumes values of 0 or 1
  • binomial y is number of successes in n
    trials
  • distributions
  • Bernoulli
  • Binomial

3
Transformational Approach
  • linear probability model
  • use grouped data (events/trials)
  • identity link
  • linear predictor
  • problems of prediction outside 0,1

4
The Logit Model
  • logit transformation
  • inverse logit
  • ensures that p is in 0,1 for all values of x
    and .

5
The Logit Model
  • odds and odds ratios are the key to understanding
    and interpreting this model
  • the log odds transformation is a stretching
    transformation to map probabilities to the real
    line

6
Odds and Probabilities
7
Probabilities and Log Odds
8
The Logit Transformation
  • properties of logit

linear
9
Odds, Odds Ratios, and Relative Risk
  • odds of success is the ratio
  • consider two groups with success probabilities
  • odds ratio (OR) is a measure of the odds of
    success in group 1 relative to group 2

10
Odds Ratio
Y
0 1
  • 2 X 2 table
  • OR is the cross-product ratio (compare x 1
    group to x 0 group)
  • odds of y 1 are 4 times higher when x 1 than
    when x 0

0 1
X
11
Odds Ratio
  • equivalent interpretation
  • odds of y 1 are 0.225 times higher when x 0
    than when x 1
  • odds of y 1 are 1-0.225 .775 times lower
    when x 0 than when x 1
  • odds of y 1 are 77.5 lower when x 0 than
    when x 1

12
Log Odds Ratios
  • Consider the model
  • D is a dummy variable coded 1 if group 1 and 0
    otherwise.
  • group 1
  • group 2
  • LOR OR

13
Relative Risk
  • similar to OR, but works with rates
  • relative risk or rate ratio (RR) is the rate in
    group 1 relative to group 2
  • OR RR as .

14
Tutorial odds and odds ratios
  • consider the following data

15
Tutorial odds and odds ratios
  • read table

clear input educ psex f 0 0 873 0 1 1190 1 0
533 1 1 1208 end label define edlev 0 "HS or
less" 1 "Col or more" label val educ edlev label
var educ education
16
Tutorial odds and odds ratios
  • compute odds
  • verify by hand

tabodds psex educ fwf
17
Tutorial odds and odds ratios
  • compute odds ratios
  • verify by hand

tabodds psex educ fwf, or
18
Tutorial odds and odds ratios
  • stat facts
  • variances of functions
  • use in statistical significance tests and forming
    confidence intervals
  • basic rule for variances of linear
    transformations
  • g(x) a bx is a linear function of x, then
  • this is a trivial case of the delta method
    applied to a single variable
  • the delta method for the variance of a nonlinear
    function g(x) of a single variable is

19
Tutorial odds and odds ratios
  • stat facts
  • variances of odds and odds ratios
  • we can use the delta method to find the variance
    in the odds and the odds ratios
  • from the asymptotic (large sample theory)
    perspective it is best to work with log odds and
    log odds ratios
  • the log odds ratio converges to normality at a
    faster rate than the odds ratio, so statistical
    tests may be more appropriate on log odds ratios
    (nonlinear functions of p)

20
Tutorial odds and odds ratios
  • stat facts
  • the log odds ratio is the difference in the log
    odds for two groups
  • groups are independent
  • variance of a difference is the sum of the
    variances

21
Tutorial odds and odds ratios
  • data structures grouped or individual level
  • note
  • use frequency weights to handle grouped data
  • or we could expand this data by the frequency
    weights resulting in individual-level data
  • model results from either data structures are the
    same
  • expand the data and verify the following results

expand f
22
Tutorial odds and odds ratios
  • statistical modeling
  • logit model (glm)
  • logit model (logit)

glm psex educ fwf, f(b) eform
logit psex educ fwf, or
23
Tutorial odds and odds ratios
  • statistical modeling (1)
  • logit model (glm)

24
Tutorial odds and odds ratios
  • statistical modeling (2)
  • some ideas from alternative normalizations
  • what parameters will this model produce?
  • what is the interpretation of the constant

gen cons 1 glm psex cons educ fwf, nocons
f(b) eform
25
Tutorial odds and odds ratios
  • statistical modeling (2)

26
Tutorial odds and odds ratios
  • statistical modeling (3)
  • what parameters does this model produce?
  • how do you interpret them?

gen lowed educ 0 gen hied educ 1 glm
psex lowed hied fwf, nocons f(b) eform
27
Tutorial odds and odds ratios
  • statistical modeling (3)

are these odds ratios?
28
Tutorial prediction
  • fitted probabilities (after most recent model)

predict p, mu tab educ fwf, sum(p) nostandard
nofreq
29
Probit Model
  • inverse probit is the CDF for a standard normal
    variable
  • link function

30
Probit Transformation
31
Interpretation
  • probit coefficients
  • interpreted as a standard normal variables (no
    log odds-ratio interpretation)
  • scaled versions of logit coefficients
  • probit models
  • more common in certain disciplines (economics)
  • analogy with linear regression (normal latent
    variable)
  • more easily extended to multivariate
    distributions

32
Example Grouped Data
  • Swedish mortality data revisited

logit model
probit model
33
Swedish Historical Mortality Data
  • predictions

34
Programming
  • Stata generalized linear model (glm)
  • glm y A2 A3 P2, family(b n) link(probit)
  • glm y A2 A3 P2, family(b n) link(logit)
  • idea of glm is to make model linear in the link.
  • old days Iteratively Reweighted Least Squares
  • now Fisher scoring, Newton-Raphson
  • both approaches yield MLEs

35
Generalized Linear Models
  • applies to a broad class of models
  • iterative fitting (repeated updating) except for
    linear model
  • update parameters, weights W, and predicted
    values m
  • models differ in terms of W and m and assumptions
    about the distribution of y
  • common distributions for y include normal,
    binomial, and Poisson
  • common links include identity, logit, probit,
    and log

36
Latent Variable Approach
  • example insect mortality
  • suppose a researcher exposes insects to dosage
    levels (u) of an insecticide and observes whether
    the subject lives or dies at that dosage.
  • the response is expected to depend on the
    insects tolerance (c) to that dosage level.
  • the insect dies if u gt c and survives if u lt c
  • tolerance is not observed (survival is observed)

37
Latent Variables
  • u and c are continuous latent variables
  • examples
  • womens employment u is the market wage and c is
    the reservation wage
  • migration u is the benefit of moving and c is
    the cost of moving.
  • observed outcome y 1 or y 0 reveals the
    individuals preference, which is assumed to
    maximize a rational individuals utility function.

38
Latent Variables
  • Assume linear utility and criterion functions
  • over-parameterization identification problem
  • we can identify differences in components but not
    the separate components

39
Latent Variables
  • constraints
  • Then
  • where F(.) is the CDF of e

40
Latent Variables and Standardization
  • Need to standardize the mean and variance of e
  • binary dependent variables lack inherent scales
  • magnitude of ß is only in reference to the mean
    and variance of e which are unknown.
  • redefine e to a common standard
  • where a and b are two chosen constants.

41
Standardization for Logit and Probit Models
  • standardization implies
  • F() is the cdf of e
  • location a and scale b need to be fixed
  • setting
  • and

42
Standardization for Logit and Probit Models
  • distribution of e is standardized
  • standard normal ? probit
  • standard logistic ? logit
  • both distributions have a mean of 0
  • variances differ

43
Extending the Latent Variable Approach
  • observed y is a dichotomous (binary) 0/1 variable
  • continuous latent variable
  • linear predictor residual
  • observed outcome

44
Notation
  • conditional means of latent variables obtained
    from index function
  • obtain probabilities from inverse link functions
  • logit model
  • probit model

45
ML
  • likelihood function
  • where if data are binary
  • log-likelihood function

46
Assessing Models
  • definitions
  • L null model (intercept only)
  • L saturated model (a parameter for each cell)
  • L current model
  • grouped data (events/trials)
  • deviance (likelihood ratio statistic)

47
Deviance
  • grouped data
  • if cell sizes are reasonably large deviance is
    distributed as chi-square
  • individual-level data Lf 1 and log Lf 0
  • deviance is not a fit statistic

48
Deviance
  • deviance is like a residual sum of squares
  • larger values indicate poorer models
  • larger models have smaller deviance
  • deviance for the more constrained model (Model
    1)
  • deviance for the less constrained model (Model 2)
  • assume that Model 1 is a constrained version of
    Model 2.

49
Difference in Deviance
  • evaluate competing nested models using a
    likelihood ratio statistic
  • model chi-square is a special case
  • SAS, Stata, R, etc. report different statistics

50
Other Fit Statistics
  • BIC AIC (useful for non-nested models)
  • basic idea of IC penalize log L for the number
    of parameters (AIC/BIC) and/or the size of the
    sample (BIC)
  • AIC s1
  • BIC s ½ log n (sample size)
  • dfm is the number of model parameters

51
Hypothesis Tests/Inference
  • single parameter
  • MLE are asymptotically normal ? Z-test
  • multi-parameter
  • likelihood ratio tests (after fitting)
  • Wald tests (test constraints from current model)

52
Hypothesis Tests/Inference
  • Wald test (tests a vector of restrictions)
  • a set of r parameters are all equal to 0
  • a set of r parameters are linearly restricted

restriction matrix
constraint vector
parameter subset
53
Interpreting Parameters
  • odds ratios consider the model where x is a
    continuous predictor and d is a dummy variable
  • suppose that d denotes sex and x denotes income
    and the problem concerns voting, where y is the
    propensity to vote
  • results logit(pi) -1.92 0.012xi 0.67di

54
Interpreting Parameters
  • for d (dummy variable coded 1 for female) the
    odds ratio is straightforward
  • holding income constant, womens odds of voting
    are nearly twice those of men

55
Interpreting Parameters
  • for x (continuous variable for income in
    thousands of dollars) the odds ratio is a
    multiplicative effect
  • suppose we increase income by 1 unit (1,000)
  • suppose we increase income by c units (c ?
    1,000

56
Interpreting Parameters
  • if income is increased by 10,000, this increases
    the odds of voting by about 13
  • a note on percent change in odds
  • if estimate of ß gt 0 then percent increase in
    odds for a unit change in x is
  • if estimate of ß lt 0 then percent decrease in
    odds for a unit change in x is

57
Marginal Effects
  • marginal effect
  • effect of change in x on change in probability
  • pdf cdf
  • often we evaluate f(.) at the mean of x.

58
Marginal Effect for a Change in a Continuous
Variable
59
Marginal Effect of a Change in a Dummy Variable
  • if x is a continuous variable and z is a dummy
    variable
  • marginal effect of change in z from 0 to 1 is the
    difference

60
Example
  • logit models for high school graduation
  • odds ratios (constant is baseline odds)

61
LR Test
  • Model 3 vs. 2

62
Wald Test
  • Test equality of parental education effects

logit hsg blk hsp female nonint inc nsibs mhs
mcol fhs fcol wtest test mhsfhs test mcolfcol
cannot reject H of equal parental education
effects on HS graduation
63
Basic Estimation Commands (Stata)
estimation commands
model tests
model 0 - null model qui logit hsg est store
m0 model 1 - race, sex, family structure qui
logit hsg blk hsp female nonint est store m1
model 1a - race X family structure
interactions qui xi logit hsg blk hsp female
nonint i.noninti.blk i.noninti.hsp est store
m1a lrtest m1 m1a model 2 - SES qui xi logit
hsg blk hsp female nonint inc nsibs mhs mcol fhs
fcol est store m2 model 3 - Indiv qui xi
logit hsg blk hsp female nonint inc nsibs mhs
mcol fhs fcol wtest est store m3 lrtest m2 m3
64
  • Fit Statistics etc.

some 'hand' calculations with saved
results scalar ll e(ll) scalar npar
e(df_m)1 scalar nobs e(N) scalar AIC -2ll
2npar scalar BIC -2ll log(nobs)npar
scalar list AIC scalar list BIC or use
automated fitstat routine fitstat output as a
table estout1 m0 m1 m2 m3 using modF07, replace
star stfmt(9.2f 9.0f 9.0f) /// stats(ll N
df_m) eform
65
Analysis of Deviance
66
BIC and AIC (using fitstat)
67
Marginal Effects
68
Marginal Effects
69
Generate Income Quartiles
qui sum adjinc, det quartiles for income
distribution gen incQ1 adjinc lt r(p25) gen
incQ2 adjinc gt r(p25) adjinc lt r(p50) gen
incQ3 adjinc gt r(p50) adjinc lt r(p75) gen
incQ4 adjinc gt r(p75) gen incQ 1 if
incQ11 replace incQ 2 if incQ21 replace
incQ 3 if incQ31 replace incQ 4 if
incQ41 tab incQ
70
Fit Model for Each Quartile
calculate predictions
look at marginal effects of test score on
graduation by selected groups (1) model (income
quartiles) local i 1 while i' lt 5 logit
hsg blk female mhs nonint nsibs urban so wtest if
incQ i' margeff cap drop wm cap drop
bm prgen wtest, x(blk0 female0 mhs1 nonint0)
gen(wmi) from(-3) to(3) prgen wtest, x(blk0
female0 mhs1 nonint1) gen(wmn) from(-3)
to(3) label var wmip1 "white/intact" label var
wmnp1 "white/nonintact" prgen wtest, x(blk1
female0 mhs1 nonint0) gen(bmi) from(-3)
to(3) prgen wtest, x(blk1 female0 mhs1
nonint1) gen(bmn) from(-3) to(3) label var bmip1
"black/intact" label var bmnp1 "black/nonintact"
71
Graph
set scheme s2mono twoway (line wmip1 wmix, sort
xtitle("Test Score") ytitle("Pr(y1)")) ///
(line wmnp1 wmix, sort) (line bmip1 wmix,
sort) (line bmnp1 wmix, sort), ///
subtitle("Marginal Effect of Test Score on High
School Graduation" /// "Income Quartile
i'" ) saving(wtgrphi', replace) graph export
wtgrphi'.eps, as(eps) replace local i i'
1
72
Fitted Probabilities
logit hsg blk female mhs nonint inc nsibs urban
so wtest prtab nonint blk female
73
Fitted Probabilities
  • predicted values
  • evaluate fitted probabilities at the sample mean
    values of x (or other fixed quantities)
  • averaging fitted probabilities over
    subgroup-specific models will produce marginal
    probabilities

74
Observed Fitted Probabilities
75
Alternative Probability Model
  • complementary log log (cloglog or CLL)
  • standard extreme-value distribution for u
  • cloglog model
  • cloglog link function

76
Extreme-Value Distribution
  • properties
  • mean of u (Eulers constant)
  • variance of u
  • difference in two independent extreme value
    variables yields a logistic variable

77
CLL Transformation
78
CLL Model
  • no practical differences from logit and probit
    models
  • often suited for survival data and other
    applications
  • interpretation of coefficients
  • exp(ß) is a relative risk or hazard ratio not an
    OR
  • glm binomial distribution for y with a cloglog
    link
  • cloglog use the cloglog command directly

79
CLL and Logit Model Compared
80
Cloglog and Logit Model Compared
logit
cloglog
more agreement when modeling rare events
81
Extensions Multilevel Data
  • what is multilevel data?
  • individuals are nested in a larger context
  • children in families, kids in schools etc.

context 1
context 2
context 3
82
Multilevel Data
  • i.i.d. assumptions?
  • the outcomes for units in a given context could
    be associated
  • standard model would treat all outcomes
    (regardless of context) as independent
  • multilevel methods account for the within-cluster
    dependence
  • a general problem with binomial responses
  • we assume that trials are independent
  • this might not be realistic
  • non-independence will inflate the variance
    (overdispersion)

83
Multilevel Data
  • example (in book)
  • 40 universities as units of analysis
  • for each university we observe the number of
    graduates (n) and the number receiving
    post-doctoral fellowships (y)
  • we could compute proportions (MLEs)
  • some proportions would be better estimates as
    they would have higher precision or lower
    variance
  • example the data y1/n1 2/5 and y2/n2 20/50
    give identical estimates of p but variances of
    0.048 and 0.0048 respectively
  • the 2nd estimate is more precise than the 1st

84
Multilevel Data
  • multilevel models allow for improved predictions
    of individual probabilities
  • MLE estimate is unaltered if it is precise
  • MLE estimate moved toward average if it is
    imprecise (shrinkage)
  • multilevel estimate of p would be a weighted
    average of the MLE and the average over all MLEs
    (weight (w) is based on the variance of each
    MLE and the variance over all the MLEs)
  • we are generally less interested in the ps and
    more interested in the model parameters and
    variance components

85
Shrinkage Estimation
  • primitive approach
  • assume we have a set of estimates (MLEs)
  • our best estimate of the variance of each MLE is
  • this is the within variance (no pooling)
  • if this is large, then the MLE is a poor estimate
  • a better estimate might be the average of the
    MLEs in this case (pooling the estimates)
  • we can average the MLEs and estimate the between
    variance as

86
Shrinkage Estimation
  • primitive approach
  • we can then estimate a weight wi
  • a revised estimate of pi would take account of
    the precision to for a precision-weighted average
  • precision is a function of ni
  • more weight is given to more precise MLEs

87
Shrinkage a primitive approach
88
Shrinkage
results from full Bayesian (multilevel) Analysis
89
Extension Multilevel Models
  • assumptions
  • within-context and between-context variation in
    outcomes
  • individuals within the same context share the
    same random error specific to that context
  • models are hierarchical
  • individuals (level-1)
  • contexts (level-2)

90
Multilevel Models Background
  • linear mixed model for continuous y
  • (multilevel, random coefficients, etc.)
  • level-1 model and level-2 sub-models
    (hierarchical)

91
Multilevel Models Background
  • linear mixed model assumptions
  • level-1 and level-2 residuals

92
Multilevel Models Background
  • composite form

composite residual
fixed effects
cross-level interaction
random effects (level-2)
93
Multilevel Models Background
  • variance components

94
Multilevel Models Background
  • general form (linear mixed model)

variables associated with fixed coefficients
variables associated with random coefficients
95
Multilevel Models Logit Models
  • binomial model (random effect)
  • assumptions
  • u increases or decreases the expected response
    for individual j in context i independently of x
  • all individuals in context i share the same value
    of u
  • also called a random intercept model

96
Multilevel Models
  • a hierarchical model
  • z is a level-1 variable x is a level-2 variable
  • random intercept varies among level-2 units
  • note level-1 residual variance is fixed (why?)

97
Multilevel Models
  • a general expression
  • x are variables associated with fixed
    coefficients
  • z are variables associated with random
    coefficients
  • u is multivariate normal vector of level-2
    residuals
  • mean of u is 0 covariance of u is

98
Multilevel Models
  • random effects vs. random coefficients
  • random effects u
  • random coefficients ß u
  • variance components
  • interested in level-2 variation in u
  • prediction
  • E(y) is not equal to E(yu)
  • model based predictions need to consider random
    effects

99
Multilevel Models Generalized Linear Mixed
Models (GLMM)
Conditional Expectation
Marginal Expectation
requires numerical integration or simulation
100
Data Structure
  • multilevel data structure
  • requires a context id to identify individuals
    belonging to the same context
  • NLSY sibling data contains a family id
    (constructed by researcher)
  • data are unbalanced (we do not require clusters
    to be the same size)
  • small clusters will contribute less information
    to the estimation of variance components than
    larger clusters
  • it is OK to have clusters of size 1
  • (i.e., an individual is a context unto
    themselves)
  • clusters of size 1 contribute to the estimation
    of fixed effects but not to the estimation of
    variance components

101
Example clustered data
  • siblings nested in families
  • y is 1st premarital birth for NLSY women
  • select sib-ships of size gt 2
  • null model (random intercept)
  • xtlogit fpmbir, i(famid)
  • or
  • xtmelogit fpmbir famid

102
Example clustered data
random intercept xtlogit
103
Example clustered data
random intercept xtmelogit
104
Variance Component
  • add predictors (mostly level-2)

105
Variance Component
  • conditional variance in u is 2.107
  • proportionate reduction in error (PRE)
  • a 31 reduction in level-2 variance when level-2
    predictors are accounted for

106
Random Effects
  • we can examine the distribution of random effects

107
Random Effects
  • we can examine the distribution of random effects

108
Random Effects Distribution
  • 90th percentile u90 1.338
  • 10th percentile u10 0.388
  • the risk for family at 90th percentile is
  • exp(1.338 0.388) 2.586
  • times higher than for a family at the 10th
    percentile
  • even if families are compositionally identical on
    covariates, we can assess the hypothetical
    differential in risks

109
Growth Curve Models
  • growth models
  • individuals are level-2 units
  • repeated measures over time on individuals
    (level-1)
  • models imply that logits vary across individuals
  • intercept (conditional average logit) varies
  • slope (conditional average effect of time) varies
  • change is usually assumed to be linear
  • use GLMM
  • complications due to dimensionality
  • intercept and slope may co-vary (necessitating a
    more complex model) and more

110
Growth Curve Models
  • multilevel logit model for change over time
  • T is time (strictly increasing)
  • fixed and random coefficients (with covariates)
  • assume that u0 and u1 are bivariate normal

111
Multilevel Logit Models for Change
  • Example Log odds of employment of black men in
    the U.S. 1982-1988 (NLSY)
  • (consider 5 years in this period)
  • time is coded 0, 1, 3, 4, 6
  • dependent variable is not-working, not-in-school
  • unconditional growth (no covariates except T)
  • conditional growth (add covariates)
  • note cross-level interactions implied by
    composite model

112
Fitting Multilevel Model for Change
  • programming
  • Stata (unconditional growth)
  • Stata (conditional growth)

xtmelogit y year id year, var cov(un)
xtmelogit y year south unem unemyr inc hs id
year, var cov(un)
113
Fitting Multilevel Model for Change
114
Fitting Multilevel Logit Model for Change
115
Logits Observed, Conditional, and Marginal
the log odds of idleness decreases with time and
shows variation in level and change
116
Composite Residuals in a Growth Model
  • composite residual
  • composite residual variance
  • covariance of composite residual

117
Model
  • covariance term is 0 (from either model)
  • results in simplified interpretation
  • easier estimation via variance components
    (default option)
  • significant variation in slopes and initial
    levels
  • other results
  • log odds of idleness decrease over time (negative
    slope)
  • other covariates except county unemployment have
    significant effects on the odds of idleness
  • the main effects are interpreted as effects on
    initial logits at time 1 or t 0 or the 1982
    baseline)
  • interaction of time and unemployment rate
    captures the effect of county unemployment rate
    in 1982 on the change log odds of idleness
  • the positive effect implies that higher county
    unemployment tends to dampen change in odds

118
IRT Models
  • IRT models
  • Item Response Theory
  • models account for an individual-level random
    effect on a set of items (i.e., ability)
  • items are assumed to tap a single latent
    construct (aptitude on a specific subject)
  • item difficulty
  • test items are assumed to be ordered on a
    difficulty scale
  • easier ? harder
  • expected patterns emerge whereby if a more
    difficult item is answered correctly the easier
    items are likely to have been answered correctly

119
IRT Models
  • IRT models
  • 1-parameter logistic (Rasch) model
  • pij individual is probability of a correct
    response on the jth item
  • ? individual is ability
  • b item js difficulty
  • properties
  • an individuals ability parameter is invariant
    with respect to the item
  • the difficulty parameter is invariant with
    respect to individuals ability
  • higher ability or lower item difficulty lead to a
    higher probability of a correct response
  • both ability and difficulty are measured on the
    same scale

120
ICC
  • item characteristics curve (item response curve)
  • depicts the probability of a correct response as
    a function of an examinees ability or trait
    level
  • curves are shifted rightward with increasing item
    difficulty
  • assume that item 3 is more difficult than item 2
    and item 2 is more difficult than item 1
  • probability of a correct response decreases as
    the threshold ? bj is crossed, reflecting
    increasing item difficulty

121
IRT Models ICC (3 Items)
slopes of item characteristics curves are equal
when ability item difficulty
122
Estimation as GLMM
  • specification
  • set up a person-item data structure
  • define x as a set of dummy variables
  • change signs on ß to reflect difficulty
  • fit model without intercept to estimate all item
    difficulties
  • normalization is common

123
PL1 Estimation
  • Stata (data set up )

clear set memory 128m infile junk y1-y5 f using
LSAT.dat drop if junk11 junk13 expand
f drop f junk gen cons 1 collapse (sum)
wt2cons, by(y1-y5) gen id _n sort id reshape
long y, i(id) j(item)
124
PL1 Estimation
  • Stata (model set up )

gen i1 0 gen i2 0 gen i3 0 gen i4 0 gen
i5 0 replace i1 1 if item 1 replace i2 1
if item 2 replace i3 1 if item 3 replace
i4 1 if item 4 replace i5 1 if item
5 1PL constrain sd1 cons 1 id1_cons
1 gllamm y i1-i5, i(id) weight(wt) nocons
family(binom) cons(1) link(logit) adapt
125
PL1 Estimation
  • Stata (output )

126
PL1 Estimation
  • Stata (parameter normalization)

normalized solution 1 -- standard 1PL 2
-- coefs sum to 0 var 1 mata bALL
st_matrix("e(b)") b -bALL1,1..5 mb
mean(b') bs b-mb ("MML Estimates", "IRT
parameters", "B-A Normalization") (-b', b',
bs') end
127
PL1 Estimation
  • Stata (normalized solution)

128
IRT Extensions
  • 2-parameter logistic (2PL) model

item discrimination parameters
129
IRT Extensions
  • 2-parameter logistic (2PL) model
  • item discrimination parameters
  • reveal differences in items utility to
    distinguish different ability levels among
    examinees
  • high values denote items that are more useful in
    terms of separating examinees into different
    ability levels
  • low values denote items that are less useful in
    distinguishing examinees in terms of ability
  • ICCs corresponding to this model can intersect as
    they differ in location and slope
  • steeper slope of the ICC is associated with a
    better discriminating item

130
IRT Extensions
  • 2-parameter logistic (2PL) model

131
IRT Extensions
  • 2-parameter logistic (2PL) model
  • Stata (estimation)

eq id i1 i2 i3 i4 i5 cons 1 id1_1i1
1 gllamm y i1-i5, i(id) weight(wt) nocons
family(binom) link(logit) frload(1) eqs(id)
cons(1) adapt matrix list e(b) normalized
solutions 1 standard 2PL) mata bALL
st_matrix("e(b)") b bALL1,1..5 c
bALL1,6..10 a -b/c ("MML Estimates-Dif",
"IRT Parameters") (b', a') ("MML Discrimination
Parameters") (c') end
132
IRT Extensions
  • 2-parameter logistic (2PL) model
  • Stata (estimation)

Bock and Aitkin Normalization (p. 164
corrected) mata bALL st_matrix("e(b)") b
-bALL1,1..5 c bALL1,6..10 lc
ln(c) mb mean(b') mc mean(lc') bs
b-mb cs exp(lc-mc) ("B-A Normalization
DIFFICULTY", "B-A Normalization
DISCRIMINATION") (bs', cs') end
133
IRT 2PL (1)
134
IRT 2PL (2) Bock-Aitkin Normalization
item 3 has highest difficulty and greatest
discrimination
135
1PL and 2PL
136
1PL and 2PL
137
Binary Response Models for Event Occurrence
  • discrete-time event-history models
  • purpose
  • model the probability of an event occurring at
    some point in time
  • Pr(event at t event has not yet occurred by t)
  • life table
  • events trials
  • observe the number of events occurring to those
    who are at remain at risk as time passes
  • takes account of the changing composition of the
    sample as time passes

138
Life Table
139
Life Table
  • observe
  • Rj number at risk in time interval j (R0 n),
    where the number at risk in interval j is
    adjusted over time
  • Dj events in time interval j (D0 0)
  • Wj removed from risk (censored) in time
    interval j (W0 0)
  • (removed from risk due to other unrelated
    causes)

140
Life Table
  • other key quantities
  • discrete-time hazard (event probability in
    interval j)
  • surviving fraction (survivor function in interval
    j)

141
Discrete-Time Hazard Models
  • statistical concepts
  • discrete random variable Ti (individuals event
    or censoring time)
  • pdf of T (probability that individual i
    experiences event in period j)
  • cdf of T (probability that individual i
    experiences event in period j or earlier)
  • survivor function (probability that individual i
    survives past period j)

142
Discrete-Time Hazard Models
  • statistical concepts
  • discrete hazard
  • the conditional probability of event occurrence
    in interval j for individual i given that an
    event has not already occurred to that individual
    by interval j

143
Discrete-Time Hazard Models
  • equivalent expression using binary data
  • binary data dij 1 if individual i experiences
    an event in interval j, 0 otherwise
  • use the sequence of binary values at each
    interval to form a history of the process for
    individual i up to the time the event occurs
  • discrete hazard

144
Discrete-Time Hazard Models
  • modeling (logit link)
  • modeling (complementary log log link)
  • non-proportional effects

145
Data Structure
  • person-level data? person-period form

146
Data Structure
  • binary sequences

147
Estimation
  • contributions to likelihood
  • contribution to log L for individual with event
    in period j
  • contribution to log L for individual censored in
    period j
  • combine

148
Example
  • dropping out of Ph.D. programs (large US
    university)
  • data 6,964 individual histories spanning 20
    years
  • dropout cannot be distinguished from other types
    of leaving (transfer to other program etc.)
  • model the logit hazard of leaving the
    originally-entered program as a function of the
    following
  • time in program (the time-dependent) baseline
    hazard)
  • female and percent female in program
  • race/ethnicity (black, Hispanic, Asian)
  • marital status
  • GRE score
  • also add a program-specific random effect
    (multilevel)

149
Example
150
Example
151
Example
clear set memory 512m infile CID devnt I1-I5
female pctfem black hisp asian married gre using
DT28432.dat logit devnt I1-I5, nocons or est
store m1 logit devnt I1-I5 female pctfem, nocons
or est store m2 logit devnt I1-I5 female pctfem
black hisp asian , nocons or est store m3 logit
devnt I1-I5 female pctfem black hisp asian
married, nocons or est store m4 logit devnt
I1-I5 female pctfem black hisp asian married gre
, nocons or
Write a Comment
User Comments (0)
About PowerShow.com