Title: EM?????????????
1?????????
2????
- ?????????
- ???????????
- EM?????????????
- ???????
- ???????????????
- ??????EM??????
- ??????????
3???????????????
- ??????????? Complete Case
-
(CC) ? - ?????????????, ?????
- ?????? Pairwise Deletion
- ????????????????????????,???????????
- ???????? Imputation Method
- ????????????(?????,????,)
- ???????????
- ???(EM???)
4????????????
and
?????? ??????
5 response variable missing indicator
variable the joint distribution of x and r
the marginal distribution of the observed data
6?????? f (x,r)????2?????
- Selection models
- f ( x , r ) f ( x )P ( r x )
- ???? ??????,???? x ??????
- ?????????????
- Pattern mixture models
- f ( x , r ) f ( x r )P ( r )
- ????????,?????? x ???????
- ????, ????????,????????
- ?????????????????????????
7Selection Model v.s. Pattern Mixture Model
- Selection Model
- MAR???????,????????
- ???????????????????
- ???????????
- NMAR????,?????????????
- ???????
- Pattern Mixture Model
- NMAR????,????????
- ????????????????????????????????
- NMAR???,?????????????????
- ?????????????????,???????
- ??????????????
8?????????????(1)
- Missing Completely at Randam (MCAR)
- P( r x ) P ( r )
-
- ?????????????????????? x ?????
-
- ????????????????,
-
- Ex. P(r(1,1,,1))75,
P(r(1,1,,0))10,
9MCAR????,?????????????
No systematic difference between complete cases
and incomplete cases
CC ?, ??????
unbiased estimates of underlying marginal
means/profiles
10?????????????(2)
- Missing at Random (MAR)
- P( r x ) P ( r xobs )
-
- ??????????? ? xobs????????,
- xmis?????
- the joint distribution of the observed data
-
- ????,MCAR???
11Growth Curve Data (Potthoff Roy,1964)
x8
means the missing produced through a MAR
mechanism by Little(1987)
12Missing at Random (MAR)
-
- ????xobs?,????xmis???????r?????
- ??????,?????????????????????
- ????xmis???????r?????????????xobs
???????????????????, ???????????????????? - ???????????xmis???????????????????, MAR
?????????????????
13MAR ??????, non-response bias ??????
- CC(Complete-case)?
- ????????????
??????? Stratification Weighting
?????????,????????????????????????
14????MCAR????????????????????????
- Observed variables
- Response Propensity ????????
-
- Predicted Mean ?????
15Response Propensity ???
- Probability of missing based on covariate.
-
- Missing at Random
Rosenbaum Rubin (1983)
and
approximately
16Propensity ??????????
- ?????????????????????????????( Propensity???)??
- ???????Propensity????????????????
- Propensity???????????????????,?????
- Propensity????????,???????????????????????????,?
???????????????????????
17???????????????
- ??????????? Complete Case
-
(CC) ? - MCAR???, MAR????????
- ?????? Pairwise Deletion
- ????????????????????????,???????????
- ???????? Imputation Method
- ????????????(?????,????,)
- ???????????
- ???
- ????
18?????? Pairwise Deletion
- ????????????????,???????
- ?????????????????,????????
19??????
- ???????????????
- ????????????????????
- ??????????????,?????
- ???CC?(???)????????????
20Imputation(???)
- ??????????????
- ????????????
- Marginal or Conditional imputation
- Explicit or Implicit model imputation
- Deterministic or Stochastic imputation
- (using random
numbers) - Univariate or Multivariate imputation
- Single or Multiple imputation
-
21 2?????
- Full loglikelihood
- ??????????????
- Partial loglikelihood
- ????????????
-
????? partial likelihood ??????????? ?
22Ignorability Rubin(1977)
- ??????????,??????????????
- ?????? ?
- Sufficient conditions for ignorability
- MAR
- ???????????????? (f) ?????????? (q) ????
- ??? MAR ??????????,ML? Lpartial ?????????,???
efficient ?????? - MAR ? key condition
- Richer the observed data xobs , the more
plausible the MAR assumption - NMAR ? more plausible, ???,???????????????????????
????
23Missing at Random
Partial loglikelihood ????????????
has much simpler form than
24Excel???
- ????????????
- ????
- EM???????????
25EM algorithm
- A general algorithm for incomplete data problems
that provides an interesting link with imputation
methods
- (k) converges to a maximum likelihood estimate
of q - based on Lpartial , if a unique finite MLE
of q exists.
26DLR(1977)
- E-step To calculate the conditional
- expectation of Lc(q)
- M-step To find q which maximize the
- conditional expectation
calculated - in the previous E-step
27EM ???(Ignorable case)
- ?????????????
- ?????????????
- ???????????????????
- Logistic ??( missing covariates)
- Unbalanced repeated-measures models with
structured covariance and with missing data - ???????
28??????????????
29 Sufficient statistics
30 Sufficient statistics
31 32MAR????????
33?????(MAR?????)
x1
x2
m2
34?????(MAR???)
x1
x2
m1
35MAR???
36??(??)
37??(r0.8)
38??
39??
40????
41??????????????
42??
43???
- ?????????? unique solution ???
- ???????????,sensitivity check ??
- ML ??,MAR????OK
- MAR??????????????????????
-
44Imputation(???)
- ??????????????
- ????????????
- Marginal or Conditional imputation
- Explicit or Implicit model imputation
- Deterministic or Stochastic imputation
- (using random
numbers) - Univariate or Multivariate imputation
- Single or Multiple imputation
-
45Mean Imputation (Unconditional) ?????????
- Available cases for each mean
- MCAR???????????
?????????????????
46Mean Imputation (Conditional) ???????????
- Conditional on observed values in case
- Regress Xp on (X1
,X2,,Xp-1) - Impute predictions
??????,????,??, ?????????????????????????
47Mean Imputation??(????)??????
- Marginal distributions and associations distorted
( no residual variance) - Conditional better than unconditional
- Standard errors from filled-in data too small
- no residual variance
- n actually smaller
- uncertainty of prediction
Stochastic Imputation
48Cold deck?? Hot deck?(??????)
- Cold deck ?
- ?????????????????????
- Hot deck ?
- ?????????????(???)????????
- ???????????????,?????????
- ??????????????
-
- Exact matching v.s. Random matching
- ???????????
-
49Deterministic imputation(??????)
- Hot deck and Cold deck methods
- Overall (unconditional) mean
- Group (adjusted cell) mean
- Predictive mean by regression model
More accuracy, but distort the distribution
The distribution becomes too peaked and the
variance is underestimated
50Stochastic imputation?????
- ????????????
- ?????????(????????????)
- EX.
- Add a random residual from N ( m ,s 2 )
- Stochastic Predictive mean
imputation - ???????????????
- Impute the value of a randomly selected case
- Random hot deck method
-
-
51Stochastic Predictive Mean Imputation
(Imputation from a Distribution)
- Add a random residual from N ( m ,s 2 ) to the
predictive mean - Impute
c.f. Predictive Mean Matching (more robust to
misspecification) Predictive Mean Stratification
Random Hot Deck
52?????????
- ??(1??????)???????????????????????
- ?????????????????
- Imputation??????????????????,
- single value stochastic imputation???
- multiple imputation
53Imputation(???)
- ??????????????
- ????????????
- Marginal or Conditional imputation
- Explicit or Implicit model imputation
- Deterministic or Stochastic imputation
- (using random
numbers) - Univariate or Multivariate imputation
- Single or Multiple imputation
-
54Multiple Imputation
- ???(M)???????
- ????????? M ?? ????
- ???(M?????????????)?????1??????????????????????
55Multiple Imputation
Combined Estimator
Total variability
56MI???????????
?????????
Rubin Schenker (1986) JASA
57????????????
58MI?????????
- ????????????????????
- ???????????????????????????
- MI???????????????????
- MI???????????
- ?????????SE????
- ???????????????????????????