Title: PROJECTS ARE DUE
1PROJECTS ARE DUE
- By midnight, Friday, May 19th
- Electronic submission only to tlouis_at_jhsph.edu
- Please name the file
- myname-project.filetype
- or
- name1_name2-project.filetype
2Efficiency-Robustness Trade-offs
- First, we consider alternatives to the Gaussian
distribution for random effects - Then, we move to issues of weighting, starting
with some formalism - Then, move to an example of informative sample
size - And, finally give a basic example that has broad
implications of choosing among weighting schemes
3Alternatives to the Gaussian Distribution for
Random Effects
4The t-distribution
- Broader tails than the Gaussian
- So, shrinks less for deviant Y-values
- The t-prior allows outlying parameters and
- so a deviant Y is not so indicative of a
- large, level 1 residual
5Creating a t-distribution
- Assume a Gaussian sampling distribution,
- Using the sample standard deviation produces the
t-distribution - Z is t with a large df
- t3 is the most different from Z for
t-distributions with - a finite variance
6(No Transcript)
7With a t-prior, B is B(Y), increasing with Y -
?
8(1-B) ½ 0.50
Z is distance from the center
9(1- B) 2/3 0.666
Z is distance from the center
10Estimated Gaussian Fully Non-parametric priors
for the USRDS data
11USRDS estimated Priors
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17Informative Sample Size(Similar to informative
Censoring)See Louis et al. SMMR 2006
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25Choosing among weighting schemesOptimality
versus goal achievement
26Inferential Context
- Question
- What is the average length of in-hospital stay?
- A more specific question
- What is the average length of stay for
- Several hospitals of interest?
- Maryland hospitals?
- All hospitals?
- .......
27Data Collection Goal
- Data gathered from 5 hospitals
- Hospitals are selected by some method
- nhosp patient records are sampled at random
- Length of stay (LOS) is recorded
- Goal is to Estimate the population mean
28Procedure
- Compute hospital-specific means
- Average them
- For simplicity assume that the population
variance is known and the same for all hospitals - How should we compute the average?
- Need a goal and then a good/best way to
- combine information
29DATA
Hospital sampled nhosp Hospital size of Total size 100phosp Mean LOS Within-hospital variance
1 30 100 10 25 s2/30
2 60 150 15 35 s2/60
3 15 200 20 15 s2/15
4 30 250 25 40 s2/30
5 15 300 30 10 s2/15
Total 150 1000 100 ? ?
30Weighted averages Variances (Variances are
based on FE not RE)
Each weighted average is mean
Weighting approach Weights x100 Mean Variance Ratio 100(Var/min)
Equal 20 20 20 20 20 25.0 130
Proportional to Reciprocal variance 20 40 10 20 10 29.5 100
Population phosp 10 15 20 25 30 23.8 172
Reciprocal variance weights minimize variance Is
that our goal?
31There are many weighting choices and weighting
goals
- Minimize variance by using reciprocal variance
weights - Minimize bias for the population mean by using
population weights (survey weights) - Use policy weights (e.g., equal weighting)
- Use my weights, ...
32General Setting
- When the model is correct
- All weighting schemes estimate the same
quantities - same value for slopes in a multiple regression
- So, it is clearly best to minimize variance by
using - reciprocal variance weights
- When the model is incorrect
- Must consider analysis goals and use appropriate
weights - Of course, it is generally true that our model
is not correct!
33Weights and their properties
- But if m1 m2 m3 m4 m5 m
- then all weighted averages estimate the
population mean ? ?k?k -
- So, its best to minimize the variance
- But, if the hospital-specific mk are not all
equal, then - Each set of weights estimates a different target
- Minimizing variance might not be best
- For an unbiased estimate of set wk pk
34The variance-bias tradeoff
- General idea
- Trade-off variance bias to produce low
- Mean Squared Error (MSE)
- MSE Expected(Estimate - True)2
- Variance (Bias)2
- Bias is unknown unless we know the mk
- (the true hospital-specific mean LOS)
- But, we can study MSE (m, w, p)
- In practice, make some guesses and do
sensitivity analyses
35Variance, Bias and MSE as a function of (the
ms, w, p)
- Consider a true value for the variation of the
between hospital means (? is the overall mean) - T ?(?k - ?)2
- Study BIAS, Variance, MSE for weights that
optimize MSE for an assumed value (A) of the
between-hospital variance - So, when A T, MSE is minimized by this
optimizer - In the following plot, A is converted to a
fraction of the total variance A/(A
within-hospital) - Fraction 0 ? minimize variance
- Fraction 1 ? minimize bias
36The bias-variance trade-offX-axis is assumed
variance fractionY is performance computed under
the true fraction
Assumed
?k
37Summary
- Much of statistics depends on weighted averages
- Weights should depend on assumptions and goals
- If you trust your (regression) model,
- Then, minimize the variance, using optimal
weights - This generalizes the equal m case
- If you worry about model validity (bias for mp),
- You can buy full insurance by using population
weights - But, you pay in variance (efficiency)
- So, consider purchasing only the insurance you
need by - using compromise weights