Statistical Guidelines for Psychosomatic Medicine: A brief commentary


Statistical Guidelines for Psychosomatic
MedicineA brief commentary
Reporting Results
Lay out analytic plan Explicitly tie analysis to
hypothesis Include the exact model Discuss
assumptions Discuss power Correction for
multiplicity-if not, why not? Tables Report
exact p-values Round, round and round some
more Mention scale in regression tables Model
fit, if relevant Graphics Avoid ducks No 3-d
unless data are 3-d Box or dot plots preferred
to bar charts
One-sided (Directional) Hypothesis Tests
  • Controversial
  • Typically preferred because it covers unexpected
  • Argument is that one-sided can be used if
    unexpected result or no difference would not lead
    to different action or suggest risk
  • Need to justify deviation
  • Whats wrong with higher p-value for new ideas?

Artificial Categorization of Variables
  • Long literature outlining problems with this
  • In population, by definition reduces power
  • In samples, can get a lucky cut
  • Does NOT improve reliability
  • Doesnt make measurement sense
  • Hides non-linear relations
  • Can yield spurious results on multivariable

Type I error rates for the relation between x2
and y after dichotomizing two continuous
predictors. Maxwell and Delaney (21) calculated
the effect of dichotomizing two continuous
predictors as a function of the correlation
between them. The true model is y .5x1 0x2
where all variables are continuous. If x1 and x2
are dichotomized, the error rate for the relation
between x2 and y increases as the correlation
between x1 and x2 increases.
Correlation between x1 and x2 Correlation between x1 and x2 Correlation between x1 and x2 Correlation between x1 and x2
N 0 .3 .5 .7
50 .05 .06 .08 .10
100 .05 .08 .12 .18
200 .05 .10 .19 .31
Artificial Categorization of Variables
  • If true category, use something like clustering,
    not median splits
  • If expect nonlinearity, use polynomials or
    splines (splitting into quartiles, etc., is
    acceptable, but increases standard errors
  • Clinical cutpoints should not figure into
    statistical modeling until the model is already
    developed with ALL the data

author Chatfield, C.,   title  Model
uncertainty, data mining and statistical
inference (with discussion),   journal  JRSSA,
  year     1995,   volume 158,   pages  
419-466,   annote               --bias by
selecting model because it fits the data well
bias in standard errors P. 420 ... need for a
better balance in the literature and in
statistical teaching between techniques and
problem solving strategies.  P. 421 It is well
known' to be logically unsound and practically
misleading' (Zhang, 1992) to make inferences as
if a model is known to be true when it has, in
fact, been selected from the same data to be used
for estimation purposes.  However, although
statisticians may admit this privately (Breiman
(1992) calls it a quiet scandal'), they (we)
continue to ignore the difficulties because it is
not clear what else could or should be done. P.
421 Estimation errors for regression
coefficients are usually smaller than errors from
failing to take into account model specification.
P. 422 Statisticians must stop pretending that
model uncertainty does not exist and begin to
find ways of coping with it.  P. 426 It is
indeed strange that we often admit model
uncertainty by searching for a best model but
then ignore this uncertainty by making inferences
and predictions as if certain that the best
fitting model is actually true.  
P. 427 The analyst needs to assess the model
selection process and not just the best fitting
model.  P. 432 The use of subset selection
methods is well known to introduce alarming
biases. P. 433 ... the AIC can be highly biased
in data-driven model selection situations.  P.
434 Prediction intervals will generally be too
narrow. In the discussion, Jamal R. M. Ameen
states that a model should be (a) satisfactory in
performance relative to the stated objective, (b)
logically sound, (c) representative, (d)
questionable and subject to on--line
interrogation, (e) able to accommodate external
or expert information and (f) able to convey
Automated Stepwise Selection Procedures
  • Can lead to wildly optimistic models
  • Doesnt deal well with correlated predictors
  • Extremely poor replication unless sample sizes
    are huge
  • Best subset has similar problems

SOME of the problems with stepwise variable
1. It yields R-squared values that are badly
biased high 2. The F and chi-squared tests
quoted next to each variable on the printout do
not have the claimed distribution 3. The method
yields confidence intervals for effects and
predicted values that are falsely narrow 4. It
yields P-values that do not have the proper
meaning and the proper correction for them is a
very difficult problem 5. It gives biased
regression coefficients that need shrinkage (the
coefficients for remaining variables are too
large). 6. It has severe problems in the
presence of collinearity 7. It is based on
methods (e.g. F tests for nested models) that
were intended to be used to test pre-specified
hypotheses. 8. Increasing the sample size
doesn't help very much 9. It allows us to not
think about the problem
Simulation results Number of Noise Variables
Sample Size
20 candidate predictors 100 samples
Automated Stepwise Selection Procedures
  • If confronted with too many predictors
  • Use theory to delete
  • Combine predictors using clustering or tree
    methods before modeling without looking at Y
  • Use approaches that exploit correlated variables,
    MANOVA, SEM, PLS, Principal Components Regression
  • If you MUST use stepwise
  • Backward preferable
  • Set p to remove high
  • MUST cross-validate

Variable Selection in Multivariable Models
  • Fit and p-values for regressions are based on
    assumption of pre-specified model
  • Univariate prescreening requires correction to
    adjust for process
  • P-values should not be sole guidenot a
    hypothesis test!
  • raise model df to reflect all variables searched
  • Cross validation to show level of optimism
  • Use pre-shrinkage
  • Pay attention to effective sample size
  • Too many predictors leads to poor power and
    instability of estimates

Simulation results number of events/predictor
From Peduzzi et al. J Clin Epidemiol. 1996
