Title: Biostatistics in Practice
1Biostatistics in Practice
Session 6 Data and Analyses Too Little or Too
Much
Peter D. Christenson Biostatistician http//rese
arch.LABioMed.org/Biostat
2Too Little or Too Much Data
- Too Little
- Too few subjects study not sufficiently
powered (Session 4) - A biasing characteristic not measured
attributability of effects questionable
(Session 5) - Subjects do not complete study, or do not
comply, e.g., take all doses (This session) - Too Much
- All subjects, not a sample (This session)
- Irrelevant detectability (This session)
3Too Little or Too Much Analyses
- Too Few Miss an Effect
- Too Many Spurious Results
- Numerous analyses due to
-
- Multiple possible outcomes.
-
- Ongoing analyses as more subjects accrue.
- Many potential subgroups.
4Non-Completing or Non-Complying Subjects
5All Study Subjects or Appropriate Subset
What is the most relevant group of studied
subjects all randomized, or mostly compliant, or
completed study, or ?
6Criteria for Appropriate Subset
Study Goal Scientific effect? Societal impact?
Primarily Compliance
Potential Biased Conclusions Why not
completed? Study arms equivalent?
Primarily Dropout
7Possible Study Populations
Per-Protocol Subjects Had all measurements,
visits, doses, etc. Modified relaxations ,
e.g., 85 of doses. Emphasis on scientific effect.
Intention-to-Treat Subjects Everyone who was
randomized. Modified slight relaxations, e.g.,
1 dose. Emphasis on non-biased policy
conclusion.
8Possible Bias Using Only Completers
Comparison cured, placebo vs. treated. Many
more placebo subjects are not curing and go
elsewhere do not complete study. Cure rate is
biased upward in placebo completers. Conclude
treatment not as good as it really is. Other
scenarios?
9Intention-to-Treat (ITT)
ITT specifies the population it includes
non-completers. Still need to define outcomes for
non-completers, i.e., impute values. Example
from last slide Typical to define non-completers
as not cured.
10ITT Two Ways to Impute Unknown Values
Observations
LOCF Ignore Presumed Progression
0
Change from Baseline
Individual Subjects
Baseline
Intermediate Visit
Final Visit
Ranks
LRCF Maintain Expected Relative Progression
0
Change from Baseline
Intermediate Visit
Final Visit
Baseline
11Too Much Data
12All Possible Data, No Sample
Too much data to need probabilistic statements
already have the whole truth. Not always as
obvious as it sounds. Examples EMT records, some
chart reviews site-specific, not
samples. Confidence intervals usually
irrelevant. Reference ranges, some non -
generalizable comparisons may be valid.
13Irrelevant (?) Detectability with Large Study
Significant differences (plt0.05) in s between
placebo and treatment groups N/Group
Difference Treated to Cure 1 100
50 vs. 63.7 7 1000
50 vs. 54.4 23 5000
50 vs. 52.0 50 10000 50
vs. 51.4 71 50000 50 vs.
50.6 167
NNT Number Needed to Treat 100/?
14Too Little or Too Much Analyses
15Too Little or Too Much Analyses
Multiple Outcomes Subgroups Ongoing
effects Exploring vs. Proving
16Multiple Outcomes
- Balance Between Missing an Effect and Spurious
Results - Food Additives and Hyperactivity Study
-
- Uses composite score.
-
- Many other indicators of hyperactivity.
17Multiple Outcomes
Parent ADHD
10 Items
GHA Global Hyperactivity Aggregate
Teacher ADHD
10 Items
Class ADHD
12 Items
Conner
4 Items
Could perform 10 10 12 4 36 item
analyses.
18(No Transcript)
19Multiple Subgroup Analyses Example
Editorial
pp. 1667-69
20Multiple Subgroup Analyses Example
Comparing Two Treatments in 25 Subgroups Overall
21Multiple Subgroup Analyses
Lagakos NEJM 354(16)1667-1669.
False Positive Conclusions
72 chance of claiming at least one false
effect with 25 comparisons
Next Slide
22A Correction for Multiple Analyses
No Correction If using plt0.05, then Pcorrect
neg conclusion 0.95. If 25 comparisons are
independent, Pno false pos Pall correct neg
(1-0.05)25 (0.95)25 0.28. So, Pat least 1
false pos 1 - 0.28 0.72. Bonferroni
Correction To maintain Pno false pos in k
tests 0.95 (1-p)k, need to use p 1 -
(0.95)1/k 0.05/k So, use plt0.05/k to maintain
lt5 overall false positive rate.
23Accounting for Multiple Analyses
- Some formal corrections built-in to p-values
- Bonferroni general purpose
- Tukey for pairs of group means, gt2 groups
- Dunnett for means of 1 control group vs.
each of 2 treatment groups
- Formal corrections not necessary
- Transparency of what was done is most important.
- Should be aware yourself of number of analyses
and report it with any conclusions.
24Reporting Multiple Analyses
Clopidogrel paper 4 slides back No p-values or
probabilistic conclusions for 25 subgroups, and
Another papers transparency
Cohan, Crit Care Med 33(10)2358-2366.
25Multiple Mid-Study Analyses
Should effects be monitored as more and more
subjects complete?
- Some mid-study analyses
- Interim analyses
- Study size re-evaluation
- Feasibility analyses
26Mid-Study Analyses
Too many analyses
Effect
0
Wrong early conclusion
Time ?
Number of Subjects Enrolled
Need to monitor, but also account for many
analyses
27Mid-Study Analyses
- Mid-study comparisons should not be made before
study completion unless planned for (interim
analyses). Early comparisons are unstable, and
can invalidate final comparisons. - Interim analyses are planned comparisons at
specific times, usually by an unmasked advisory
board. They allow stopping the study early due to
very dramatic effects, and final comparisons, if
study continues, are adjusted to validly account
for peeking.
Continued
28Mid-Study Analyses
- Mid-study reassessment of study size is advised
for long studies. Only standard deviations to
date, not effects themselves, are used to assess
original design assumptions. - Feasibility analysis
- may use the assessment noted above to decide
whether to continue the study. - may measure effects, like interim analyses, by
unmasked advisors, to project ahead on the
likelihood of finding effects at the planned end
of study.
Continued
29Mid-Study Analyses
Examples Studies at Harbor Randomized not
masked data available to PI. Compared treatment
groups repeatedly, as more subjects were enrolled.
Study 1 Groups do not differ plan to add more
subjects. Consequence ? final p-value not valid
probability requires no prior knowledge of
effect. Study 2 Groups differ significantly
plan to stop study. Consequence ? use of this
p-value not valid the probability requires
incorporating later comparison.
30ConclusionsBad Science That Seems So Good
- Re-examining data, or using many outcomes,
seeming to be due diligence. - Adding subjects to a study that is showing
marginal effects stopping early due to strong
results. - Looking for effects in many subgroups.
- Actually bad? Could be negligent NOT to do these,
but need to account for doing them.
31Course Over? Already?
Nils Simonson, in Furberg Furberg, Evaluating
Clinical Research