Title: Packer
1Interpretation of Observed Differences in the
Frequency of Events When the Number of Events is
Small
Milton Packer, M.D. Gayle and Paul Stoffel
Distinguished Chair in Cardiology Professor and
Director Center for Biostatistics and Clinical
Science University of Texas Southwestern Medical
School at Dallas US Food and Drug
Administration Friday, February 18, 2005 815 AM
2Question
How should we interpret differences in the
observed frequency of events in a clinical
trial when the number of events is small?
3Question
How should we interpret differences in the
observed frequency of events in a clinical
trial when the number of events is small? By
small, I mean that the number of events would
have provided lt 70 power to have detected a
true treatment difference assuming an effect
size similar to that generally encountered in
clinical research.
4How Should Such a Difference Between Treatment
Groups Be Interpreted?
Major Adverse Cardiovascular Event
Placebo
Drug
13
33
Hypothetical trial of 3000 patients (1500 in each
group)
5How Should Such a Difference Between Treatment
Groups Be Interpreted?
Major Adverse Cardiovascular Event
Placebo
Drug
13
33
RR2.63 95 CI (1.39, 5.00) P 0.002
6Remember
P values are most easy to interpret when they
refer to reproducibility of observed
differences in predefined primary endpoints in
trials adequately powered (gt 80-90 power) to
detect differences between treatments.
7Probability That Second Trial Would Find P lt 0.05
Effect if Second Trial Were Identical to First
Trial
P Value in First Trial
Probability of Plt0.05 in Second Trial
0.10 37 0.05 57 0.01
73 0.005 80 0.001 91
ONeill. Cont Clin Trial 199718550-6.
8What If . . .
The event was not the primary endpoint of
the study. The event was not precisely
defined before the start of the trial. The
trial was not adequately power to detect a
treatment difference.
9This Frequently Happens . . .
With primary endpoints in pilot trials. With
secondary endpoints. With subgroup
analyses. With other measures of efficacy.
With physiological measurements. With
assessments of safety.
10Things to Worry About When Analyzing Incidence of
Adverse Events
11Things to Worry About When Analyzing Incidence of
Adverse Events
1. There are hundreds of adverse events
(multiplicity of comparisons).
12Multiplicity of Comparisons
A typical large-scale clinical trial may
describe as many as 500 individual terms
describing adverse events. If P value were
calculated for each pairwise comparison, then
one would by chance alone expect ? 25
events (5) to have P 0.05 ? 5 events (1)
to have P 0.01
13Things to Worry About When Analyzing Incidence of
Adverse Events
1. There are hundreds of adverse events
(multiplicity of comparisons). 2. Adverse
events are spontaneous (nonadjudicated)
reports.
14Adverse Events Are Spontaneous (Nonadjudicated)
Reports
Adverse events are reported at the
discretion of the investigator and then
translated into standardized terms. There is
little uniformity as to how an event is
identified, defined or reported. Uncertainty
increased when event is in field remote from
investigators focus.
15Can We Fix This Problem by Blinded Post Hoc
Adjudication?
16Can We Fix This Problem by Blinded Post Hoc
Adjudication?
Rules guiding post hoc adjudication are
inevitably influenced by knowledge that a
treatment effect has been seen. Any bar set by
the post hoc process can magnify or dilute the
effect. Adjudication is generally not applied
to those without a reported event.
17Things to Worry About When Analyzing Incidence of
Adverse Events
1. There are hundreds of adverse events
(multiplicity of comparisons). 2. Adverse
events are spontaneous (nonadjudicated)
reports. 3. Analyses that depend on grouping of
events are subject to bias.
18Thrombotic Cardiovascular Events
Placebo
Drug
Myocardial infarction 5 20 Stroke 4 4 Sudden
death 5 6 Unstable angina 6 5 Pulmonary
embolism 3 4 Arterial embolism 0 1 Transient
ischemic attack 5 3 Venous thrombosis 4 6
19Thrombotic Cardiovascular Events
Placebo
Drug
Myocardial infarction 5 10 Stroke 4 8 Sudden
death 5 6 Unstable angina 6 5 Pulmonary
embolism 3 4 Arterial embolism 0 1 Transient
ischemic attack 5 3 Venous thrombosis 4 6
20Thrombotic Cardiovascular Events
Placebo
Drug
Myocardial infarction 5 4 Stroke 4 4 Sudden
death 5 9 Unstable angina 6 13 Pulmonary
embolism 3 5 Arterial embolism 0 1 Transient
ischemic attack 5 11 Venous thrombosis 4 6
21Thrombotic Cardiovascular Events
Placebo
Drug
Myocardial infarction 5 4 Stroke 4 4 Sudden
death 5 9 Unstable angina 6 13 Pulmonary
embolism 3 5 Arterial embolism 0 1 Transient
ischemic attack 5 11 Venous thrombosis 4 6
22Analyses That Depend on Grouping of Adverse Events
Best to develop uniform definition of a
group before classifying events. When the
process of developing a definition is started
after a concern has been raised, those creating
the definition have frequently already looked at
the data and know (subconsciously) what kind
of definition is needed to capture the events
of interest.
23Things to Worry About When Analyzing Incidence of
Adverse Events
1. There are hundreds of adverse events
(multiplicity of comparisons). 2. Adverse
events are spontaneous (nonadjudicated)
reports. 3. Analyses that depend on grouping of
events are subject to bias. 4. Small number of
events results in extremely imprecise estimates.
24With Small Number of Events, Lack of an Observed
Difference Does Not Rule Out Existence of True
Difference
Major Adverse Cardiovascular Event
Placebo
Drug
25
26
RR1.04 95 CI (0.60-1.79) P 0.89
25With Small Number of Events, the Finding of an
Observed Difference Does Not Prove Existence of
True Difference
Major Adverse Cardiovascular Event
Placebo
Drug
13
33
RR2.63 95 CI (1.39, 5.00) P 0.002
26Effect Size (? 95 CI) in Trials With 20-500
Events Assuming Lower Bound 1.00
20 events
50 events
100 events
Each trial has 2 treatment groups, each with
n1500
200 events
500 events
1.0
2.0
3.0
8.0
Hazard Ratio
27Trials Are Designed to Provide Precise Estimates
For Primary Endpoints
Start
Primary endpoint
Secondary endpoint
Specific adverse event
Very precise estimate
Very imprecise estimate
28What Is Wrong With Imprecise Estimates?
29What Is Wrong With Imprecise Estimates?
Imprecise estimates are fine if the intent is
to withhold judgment until more data are
collected to make the estimates more precise.
30What Is Wrong With Imprecise Estimates?
Imprecise estimates are fine if the intent is
to withhold judgment until more data are
collected to make the estimates more precise.
Imprecise estimates are problematic if the
intent is to stop and reach a conclusion.
31What Is Wrong With Imprecise Estimates?
Imprecise estimates are fine if the intent is
to withhold judgment until more data are
collected to make the estimates more precise.
Imprecise estimates are problematic if the
intent is to stop and reach a conclusion. When
calculated in the conventional manner, the 95
CIs (and the associated P value) of an estimate
have meaning primarily in the context of a
completed experiment.
32What Is Wrong With Imprecise Estimates?
The adverse event data generated in a typical
trial is not the result of a completed
experiment.
33What Is Wrong With Imprecise Estimates?
The adverse event data generated in a typical
trial is not the result of a completed
experiment. Viewed from the amount of data
needed for a precise estimate, the adverse
event data in a single study represents a
snapshot in an ongoing experiment to
characterize the safety of the drug.
34What Is Wrong With Imprecise Estimates?
The adverse event data generated in a typical
trial is not the result of a completed
experiment. Viewed from the amount of data
needed for a precise estimate, the adverse
event data in a single study represents a
snapshot in an ongoing experiment to
characterize the safety of the drug.
Therefore, performing an analysis of adverse
events data is akin to interim analysis of
primary endpoint data in an ongoing clinical
trial.
35Interim Monitoring in Group Sequential Trials
8.0
6.0
4.0
Treatment Difference (Z score)
2.0
0.0
-2.0
Information Time ( of Expected Events)
36Interim Monitoring in Group Sequential Trials
8.0
6.0
4.0
Z score
?0.05
2.0
0.0
-2.0
Information Time
37Interim Monitoring in Group Sequential Trials
8.0
6.0
4.0
Z score
?0.05
2.0
0.0
-2.0
Information Time
38Interim Monitoring in Group Sequential Trials
8.0
6.0
4.0
Z score
?0.05
2.0
0.0
-2.0
Information Time
39Interim Monitoring in Group Sequential Trials
8.0
?0.05
6.0
4.0
Z score
2.0
0.0
-2.0
Information Time
40Interim Monitoring in Group Sequential Trials
8.0
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
41Interim Monitoring in Group Sequential Trials
8.0
6.0
4.0
Z score
?0.05
2.0
0.0
-2.0
Information Time
42Interim Monitoring in Group Sequential Trials
8.0
?0.05
6.0
4.0
Z score
2.0
0.0
-2.0
Information Time
43Coronary Drug Project
5.0
4.0
3.0
Clofibrate better
2.0
Z score
1.0
0
-1.0
Placebo better
-2.0
-3.0
20
40
60
80
100
0
Months
44Coronary Drug Project
5.0
4.0
?0.05
3.0
Clofibrate better
2.0
Z score
1.0
0
-1.0
Placebo better
-2.0
-3.0
20
40
60
80
100
0
Months
45Interim Monitoring in Group Sequential Trials
?0.001
8.0
?0.01
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
46An Important Note
This problem is not the classic concern about
multiplicity of comparisons. This type
of problem exists even if there were only one
endpoint if the trial is underpowered. This
problem is related to the imprecision inherent
in estimates based on small numbers of events
an imprecision not adequately quantified by
conventional approaches to the calculation of
confidence intervals.
47Concept
Reaching conclusions from data derived in an
underpowered trial raises the same concerns as
reaching conclusions based on an underpowered
interim analysis in a definitive, adequately
powered trial.
48IMPRESS (Omapatrilat in Heart Failure)
Death or Hospitalization for Heart Failure
39 events, RR0.53 95CI (0.27-1.02) P 0.053
0.15
ACE inhibitor (n284)
0.10
P
r
o
p
o
r
t
i
o
n
w
i
t
h
e
v
e
n
t
0.05
Omapatrilat (n289)
0.00
0
30
60
90
120
150
180
210
240
D
a
y
s
f
r
o
m
r
a
n
d
o
m
i
z
a
t
i
o
n
Rouleau et al. Lancet 2000 356615-20.
49OVERTURE Trial
Death or Hospitalization for Heart Failure
1.0
0.8
0.6
Event Free Survival
0.4
1887 events, RR0.94 95CI (0.86-1.03) P 0.187
Omapatrilat (n2886)
ACE inhibitor (n2884)
0.2
0.0
0
3
6
9
12
15
18
21
24
Months
Packer et al. Circulation 2002 106920-6.
50Amlodipine in Heart Failure
All-Cause Mortality in Nonischemic Cardiomyopathy
Hazard
Log-rank
Placebo
Amlodipine
Ratio
P-Value
PRAISE-1
74/212
45/209
0.55
0.001
(0.37,0.79)
51Amlodipine in Heart Failure
All-Cause Mortality in Nonischemic Cardiomyopathy
Hazard
Log-rank
Placebo
Amlodipine
Ratio
P-Value
PRAISE-1
74/212
45/209
0.55
0.001
(0.37,0.79)
PRAISE-2
262/826
278/826
1.09
0.32
(0.92, 1.29)
52Experience With Vesnarinone and Losartan
N Engl J Med 1993329149-55. N Engl J Med
19983391810-6. Lancet 1997 349 747-57. Lancet
2000 355582-7.
53Definitive Trial Shows Reversal of Effect Seen in
Earlier Pilot Trial
N Engl J Med 1993329149-55. N Engl J Med
19983391810-6. Lancet 1997 349 747-57. Lancet
2000 355582-7.
54Magnesium in Myocardial Infarction
All-Cause Mortality
Hazard
Log-rank
Placebo
Magnesium
Ratio
P-Value
53/644
25/644
0.45
lt 0.001
Meta- analysis
(0.28,0.71)
LIMIT-2
90/1159
118/1157
0.74
0.04
(0.55,1.00)
55Magnesium in Myocardial Infarction
All-Cause Mortality
Hazard
Log-rank
Placebo
Magnesium
Ratio
P-Value
53/644
25/644
0.45
lt 0.001
Meta- analysis
(0.28,0.71)
LIMIT-2
90/1159
118/1157
0.74
0.04
(0.55,1.00)
ISIS-4
2103/29039
2216/29011
1.06
0.07
(1.00,1.12)
56Metoprolol XL in Heart Failure
Effect on CHF Hospitalizations
Hazard
Log-rank
Placebo
Metoprolol
Ratio
P-Value
RESOLVD
5/212
15/214
------
lt 0.05
(1.01,5.63)
57Metoprolol XL in Heart Failure
Effect on CHF Hospitalizations
Hazard
Log-rank
Placebo
Metoprolol
Ratio
P-Value
RESOLVD
5/212
15/214
------
lt 0.05
(1.01,5.63)
MERIT-HF
294/826
200/826
-----
lt 0.001
(-----, -----)
58What We Have Learned
To achieve statistical significance in an
underpowered analysis, the effect size must be
extreme and the estimate must be
imprecise. Yet, the more extreme the effects
and the more imprecise the estimates, the less
likely they will be reproduced in definitive
clinical trials.
59Things to Worry About When Analyzing Incidence of
Adverse Events
1. There are hundreds of adverse events
(multiplicity of comparisons). 2. Adverse
events are spontaneous (nonadjudicated)
reports. 3. Analyses that depend on grouping of
events are subject to bias. 4. Small number of
events results in extremely imprecise estimates.
60What To Do?
61What To Do?
The most important first step is to develop
an approach to analyzing data in trials with
small numbers of events which accurately
reflects the true imprecision of the treatment
effect estimate and its statistical
significance.
62How Should Such a Difference Between Treatment
Groups Be Interpreted?
Major Adverse Cardiovascular Event
Placebo
Drug
13
33
RR2.63 95 CI (1.39, 5.00) P 0.002
Assumes critical value for ?0.05 is Z1.96
63Interim Monitoring in Group Sequential Trials
8.0
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
64Interim Monitoring in Group Sequential Trials
8.0
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
65Interim Monitoring in Group Sequential Trials
8.0
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
66How Should Such a Difference Between Treatment
Groups Be Interpreted?
Major Adverse Cardiovascular Event
Placebo
Drug
13
33
RR2.63 95 CI (lt 0.8, gt 6.0) P gt 0.10
If assume critical value for ?0.05 is boundary Z
score (across range of effect sizes)
67Boundary-Adjusted Confidence Intervals
Appropriately describe the uncertainty inherent
in the analysis of small number of events,
markedly reducing the false positive error
rate. Yet, despite using a boundary-adjusted
confidence interval. adverse effects that are
known to be characteristic of specific drugs
remain highly significant. Do not provide a
way to interpret trends observed in imprecise
data.
68What Should We Do With Worrisome Trends in
Imprecise Data?
69What Should We Do With Worrisome Trends in
Imprecise Data?
Believe in observed differences that are
biologically plausible.
70An Indisputable Truth
Be wary of differences that are deemed real
based on biological plausibility, because
physicians can always be relied upon to propose a
biological mechanism to explain the validity of
an unexpected (and potentially preposterous)
finding that happens to have an interesting P
value.
71What Should We Do With Worrisome Trends in
Imprecise Data?
Believe in observed differences that are
biologically plausible. Look for confirmatory
evidence in other studies (avoid being
selective).
72Experience with Drug A
Major Adverse Cardiovascular Event
Placebo
Drug
Trial 1 13 33 Trial 2 10
15 Trial 3 5 8 Trial 4 2
0 Trial 5 8 14
73The Cumulative Meta-Analyses
8.0
6.0
4.0
Z score
2.0
0.0
-2.0
Information Time
74Boundaries for Cumulative Meta-Analyses
8.0
?0.001
?0.01
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
75Boundaries for Cumulative Meta-Analyses
8.0
?0.001
?0.01
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
76Boundaries for Mg-MI Meta-Analyses
Pogue Yusuf, Cont Clin Trials 199718580-93
8.0
?0.001
?0.01
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
77What Should We Do With Worrisome Trends in
Imprecise Data?
Believe in observed differences that are
biologically plausible. Look for confirmatory
evidence in other studies (avoid being
selective). Carry out definitive trial with
the adverse event as primary endpoint (powered
to detect meaningful treatment difference).
78Are Definitive Trials the Answer?
Sponsors pursue encouraging trends for
important endpoints. Most are
non- confirmatory. Sponsors should pursue
discouraging trends for important endpoints.
Most will be non-confirmatory. Definitive
trial can address ascertainment and
classification biases as well as concerns about
multiplicity of comparisons and imprecision of
data.
79Just Pause and Think
If you observed an increased frequency of a
serious adverse effect in a clinical trial, how
easy would you think it would be to carry out a
trial intended to definitively evaluate this
risk?
80Do We Need to Be So Certain When Evaluating
Safety Instead of Efficacy?
81Do We Need to Be So Certain When Evaluating
Safety Instead of Efficacy?
We are strict in reaching conclusions about
efficacy because saying that there is a
benefit when there is none means millions will
be treated unnecessarily and subject to side
effects and costs. Some might advocate being
less strict in reaching conclusions about
safety, but saying that there is an adverse
effect when there is none means millions will be
deprived of an effective treatment.
82Conclusions
The findings of controlled clinical trials are
most easily interpreted when they represent the
principal intent of the study. A
non-principal finding is subject to many
interpretative difficulties, including
ascertainment biases and inflated false
positive rates due to the multiplicity of
comparisons and imprecision of estimates
inherent in analysis of small numbers. The FDA,
industry and academia remain in a quandary as
to how to respond in a responsible fashion to
observed differences in reported frequencies of
adverse events.
83A Final Personal Note
My presentation should not be construed as
favoring any particular side in the current
debate. It is my view that regardless of
ones position it is critical to understand
the limitations of what we know and to resist
the temptation to reach conclusions before we
are justified to do so. Only by recognizing
our ignorance will we be able to take the first
step towards developing a rational approach
that is in interest of all patients.