Packer - PowerPoint PPT Presentation

1 / 83
About This Presentation
Title:

Packer

Description:

– PowerPoint PPT presentation

Number of Views:283
Avg rating:3.0/5.0
Slides: 84
Provided by: milton
Category:
Tags: packer

less

Transcript and Presenter's Notes

Title: Packer


1
Interpretation of Observed Differences in the
Frequency of Events When the Number of Events is
Small
Milton Packer, M.D. Gayle and Paul Stoffel
Distinguished Chair in Cardiology Professor and
Director Center for Biostatistics and Clinical
Science University of Texas Southwestern Medical
School at Dallas US Food and Drug
Administration Friday, February 18, 2005 815 AM
2
Question
How should we interpret differences in the
observed frequency of events in a clinical
trial when the number of events is small?
3
Question
How should we interpret differences in the
observed frequency of events in a clinical
trial when the number of events is small? By
small, I mean that the number of events would
have provided lt 70 power to have detected a
true treatment difference assuming an effect
size similar to that generally encountered in
clinical research.
4
How Should Such a Difference Between Treatment
Groups Be Interpreted?
Major Adverse Cardiovascular Event
Placebo
Drug
13
33
Hypothetical trial of 3000 patients (1500 in each
group)
5
How Should Such a Difference Between Treatment
Groups Be Interpreted?
Major Adverse Cardiovascular Event
Placebo
Drug
13
33
RR2.63 95 CI (1.39, 5.00) P 0.002
6
Remember
P values are most easy to interpret when they
refer to reproducibility of observed
differences in predefined primary endpoints in
trials adequately powered (gt 80-90 power) to
detect differences between treatments.
7
Probability That Second Trial Would Find P lt 0.05
Effect if Second Trial Were Identical to First
Trial
P Value in First Trial
Probability of Plt0.05 in Second Trial
0.10 37 0.05 57 0.01
73 0.005 80 0.001 91
ONeill. Cont Clin Trial 199718550-6.
8
What If . . .
The event was not the primary endpoint of
the study. The event was not precisely
defined before the start of the trial. The
trial was not adequately power to detect a
treatment difference.
9
This Frequently Happens . . .
With primary endpoints in pilot trials. With
secondary endpoints. With subgroup
analyses. With other measures of efficacy.
With physiological measurements. With
assessments of safety.
10
Things to Worry About When Analyzing Incidence of
Adverse Events
11
Things to Worry About When Analyzing Incidence of
Adverse Events
1. There are hundreds of adverse events
(multiplicity of comparisons).
12
Multiplicity of Comparisons
A typical large-scale clinical trial may
describe as many as 500 individual terms
describing adverse events. If P value were
calculated for each pairwise comparison, then
one would by chance alone expect ? 25
events (5) to have P 0.05 ? 5 events (1)
to have P 0.01
13
Things to Worry About When Analyzing Incidence of
Adverse Events
1. There are hundreds of adverse events
(multiplicity of comparisons). 2. Adverse
events are spontaneous (nonadjudicated)
reports.
14
Adverse Events Are Spontaneous (Nonadjudicated)
Reports
Adverse events are reported at the
discretion of the investigator and then
translated into standardized terms. There is
little uniformity as to how an event is
identified, defined or reported. Uncertainty
increased when event is in field remote from
investigators focus.
15
Can We Fix This Problem by Blinded Post Hoc
Adjudication?
16
Can We Fix This Problem by Blinded Post Hoc
Adjudication?
Rules guiding post hoc adjudication are
inevitably influenced by knowledge that a
treatment effect has been seen. Any bar set by
the post hoc process can magnify or dilute the
effect. Adjudication is generally not applied
to those without a reported event.
17
Things to Worry About When Analyzing Incidence of
Adverse Events
1. There are hundreds of adverse events
(multiplicity of comparisons). 2. Adverse
events are spontaneous (nonadjudicated)
reports. 3. Analyses that depend on grouping of
events are subject to bias.
18
Thrombotic Cardiovascular Events
Placebo
Drug
Myocardial infarction 5 20 Stroke 4 4 Sudden
death 5 6 Unstable angina 6 5 Pulmonary
embolism 3 4 Arterial embolism 0 1 Transient
ischemic attack 5 3 Venous thrombosis 4 6
19
Thrombotic Cardiovascular Events
Placebo
Drug
Myocardial infarction 5 10 Stroke 4 8 Sudden
death 5 6 Unstable angina 6 5 Pulmonary
embolism 3 4 Arterial embolism 0 1 Transient
ischemic attack 5 3 Venous thrombosis 4 6
20
Thrombotic Cardiovascular Events
Placebo
Drug
Myocardial infarction 5 4 Stroke 4 4 Sudden
death 5 9 Unstable angina 6 13 Pulmonary
embolism 3 5 Arterial embolism 0 1 Transient
ischemic attack 5 11 Venous thrombosis 4 6
21
Thrombotic Cardiovascular Events
Placebo
Drug
Myocardial infarction 5 4 Stroke 4 4 Sudden
death 5 9 Unstable angina 6 13 Pulmonary
embolism 3 5 Arterial embolism 0 1 Transient
ischemic attack 5 11 Venous thrombosis 4 6
22
Analyses That Depend on Grouping of Adverse Events
Best to develop uniform definition of a
group before classifying events. When the
process of developing a definition is started
after a concern has been raised, those creating
the definition have frequently already looked at
the data and know (subconsciously) what kind
of definition is needed to capture the events
of interest.
23
Things to Worry About When Analyzing Incidence of
Adverse Events
1. There are hundreds of adverse events
(multiplicity of comparisons). 2. Adverse
events are spontaneous (nonadjudicated)
reports. 3. Analyses that depend on grouping of
events are subject to bias. 4. Small number of
events results in extremely imprecise estimates.
24
With Small Number of Events, Lack of an Observed
Difference Does Not Rule Out Existence of True
Difference
Major Adverse Cardiovascular Event
Placebo
Drug
25
26
RR1.04 95 CI (0.60-1.79) P 0.89
25
With Small Number of Events, the Finding of an
Observed Difference Does Not Prove Existence of
True Difference
Major Adverse Cardiovascular Event
Placebo
Drug
13
33
RR2.63 95 CI (1.39, 5.00) P 0.002
26
Effect Size (? 95 CI) in Trials With 20-500
Events Assuming Lower Bound 1.00
20 events
50 events
100 events
Each trial has 2 treatment groups, each with
n1500
200 events
500 events
1.0
2.0
3.0
8.0
Hazard Ratio
27
Trials Are Designed to Provide Precise Estimates
For Primary Endpoints
Start
Primary endpoint
Secondary endpoint
Specific adverse event
Very precise estimate
Very imprecise estimate
28
What Is Wrong With Imprecise Estimates?
29
What Is Wrong With Imprecise Estimates?
Imprecise estimates are fine if the intent is
to withhold judgment until more data are
collected to make the estimates more precise.
30
What Is Wrong With Imprecise Estimates?
Imprecise estimates are fine if the intent is
to withhold judgment until more data are
collected to make the estimates more precise.
Imprecise estimates are problematic if the
intent is to stop and reach a conclusion.
31
What Is Wrong With Imprecise Estimates?
Imprecise estimates are fine if the intent is
to withhold judgment until more data are
collected to make the estimates more precise.
Imprecise estimates are problematic if the
intent is to stop and reach a conclusion. When
calculated in the conventional manner, the 95
CIs (and the associated P value) of an estimate
have meaning primarily in the context of a
completed experiment.
32
What Is Wrong With Imprecise Estimates?
The adverse event data generated in a typical
trial is not the result of a completed
experiment.
33
What Is Wrong With Imprecise Estimates?
The adverse event data generated in a typical
trial is not the result of a completed
experiment. Viewed from the amount of data
needed for a precise estimate, the adverse
event data in a single study represents a
snapshot in an ongoing experiment to
characterize the safety of the drug.
34
What Is Wrong With Imprecise Estimates?
The adverse event data generated in a typical
trial is not the result of a completed
experiment. Viewed from the amount of data
needed for a precise estimate, the adverse
event data in a single study represents a
snapshot in an ongoing experiment to
characterize the safety of the drug.
Therefore, performing an analysis of adverse
events data is akin to interim analysis of
primary endpoint data in an ongoing clinical
trial.
35
Interim Monitoring in Group Sequential Trials
8.0
6.0
4.0
Treatment Difference (Z score)
2.0
0.0
-2.0
Information Time ( of Expected Events)
36
Interim Monitoring in Group Sequential Trials
8.0
6.0
4.0
Z score
?0.05
2.0
0.0
-2.0
Information Time
37
Interim Monitoring in Group Sequential Trials
8.0
6.0
4.0
Z score
?0.05
2.0
0.0
-2.0
Information Time
38
Interim Monitoring in Group Sequential Trials
8.0
6.0
4.0
Z score
?0.05
2.0
0.0
-2.0
Information Time
39
Interim Monitoring in Group Sequential Trials
8.0
?0.05
6.0
4.0
Z score
2.0
0.0
-2.0
Information Time
40
Interim Monitoring in Group Sequential Trials
8.0
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
41
Interim Monitoring in Group Sequential Trials
8.0
6.0
4.0
Z score
?0.05
2.0
0.0
-2.0
Information Time
42
Interim Monitoring in Group Sequential Trials
8.0
?0.05
6.0
4.0
Z score
2.0
0.0
-2.0
Information Time
43
Coronary Drug Project
5.0
4.0
3.0
Clofibrate better
2.0
Z score
1.0
0
-1.0
Placebo better
-2.0
-3.0
20
40
60
80
100
0
Months
44
Coronary Drug Project
5.0
4.0
?0.05
3.0
Clofibrate better
2.0
Z score
1.0
0
-1.0
Placebo better
-2.0
-3.0
20
40
60
80
100
0
Months
45
Interim Monitoring in Group Sequential Trials
?0.001
8.0
?0.01
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
46
An Important Note
This problem is not the classic concern about
multiplicity of comparisons. This type
of problem exists even if there were only one
endpoint if the trial is underpowered. This
problem is related to the imprecision inherent
in estimates based on small numbers of events
an imprecision not adequately quantified by
conventional approaches to the calculation of
confidence intervals.
47
Concept
Reaching conclusions from data derived in an
underpowered trial raises the same concerns as
reaching conclusions based on an underpowered
interim analysis in a definitive, adequately
powered trial.
48
IMPRESS (Omapatrilat in Heart Failure)
Death or Hospitalization for Heart Failure
39 events, RR0.53 95CI (0.27-1.02) P 0.053
0.15
ACE inhibitor (n284)
0.10
P
r
o
p
o
r
t
i
o
n
w
i
t
h

e
v
e
n
t
0.05
Omapatrilat (n289)
0.00
0
30
60
90
120
150
180
210
240
D
a
y
s

f
r
o
m

r
a
n
d
o
m
i
z
a
t
i
o
n
Rouleau et al. Lancet 2000 356615-20.
49
OVERTURE Trial
Death or Hospitalization for Heart Failure
1.0
0.8
0.6
Event Free Survival

0.4
1887 events, RR0.94 95CI (0.86-1.03) P 0.187
Omapatrilat (n2886)
ACE inhibitor (n2884)
0.2
0.0
0
3
6
9
12
15
18
21
24
Months
Packer et al. Circulation 2002 106920-6.
50
Amlodipine in Heart Failure
All-Cause Mortality in Nonischemic Cardiomyopathy

Hazard
Log-rank
Placebo
Amlodipine
Ratio
P-Value
PRAISE-1
74/212
45/209
0.55
0.001
(0.37,0.79)
51
Amlodipine in Heart Failure
All-Cause Mortality in Nonischemic Cardiomyopathy

Hazard
Log-rank
Placebo
Amlodipine
Ratio
P-Value
PRAISE-1
74/212
45/209
0.55
0.001
(0.37,0.79)
PRAISE-2
262/826
278/826
1.09
0.32
(0.92, 1.29)
52
Experience With Vesnarinone and Losartan
N Engl J Med 1993329149-55. N Engl J Med
19983391810-6. Lancet 1997 349 747-57. Lancet
2000 355582-7.
53
Definitive Trial Shows Reversal of Effect Seen in
Earlier Pilot Trial
N Engl J Med 1993329149-55. N Engl J Med
19983391810-6. Lancet 1997 349 747-57. Lancet
2000 355582-7.
54
Magnesium in Myocardial Infarction
All-Cause Mortality
Hazard
Log-rank
Placebo
Magnesium
Ratio
P-Value
53/644
25/644
0.45
lt 0.001
Meta- analysis
(0.28,0.71)
LIMIT-2
90/1159
118/1157
0.74
0.04
(0.55,1.00)
55
Magnesium in Myocardial Infarction
All-Cause Mortality
Hazard
Log-rank
Placebo
Magnesium
Ratio
P-Value
53/644
25/644
0.45
lt 0.001
Meta- analysis
(0.28,0.71)
LIMIT-2
90/1159
118/1157
0.74
0.04
(0.55,1.00)
ISIS-4
2103/29039
2216/29011
1.06
0.07
(1.00,1.12)
56
Metoprolol XL in Heart Failure
Effect on CHF Hospitalizations

Hazard
Log-rank
Placebo
Metoprolol
Ratio
P-Value
RESOLVD
5/212
15/214
------
lt 0.05
(1.01,5.63)
57
Metoprolol XL in Heart Failure
Effect on CHF Hospitalizations

Hazard
Log-rank
Placebo
Metoprolol
Ratio
P-Value
RESOLVD
5/212
15/214
------
lt 0.05
(1.01,5.63)
MERIT-HF
294/826
200/826
-----
lt 0.001
(-----, -----)
58
What We Have Learned
To achieve statistical significance in an
underpowered analysis, the effect size must be
extreme and the estimate must be
imprecise. Yet, the more extreme the effects
and the more imprecise the estimates, the less
likely they will be reproduced in definitive
clinical trials.
59
Things to Worry About When Analyzing Incidence of
Adverse Events
1. There are hundreds of adverse events
(multiplicity of comparisons). 2. Adverse
events are spontaneous (nonadjudicated)
reports. 3. Analyses that depend on grouping of
events are subject to bias. 4. Small number of
events results in extremely imprecise estimates.
60
What To Do?
61
What To Do?
The most important first step is to develop
an approach to analyzing data in trials with
small numbers of events which accurately
reflects the true imprecision of the treatment
effect estimate and its statistical
significance.
62
How Should Such a Difference Between Treatment
Groups Be Interpreted?
Major Adverse Cardiovascular Event
Placebo
Drug
13
33
RR2.63 95 CI (1.39, 5.00) P 0.002
Assumes critical value for ?0.05 is Z1.96
63
Interim Monitoring in Group Sequential Trials
8.0
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
64
Interim Monitoring in Group Sequential Trials
8.0
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
65
Interim Monitoring in Group Sequential Trials
8.0
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
66
How Should Such a Difference Between Treatment
Groups Be Interpreted?
Major Adverse Cardiovascular Event
Placebo
Drug
13
33
RR2.63 95 CI (lt 0.8, gt 6.0) P gt 0.10
If assume critical value for ?0.05 is boundary Z
score (across range of effect sizes)
67
Boundary-Adjusted Confidence Intervals
Appropriately describe the uncertainty inherent
in the analysis of small number of events,
markedly reducing the false positive error
rate. Yet, despite using a boundary-adjusted
confidence interval. adverse effects that are
known to be characteristic of specific drugs
remain highly significant. Do not provide a
way to interpret trends observed in imprecise
data.
68
What Should We Do With Worrisome Trends in
Imprecise Data?
69
What Should We Do With Worrisome Trends in
Imprecise Data?
Believe in observed differences that are
biologically plausible.
70
An Indisputable Truth
Be wary of differences that are deemed real
based on biological plausibility, because
physicians can always be relied upon to propose a
biological mechanism to explain the validity of
an unexpected (and potentially preposterous)
finding that happens to have an interesting P
value.
71
What Should We Do With Worrisome Trends in
Imprecise Data?
Believe in observed differences that are
biologically plausible. Look for confirmatory
evidence in other studies (avoid being
selective).
72
Experience with Drug A
Major Adverse Cardiovascular Event
Placebo
Drug
Trial 1 13 33 Trial 2 10
15 Trial 3 5 8 Trial 4 2
0 Trial 5 8 14
73
The Cumulative Meta-Analyses
8.0
6.0
4.0
Z score
2.0
0.0
-2.0
Information Time
74
Boundaries for Cumulative Meta-Analyses
8.0
?0.001
?0.01
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
75
Boundaries for Cumulative Meta-Analyses
8.0
?0.001
?0.01
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
76
Boundaries for Mg-MI Meta-Analyses
Pogue Yusuf, Cont Clin Trials 199718580-93
8.0
?0.001
?0.01
6.0
?0.05
4.0
Z score
2.0
0.0
-2.0
Information Time
77
What Should We Do With Worrisome Trends in
Imprecise Data?
Believe in observed differences that are
biologically plausible. Look for confirmatory
evidence in other studies (avoid being
selective). Carry out definitive trial with
the adverse event as primary endpoint (powered
to detect meaningful treatment difference).
78
Are Definitive Trials the Answer?
Sponsors pursue encouraging trends for
important endpoints. Most are
non- confirmatory. Sponsors should pursue
discouraging trends for important endpoints.
Most will be non-confirmatory. Definitive
trial can address ascertainment and
classification biases as well as concerns about
multiplicity of comparisons and imprecision of
data.
79
Just Pause and Think
If you observed an increased frequency of a
serious adverse effect in a clinical trial, how
easy would you think it would be to carry out a
trial intended to definitively evaluate this
risk?
80
Do We Need to Be So Certain When Evaluating
Safety Instead of Efficacy?
81
Do We Need to Be So Certain When Evaluating
Safety Instead of Efficacy?
We are strict in reaching conclusions about
efficacy because saying that there is a
benefit when there is none means millions will
be treated unnecessarily and subject to side
effects and costs. Some might advocate being
less strict in reaching conclusions about
safety, but saying that there is an adverse
effect when there is none means millions will be
deprived of an effective treatment.
82
Conclusions
The findings of controlled clinical trials are
most easily interpreted when they represent the
principal intent of the study. A
non-principal finding is subject to many
interpretative difficulties, including
ascertainment biases and inflated false
positive rates due to the multiplicity of
comparisons and imprecision of estimates
inherent in analysis of small numbers. The FDA,
industry and academia remain in a quandary as
to how to respond in a responsible fashion to
observed differences in reported frequencies of
adverse events.
83
A Final Personal Note
My presentation should not be construed as
favoring any particular side in the current
debate. It is my view that regardless of
ones position  it is critical to understand
the limitations of what we know and to resist
the temptation to reach conclusions before we
are justified to do so. Only by recognizing
our ignorance will we be able to take the first
step towards developing a rational approach
that is in interest of all patients.
Write a Comment
User Comments (0)
About PowerShow.com