Title: VII-1
1Part VII Philosophy of Interim Monitoring
Christopher S. Coffey, PhDDepartment of
BiostatisticsSchool of Public HealthUniversity
of Alabama at BirminghamSCT Pre-Conference
WorkshopFundamentals of Clinical Trials
2OUTLINE
- Overview of Interim Monitoring
- Efficacy Monitoring
- Futility Monitoring
- Adaptive Designs
3INTERIM MONITORING
Data and Safety Monitoring Boards (DSMBs) are
often given the responsibility of monitoring the
accumulating data. The DSMB is responsible for
assuring that study participants are not exposed
to unnecessary or unreasonable risks. The DSMB is
also responsible for assuring that the study is
being conducted according to high scientific and
ethical standards.
4INTERIM MONITORING
- Why have DSMBs?
- Protect safety of trial participants
- Investigators are in a natural conflict of
interest - Vested in the study
- They, and their staff, are paid by the study
- Having the DSMB externally review efficacy and
safety data protects - The credibility of the study
- The validity of study results
5INTERIM MONITORING
Principle 1 - Responsibilities. The primary
responsibilities of a DSMB are to safeguard the
interests of study patients and to preserve the
integrity and credibility of the trial. Principle
2 Composition. The DSMB should have
multidisciplinary representation, including topic
experts from relevant medical specialties and
biostatisticians.
6INTERIM MONITORING
Principle 3 - Conflicts. Individuals with
important conflicts of interest (financial,
intellectual, professional, or regulatory) should
not serve on a DSMB. Principle 4
Confidentiality Issues. Trial integrity requires
DSMB members not to discuss details of meetings
elsewhere.
7INTERIM MONITORING
DSMBs should periodically review study data. The
study protocol should include a section
describing proposed plan for interim data
monitoring.
- This plan should detail
- What data will be monitored?
- The timing of all interim analyses?
- The frequency of data reviews.
- Criteria that will guide early termination
8INTERIM MONITORING
Frequency of DSMB meetings depends on disease
topic and specific intervention most meet 1-4
times per year. Early in the trial, DSMB review
will focus more on safety, quality of conduct,
and trial integrity rather than on efficacy
evaluation.
9INTERIM MONITORING
Later meetings may include formal efficacy or
futility analyses. Ethical principles mandate
that clinical trials begin with uncertainty as to
which treatment is better. This uncertainty
should be maintained during study. If interim
data become sufficiently compelling, ethics would
demand that the trial stop and the results made
public.
10INTERIM MONITORING
Hence, interim monitoring of safety and efficacy
data has become an integral part of modern
clinical trials. Any efficacy or safety data
analyzed by treatment arm will be discussed only
in a closed session. Only the DSMB members and
study statistician will attend the closed
session. It is critical not to reveal information
presented in closed session to the study
investigators, except as explicitly authorized by
the DSMB.
11INTERIM MONITORING
- A typical agenda for a DSMB meeting
- Closed executive session
- Review of agenda, additions to agenda
- Open session with investigators
- Review current status and conduct of study
- Accrual update
- Closed session with unblinded investigators
- Review safety data
- Review interim analysis (if appropriate)
- Closed executive session
- Open session with investigators
- Discussion/Recommendations
12INTERIM MONITORING
- Early termination of a trial should be considered
if - Interim data indicate intervention is harmful
- Interim data demonstrate a clear benefit
- Significant difference by end of study is
probable - No significant difference by end of study
probable - Severe logistical or data quality problems exist
The DSMB may recommend that the study protocol be
terminated, temporarily suspended, or amended.
13INTERIM MONITORING
At the conclusion of each meeting, the DSMB makes
a recommendation to the sponsor
- Study should continue without modification
- Study should continue with the following
modifications - Study should be stopped for safety/efficacy/futili
ty
DSMB will also summarize any areas of concern
regarding performance and safety. Soon
thereafter, the DSMB chair will provide a written
summary of the boards recommendations.
14INTERIM MONITORING
The decision to stop a trial early is complex,
requiring a combination of statistical and
clinical judgment. Stopping a trial too late
means needlessly delaying some participants from
receiving the better treatment. Stopping a trial
to early may fail to persuade others to change
practice. Statistical methods have been developed
for interim monitoring of clinical trials to
minimize the role of subjective judgment.
15INTERIM MONITORING
- Safety vs. Efficacy
- Efficacy is the assessment of whether there is a
meaningful difference on the primary outcomes
of the study - Safety involves all other aspects of differences
between groups in adverse outcomes - Focus is on monitoring of adverse events
16EFFICACY MONITORING
In clinical trials setting, practical
considerations dictate that interim data looks
occur after groups of patients have completed the
study. The appropriate interim efficacy
monitoring plan depends on the goals of the
trial. Flexibility, in number and timing of
analyses, can be built into the interim
monitoring plan.
17EFFICACY MONITORING
Consider a clinical trial to compare two normally
distributed groups with K interim analyses. The
objective of the trial is to test the null
hypothesis of no treatment effect at each interim
analysis H0 ? 0 vs. HA ? ? 0 where d equals
difference between treatment means. At each
interim analysis, the null hypothesis is tested
using the test statistics Z1,,ZK (Z-statistic
for all data observed up to time of kth interim
analysis)
18EFFICACY MONITORING
Under H0 (no difference between groups), repeated
testing at level a inflates the probability of
making at least one type I error. Even 5-10 tests
can lead to serious misinterpretation of trial
results.
19EFFICACY MONITORING
Solution is to adjust stopping boundaries in such
a way to ensure that overall type I error is
equal to a
- Pocock (1977) described stopping boundaries with
same critical value at each interim look - OBrien Fleming (1979) proposed a sequential
plan where nominal significance levels needed to
reject H0 increase as study progresses. - Haybittle Peto (1976) suggested a simple form
of sequential monitoring where H0 is rejected
only if Zk 3 for all interim tests (k lt K)
20EFFICACY MONITORING
A comparison of the critical values for the
Pocock, OBrien-Fleming, and Haybittle-Peto
methods for k 5 looks and a 0.05 is given
below
21EFFICACY MONITORING
There is a slight loss of power with multiple
testing. To account for this, sample size
calculations must adjust the sample size upward.
- This is accomplished by the following process
- Compute the required sample size under a fixed
sample design. - Multiply this sample size by an appropriate ratio
to account for the multiple testing.
22EFFICACY MONITORING
The original methodology for group sequential
boundaries required that the number and timing of
interim analyses be specified in advance. DSMBs
sometimes may require more flexibility as
beneficial or harmful trends emerge. Similarly,
it may be more convenient to tie interim looks to
calendar time rather than information time
related to sample size. In this case, the of
patients changes unpredictably between looks and
one needs to find a way to deal with random
increments of information.
23EFFICACY MONITORING
A simple, practical alternative is to spend type
I error as a function of elapsed calendar time
from start of analysis to the kth analysis, tk
nk/N. Lan and DeMets (1983, 1989) proposed an
alpha spending function which provide more
flexible group sequential boundaries. With the
a-spending function, neither the number nor exact
timing of interim analyses need to be specified
in advance.
24EFFICACY MONITORING
The selected a-spending function determines rate
at which overall type I error is spent during the
trial. The big advantage of a-spending functions
is their flexibility since they do not require
pre-specifying the number of timing of looks. The
approach lends itself well to the accommodation
of irregular, unpredictable, and unplanned
interim analyses.
25FUTILITY MONITORING
Power tells whether a clinical trial is likely to
have high probability to detect a pre-defined
treatment effect of interest. Very low power
implies that a trial is unlikely to reach
statistical significance even if there is a true
effect. One should never begin a trial with low
power. However, sometimes low power becomes
apparent only after a trial is well under way.
26FUTILITY MONITORING
Once a trial begins and data become available,
the probability that a significant treatment
effect will be detected can be recalculated. If
conclusions are known for certain, regardless
of future outcomes, early termination should be
considered i.e. a sports team clinching a
pennant.
27FUTILITY MONITORING
This idea has been developed for clinical trials
and is referred to as stochastic curtailment.
- Simple Curtailment A study is stopped as soon as
result is inevitable (i.e., it could not be
reversed) - Stochastic Curtailment A study is stopped as
soon as result is highly probable given current
data
In stochastic curtailment framework, a decision
to continue or terminate study at each interim
look is based on likelihood of observing positive
or negative treatment effect if continued to
planned end.
28FUTILITY MONITORING
Stochastic curtailment uses the concept of
conditional power Pk(?) Pr reject H0 ? and
observed data so far Initially, when k 0,
this is the usual power function. At the planned
termination of the study (stage K), this
probability is either 0 or 1. At interim stage k,
conditional power depends on ?.
29FUTILITY MONITORING
- If early results show
- Intervention better than expected? conditional
power high - Intervention worse than expected? conditional
power low (unless sample size increased)
Group sequential methods focus on existing
data. Stochastic curtailment methods consider
future data.
30FUTILITY MONITORING
Clearly, the futility rule is heavily influenced
by the assumed value of the treatment difference,
?. Making an overly optimistic assumption about ?
delays decision to terminate the trial.
- Several options for the value of ? have been
proposed - Lan, Simon, Halperin (1982) Evaluated at value
of ? corresponding to alternative hypothesis. - Evaluated under the null hypothesis.
- Evaluated at the observed treatment effect
31FUTILITY MONITORING
These calculations are most frequently made when
interim data are viewed to be unfavorable. Here,
conditional power represents probability that
current unfavorable trend would improve
sufficiently to yield evidence of benefit by
scheduled end of trial. If interim prediction
indicates trial is unlikely to be positive,
ethical and financial considerations suggest
early termination of the trial. The trial is
stopped for futility once the conditional power
drops below some specified level (i.e., 20).
32FUTILITY MONITORING
One limitation of conditional power is that no
adjustment is made to account for associated
prediction error if observed treatment effect is
used. Interim futility monitoring may also be
conducting using other approaches
- Predictive Power Mixed Bayesian-Frequentist
approach - Predictive Probability Bayesian approach
33SOFTWARE
- Software packages for group sequential methods
- SSeqTrial (Insightful Corporation)
- EaST (Cytel)
- PEST 4 (University of Reading)
- LanDeM (University of Wisconsin)
- SAS (through the use of Macros)
34EXAMPLE
The Secondary Prevention of Small Subcortical
Strokes (SPS3) study consists of two randomized,
multi-center clinical trials to simultaneously
assess the impact of two therapies
- Antiplatelet therapyAspirin 325 mg/day vs.
Aspirin 325 mg/day plus Clopidogrel 75
mg/day(double blind, placebo-controlled) - Two target levels of blood pressure
controlusual (130-149 mmHg) vs. intensive
(lt130 mmHg)
35EXAMPLE
Hypothesis 1 Effect of Antiplatelet Therapy on
Recurrent Stroke
- Analysis
- Kaplan-Meier/Log Rank
- Cox Regression Models (to control for covariates)
- Assumptions
- Assumed 7 annual rate in ASA/placebo group
- Assumed 10 Loss to Follow-Up
- Average follow-up time of 3 years
- 90 power to detect 25 decrease using 5
significance level
? Sample size of n 2500 required
36EXAMPLE
- Survival Curves Based on SPS3 Power Calculations
- 7 rate/yr in ASA/Placebo Group (black)
- 25 reduction in ASA/Clopidogrel group (red)
37EXAMPLE
An interim efficacy analysis allows early
stopping if the effect is larger.
- 7 rate/yr in ASA/Placebo Group (black)
- 50 reduction in ASA/Clopidogrel group (red)
38EXAMPLE
An interim futility analysis allows early
stopping if the effect is smaller.
- 7 rate/yr in ASA/Placebo Group (black)
- 5 reduction in ASA/Clopidogrel group (red)
39EXAMPLE
- SPS3 Interim Analysis Plan
- Recommend two interim efficacy analyses to take
place after 1/3 and 2/3 of primary events have
been observed - Should an interim analysis lead to stopping
randomization to one of the interventions - The winning treatment will be given to all
subjects - Randomization to alternative treatments will
continue for the other intervention
40EXAMPLE
At the time of the planned interim analyses, a
series of conditional power calculations will be
performed to assess
- The interaction, using observed effect size
- The antiplatelet therapy arm, using both
hypothesized and observed effect size - The blood pressure control arm, using both
hypothesized and observed effect size
41ADAPTIVE DESIGNS
There may be limited information to guide initial
choices for study planning. Since more knowledge
will accrue as the study progresses, adaptive
designs allow these elements to be reviewed
during the trial. An adaptive design allows for
changing or modifying the characteristics of a
trial based on cumulative information.
42ADAPTIVE DESIGNS
Greater flexibility within the adaptive design
framework can translate into
- Improved efficiency of trial design
- Use of fewer patients within trials
- Better use of available resources
- More efficient drug development
- Reduce time and cost of drug development
- Take the correct dose into phase III
- More winners to the market faster
43ADAPTIVE DESIGNS
Recently, there has been considerable research on
adaptive designs (also called flexible or
innovative designs). The rapid proliferation of
interest in adaptive designs and inconsistent use
of terminology has created confusion about
similarities and differences among the various
techniques. For example, the definition of an
adaptive design itself is a common source of
confusion.
44ADAPTIVE DESIGNS
PhRMA Adaptive Designs Working Group
- Co-Chairs
- Michael Krams
- Brenda Gaydos
- Member Authors
- Keaven Anderson
- Suman Bhattacharya
- Alun Bedding
- Don Berry
- Frank Bretz
- Christy Chuang-Stein
- Sylva Collins
- Vlad Dragalin
- Paul Gallo
- Brenda Gaydos
- Michael Krams
- Qing Liu
- Jeff Maca
- Inna Perevozskaya
- Members
- Zoran Antonijevic
- Roy Baranello
- Michael Branson
- Carl-Fredrik Burman
- Nancy Burnham
- Daniel Burns
- Bob Clay
- Chris Coffey
- David DeBrota
- Alex Dmitrienko
- Jennifer Dudinak
- Greg Enas
- Richard Entsuah
- Parvin Fordipour
- Sam Givens
- Ekkehard Glimm
- Andy Grieve
- Shu Han
- Members (cont.)
- David Henry
- Melissa Himstedt
- Tony Ho
- Roger Lewis
- Gary Littman
- Cyrus Mehta
- Wili Maurer
- Allan Pallay
- Michael Poole
- Bob Parker
- Yili Pritchett
- Jerry Schindler
- Jonathan Smith
- Don Stanski
- Joel Waksman
- Bill Wang
- Gernot Wassmer
45ADAPTIVE DESIGNS
- Vision of AD Working Group
- Establish dialogue between with clinicians,
regulators and other lines within Pharmaceutical
Industry, Health Authorities and Academia. - To turn adaptive designs into a respected
approach across all phases of clinical drug
development - To educate, set expectations for high quality
standards, and share experiences on case studies
46ADAPTIVE DESIGNS
PhRMA Working Group on Adaptive Designs
(2006) By adaptive design we refer to a
clinical study design that uses accumulating data
to modify aspects of the study as it continues,
without undermining the validity and integrity of
the trial. changes are made by design, and not
on an ad hoc basis not a remedy for inadequate
planning.
47ADAPTIVE DESIGNS
- Infinite number of adaptive design possibilities
- Many aspects of the study can be changed
- - dosing - sample size
- - final test statistic - treatment allocation
ratio - - primary endpoint - inclusion/exclusion
criteria - - number of treatment arms - randomization
procedure - - Number of interim looks - goal superiority to
non-inferiority - In all cases, objectives should be clearly
defined and the operating characteristics should
be well understood (i.e., impact on type I error
rate).
48ADAPTIVE DESIGNS
Flexible Designs
Planned
Unplanned
Adaptive Designs
???
Change Other Aspects (Test Statistic, Primary
Endpoint, Inclusion/Exclusion Criteria Dose,
etc.)
Adaptive Dose-Response
Seamless Phase II/III Designs
Adaptive Randomization
Sample Size Re-Estimation
Estimated Treatment Effect (Known Variance)
Internal Pilots (Estimated Nuisance Parameters)
Estimated Effect Size
49ADAPTIVE DESIGNS
Historically, a great deal of controversy
surrounding adaptive designs has been focused
around a particular type of sample size
re-estimation design
50ADAPTIVE DESIGNS
Methods exist that allow the use of sample size
re-estimation methods based on a revised
treatment effect without inflating the type I
error rate. Then, why are such methods
controversial? The controversy arises due to the
argument as to whether there is any benefit above
and beyond that achieved with a standard GS
design. Mehta and Patel (2006) provided an
example which illustrates the controversy.
51ADAPTIVE DESIGNS
- A sample size re-estimation based on revised
estimates of treatment effect is nearly always
less efficient than a group sequential approach! - - Tsiatis Mehta (2003)
- - Jennison Turnbull (2003, 2006)
- Researchers should carefully think about what
constitutes an effect of interest. - Researchers should avoid choosing power to detect
whatever effects are observed in a given set of
data. - Random treatment effect gives random power
estimate
52ADAPTIVE DESIGNS
With internal pilot (IP) designs, modifications
are based only on re-estimated nuisance
parameters. All data in final analysis, no
interim testing. Interim power analysis
only. With small samples, risk of type I error
inflation may offset benefits of unadjusted test.
53ADAPTIVE DESIGNS
With moderate to large sample sizes, there is
minimal (if any) type I error rate inflation with
the unadjusted test. Thus, IP designs can be used
in large randomized clinical trials to assess key
nuisance parameters and make appropriate
modifications with little cost in terms of
inflated type I error rate!
54ADAPTIVE DESIGNS
Why are IP designs not being used more often
since there are clear scientific and statistical
benefits with very little penalty in large
clinical trials? For example, useful to
re-estimate sample size using actual accrual
rates with time-to-event data. IP designs may not
be used more frequently because they are confused
with more controversial techniques that use
revised estimates of treatment effect to adjust
sample size. Discussions such as these should
clarify this issue.
55ADAPTIVE DESIGNS
Adaptive designs are NOT new. The broad
definition includes topics such as group
sequential designs and covariate adaptive
randomization techniques. However, because this
is a rapidly expanding area of research, more
practical experience is needed. Both Bayesian and
Frequentist approaches should be considered.
56ADAPTIVE DESIGNS
Adaptive designs are NOT always better. You can
adapt to quickly. May not answer the question
faster, i.e., may require larger sample
sizes. Always need to evaluate the benefits of an
AD over standard fixed designs. Specifically,
need to understand operating characteristics of
the AD by conducting simulations to assess
performance under realistic scenarios.
57ADAPTIVE DESIGNS
The AD Working Group has established an external
webpage http//biopharmnet.com/doc/doc12004.html
This provides a central location for
publications, training courses, and other
documents created by the working group to
facilitate the sharing of knowledge. Full white
papers of various topics were published in a
special issue of the Drug Information Journal
(Vol. 40, No. 4, 2006)
58SUMMARY
There are a variety of approaches for interim
monitoring of clinical trial data. The
relationship between clinical trials and practice
is very complex, and this complexity is evident
in the data monitoring process. The appropriate
monitoring plan depends on the goals of the trial.
59SUMMARY
Because of the repercussions of stopping a trial
early, the decision to stop a trial is complex
and requires both statistical and clinical
judgment. Hence, these methods should not be used
as a sole basis in the decision to stop or
continue a trial. Other considerations that play
an important role in decision making process
cannot be fully addressed within the statistical
sequential testing framework.
60SUMMARY
There is much confusion as to the meaning of the
term adaptive design since it has been used to
refer to a variety of situations. As a result,
many perceive such designs as controversial. The
PhRMA working group on adaptive designs recently
proposed a general definition which states that
adaptive designs allow use of accumulating data
to modify aspects of an ongoing study. Adaptive
by DESIGN thorough upfront planning required