Title: Statistical Issues in Contraceptive Trials
1Statistical Issues in Contraceptive Trials
- Daniel L. Gillen, PhD
- Department of Statistics
- University of California, Irvine
- FDA Reproductive Drugs Advisory Committee
Meeting, Jan 23-24
2Minimum requirements of a clinical trial
- Appropriate target population
- Use of appropriate comparison groups
- Use of appropriate outcome measure
- Ability to maintain statistical criteria for
evidence - Controlling type I and II errors in the
Frequentist setting
3Outline
- Outcome measures
- Pearl Index vs. life-table methods
- Comparison populations
- Historical vs. active control trials
- Defining statistical evidence
- Testing for superiority vs. non-inferiority
4Outcome MeasuresPearl Index vs. Life Table
Methods
5The Pearl Index
- The Pearl Index (number of pregnancies per 100
woman years) is a common measure used to
summarize contraceptive effectiveness - However, a drawback of the Pearl Index is that in
most situations it is dependent on time and must
be interpreted accordingly - Such dependence occurs because of the changing
baseline risk of pregnancy within study samples
as time marches forward
6Ex Sensitivity of Pearl Index to duration of
follow-up
- Suppose our study population consists of two
groups - Low risk group (90 of population)
- Constant risk of pregnancy
- 1 year probability of pregnancy is 5
- High risk group (10 of population)
- Constant risk of pregnancy
- 1 year probability of pregnancy is 50
7Ex (contd) One-year Pearl Index
- Now consider the Pearl Index calculated over the
first year - Expected number of pregnancies
- 5000(0.900.05 0.100.50) 475
- Expected person-years at risk with censoring for
pregnancy - 45251 475.5 4762.5
- Pearl Index
- (475 / 4762.5)100 9.97 pregnancies per 100 per
year
8Ex (contd) Two-year Pearl Index
- For the Pearl Index calculated over 2 years, we
need to consider the impact of censoring the
high risk group at pregnancy - By the end of one year
- Number left in low risk group 50000.90(1-0.05)
4275 - Number left in high risk group
50000.10(1-0.50) 250 - Percent of total population in high risk group at
one year is 250/4275 5.8
9Ex (contd) Two-year Pearl Index
- Now consider the Pearl Index calculated between
years 1 and 2 - Expected number of pregnancies occurring between
1 and 2 years of follow-up - 4525(0.9420.05 0.0580.50) 344.4
- Expected person-years at risk between year 1 and
year 2 - 4180.61 344.4.5 4352.8 person-years
- Pearl Index calculated between years 1 and 2
- (344.4 / 4352.8)100 7.92 pregnancies per 100
per year
10Ex (contd) Two-year Pearl Index
- Now consider the Pearl Index calculated over 2
years - Expected number of pregnancies observed over 2
years - 475 344.4 819.4
- Expected person-years at risk over 2 years
- 4762.5 4352.8 9115.3 person-years
- Pearl Index calculated over 2 years
- (819.4 / 9115.3)100 8.99 pregnancies per 100
per year
11When is the Pearl Index independent of study
support?
- The Pearl Index will change with the length of
follow-up unless - The rate of pregnancies is homogeneous across all
possible subgroups - This rate remains constant with time
12When is the Pearl Index independent of study
support?
- In the previous example, it should be noted that
even if we allow participants with failures to
re-enter the risk set the Pearl Index will still
depend upon time - This is because a failure results in less at-risk
time, thus total years of follow-up will be
proportionately less in the high risk group as
duration of maximal follow-up increases
13A further issue in quantifying the Pearl Index
- Most confidence intervals for the Pearl Index
assume a Poisson Distribution - This distribution is defined as having variance
equal to the mean (or rate) - However, count or rate data is typically
characterized as stemming from an overdispersed
Poisson distribution - That is, the true variance in the rate that we
observe is more that we assume from the Poisson
distribution - Overdispersion in Poisson rates typically arises
from heterogeneity of patient populations
14Computation of confidence intervals for the Pearl
Index
- Consider our previous example with a low risk
and a high risk group - Low risk group (90 of population)
- Constant risk of pregnancy
- 1 year probability of pregnancy is 5
- High risk group (10 of population)
- Constant risk of pregnancy
- 1 year probability of pregnancy is 50
15Computation of confidence intervals for the Pearl
Index
- We previously calculated the (true) 1 year Pearl
Index to be 9.97 pregnancies per 100 per year - Suppose that in reality, we observed 457
pregnancies over 1 year with a total of 4763
years of followup, resulting in a Pearl Index of
9.60 per 100 per year - Assuming a Poisson distribution the corresponding
95 confidence interval for the 1 year Pearl
Index would be (8.73, 10.51)
16Computation of confidence intervals for the Pearl
Index
- However, because the Pearl Index is really
composed of a mixture of Poisson distributions
(from the high and low risk groups) the true
variance is actually 19.2 larger than assumed by
the usual (single) Poisson model - This means that we have underestimated the
variance, ie. Our confidence interval is shorter
than it should be! - In this case, a 95 confidence interval
accounting for the heterogeneity of groups is
(8.63, 10.55). - This is approximately 8 wider than the previous
interval
17How to deal with the changing composition of the
risk set?
- We illustrated one way in our example
- Consider the probability of failure at specific
time points by using conditional probability - For example, if T is the time of failure we can
compute the probability of failure within two
years as - PrTlt2 1-PrTgt2
- 1 - PrTgt2Tgt1PrTgt1
- 1-(1-0.0792)(1-0.0997) 0.171
18How to deal with the changing composition of the
risk set?
- This is called a life-table estimate
- In the setting of contraceptive failure, these
conditional probabilities are typically computed
monthly to more accurately incorporate the risk
set (see eg. Potter, 1966) - When the life-table estimate is evaluated at all
(distinct) failure times, this is called a
Kaplan-Meier estimate.
19Are there any benefits of to using the Pearl
Index?
- Clearly, the Pearl Index has been in wide use
- The reasons for this are
- Ease of interpretation
- Although the Kaplan-Meier estimator also has a
clinically relevant interpretation (probability
of failure over T years of use) - For historically controlled trials, there is a
great deal of data summarized in terms of the
Pearl Index - This will, of course, change as the popularity of
Kaplan-Meier estimates grow in the field
20Can we incorporate changing treatment regiments?
- Patients may discontinue use or use additional
contraceptives for some intervals of time - Technically, the Kaplan-Meier estimator could
incorporate such left and right censoring. - However, it is not clear when patients should
re-enter the risk set
21Can we incorporate changing treatment regiments?
- For example, consider the case where a
participant uses back-up contraception during the
interval (t1, t2). - This individual could be considered at risk for
the interval (0, t1) then re-entered into the
risk set at time t2. - However, by doing this we are implicitly making
the assumption that this persons hazard (or risk
of pregnancy) at time t2 is the same as all
others who have been at risk from (0, t2) - This is not a reasonable assumption to me and I
would advise against it
22Can we incorporate changing treatment regiments?
- Another option for incorporating changing
treatment regiments would come from post-hoc
analyses - Stratified Kaplan-Meier estimates
- Number of strata could become large
- Time-dependent covariates
- Eg. Consider a proportional hazards framework
23Regardless of the measure, what defines a failure
and who is at risk?
- For all new interventions we must consider
- Safety Are there adverse effects that clearly
outweigh any potential benefit? - Efficacy Can the intervention reduce the
probability of unintended pregnancy in a
beneficial way? - Effectiveness Would adoption of the intervention
as a standard reduce the probability of
unintended pregnancy in the population?
24Regardless of the measure, what defines a failure
and who is at risk?
- One difference between evaluation of efficacy and
effectiveness is in what defines a failure and
who should be included in the risk set - In a clinical trial setting we can truly only
evaluate efficacy because of possible selection
bias of patients entering contraceptive trials - However, even in the clinical trial setting it is
useful to evaluate - Intervention failure rates during actual use
(including inconsistent or incorrect use) - Intervention failure rates during perfect use
- (see eg. Trussell, Contraception, 2004)
25Regardless of the measure, what defines a failure
and who is at risk?
- To assess true method efficacy, counting only
method failures during perfect use, we must
only include perfect use exposure patients in the
risk set - Also, need to consider if those who are lost to
follow-up should be considered at risk all the
way up to the time of drop-out - One reasonable approach is to censor patients
three months prior to the time at which they
become lost to follow-up (Trussell, SIM, 1991)
26Historical vs. Active Control Trials
27Historical control trials vs. active control
trials
- In the past many methods have been assessed via a
historical control trial - Eg. Criteria such as a Pearl Index of 1.5 (or
more recently 2) or less has been used an
efficacy criteria - Such criteria stems from the experience of
historical controls - However, biases resulting from historical control
studies can be numerous. Particularly when study
samples are not comparable with respect to
baseline risk, evaluative measure of outcome, or
duration of study.
28Criteria for superiority in historical control
trials
- As noted, past studies have considered point
estimates of the (one year) Pearl Index of less
than 1.5 or 2 unintended pregnancies per 100 per
year - However, we must also acknowledge uncertainty of
these estimates - EMEA requires sufficient sample size to guarantee
the width of the 95 CI for the Pearl Index to be
no larger than 1 - Better (in my opinion) to require that upper
bound of CI is less than the chosen threshold - In either case, if the Pearl Index is used the
previous notes on computation of the CI need to
be considered
29Historical control trials vs. active control
trials
- Because it is impossible to guarantee
comparability between historical controls and
current study samples, it is almost always
advantageous to employ randomization when
ethically feasible - Given a wide use of standard contraceptives, it
is not feasible to consider a placebo controlled
trial - However, one can (and should) consider the use of
an active control when comparable interventions
are in use - Also allows for comparison of entire survival
curve (logrank test or proportional hazards
model?)
30Superiority vs. Non-Inferiority in Active Control
Trials
31Superiority vs. non-inferiority in active control
trials
- Statistical criteria for evidence in a
superiority trial - Evidence to rule out equality of effect as
measured by the chosen parameter (eg. Pearl
Index, 1-year survival estimate, or a hazard
ratio) - Example
- Contrast may be difference in 1-year failure
rates as measured by the Kaplan-Meier estimator - KMTx(1) - KMAC(1)
- Test H0 KMTx(1) - KMAC(1) ? 0
- Vs. H1 KMTx(1) - KMAC(1) lt 0
- Rejection of null hypothesis corresponds to upper
bound of CI for KMTx(1) - KMAC(1) being less than
0
32Superiority vs. non-inferiority in active control
trials
- Statistical criteria for evidence in a
non-inferiority trial - Evidence to rule out some margin of efficacy less
than the active control - Example
- Contrast may be difference in 1-year failure
rates as measured by the Kaplan-Meier estimator - KMTx(1) - KMAC(1)
- Test H0 KMTx(1) - KMAC(1) ? ?
- Vs. H1 KMTx(1) - KMAC(1) lt ? for some ? gt 0
- Rejection of null hypothesis corresponds to upper
bound of CI for KMTx(1) - KMAC(1) being less than
?
33Superiority vs. non-inferiority in active control
trials
- When is it reasonable to consider non-inferiority
instead of superiority? - ICH E-10 Guidelines
- Active control treatment must truly be active in
the study population - If active control is truly active in the study
population - Can a margin to define non-ineferiority be
established? - If active control is standard of care, is new
treatment also superior on secondary endpoints?
34Superiority vs. non-inferiority in active control
trials
- Issues in setting the non-inferiority margin?
- What measure compares distributions?
- Is the treatment effect random?
- How much of a decrease in effect is acceptable?
- How to account for variability in the estimate(s)
from historical trials?
35Superiority vs. non-inferiority in active control
trials
- Precedence for setting the non-inferiority
margin - Is the treatment effect random?
- Ideally use meta-analysis of multiple trials
- Careful! Do trials have same duration of
follow-up? - How much of a decrease in effect is acceptable?
- 10, 20, 50 of active control effect?
- How to account for variability in the estimate(s)
from historical trials? - Use worst case from historical 95 CI?
- Explicitly account for variability in historical
trial
36Summary
37Summary
- Need to define appropriate target population,
comparison group, outcome measure, and maintain
statistical criteria for evidence - Pearl Index is (usually) implicitly dependent on
the length of follow-up, whereas Kaplan-Meier
(life table) estimates make this dependence
explicit - In either case, we need to obtain correct
inference (CIs) and the definition of the risk
set must correspond to the definition of failure - When ethically and logistically possible, active
controls should be used - If historical controls are used, uncertainty
should be accounted for in defining superiority
criteria