Title: Statistical aspects of clinical research
1Statistical aspects of clinical research
2Outline
- Why is clinical research hard?
- Key statistical concerns
- Get the correct answer to the right question,
using the appropriate number of subjects - Key components of a clinical trial
- Clear, feasible, appropriate study objective(s)
- Target patient population
- Study design visit and evaluation schedule
- Efficacy and safety endpoints
- Sample size
- Analysis methods
- Next week
- Interim analyses early termination?
- Subgroup analyses
3Clinical research is not for sissies
- Answering even relatively simple questions under
the best conditions a controlled clinical
trial can be tricky. Possible sources of bias
abound, and if appropriate safeguards are not
taken, may combine to give a false or misleading
conclusion - Some of the factors which make clinical research
hard - Formulating the right scientific question can
be deceptively tricky - Logistical complexity, especially the need to use
multiple sites - Trial conduct is highly interdisciplinary,
requiring sustained, well-coordinated effort from
many groups - Staggered recruitment of subjects, uncertainty
about accrual pattern is unavoidable - Patient dropout, particularly in longer trials
- Potential for the goalpost to move mid-trial
unforeseen events can destroy, or severely
reduce, the relevance of the study even before it
ends
4Laws governing clinical trial conduct ¹
- Lasagnas Law
- The prevalence of any disease under study drops
dramatically once study enrolment opens up, and
returns to previous levels only once enrolment
closes - Murphys Law
- Anything that can go wrong, will go wrong
- In particular, the most egregious breach of
protocol instructions will occur at the
highest-enrolling site - Giltinans Law
- The quality of data obtained from any site is
inversely proportional to the degree of
exaltation of the thought leader or principal
investigator at the site (in extreme cases, the
role of thought leader is so all-consuming that
delays in filing the necessary paperwork result
in actual enrolment levels close to zero) - ¹ clearly, all just different manifestations of
Murphys Law
5Strategy to tactics protocol development
- A key concern is that each individual study
protocol must achieve its goals, not just on its
own terms it must also make sense within the
broader picture - A major practical issue is the ever-changing
nature of the landscape the long duration of
most trials, and the uncertainty about the
results means that the original target may have
shifted by completion of a given trial - Nonetheless, a key requirement when designing any
trial is that the proposed design should give the
best chance possible of enabling the development
plan to proceed to the next stage, once results
from the trial become available - The previous condition should be met, even when
results do not correspond to the desired answer
it is important to remember that a failed
clinical trial is not one which fails to give the
desired answer, but rather one which fails to
give an unambiguous answer
6 Study objectives should be clear, specific, and
relevant
- Phase III objectives determined primarily by (i)
target product profile (think desired label
claim) (ii) norms for the given disease - Primary and secondary objectives should map
readily to corresponding statistical hypotheses - Safety objectives are given greater emphasis in
Phases I and II Phase III focuses on efficacy
and safety - Objectives should be specified as precisely as
possible. At a minimum, include information on - What measure of efficacy/safety will be used?
- Key features of the target patient population
- Dosing regimen, i.e. amount, frequency, and route
of dosing - Preferable to use neutral language when
specifying objectives (personal opinion). Phrases
like to compare (investigate) the efficacy or
to characterize the pharmacokinetics are
preferable to, e.g., to demonstrate efficacy or
to establish superiority
7Protocol Tip 1 Specify clear study objectives
- Examples
- To investigate the effect of a single 5mg dose
of rhwonderprotein, administered by transgenic
snakebite, on clotting ability in Irish
clergymen, as measured by the change from
baseline in prothrombin time, rather than To
demonstrate the efficacy of rhwonderprotein in
improving clotting ability - To investigate the effect of twice daily SC
injection of 40µg/kg of rhIGF-I for 12 weeks on
glycemic control, in subjects with moderate to
severe Type II diabetes, as measured by the
average change from baseline in HbA1c, compared
to subjects in the placebo group
8Bias sources and precautions
- Selection bias
- Allocation bias
- Evaluation bias (observer/instrument)
- Recall bias
- Time (systematic change in patient population,
treatment, or other aspect of study conduct as
trial progresses) - Withdrawal / drop out patterns
- Lack of compliance with study protocol
- Unblinding (of patient, physician, or study
personnel)
- Unambiguous eligibility criteria
- Randomization, stratification, blinding
- Blinding, standardization
(training, or central evaluation) - Appropriate data collection instruments
- Balanced treatment allocation, protocol should
specify salient details of study conduct,
avoiding room for differential interpretations - Pre-specified analysis conventions, sensitivity
analyses - Training engaged study coordinators at site
- Randomized allocation suitable precautions
surrounding treatment codes and drug
inventory/supply
9Bias the statisticians arch-nemesis
- Loosely speaking bias arises as a result of
- Groups differ at baseline w.r.t. an important
prognostic factor - Groups differ w.r.t. some aspect of study conduct
that could affect response - Key statistical tools against bias are
- Randomization (allocation of subjects to
treatment groups is randomized) - Blinding
- Stratification
- Uniform implementation of study procedures across
study sites is also critical. Differences may
complicate interpretation, or compromise
generalizability of results. Of particular
concern - Different interpretation of eligibility criteria
- Systematic differences across sites in how key
variables are measured
10Bias, efficiency, and generalizability
- Trial design and execution should
- Avoid bias - wrong, or misleading, result
- Generalize to the target population of interest -
avoid an irrelevant result - Be efficient - avoid using more subjects than
necessary - Studies which are inadequately powered, or
otherwise deficiently designed, may be viewed as
particularly inefficient (and ethically dubious)
11Randomization
- Randomization is the basis for statistical
inference - A significance level represents the probability
that differences in outcome can be the result of
random fluctuations. - Without randomization a statistically
significant difference may be the result of non
random differences in the distribution of unknown
prognostic factors - Randomization does not ensure that groups are
medically equivalent, but it distributes randomly
the unknown biasing factors - Randomization plays an important role for the
generalization of the observed clinical trials
data
12Randomization Practical Tips
- If prognostic factors are known use randomization
methods that can account for it - Stratification / blocking
- Adaptive randomization
- If possible randomize patients within a site
- Patients enrolled early may differ from patients
enrolled later - Watch out for staggered enrollment
- Temporary closing of study sites or arms can
cause problems - Protocol amendments that affect
inclusion/exclusion criteria may be tricky - Even in open label studies randomization codes
should be locked
13Blinding
- Randomization does not guarantee that there will
be no bias by subjective judgment in evaluating
and reporting the treatment effect - Such bias can be minimized by blocking the
identity of treatment (blinding) - Types of blinding
- Challenges
- Ethical considerations
- Unblinding procedures for safety reasons
- Unblinding procedures at final analysis
14Protocol Tip 2 Avoid Ambiguity
- Protection against certain types of bias is
through appropriate design precautions
(stratification, randomization, blinding) - Other types of bias are prevented only by giving
unambiguous instructions to the sites on the
intended patient population and how all aspects
of the study should be conducted - Sites will sniff out each ambiguity in the
protocol, and interpret and execute the
instructions more divergently than you can
imagine - There is vagueness regarding key aspects of study
conduct, e.g. use of con meds, evaluation
schedule, endpoint definition, handling of
dropouts, how key evaluations will be carried
out, etc. etc. etc. - Major divergence in interpretation (e.g. in
deciding eligibility, or how to measure a key
response variable) - has the potential to torpedo the protocol
entirely - may not become evident until its too late
15Protocol Tip 3 Accommodating multiple sites
- As a routine precaution, it is advisable to limit
the contribution to enrolment of any single site
to no more than 15 of the total. Note that this
limit is generally not specified explicitly in
protocol text, but is communicated to sites at
study initiation nonetheless - Non-standard evaluations may require intensive
training of site personnel to reduce systematic
differences in evaluation among sites - Centralized (blinded) evaluation, when feasible,
is often the best option - It is a good idea to develop a prospective
publication strategy, securing upfront buy-in
from key stakeholders - A plan and timetable for disseminating study
results should be developed, following existing
SOPs, and communicated to sites prospectively
16Protocol Tip 3 Accommodating multiple sites
- Regular, frequent communication with sites is
important - Early monitoring of key variables is advisable,
to allow problems to be detected and fixed early - Appropriate mechanisms should be in place to
allow evaluation of aggregated safety data in a
timely fashion, (remember that individual sites
may not be able to discern adverse patterns,
based only on their data) - Each team member should try to attain at least a
basic understanding of the role of every other
team member
17 Endpoints (1)
- Discussion here will focus primarily on efficacy
endpoints - What about other kinds of endpoints?
- Pharmacokinetic endpoints are generally standard
parameters derived from the observed
concentration-time profiles - Safety endpoints also tend to be fairly standard
most are common across protocols, with occasional
disease/drug-specific markers - Incidence of adverse events (general,
protocol-specified, by body system, etc.) - Changes in key laboratory parameters
- Incidence of antibodies (neutralizing or not)
- Pharmacodynamic endpoints, in contrast, are
measures of activity, and will vary from study to
study. Recommendations for efficacy endpoints
apply.
18 Endpoints (2) General Remarks
- No problem in Phase I, where focus is primarily
on safety and PK endpoints. Limited sample sizes
preclude formal evaluation of efficacy if it
must be mentioned in the protocol, it is
preferable to refer to activity, rather than
efficacy - Drug approval requires establishing an acceptable
risk-benefit profile. It is important to bear in
mind that the regulatory expectation is that of
clinical benefit to the patient - Thus, in general, the primary efficacy endpoint
should be a measure of clinical effect (as
opposed to, e.g. a biochemical or physiological
marker) - Taking the primary efficacy endpoint in a pivotal
trial to be a biomarker which is not a direct
measure of clinical benefit is something which
should be done only with prior buy-in from all
relevant regulatory agencies - In general, such buy-in can be attained only in
the case of an established surrogate endpoint
more on this below
19Endpoints (3) relevance should be accepted
- Ideally, there is a well-established primary
efficacy endpoint, accepted as a suitable
measure of patient benefit. - This can circumvent much tedious discussion, and
has the added advantage that consensus on what
constitutes a meaningful treatment effect is
likely already to exist. - When such consensus exists, to ignore it would be
foolhardy - Often there may be consensus on the choice of
primary efficacy variable, but secondary aspects,
such as definition of relapse or rebound may
still be under debate - For diseases with no consensus on how best to
measure efficacy, expect longer development times - It is not recommended to launch Phase I without a
reasonably clear vision of what the primary
efficacy variable will be in pivotal studies
postponing difficult discussions wont
necessarily make them any easier - Agreement on conventions for handling
dropouts/missing data is also important
20Endpoints (4) Objective is better
- Generally speaking, endpoints which can be
measured in a completely objective fashion are
preferred - This may not always be possible some degree of
subjectivity may be unavoidable (e.g. in
endpoints such as physicians or patients
evaluation of improvement) - The degree to which this kind of subjectivity may
be acceptable is likely to depend on perceptions
about the integrity of blinding in the study - In evaluating quality of life, use of a
validated instrument is preferable. In many
cases, a disease-specific QOL questionnaire
exists - Consultation with the Health Economics group is
highly recommended, to ensure that collection of
QOL data supports the target product profile
(dont wait until Phase III to do this)
21Endpoints (5) measurement aspects
- In general, key efficacy endpoints should be
straightforward to measure. Avoid measures which
might still be considered experimental, which
require highly complex instrumentation, or
involve extremely specialized assays.
Measurements which rely heavily on technician
skill or judgement can also be problematic - Centralized evaluation of key endpoints may help
guard against inter-site variation - If key variables do involve specialized assays,
make sure that assay procedures are thoroughly
understood, and consistently implemented
22Endpoints (6) Multiple Endpoints
- Multiple secondary endpoints are common
- Multiple primary endpoints are sometimes used
- If consensus on a single 1? endpoint is
impossible - Should be a course of last resort (personal view)
- Have an associated penalty, in terms of a higher
bar to declare statistical significance at a
given level ? - A common approach is to require significance at
level ? k, where k is the number of
co-primary endpoints (Bonferroni) - Bonferroni works reasonably, provided k is not
too large, and if the constituent endpoints are
uncorrelated - For highly correlated endpoints, Bonferroni is
inefficient true attained significance will be lt
? - Especially problematic if there is interest in
multiple subsets - Try to show some discipline regarding of 2?
endpoints
23Endpoints (7) a statistical taxonomy
- Continuous - e.g. reduction in cholesterol,
HbA1c, visual acuity - Categorical
- Multiple categories with no natural ordering
- Ordered categorical - e.g. different degrees of
improvement - Dichotomous e.g. response/non-response,
dead/alive at a specific time post-treatment - Time-to-event e.g. survival, time to
progression - Different analysis methods are appropriate for
each main - endpoint type sample size requirements differ as
well - (3) is obviously a special case of (2)
24Endpoints (8) statistical properties
- Approximate ordering by information content (from
highest to lowest) is - Continuous gt time-to-event ordered
categorical - gt categorical gt binary
- As a result, demonstrating an effect when the
primary efficacy measure is a response rate is
typically most demanding, in terms of sample size - Although continuous response variables may have
preferable statistical properties, it is quite
common for FDA to require the primary efficacy
variable to be a response rate, where response is
defined as the proportion of subjects who reach a
specified threshold of improvement on the
continuous scale (Raptiva, Lucentis)
25Endpoints in cancer trials
- Response rate (where response is based on change
in tumor size, according to well-defined
criteria best post-treatment evaluation is
counted, so response is not linked to a specific
timepoint) - Duration of response (note that the resolution
with which this can be determined will depend on
the frequency of scheduled evaluations) - Survival time
- Time to disease progression, where criteria for
progression are well-defined - Progression-free survival
- One major question is the extent to which a
treatment effect - on response, in terms of reduction of tumor size,
is predictive - for treatment effect on survival. Unfortunately,
this seems to vary by tumor - and treatment class.
26Sample Size Considerations
- In the standard hypothesis testing framework for
efficacy - Type I error conclude an ineffective drug is
effective (false positive) - Type II error conclude an effective drug is
ineffective (false negative) - Ideally, both error probabilities should be
controlled - Generally, sample size is chosen to give
acceptable power (defined as 1- Type II error
rate, or 1 - ?) for a prespecified false positive
rate, ? - In phase III efficacy trials, ? is 0.05, by
regulatory fiat - Acceptable power is generally taken to be 90 for
pivotal studies
27Phase III Trials Sample Sizes
- This has implications for sample size, due to
tension between both types of error - Timeline implications, as study duration
treatment duration accrual time - Common pitfall exaggerate extent of the
possible treatment effect (power for the home
run), over-optimistic sample sizes - General guideline power study to detect
treatment effect specified in the target product
profile (regular, not optimistic, scenario) - In some cases, sample size is dictated by safety,
rather than efficacy, considerations (satisfy
minimum regulatory requirements)
28Sample Size Considerations
- For a given value of ?, power depends on
- Magnitude of the treatment effect (?)
- Sample size (?)
- Inter-subject variability for continuous
measurements (?) - Response rates for binary responses (??)
- For most pivotal efficacy trials, the standard
approach is to calculate the sample size
necessary to give adequate (90) power to detect
a clinically meaningful treatment effect, with a
type I error rate of 5 - Calculating the sample size needed for a given
power requires some knowledge about variability
of continuous responses (or response rates, for
binary data) - Clinically meaningful needs to be defined in
terms of the target product profile, not as the
effect size which will give acceptable power for
the sample size Im willing/able to use
29Sample Size other approaches
- Sample size is not always dictated by this kind
of power analysis in some cases, safety
requirements may be the deciding factor
(rheumatoid arthritis, psoriasis) - In earlier phases, it may not be practical to run
trials big enough to control both Type I and Type
II error rates as well as we might like - 80 power is generally considered adequate in
Phase II on occasion we may settle for less - Similarly, requiring significance at the 5 level
may be overly stringent in Phase II - Personal view it is foolish to allow the
hegemony of hypothesis testing to control our
thinking prior to Phase III - Instead, view the issue as an estimation problem
- Precision analysis
- Choose sample size in such a way that there is a
desired precision at fixed confidence level - Small chance of detecting true treatment effect
30Sample Size for Time to Event Endpoints
- Challenge
- Power for correctly detecting a clinical
meaningful difference at a fixed type I error
rate depends primarily on the number of events
(deaths, progressions, etc.) - Specifying the number of events doesnt uniquely
determine the number of subjects - For instance, suppose the required number of
events is 280. If 300 subjects per group is
sufficient to give the required number of events,
then 250 per group must as well it will just
take longer - Thus, sample size calculations are a little more
complex for time-to-event responses and will
depend on - calculating the number of events needed to give
the desired power - an assumption about the median time-to-event in
the control group - an assumption about the size of the difference
between control and treated groups - projected accrual patterns
- targeted study duration
31Interim Analyses
- Interim analysis is a tool to protect the welfare
of subjects - By stopping enrollment/treatment as soon as a
drug is determined to be harmful - By stopping enrollment as soon as a drug is
determined to be beneficial - By stopping trials which will yield little
additional useful information (or which have
negligible chance of demonstrating efficacy if
fully enrolled, given results to date) - The associated statistical methods are generally
referred to as group sequential methods
32Interim analysis Concerns
- Should preserve an overall false positive rate of
? for the trial cannot claim statistical
significance at level ? if the unadjusted p-value
at one of the interim analyses happens to be less
than ? - In general, the unadjusted p-value for testing
treatment effect at any given interim analysis
will be compared to a more stringent (lower)
bound to stop early (for efficacy) requires
compelling evidence - Regulatory agencies need to be convinced that
interim analyses do not compromise the integrity
of the blind - Regulatory guidelines over the past 10 years have
become stricter and stricter, ultimately
requiring that interim analyses be conducted by
an external, independent group, i.e. study team
members are no longer privy to interim results
33Interim analysis Concerns
- Basically, interim results should not be shared
with anyone in the sponsor company, or at
participating study centers - The only feedback to the sponsor is in the form
of the recommendations from the Data Monitoring
Committee - Details of any proposed interim analysis,
including the sponsors expectations of the DMC,
should be laid out prospectively in a written
charter - SOPs and a charter template exist and should be
followed - Although team members do not conduct the actual
analyses, scheduled interim analyses can be
highly labor-intensive nonetheless. Genentechs
biostatistician/statistical programmer will still
need to work with the external data group to
develop detailed specifications for the analyses
and displays to be made available to the Data
Monitoring Board
34Interim analysis
- Early stopping for efficacy is not the only
possibility (recent experience notwithstanding).
Doing so is generally non-controversial, provided
an appropriate group sequential stopping rule,
and the role of the DMC, have been identified
prospectively - Early stopping for safety can range from
scenarios which are very clear-cut to situations
which are considerably more ambiguous. In the
latter case, having an experienced DMC chair can
be particularly important - Early stopping for lack of efficacy (futility
analysis) is not particularly common (with one
exception, discussed on the next slide) the
idea that incorporating this option can result in
substantial reduction in the number of patients
(gating risk) seems slightly misleading
(personal opinion) - Stopping for futility in a controlled trial will
typically happen only if the treatment appears
considerably inferior to control at the interim
analysis - Enrolment continues during preparation for the
interim analysis, which typically occurs at a
point where accrual has gained momentum, so of
subjects saved may not be that great
35Early stopping for futility
- An exception is the case of uncontrolled oncology
trials focusing on estimation of response rate - Use of a two-stage (or multi-stage) design is
common - At a given analysis stage, if the observed
response rate is so low that it essentially rules
out the possibility that the true response rate
is acceptable, may choose to stop - Typically the argument is based on the upper 90
or 95 confidence limit for the true response
rate stop if this is lower than the minimum
rate identified as interesting in the TPP - Recall the rule of 3, often invoked in the
context of safety data. If a particular event
(adverse reaction, response) occurs in 0 out of N
subjects tested, then the 95 upper confidence
limit for the true rate of occurrence is 3/N. - Thus, for instance, if no responses are observed
in the first 20 subjects, this effectively rules
out values of the true response rate greater than
3/20, or 15. If the TPP requires a response rate
of at least 20, stopping for futility seems
warranted
36Statistical analysis methods for rates
- A fairly detailed exposition can be found on our
website at gwiz/projects/stathelp
introductory course notes, lecture 4 - Use of the binomial distribution
- Calculating standard errors normal approximation
for large samples - Estimation and confidence intervals for a single
rate - Testing for difference between two rates (z-test,
?²-test, Fishers exact test) - Estimation and confidence intervals for the
difference between two rates - Testing for differences in rates among several
groups (?²-test, Fishers exact test)
37Statistical methods for survival analysis
- If the response of interest is survival time,
then specialized methods are needed, for two main
reasons -
- Frequency distribution of survival times is
usually not well-behaved not normal, not even
symmetric - In the context of clinical studies, cannot wait
to observe all survival times this means, for
some subjects, all we know is that their survival
time exceeds the observation period - In statistical jargon, such survival times are
called (right)-censored observations - Methods for survival times are also applicable to
any response of type time-to-event e.g. time
to disease progression, etc.
38Overview of survival analysis methods
- Definitions survivor function, hazard function
- Estimation of survival curve Kaplan-Meier
- Comparison of one or more survival curves
logrank test, Wilcoxon test - Comparing survival curves, allowing adjustment
for other factors (e.g. baseline disease status)
proportional hazard regression, aka the Cox
model
39 Kaplan-Meier disease-free survival curves
stratified by p53 mutation status (n 542)
Solid/dotted without/with a p53 tumor mutation
40Graphing survival data Kaplan-Meier estimation
- We wish to estimate the proportion remaining
disease-free at any given time, equivalently, the
estimated probability of that a member of the
population from which the sample is drawn is
alive without disease at that time - Because of the censoring we use the Kaplan-Meier
method. For each time interval we estimate the
probability that those without disease at the
beginning remain so throughout the interval. This
is a conditional probability. - The probability of being disease-free at any time
point is calculated as the product of the
conditional probabilities of surviving without
disease through each interval prior to that time
point. - The calculations are simplified by ignoring times
at which there were no recorded events (whether
progressions or losses to censorship). - Censorship is accommodated in the calculations by
ensuring that all subjects previously lost to
censoring are removed from the risk set when
calculating the conditional probability for a
given timepoint - Because the overall probability of being disease
free at a particular timepoint is calculated as a
product of the relevant conditional
probabilities, this (Kaplan-Meier) method of
estimating the survival curve is sometimes
referred to as the product-limit estimate
41Describing survival pattern for a single group
- Survival probabilities are usually presented as a
connected "curve. The curve takes the form of a
step function, with changes in the estimated
probability occurring (only) when an event
(progression) was observed - Observations censored during any interval affect
the number still at risk at the start of the next
interval. Censoring is thus accommodated when
calculating the step sizes, its effect on the
curve is relatively subtle, but becomes
cumulatively more important over time. Some
versions of the Kaplan-Meier curve display
censoring times as superimposed short vertical
lines (works best for relatively small sample
sizes) - In practice, a computer is used to do these
calculations. - Standard errors and confidence intervals for
estimated survival probabilities can be found by
using a formula due to Greenwood - Reporting estimated median survival with
associated confidence limits is usual estimating
other percentiles is also possible
42Comparing survival patterns across groups
- Two most common tests are
- Logrank test
- Wilcoxon test
- If comparison needs to allow adjustment for other
- covariates besides group ID (e.g baseline disease
- status), the most common approach is
-
- Cox (proportional hazards) regression
- As the name implies, this analysis frames the
comparison in terms - of the effect a treatment or covariate exerts on
the hazard function, - rather than directly on the survival function
43Comparing survival patterns testing
- Logrank test
- Basic idea at each new event time, figure out
the survival pattern that would be expected if
the null hypothesis (no difference) were true - Quantify the difference between the observed
survival pattern and that expected under null
hypothesis. This is done at each new event time. - Obtain a cumulative measure of discrepancy from
H0 by adding up the contributions across all
event times - Compare the result to appropriate tables
(chi-square) to obtain a p-value - Wilcoxon test variation of logrank text which
gives greater weight to discrepancies occurring
earlier
44Comparing survival patterns estimation
- Limitations of the logrank test
- Only addresses the question is there a
difference? No direct quantification of the size
of the difference - Doesnt allow adjustment for other relevant
prognostic factors (e.g. differences at baseline) - These questions usually addressed by Cox
(proportional hazards) regression. Salient output
is - estimated coefficient with standard error and/or
confidence interval - Usually interested in whether or not coefficient
is zero - Quantifies effect on hazard, rather than the
survival function
45Definitions of survival and hazard functions
- For completeness, here are the definitions
- Survival function
- S(t) Probability of surviving past time t
- Hazard function
- h(t) Probability of dying at time t, given one
has survived until that time - For calculus fans, the hazard function turns out
to be d/dt - log (S(t)
46Safety analyses
- Safety and efficacy data differ in some key
aspects - Safety hypotheses are not specified a priori
- Failure to achieve statistical significance does
not mean that a safety finding can be ignored - With safety data the goal is to prove a negative
- Safety analyses are usually descriptive
- A few serious medical events can lead to the
termination of products development extreme
value distributions are relevant to safety
analyses - Concurrent controls may not provide adequate
context for interpretation
47Safety Analysis - Challenges
- Phase III trials are typically sized based on
efficacy what type of safety statements are
appropriate? - Drug exposure how to summarize, how to
correlate with adverse events observed, etc. - Dose response
- Open label trials
- Placebo-controlled trials
- Sources of bias (under-reporting, longer
follow-up leads to more events) - Adverse events very very many types, so what is
an appropriate way to summarize/analyze? - Multiplicity
48Safety Analyses - Challenges
- Number of subjects and duration of exposure
during development is minimal relative to the
of patients that may receive drug post-approval - Only the most common AEs (e.g., incidence of 1
or more) are identified - Less common AEs (1 in 1000) cannot be reliably
detected - Rare events (1 in 10,000) will almost certainly
not be observed at all - Some patient groups may have been excluded from
trials entirely, or insufficiently represented
to a degree which precludes identifying any risks
specific to them
49Regulatory Requirements
- Safety
- Applicant must demonstrate product safety (FDA
has obligation to demand) - Extent of data There must be sufficient
information to decide whether the drug is safe. - Adequate analyses Adequate tests by all
methods reasonably applicablemust be performed
to evaluate safety for labeled use. - Reasonable results Tests should show that drug
is safe as labeled - Risks must be adequately defined.
- Extreme risks (even if rare) must be obvious.
50Regulatory Requirements
- Efficacy
- Applicant must demonstrate substantial evidence
of effectiveness claimed. - Substantial evidence evidence consisting of
adequate and well-controlled investigations,
including clinical investigations, from which
experts could conclude the drug will have the
claimed effect. - Investigations imply replication or
corroboration. - Typical 2 Phase III trials with identical or
similar designs - In special circumstances 1 Phase III trial may
be sufficient. - E.g. life-threatening diseases with very limited
therapeutic options (always a good idea to talk
to regulatory agencies prior to trial initiation)
51Guidelines and Regulations
- Regulatory Agencies
- FDA
- EEC (European Economic Community)
- U.S. Codes of Federal Regulations for Clinical
Trials - ICH (International Conference on Harmonization)
- Initiatives undertaken by regulatory authorities
and industry associations to promote
international harmonization of regulatory
requirements - Good Clinical Practice (GCP)
- Structure and content of clinical studies
- Clinical safety data management Definitions and
standards for expedited reporting - Statistical principles for clinical trials
52Biomarker - working definition
- . a laboratory measurement or physical sign
used as a substitute for a clinical endpoint that
measures how a patient feels, functions, or
survives. - from a definition of the term surrogate
endpoint by - Temple, cited in Fleming and DeMets (1996),
- Annals of Internal Medicine, 125, pages 605-613
- Surrogate endpoints in clinical trials are we
being misled?
53Appendix
- Some thoughts on biomarkers
54Biomarkers as surrogate endpoints
- Predict clinical efficacy of treatment based
- on its effect on biomarker (data may be
- available earlier may provide answer with fewer
- number of subjects)
- Use in Phase II is common
- dose ranging based on biomarker
- Phase III go/no go decision based on observed
treatment effect on biomarker
55Common biomarker types
- Biochemical (cholesterol, HIV viral load,
cytokine concentration, hemoglobin A1c ) - Immunological (lymphocyte subpopulation counts,
CD4 , CD11a T cells, CD20 B cells..) - Saturation of target cell surface antigen or
soluble ligand - Physiological (e.g. blood pressure, pulmonary
function testing, episodes of arrythmia ) - Imaging (angiography, tumor size, bone density by
DEXA scan )
56Biomarkers as surrogates - successes
- Lowering of cholesterol level by treatment with
statins (survival benefit established) - Reduction in viral RNA in peripheral blood
through treatment with protease inhibitors delays
HIV disease progression - Improved glycemic control (HbA1c) predictive of
delayed onset of microvascular complications
(retino-, nephro-, neuropathy) in Type I diabetes
- 90-minute TIMI flow (angiography) predictive of
30-day survival following thrombolytic therapy - Reduction in free IgE following treatment with an
anti-IgE antibody correlates with symptom
improvement scores in allergic rhinitis and asthma
57Biomarkers as surrogates cant win em all
- Experience with biomarkers is not always positive
- CD4 counts as a surrogate in AIDS trials mixed
performance as a predictor of clinical benefit - Tumor size in cancer trials experience runs
both ways appears to depend both on tumor type
and on class of treatments - Experience in the CAST trial demonstrated that
treatment with encainide/flecainide clearly
reduced the incidence of arrythmias, but
increased mortality - Similar results in context of treating atrial
fibrillation - Blood pressure as surrogate effect translates
to clinical benefit for some drug classes, but
not others
58What can make biomarkers unreliable?
- Biomarker not on causal pathway of disease
process - Several pathways intervention affects that
mediated through biomarker, but not others
(redundancy) - Biomarker not on the pathway affected by the
intervention, or is insensitive to treatment
effect - Intervention has mechanisms of action unrelated
to the disease process (aka the law of
unintended consequences) - Failure of either type is possible - biomarker
could falsely predict, or fail to predict,
clinical benefit
59What can make biomarkers unreliable?
- Other potential contributing factors include
- Measurement difficulties due to rater effects
- GNE experience (?-interferon in renal cell
carcinoma) - strongly supports advisability of blinded tumor
- evaluation by a single central review board
(avoid - bias, minimize center differences)
- Measurement difficulties arising from sample
preparation, - transport, storage, and handling
- Time constraints in assaying fresh blood,
possible effects of - activation of T-cells, lack of standardization of
FACS assay - protocols and reporting methods, heterogeneity of
tumor - samples, center differences (use of local or
central labs)
60What can make biomarkers unreliable?
- Other potential assay-related difficulties
include - - Matrix effects
- Interference by other proteins can affect assay
- specificity and/or sensitivity
- Development of antibodies
- Can be hard to detect harder to quantify
reliably - extremely difficult to assess clinical
significance, if any - Inter-laboratory differences
- Can be large enough to make biomarker data
uninterpretable
61Biomarkers editorial comments
- Avoid the what we can measure is what we should
measure fallacy - Experience with imaging-based biomarkers to date
has been disappointing - Non-targeted genomic assays (e.g. microarrays
followed by data mining) has the potential for
much wasted effort - Avoid the rearranging the deckchairs on the
Titanic fix, e.g. straining to improve assay
precision from a CV of 20 to 15 when the
within-subject CV for the marker is 40 and the
inter-subject CV is 50. - Cytokines make particularly treacherous
biomarkers - Proteomics is not for sissies
- Distinguish between must know and
nice-to-know - An understanding of mechanism of action may be
nice to know, but is not a requirement for drug
approval
62Personal opinions (tongue in cheek)
- If the word cascade appears in the description
of the disease process, all bets are off - The topic of biomarkers seems to drive otherwise
thoughtful researchers to an irrational frenzy of
wishful thinking - The message so eloquently expounded by Jagger et
al remains as relevant today as it was in 1969 - Lasagnas Law already mitigates against rapid
accrual of eligible subjects to clinical trials - To slow recruitment from a trickle to a complete
grinding halt only two words are needed in the
protocol serial biopsy
63Biomarkers - general conclusions
- Utility of a particular biomarker depends not
only on the disease, but also on the nature of
the therapeutic intervention - Validation of any candidate biomarker must
necessarily be considered on a case-by-case basis - Validity of a marker for a given drug class may
not transfer to other drug classes for the same
disease - Success is most likely when intervention clearly
affects the biomarker, whose role in the disease
process is well-established and clearly
understood - Validation of a putative marker cannot happen
without ultimately generating the required
clinical outcome data - Regulatory conservatism is to be expected, and
seems appropriate
64(No Transcript)