Design and Statistical Principles for Randomized Clinical Trials: An Overview

1 / 49
About This Presentation
Title:

Design and Statistical Principles for Randomized Clinical Trials: An Overview

Description:

Comparing multiple treatments to 'definitively' demonstrate: Superiority: test better than control (with respect to the primary endpoint) ... –

Number of Views:211
Avg rating:3.0/5.0
Slides: 50
Provided by: PYL4
Category:

less

Transcript and Presenter's Notes

Title: Design and Statistical Principles for Randomized Clinical Trials: An Overview


1
Design and Statistical Principles for Randomized
Clinical Trials An Overview
  • P. Y. Liu, PhD

2
Randomized Phase III Trials
  • Comparing multiple treatments to definitively
    demonstrate
  • Superiority test better than control (with
    respect to the primary endpoint)
  • Non-inferiority (equivalence) test no worse than
    control by a pre-specified delta
  • Failing to detect a significant difference does
    NOT imply equivalence

3
Randomized Trials
  • Treatment assignment randomized, i.e. unbiased
  • Except planned treatment differences, treatment
    arms equal with respective to baseline patient
    factors and study conduct
  • Outcome differences attributable to different
    treatments

4
Randomized Trial Design Elements
  • Randomization ratio
  • Stratification
  • Degree of treatment identity blinding

5
Randomization Ratio
  • Equal N to each arm smallest total N
  • Unequal N per arm, e.g. 21 experimental vs.
    control ratio
  • More safety data for experimental arm, more
    attractive to potential subjects if placebo
    controlled, etc
  • Incrementally higher N if ratio not too drastic,
    e.g. 2-arm total N 12 14 higher for 21 vs.
    11 ratio

6
Randomization Stratification
  • To ensure balanced randomization with respect to
    established important baseline prognostic factors
  • Example
  • Gender known to be highly related to trials
    primary endpoint, e.g. females do better
  • Randomize within gender to guard against chance
    imbalance, e.g. test same as control but
    disproportionally more females on test arm,
    making it look better than control
  • Not necessary if N really large

7
Randomized Trials Blinding
  • Open label treatment assignment identity known
    to all
  • Single blind subject
  • Double blind entire study staff (except masked
    treatment/device dispenser and statistician) and
    subject highest assurance of trial conduct
    uniformity across treatment arms
  • 1.5 blind outcome scorer subject

8
Randomized Trials Intent to TreatEffectiveness
Analysis Principle (ITT)
  • For treatment effectiveness, all randomized
    patients included and analyzed according to the
    randomization treatment assignment regardless of
    actual treatment deviations

9
Intent to Treat Principle
  • Validity of treatment comparison rests on
    equality of treatment arms at baseline
  • Between-arm patient balance achieved by
    randomization must be preserved for analysis

10
Intent to Treat Principle
  • Post randomization deviations often treatment
    related
  • Those randomized to test but could not be treated
    or received control instead could be the more
    difficult cases for the test treatment
  • Compliance doing poorly ? noncompliance
  • A better than B ? more B patients not doing well
    / noncompliant ? noncompliance exclusions reduce
    treatment differences
  • Double blinded study exclusions ok if due
    inc/excl violations or no treatment received

11
ITT Principle
  • Generally, once randomized patient counts towards
    the randomly assigned treatment regardless of
    actual treatment deviations (received little/no
    treatment, wrong treatment)
  • No randomization cancelations / post-hoc
    exclusions (except for lack of data due to
    consent withdrawal or loss to follow-up)

12
ITT Principle Implications
  • Minimize potential post-randomization protocol
    deviations by
  • Truly informed consent to reduce post-hoc
    refusals, dropouts, e.g. subjects favoring one
    treatment over another or having study
    requirement compliance issues should not be
    consented
  • Timing of randomization as close to treatment
    divergence as logistically feasible (e.g.
    intra-operative randomization)

13
Statistical Considerations
  • False positive error
  • False negative error
  • Sample size estimation

14
False Positive Error (type I error, a error, p
value, significance level)
  • False positive chance of a comparison (consumers
    risk) seeing a statistically significant
    difference when truth is null
  • Calculated from trial data
  • Threshold pre-specified, p lt0.05 commonly
    accepted as a small enough false positive risk
    for concluding an observed difference to reflect
    a real difference
  • Related to trial size

15
False Positive Error (type I error, a error, p
value, significance level)
  • Arm 1 success rate 10/100 (10)
  • Arm 2 success rate 30/100 (30)
  • p value 0.0007 with a false positive chance
    of 7/10,000, infer the true underlying rates to
    be different
  • Arm 1 success rate 1/10 (10)
  • Arm 2 success rate 3/10 (30)
  • p value 0.58 with a false positive chance of
    58/100, can not infer the true underlying rates
    to be different

16
False Positive Error(type I error, a error, p
value, significance level)
  • Most quoted p values are 2-sided, i.e. testing
    whether two treatments are the same
  • Most trials NOT symmetrical, e.g. interested in
    if experimental better than control
  • A 2-sided plt0.05 level is really a 1-sided
    plt0.025 level true false positive rate
  • Non-inferiority inherently 1-sided

17
False Positive Error (type I error, a error, p
value, significance level)
  • 1-sided a threshold 0.025 ? 1 out of every 40
    positive trials would be falsely positive
  • An error rate not necessarily acceptable from
    FDAs perspective
  • Two independent, identically designed positive
    trials with p1lt0.025 each chance of both trials
    falsely positive lt0.000625 or 6/10,000
  • Better off (smaller N) doing a single large trial
    with 1-sided a pre-set at 0.000625

18
False Negative Error (ß)
  • A trials chance of missing a significant
    difference when a true difference exists
    (sponsors risk)
  • For the same a, ß decreases when N increases

19
Statistical Error Rates
  • a and ß specified when designing the trial
  • Along with primary endpoints data type and
    effect size of interest, determine the trials
    sample size

20
Approximate Total Sample Size 2 ArmsContinuous
(Bell Shape) Data 11 randomization, a 0.05
21
Sample Size Continuous Bell Shape Data
  • N determinant mean ?/SD ratio
  • The larger the signal over noise ratio, the
    smaller the N, twice the ratio ? ¼ the N
  • N ? incrementally as power ? from 0.80 to 0.90
  • 0.80 power should not be the norm, too high a
    sponsors risk to miss 1/5 good new agents or
    devices
  • Power 0.85 or 0.90, i.e. ß of 1/7 or 1/10, should
    be considered

22
Approximate Total Sample Size 2 ArmsBinary
(Yes/No) Data, 11 randomization, a 0.05
23
Sample Size Binary Data
  • Primary sample size determinant absolute success
    rate difference (not relative difference or
    ratio)
  • Actual rates also matter, e.g. for the same ? of
    30, N smaller for 10 vs. 40 compared to 35
    vs. 65 (N largest for rates around 50)
  • 0.85 or 0.90 power recommended

24
Randomized Phase II Trials
  • Screening for promising new treatment
  • Relax a to 1-sided 0.10 to 0.20 - results not
    definitive (1-sided a 0.025 for phase IIIs)
  • Power 0.90
  • Single arm phase II more efficient, 1/4 the N, if
    historical controls well established and stable

25
2-arm Randomized Phase II Time to Event Data
(e.g. disease progression)Total Events Required
11 Randomization 0.90 Power
26
Interim Analysis
  • Can be done as often as desired if results not
    disclosed and no action of any kind will be taken
  • If results for interim presentation/publication
    only, trial WILL continue to pre-planned accrual
    goal regardless of interim results (extreme /-)
  • Early disclosure must have no potential of
    altering trial conduct, e.g. enrollment pattern
    (no difference ? no more enrollment),
    randomization acceptance (test appears beneficial
    ? no one wants control), etc.
  • Present patient characteristics, safety data
    only maintain treatment masking if blinded
    trial

27
Formal Interim Analysis - Traditional
  • Early trial termination considered if extremely
    unfavorable or favorable findings detected
  • Analysis plan with early stopping rules must be
    pre-specified, i.e. not data driven
  • Statistical cost
  • The more often data are acted upon, the more
    chances for error
  • Total 1-sided false positive error rate needs to
    be lt0.025
  • Typical interim 1-sided a set at lt0.0025 for
    consideration of early trial closure

28
Formal Interim Analyses - Traditional
  • Reason for very small interim a threshold early
    a values not stable ? only extreme findings are
    unlikely to be false signals and warrant drastic
    actions
  • Interim analysis for futility difficult if
    primary endpoint requires substantial follow-up
    enrollment may near completion when endpoint data
    available for e.g. first 50 of subjects

29
Formal Interim Analyses Adaptive Design
  • Stop for extreme interim results
  • Otherwise calculate conditional power (CP) at
    interim trend, i.e. chance of a positive trial at
    planned N given interim data
  • Proceed as planned to original N if CP reasonable
  • Increase N if CP promising but less than ideal,
    e.g. gt0.50 but lt0.80
  • No a inflation under broad conditions

30
Common Mis-practices
  • Overlapping confidence intervals imply no
    statistically significant differences
  • Subset analyses
  • Survival by another time related outcome
  • More treatment is better
  • Responders live longer

31
Overlapping Confidence Intervals
32
Overlapping Confidence Intervals
33
Overlapping Confidence Intervals
  • Do not imply no statistically significant
    differences
  • Has to be a highly significant difference in
    order for confidence intervals to be
    non-overlapping

34
Subset Analysis
  • No overall treatment difference
  • Is treatment effective in some subset of
    patients? (males, extremity lesions, good
    performance, ... )
  • Overall treatment difference
  • Is treatment more effective in some subsets than
    others?

35
Subset Analysis5-FU/Levamisole as Adjuvant
Therapy for Colon Cancer
  • NCCTG Trial
  • Therapy Most Effective for
  • Females
  • Young
  • SWOG Trial
  • Therapy Most Effective for
  • Males
  • Old

36
Subset Analysis NCI Melanoma Interferon Trials
  • E1684, HD IFN vs. Obs (N 300)
  • E1690, HD IFN vs. LD IFN vs. Obs (N 600)
  • E1694, HD IFN vs. GMK vaccine (N 900)
  • Virtually identical patient populations
  • Thick primary node negative, or any positive nods

37
Subset Analysis Melanoma HD IFN Studies
  • E1684, HD IFN vs. Obs
  • Benefited most from IFN single node
  • E1690, HD IFN vs. LD IFN vs. Obs
  • Benefited most from HD IFN 2-3 node
  • E1694, HD IFN vs. GMK vaccine
  • Benefited most from IFN node negative

38
Post-Hoc Subset Analysis
  • Most trials under powered to begin with subset
    analysis has very little power
  • High false negative rate
  • Often, many subset analyses are done without
    correcting a for multiple comparisons
  • High false positive rate
  • Almost all post-hoc subset analyses are wrong
  • All post-hoc subset analyses must be confirmed

39
Subset AnalysisHow Should One Proceed
  • Definitive subset analysis
  • Pre-trial specified hypotheses by subset
  • Adequate sample size in subset of interest
  • Adjust a for multiple comparisons
  • Post-hoc subset analysis (exploratory)
  • Global test of subset by treatment interaction
  • Do subsets just for randomization stratification
    factors
  • Can not get around small ns / high false
    negative rate

40
Survival by Another Time-Related Outcome
41
Survival by Another Time-Related Outcome
NSABP Breast Cancer Trial of LPAM
  • Treatment
  • LPAM
  • Placebo
  • 5 Year DFS
  • 51
  • 46

42
DFS by Delivered Dose NSABP Trial of LPAM
  • LPAM Dose Recd
  • gt85
  • 65-84
  • lt65
  • Placebo
  • 5 Year DFS
  • 69
  • 67
  • 26
  • 46

43
Disease-Free Survival by Dose of Placebo
44
Survival by Delivered Dose
  • Time bias
  • The shortest living patients by default receive
    the least amount of treatment
  • Extreme example comparing survival duration for
    heart transplant patients (treatmentyes) with
    those who die on the wait list (treatmentno)
  • Treatment received favors longer living patients

45
Survival by Delivered Dose
  • Common patient factors related to both survival
    and delivered dose
  • Ice cream sale related to drowning deaths
  • Dont have ice cream before swimming?
  • Both are effects of a 3rd factor
  • Association ? cause and effect

46
Survival by Delivered Dose
  • The more placebo the better? Unlikely
  • More likely
  • More placebo delivered selects out longer
    survivors
  • Those with favorable baseline characteristics
    live longer and receive more treatment
  • Same for active treatment

47
DFS by Delivered Dose NSABP Trial of LPAM
  • LPAM Dose Recd
  • gt85
  • 65-84
  • lt65
  • Placebo
  • 5 Year DFS
  • 69
  • 67
  • 26
  • 46

48
Survival by Tumor Response
  • Same problems as survival by dose received both
    are time-related outcomes
  • Time bias
  • The shortest living patients are by default in
    the no response group
  • Common patient factors related to both survival
    and response
  • Baseline prognosis (e.g. disease sites)
  • Age

49
Survival by a Time Related Factor
  • Dose intensity questions should be answered by
    randomized trials
  • Survival by response status
  • Landmark analysis can remove time bias
  • Proper analysis techniques may remove time bias
    and demonstrate association, but can not remove
    common factor self selection, therefore still
    can not infer cause and effect
Write a Comment
User Comments (0)
About PowerShow.com