Title: Statistical Principles for Clinical Research
1Statistical Principles for Clinical Research
Conducting Clinical Trials 2007
Sponsored by NIH General Clinical Research
Center Los Angeles Biomedical Research
Institute at Harbor-UCLA Medical Center November
1, 2007 Peter D. Christenson
2Speaker Disclosure Statement
The speaker has no financial relationships
relevant to this presentation.
3Recommended Textbook Making Inference
Design issues Biases How to read
papers Meta-analyses Dropouts Non-mathematical Man
y examples
4Example Harbor Study Protocol
18 Pages of Background and Significance,
Preliminary Studies, and Research Design and
Methods. Then Pearson correlation, repeated
measure of the general linear model, ANOVA
analyses and student t tests will be used where
appropriate. The two main parameters of
interest will be A and B. For A, using a
t-test 40 subjects provide 80 assurance that a
XX reduction will be detected, with plt0.05.
Similar comparisons as for A and B will be
carried out
5Example Harbor Study Protocol
The good . The two main parameters of
interest will be A and B. For A, using a
t-test, 40 subjects provide 80 assurance that a
XX reduction will be detected, with plt0.05.
- Because
- Explicit Specifies primary outcome of interest.
- Explicit Justification for of subjects.
6Example Harbor Study Protocol
the Bad Pearson correlation, repeated
measure of the general linear model, ANOVA
analyses and student t tests will be used where
appropriate.
- Because
- Boilerplate.
- These methods are almost always used.
- Where appropriate?
- Tries to satisfy reviewer, not science.
7Example Harbor Study Protocol
and the Ugly. Similar comparisons as for A
and B will be carried out
- Because
- 1º OK Diff b/w 2 visits for 2 measures, A B.
- But, 15 measures taken at each of 19 visits.
- Torture the data long enough, and it will
confess to something.
8Goals of this Presentation
More good. Less bad. Less ugly.
9Biostatistical Involvement in Studies
Off-site statistical design and
analysis Multicenter studies data coordinating
center. In house drug company statisticians. CRO
through NIH or drug company. Local study
contracted elsewhere e.g. UCLA, USC,
CRO. Local protocol, and statistical design and
analysis Occasionally multicenter.
10Studies with Off-Site Biostatistics
- Not responsible for statistical design and
analysis. - Are responsible for study conduct that may
- impact analysis, believability of results.
- reduce sensitivity (power) of the study to
be able to detect effects.
11Review of Basic Method of Inference from
Clinical Studies
12Typical Study Data Analysis
Large enough signal-to-noise ratio ? Proves an
effect beyond a reasonable doubt. Often
Observed Effect Natural Variation/vN
Signal Noise
Ratio
For a t-test comparing two groups
Difference in Means SD/vN
t Ratio
Degree of allowable doubt ? How large t needs to
be.
5 (plt0.05) ? t gt 2
13Meaning of p-value
p-value Probability of a test statistic (ratio)
that is at least as deviant as was observed, if
there is really no effect. Smaller p-values ?
more evidence of effect.
- Validity of p-value interpretation typically
requires - Proper data generation, e.g., randomness.
- Subjects provide independent information.
- Data is not used in other statistical tests.
- or an accounting for not satisfying these
criteria.
? p-values are earned by satisfying appropriately.
14Analogy with Diagnostic Testing
Analogy True Effect ? Disease Study Claim
? Diagnosis
Truth
No Effect
Effect
Study Claims
No Effect
Correct
Error
Specificity
Sensitivity
Effect
Correct
Error
Set p0.05 Specificity95
Power Maximize. Choose N for 80
? Typical ?
15Study Conduct Impacting Analysis
? effect detectability (and ?ratio) results from
Non-adherence of study personnel to the protocol
in general. Increases variation. Enrolling
subjects who do not satisfy inclusion or
exclusion criteria. E.g., no effect in 10
wrongly included real effect50 ? 0.9(50)
45 observed effect. Can decrease observed
effect. Subjects not completing entire study.
May decrease N, or give potentially conflicting
results.
16Potentially Conflicting Results
Example Subjects not completing the entire study.
17Tigabine Study Results How Believable?
1
2
3
Conclusions differ depending on how
non-completing subjects (24) are handled in the
analysis.
Primary analysis here is specified, but we would
prefer robustness to the method of analysis
(agreement), which is more likely with more
completing subjects.
18Study Conduct Impacting Analysis
Intention-to-Treat (ITT)
ITT typically specifies that all subjects are
included in analysis, regardless of treatment
compliance or whether lost to follow-up. Purposes
Avoid bias from subjective exclusions or
differential exclusion between treatment groups
sometimes argued to mimic non-compliance in real
world setting. More emphasis on policy
implications of societal effectiveness than on
scientific efficacy. Not appropriate for many
studies.
Continued
19Study Conduct Impacting Analysis
Intention-to-Treat (ITT)
Lost to follow-up Always minimize no real
world analogy as for treatment compliance. Need
to define outcomes for non-completing
subjects. Current Harbor study N1200 would
need N3000 if ITT used, 20 lost, and lost
counted as treatment failures.
20ITT Need to Impute Unknown Values
Observations
LOCF Ignore Presumed Progression
0
Change from Baseline
Individual Subjects
Baseline
Intermediate Visit
Final Visit
Ranks
LRCF Maintain Expected Relative Progression
0
Change from Baseline
Intermediate Visit
Final Visit
Baseline
21Study Conduct Impacting Feasibility
Potential Effects of Slow Enrollment
- Needed N may be impossible ? Study stopped.
- Competitive site enrollment ? Local financial
loss. - Insufficient person-years (PY) of observation
for some studies, even if N is attained
Area PY
of Subjects
Detects Effect?
Detects Effect1.1?
Detects Effect1.7?
N
Year
0 1 2
0 1 2
0 1 2
Planned
Slower Yet
Slower
22Biostatistical Involvement in Studies
Off-site statistical design and
analysis Multicenter studies data coordinating
center. In-house drug company statisticians. By
CRO through NIH or drug company. Local study
contracted elsewhere e.g. UCLA, USC, CRO Local
protocol, and statistical design and
analysis Occasionally multicenter.
23Local Protocols and Data Analysis
- Develop protocol and data analysis plan.
- Have randomization and blinding strategy, if
study requires. - Data management.
- Perform data analyses.
24Local Data Analysis Resources
Biostatistician Peter Christenson,
PChristenson_at_labiomed.org. Develop study design,
analysis plan. Advise throughout for any
study. Perform all non-basic analyses. Full
responsibility for studies with funded
FTE. Review some protocols for committees. Data
Management Database development for GCRC
studies by database manager.
25Statistical Components of Protocols
- Target population / source of subjects.
- Quantification of aims, hypotheses.
- Case definitions, endpoints quantified.
- Randomization plan, if any.
- Masking, if used.
- Study size screen, enroll, complete.
- Use of data from non-completers.
- Justification of study size (power, precision,
other). - Methods of analysis.
- Mid-study analyses.
26SelectedStatistical Componentsand Issues
27Case Definitions and Endpoints
- Primary case definitions and endpoints need
careful thought. - Will need to report results based on these.
Example Study at Harbor Definition of cure very
strict. Analyzed data with this definition. Cure
rates too low - would not be taken
seriously. Scientific method ? need to report
them otherwise cherry-picking. Publication Use
primary definition explain also report with
secondary definition. Less credible.
28Randomization
- Helps assure attributability of treatment
effects. - Blocked randomization assures approximate
chronologic equality of numbers of subjects in
each treatment group. - Recruiters must not have access to randomization
list. - List can be created with a random number
generator in software, printed tables in stat
texts, or even shuffled slips of paper.
29Non-completing Subjects
- Enrolled subjects are never dropouts.
- Protocol should specify
- Primary analysis set (e.g., ITT or per-protocol).
- How final values will be assigned to
non-completers. - Time-to-event (survival analysis) studies may not
need final assignments use time followed. - Study size estimates should incorporate the
number of expected non-completers.
30Study Size Power
- Power Probability of detecting real effects of
a specified minimal (clinically relevant)
magnitude - Power will be different for each outcome.
- Power depends on the statistical method.
- Five factors including power are inter-related.
Fixing four of these specifies the fifth - Study size
- Heterogeneity among subjects (SD)
- Magnitude of treatment effect to be detected
- Power to detect this magnitude of effect
- Acceptable chance of false positive conclusion,
usually 0.05
31Free Study Size Software
www.stat.uiowa.edu/rlenth/Power
32Free Study Size Software Example
Pilot data SD8.19 in 36 subjects. We propose
N40 subjects/group in order to provide 80 power
to detect (plt0.05) an effect ? of 5.2
33Study Size May Not be Based on Power
Precision refers to how well a measure is
estimated. Margin of error the value
(half-width) of the 95 confidence
interval. Smaller margin of error ?? greater
precision. To achieve a specified margin of
error, solve the CI formula for N. Polls N
1000? margin of error on 1/vN 3.
Pilot Studies, Phase I, Some Phase II Power not
relevant may have a goal of obtaining an SD for
future studies.
34Mid-Study Analyses
- Mid-study comparisons should not be made before
study completion unless planned for (interim
analyses). Early comparisons are unstable, and
can invalidate final comparisons. - Interim analyses are planned comparisons at
specific times, usually by an unmasked advisory
board. They allow stopping the study early due to
very dramatic effects, and final comparisons, if
study continues, are adjusted to validly account
for peeking.
Continued
35Mid-Study Analyses
Too many analyses
Effect
0
Wrong early conclusion
Time ?
Number of Subjects Enrolled
Need to monitor, but also account for many
analyses
36Mid-Study Analyses
- Mid-study reassessment of study size is advised
for long studies. Only standard deviations to
date, not effects themselves, are used to assess
original design assumptions. - Feasibility analysis
- may use the assessment noted above to decide
whether to continue the study. - may measure effects, like interim analyses, by
unmasked advisors, to project ahead on the
likelihood of finding effects at the planned end
of study.
Continued
37Mid-Study Analyses
Examples Studies at Harbor Randomized not
masked data available to PI. Compared treatment
groups repeatedly, as more subjects were enrolled.
Study 1 Groups do not differ plan to add more
subjects. Consequence ? final p-value not valid
probability requires no prior knowledge of
effect. Study 2 Groups differ significantly
plan to stop study. Consequence ? use of this
p-value not valid the probability requires
incorporating later comparison.
38Multiple Analyses at Study End
False Positive Conclusions
Torturing Data
Replacing Subgroup with Analysis Gives a
Similar Problem
Lagakos NEJM 354(16)1667-1669.
39Multiple Analyses at Study End
- There are formal methods to incorporate the
number of multiple analyses. - Bonferroni
- Tukey
- Dunnett
- Transparency of what was done is most
important. - Should be aware of number of analyses and
report it with any conclusions.
40SummaryBad Science That May Seem So Good
- Re-examining data, or using many outcomes,
seeming to be performing due diligence. - Adding subjects to a study that is showing
marginal effects or, stopping early due to
strong results. - Examining effects in subgroups. See NEJM 2006
354(16)1667-1669. - Actually bad? Could be negligent NOT to do these,
but need to account for doing them.
41Statistical Software
42Professional Statistics Software Package
Output
Stored data access-ible.
Enter code syntax.
43Microsoft Excel for Statistics
- Primarily for descriptive statistics.
- Limited output.
44Almost Free On-Line Statistics Software
www.statcrunch.com
Run from browser not local. 5/ 6 months
usage. Potential HIPPA concerns
Supported by NSF
45Typical Statistics Software Package
Select Methods from Menus
www.ncss.com www.minitab.com www.stata.com
100 - 500
Output after menu selection
Data in spreadsheet
46http//gcrc.labiomed.org/biostat
This and other biostat talks posted
47Conclusions
Dont put off slow enrollment find the cause
solve it. I am available. Do put off analyses of
efficacy, not of design assumptions. I am
available. P-values are earned, by following
methods which are needed for them to be valid. I
am available. You may have to pay for lack of
attention to protocol decisions, to satisfy the
scientific method. I am available. Software
always takes more time than expected.
48Thank You
Nils Simonson, in Furberg Furberg, Evaluating
Clinical Research