Title: Prognostic Model Building with Biomarkers in Pharmacogenomics Trials
1Prognostic Model Building with Biomarkers in
Pharmacogenomics Trials
- Li-an Xu Douglas Robinson
- Statistical Genetics Biomarkers
- Exploratory Development, Global Biometric
Sciences - Bristol-Myers Squibb
- 2006 FDA/Industry Statistics WorkshopTheme -
Statistics in the FDA and Industry Past,
Present, and FutureWashington, DC - September 27-29, 2006
2Outline
- Statistical Challenges in Prognostic Model
Building - Data quantity and quality across multiple
platforms - Dimension reduction in model building process
- Model performance measures
- Realistic assessment of model performance
- Handling correlated predictors when p gtgt n
3Data Quantity and Quality Across Platforms
- Tumor samples for mRNA
- Trial A Sample Size 161 Subjects
- 134 usable (sufficient quality and quantity) mRNA
samples (85) - Trial B Sample Size 110 Subjects
- 83 usable mRNA samples (75)
- Plasma protein profiling (Liquid Chromatography /
Mass Spectrometry) - Trial B Sample Size 110 Subjects
- 90 usable plasma samples (82)
- Even if sample collection is mandatory, usable
sample size lt subject sample size
- Need to design studies based on expected usable
sample size
4Dimension Reduction in Prognostic Model Building
- Number of potential predictors is greater than
number of subjects (pgtgtn) in high throughput
biomarker studies - No unique solutions in prognostic model fitting
with traditional methods - Regularized methods can provide some possible
solutions - Penalized logistic regression (PLR) Recursive
Feature Elimination (RFE) - Threshold gradient descent RFE
- Further dimension reduction may still be needed
- Incorporate prior information (e.g. results from
preclinical studies as the starting point for p) - Intersection of single-biomarker results from
multiple statistical methods
5Dimension Reduction Through Penalized Logistic
Regression with Recursive Feature Elimination to
Select Genes
Training Set
Genes
Patients
6Dimension Reduction Through Preclinical Studies
- Predicting cell line sensitivity to a compound
- 18 cancer cell lines (12 sensitive, 6 resistant)
- Identified top 200 genes associated with in
vitro - sensitivity/resistance
Sensitive Resistant
Expression
18 Caner Cell Lines
Example of one gene
7Predicting Response in Trial A
All treated patients N161 Patients included in the genomics analysis N134
Response 29 (18) 23 (17)
Models PPV (95 CI) NPV (95 CI) Sensitivity(95 CI) Specificity (95 CI) Error
Starting with full gene list, resulting in 6-gene model 0 (0-0.30) 0.81 (0.69-0.89) 0 (0 -0.26) 0.84 (0.72 -0.91) 0.580
Starting with preclinical top 200, resulting in 10-gene model 0.45 (0.21-0.72) 0.89 (0.79-0.95) 0.45 (0.21-0.72) 0.89 (0.79-0.95) 0.326
- Dimension reduction by using prior preclinical
results seemed to help in this trial
8Dimension Reduction Through Intersection of
Single-Biomarker Results from Multiple
Statistical Methods
Method Resp1 Resp2 Resp3 Resp4 TTP
Log Reg X X X X
t - Test X X X X
Cox X
Logistic Regression 297 Probesets
t Test 396 Probesets
46
97
51
Cox Proportional Hazards 446 Probesets
- Intersection resulted in 51 potential candidates
- It may be more beneficial to start model building
with this set than the complete set of potential
predictors (work currently in progress)
9Model Performance Measures
- Sensitivity, Specificity, Positive and Negative
Predictive Value are common measures of model
performance - Dependent on the threshold
- Area under the ROC curve (AUC) may be a better
measure for comparing models
- These figures are from simulated perfect
predictors
- All three models yield complete separation
between responders and non-responders - Arbitrary threshold of 0.5 probability may lead
one to believe that model 2 is superior - AUC correctly shows equivalence
Sensitivity Specificity PPV NPV AUC
Model 1 0.73 1 1 0.79 1
Model 2 1 1 1 1 1
Model 3 1 0.77 0.81 1 1
10Realistic Assessment of Model Performance
- When sample size is reasonably large
- Split sample into a training set and independent
test Set - Build the model on the training set and test the
model performance on the test set - Pro One independent test of model performance
for the model picked in the training set - Cons
- When sample size is small, the estimate of
performance may have a large variance - Reduced sample size for training may yield
sub-optimal model
- Christophe Ambroise Geoffrey J.
- McLachlan, PNAS 99(10) 2002
- Entire model building procedure should be
cross-validated
11Realistic Assessment of Model Performance
- When sample size is small, one cannot split data
into training / test set - Crossvalidation alone is a reasonable
alternative - Warning Initial performance estimate may be
misleading
Cross-validated AUC
Number of Predictors
- Cross-validation should be repeated multiple
times - Allows one to observe effects of sampling
variability - The average of replicate estimators gives a more
accurate assessment of model performance
12Handling Correlated Predictors When p gtgt n
- Complex correlation structure (mRNA as example)
- Multiple probe sets interrogate the same gene
- Multiple genes function together in pathways
- Not all pathways are known
- Multiple response definitions that are
interrelated - False positive genes may be correlated with true
positives - Most prognostic modeling techniques do not handle
this well - Recursive feature elimination may remove
important predictors because of correlations - This is an open research problem
13Summary
- Need to design studies based on expected usable
sample size - Dimension reduction in the model building process
- Overfitting problem can be mitigated by
regularized methods - To further reduce the candidate set of predictors
- Preclinical information can be useful
- Intersection of single-biomarker results by
different statistical methods may also be useful - Model performance
- Independent test set may be important for
validation purposes. When sample size is small,
cross-validation is a viable alternative. - Cross-validation should include biomarker
selection procedures and needs to be performed
appropriately - Cross-validation should be repeated multiple
times - Performance measures should be carefully chosen
when comparing multiple models. AUC often is a
good choice. - Handling correlated predictors is still an open
research problem
14Acknowledgments
Haolan Lu David Mauro Shelley Mayfield Oksana
Mokliatchouk Relekar Padmavathibai Barry
Paul Lynn Ploughman Amy Ronczka Katy
Simonsen Eric Strittmatter Dana Wheeler Shujian
Wu Shuang Wu Kim Zerba Renping Zhang
Can Cai Scott Chasalow Ed Clark Mark Curran Ashok
Dongre Matt Farmer Alexander Florczyk Shirin
Ford Susan Galbraith Ji Gao Nancy Gustafson Ben
Huang Tom Kelleher Christiane Langer Hyerim Lee