Prognostic Model Building with Biomarkers in Pharmacogenomics Trials

About This Presentation
Title:

Prognostic Model Building with Biomarkers in Pharmacogenomics Trials

Description:

Prognostic Model Building with Biomarkers in Pharmacogenomics Trials ... Lynn Ploughman. Amy Ronczka. Katy Simonsen. Eric Strittmatter. Dana Wheeler. Shujian Wu ... –

Number of Views:91
Avg rating:3.0/5.0
Slides: 15
Provided by: ams5
Category:

less

Transcript and Presenter's Notes

Title: Prognostic Model Building with Biomarkers in Pharmacogenomics Trials


1
Prognostic Model Building with Biomarkers in
Pharmacogenomics Trials
  • Li-an Xu Douglas Robinson
  • Statistical Genetics Biomarkers
  • Exploratory Development, Global Biometric
    Sciences
  • Bristol-Myers Squibb
  • 2006 FDA/Industry Statistics WorkshopTheme -
    Statistics in the FDA and Industry Past,
    Present, and FutureWashington, DC
  • September 27-29, 2006

2
Outline
  • Statistical Challenges in Prognostic Model
    Building
  • Data quantity and quality across multiple
    platforms
  • Dimension reduction in model building process
  • Model performance measures
  • Realistic assessment of model performance
  • Handling correlated predictors when p gtgt n

3
Data Quantity and Quality Across Platforms
  • Tumor samples for mRNA
  • Trial A Sample Size 161 Subjects
  • 134 usable (sufficient quality and quantity) mRNA
    samples (85)
  • Trial B Sample Size 110 Subjects
  • 83 usable mRNA samples (75)
  • Plasma protein profiling (Liquid Chromatography /
    Mass Spectrometry)
  • Trial B Sample Size 110 Subjects
  • 90 usable plasma samples (82)
  • Even if sample collection is mandatory, usable
    sample size lt subject sample size
  • Need to design studies based on expected usable
    sample size

4
Dimension Reduction in Prognostic Model Building
  • Number of potential predictors is greater than
    number of subjects (pgtgtn) in high throughput
    biomarker studies
  • No unique solutions in prognostic model fitting
    with traditional methods
  • Regularized methods can provide some possible
    solutions
  • Penalized logistic regression (PLR) Recursive
    Feature Elimination (RFE)
  • Threshold gradient descent RFE
  • Further dimension reduction may still be needed
  • Incorporate prior information (e.g. results from
    preclinical studies as the starting point for p)
  • Intersection of single-biomarker results from
    multiple statistical methods

5
Dimension Reduction Through Penalized Logistic
Regression with Recursive Feature Elimination to
Select Genes
Training Set
Genes
Patients
6
Dimension Reduction Through Preclinical Studies
  • Predicting cell line sensitivity to a compound
  • 18 cancer cell lines (12 sensitive, 6 resistant)
  • Identified top 200 genes associated with in
    vitro
  • sensitivity/resistance

Sensitive Resistant
Expression
18 Caner Cell Lines
Example of one gene
7
Predicting Response in Trial A
All treated patients N161 Patients included in the genomics analysis N134
Response 29 (18) 23 (17)
Models PPV (95 CI) NPV (95 CI) Sensitivity(95 CI) Specificity (95 CI) Error
Starting with full gene list, resulting in 6-gene model 0 (0-0.30) 0.81 (0.69-0.89) 0 (0 -0.26) 0.84 (0.72 -0.91) 0.580
Starting with preclinical top 200, resulting in 10-gene model 0.45 (0.21-0.72) 0.89 (0.79-0.95) 0.45 (0.21-0.72) 0.89 (0.79-0.95) 0.326
  • Dimension reduction by using prior preclinical
    results seemed to help in this trial

8
Dimension Reduction Through Intersection of
Single-Biomarker Results from Multiple
Statistical Methods
Method Resp1 Resp2 Resp3 Resp4 TTP
Log Reg X X X X
t - Test X X X X
Cox X
Logistic Regression 297 Probesets
t Test 396 Probesets
46
97
51
Cox Proportional Hazards 446 Probesets
  • Intersection resulted in 51 potential candidates
  • It may be more beneficial to start model building
    with this set than the complete set of potential
    predictors (work currently in progress)

9
Model Performance Measures
  • Sensitivity, Specificity, Positive and Negative
    Predictive Value are common measures of model
    performance
  • Dependent on the threshold
  • Area under the ROC curve (AUC) may be a better
    measure for comparing models
  • These figures are from simulated perfect
    predictors
  • All three models yield complete separation
    between responders and non-responders
  • Arbitrary threshold of 0.5 probability may lead
    one to believe that model 2 is superior
  • AUC correctly shows equivalence

Sensitivity Specificity PPV NPV AUC
Model 1 0.73 1 1 0.79 1
Model 2 1 1 1 1 1
Model 3 1 0.77 0.81 1 1
10
Realistic Assessment of Model Performance
  • When sample size is reasonably large
  • Split sample into a training set and independent
    test Set
  • Build the model on the training set and test the
    model performance on the test set
  • Pro One independent test of model performance
    for the model picked in the training set
  • Cons
  • When sample size is small, the estimate of
    performance may have a large variance
  • Reduced sample size for training may yield
    sub-optimal model
  • Christophe Ambroise Geoffrey J.
  • McLachlan, PNAS 99(10) 2002
  • Entire model building procedure should be
    cross-validated

11
Realistic Assessment of Model Performance
  • When sample size is small, one cannot split data
    into training / test set
  • Crossvalidation alone is a reasonable
    alternative
  • Warning Initial performance estimate may be
    misleading

Cross-validated AUC
Number of Predictors
  • Cross-validation should be repeated multiple
    times
  • Allows one to observe effects of sampling
    variability
  • The average of replicate estimators gives a more
    accurate assessment of model performance

12
Handling Correlated Predictors When p gtgt n
  • Complex correlation structure (mRNA as example)
  • Multiple probe sets interrogate the same gene
  • Multiple genes function together in pathways
  • Not all pathways are known
  • Multiple response definitions that are
    interrelated
  • False positive genes may be correlated with true
    positives
  • Most prognostic modeling techniques do not handle
    this well
  • Recursive feature elimination may remove
    important predictors because of correlations
  • This is an open research problem

13
Summary
  • Need to design studies based on expected usable
    sample size
  • Dimension reduction in the model building process
  • Overfitting problem can be mitigated by
    regularized methods
  • To further reduce the candidate set of predictors
  • Preclinical information can be useful
  • Intersection of single-biomarker results by
    different statistical methods may also be useful
  • Model performance
  • Independent test set may be important for
    validation purposes. When sample size is small,
    cross-validation is a viable alternative.
  • Cross-validation should include biomarker
    selection procedures and needs to be performed
    appropriately
  • Cross-validation should be repeated multiple
    times
  • Performance measures should be carefully chosen
    when comparing multiple models. AUC often is a
    good choice.
  • Handling correlated predictors is still an open
    research problem

14
Acknowledgments
Haolan Lu David Mauro Shelley Mayfield Oksana
Mokliatchouk Relekar Padmavathibai Barry
Paul Lynn Ploughman Amy Ronczka Katy
Simonsen Eric Strittmatter Dana Wheeler Shujian
Wu Shuang Wu Kim Zerba Renping Zhang
Can Cai Scott Chasalow Ed Clark Mark Curran Ashok
Dongre Matt Farmer Alexander Florczyk Shirin
Ford Susan Galbraith Ji Gao Nancy Gustafson Ben
Huang Tom Kelleher Christiane Langer Hyerim Lee
Write a Comment
User Comments (0)
About PowerShow.com