Title: Class prediction for experiments with microarrays
1Class prediction for experiments with microarrays
- Lara Lusa
- Inštitut za biomedicinsko informatiko Medicinska
fakulteta -
- Lara.Lusa at mf.uni-lj.si
2Outline
- Objectives of microarray experiments
- Class prediction
- What is a predictor?
- How to develop a predictor?
- Which are the available methods?
- Which features should be used in the predictor?
- How to evaluate a predictor?
- Internal v External validation
- Some examples of what can go wrong
- The molecular classification of breast cancer
3Scheme of an experiment
- Study design
- Performance of the experiment
- Sample preparation
- Hybridization
- Image analysis
- Quality control and normalization
- Data analysis
- Class comparison
- Class prediction
- Class discovery
- Interpretation of the results
4Aims of high-throughput experiments
- Class comparison - supervised
- establish differences in gene expression between
predetermined classes (phenotypes) - Tumor vs. Normal tissue
- Recurrent vs. Non-recurrent patients treated with
a drug (Ma, 2004) - ER vs ER- patients (West, 2001)
- BRCA1, BRCA2 and sporadics in breast cancer
(Hedenfalk, 2001) - Class prediction - supervised
- prediction of phenotype using gene expression
data - morphology of a leukemia patient based on his
gene expression (ALL vs. AML, Golub 1999) - which patients with breast cancer will develop a
distant metastasis within 5 years (vant Veer,
2002) - Class discovery - unsupervised
- discover groups of samples or genes with similar
expression - Luminal A, B, C(?), Basal, ERBB2, Normal in
Breast Cancer (Perou 2001, Sørlie, 2003)
5Data from microarray experiments
6How to develop a predictor?
- On a training set of samples
- Select a subset of genes (feature selection)
- Use gene expression measurements (X)
-
- Predict class
- membership (Y) of new samples
- (test set)
Obtain a RULE (g(X)) based on gene-expression for
the classification of new samples
7An example from Duda et al.
8Rule Nearest-neighbor classifier
- For each sample of the independent data set
(testing set) calculate Pearsons (centered)
correlation of its gene expression with each
sample from the test - Classification rule assign the new sample to the
class to which belongs the samples from the
training set which has the highest correlation
with the new sample
Samples from training set
correlation
new sample
Bishop, 2006
9Rule K-Nearest-neighbor classifier
- For each sample of the independent data set
(testing set) calculate Pearsons (centered)
correlation of its gene expression with each
samplefrom the test - Classification rule assign the new sample to the
class to which belong the majority of the samples
from the training set which have the K highest
correlation with the new sample
Samples from training set
correlation
new sample
K3
Bishop, 2006
10Rule Method of centroids (Sørlie et al. 2003)
- Method of centroids class prediction rule
- Define a centroid for each class on the original
data set (training set) - For each gene, average its expression from the
samples assigned to that class - For each sample of the independent data set
(testing set) calculate Pearsons (centered)
correlation of its gene expression with each
centroid - Classification rule Assign the sample to the
class for which the centroid has the highest
correlation with the sample (if below .1 do not
assign)
centroids
correlation
new sample
Assigned to the class which centroid has highest
correlation with the new sample
11Rule Diagonal Linear Discriminant Analysis (DLDA)
- Calculate mean expression of samples from Class 1
and Class 2 in the training set for each of the G
genes -
- and the pooled within class variance
- For each sample x of the test set evaluate if
- where xj is the expression of the j-th gene for
the new sample - Classification rule if the above inequality is
satisfied, classify the sample in Class 1,
otherwise to Class 2.
12Rule Diagonal Linear Discriminant Analysis (DLDA)
- Particular case of discriminant analysis with the
hypotheses that - the feature are not correlated
- the variances of the two classes are the same
- Other methods used in microarray studies are
variants of discriminant analysis - Compound covariate predictor
- Weighted vote method
Bishop, 2006
13Other popular classification methods
- Classification and Regression Trees (CART)
- Prediction Analysis of Microarrays (PAM)
- Support Vector Machines (SVM)
- Logistic regression
- Neural networks
Bishop, 2006
14How to choose a classification method?
- No single method is optimal in every situation
- No Free Lunch Theorem in absence of assumptions
we should not prefer any classification algorithm
over another - Ugly Ducking Theorem in absence of assumptions
there is no best set of features
15The bias-variance tradeoff
Hastie et al, 2001
MSEED (g(x D) F(x))2 ( ED g(x D)
F(x) )2 ED ( g(x D) ED g(xD) )2
Bias2Variance
Duda et al, 2001
16Feature selection
- Can ALL the gene expression variables be included
in the classifier? - Which variables should be used to build the
classifier? - Filter methods
- Prior to building the classifier
- One feature at a time or joint distribution
approaches - Wrapper methods
- Performed implicitly by the classifier
- CART, PAM
From Fridlyand, CBMB Workshop
17A comparison of classifiers performance for
microarray data
- Dudoit, Fridlyand and Speed -2002, JASA on 3 data
sets - DA, DLDA, k-NN, SVM, CART
- Good performance of simple classifiers as DLDA
and NN - Feature selection small number of features
included in the classifier
18How to evaluate the performance of a classifier
- Classification error
- A sample is classified in a class to which it
does not belong - g(X) ? Y
- Predictive accuracy of correctly classified
samples - In a two-class problem, using the terminology
from diagnostic tests (diseased, -healthy) - Sensitivity P(classified true )
- Specificity P(classified - true -)
- Positive predictive value P( true classified
) - Negative predictive value P( true -
classified -)
19Class prediction how to assess the predictive
accuracy?
- Use an independent data set
- If it is not available?
- ABSOLUTELY WRONG
- Apply your predictor to the data you used to
develop it and see how well it predicts - OK
- cross validation
- bootstrap
train
train
train
test
train
train
train
data
test
test
test
test
test
20How to develop a cross-validated class predictor
- Test set
- Predict class using class predictor from test set
21Dupuy and Simon, JNCI 2007
Supervised prediction 12/28 reported a
misleading estimate of prediction accuracy 50
of studies contained one or more major flaws
22(No Transcript)
23Class prediction a famous example
vant Veer et al. report results obtained with
wrong analysis in the paper and correct analysis
(with less striking results) just in the
supplementary material
24What went wrong?
Produces highly biased estimates of predictive
accuracy
Going beyond the quantification of predictive
accuracy and attempting to make inference with
cross-validated class predictor INFERENCE MADE
IS NOT VALID
25Observed
Hypothesis there is no difference between
classes Prop. of rejected H0 0.01 0.05
0.10 LOO CV 0.268 0.414 0.483 (n 100) Lusa,
McShane, Radmacher, Shih, Wright, Simon,
Statistics in Medicine, 2007
lt5 yrs gt5yrs
Good prognosis 31 18
Bad prognosis 2 26
Microarray predictor
Odds ratio15.0, p-value4 10(-6)
- Parameter Logistic Coeff Std. Error Odds
ratio 95 CI - --------------------------------------------------
--------------------------------------------------
------ - Grade -0.08 0.79 1.1 0.2 5.1
- ER 0.5 0.94 1.7 0.3 10.4
- PR -0.75 0.93 2.1 0.3 13.1
- size (mm) -1.26 0.66 3.5 1.0 12.8
- Age 1.4 0.79 4 0.9 19.1
- Angioinvasion -1.55 0.74 4.7 1.1 20.1
- Microarray 2.87 0.851 7.6 3.3 93.7
- --------------------------------------------------
--------------------------------------------------
------
26Michiels et al, 2005 Lancet
27Final remarks
- Simple classification methods such as LDDA have
proved to work well for microarray studies and
outperform fancier methods - A lot of classification methods which have been
proposed in the field with new names are just
slight modifications of already known techniques
28Final remarks
- Report all the necessary information about your
classifier so that other can apply it to their
data - Evaluate correctly the predictive accuracy of the
classifier - in early microarray times, many papers
presented analyses that were not correct, or drew
wrong conclusions from their work. - still now, middle and low IF journals keep
publishing obviously wrong analyses - Dont apply methods without understanding exactly
- what they are doing
- on which assumptions they rely
29Other issues in classification
- Missing data
- Class representation
- Choice of distance function
- Standardization of observations and variables
- An example where all this matters
30Class discovery
- Mostly performed through hierarchical clustering
of genes and samples - Often abused method in microarray analysis, used
instead of supervised methods - In very few examples
- stability and reproducibility of clustering is
assessed - results arevalidated or further used after
discovery - a rule for classification of new samples is given
- Projection of the clustering to new data sets
seems still problematic
It becomes a class prediction problem
31Molecular taxonomy of breast cancer
- Perou/Sørlie (Stanford/Norway)
- Class sub-type discovery (Perou, Nature 2001,
Sørlie, PNAS 2001, Sørlie, PNAS 2003) - Association of discovered classes with survival
and other clinical variables (Sørlie, PNAS 2001,
Sørlie, PNAS 2003) - Validation of findings assigning class labels
defined from class discovery to independent data
sets (Sørlie, PNAS 2003)
32Sørlie et al, PNAS 2003
10 (gt.31) 2/3
28 (gt.32) 89
11 (gt.28) 82
11 (gt.34) 64
19 (gt.41) 22
n79 (64) (?)
ER
Hierarchical clustering of the 122 samples from
the paper using the intrinsic gene-set (500
genes) Average linkage and distance 1- Pearsons
(centered) correlation Number of samples in each
class (node correlation for the core samples
included for each subtype) and percentage of ER
positive samples
33Can we assign subtype membership to samples from
independent data sets?
Sørlie et al. 2003
centroids
- Method of centroids class prediction rule
- Define a centroid for each class on the original
data set (training set) - For each gene, average its expression from the
samples assigned to that class - For each sample of the independent data set
(testing set) calculate Pearsons (centered)
correlation of its gene expression with each
centroid - Classification rule Assign the sample to the
class for which the centroid has the highest
correlation with the sample (if below .1 do not
assign)
correlation
Assigned to the class which centroid has highest
correlation with the new sample
new sample
- Cited thousands of times
- Widely used in research papers and praised in
editorials - Recent concerns raised about their
reproducibility and robustness
West data set
34 Predicted class membership Sørlie our data
- Loris I obtained the subtypes on our data! All
the samples from Tam113 are Lum A, a bit
strange... there are no Lum B in our data set - Lara Have you tried also on the BRCA60?
- Loris No ... Those are mostly LumA, too. Some
are Normal, very strange..there are no basal
among the ER-! - Lara ... Have you mean-centered the genes?
- Loris No ... Looks better on BRCA60 Now the
ER- of are mostly basal... On Tam113 I get many
lumB... But 50 of the samples from Tam113 are
NOT luminal anymore! - Something is wrong!
BRCA60 Hereditary BRCa (42ER/16ER-)
Tam113 Tamoxifen treated BR Ca 113
ER/ 0 ER-
35How are the systematic differences between
microarray platforms/batches taken into account?
- Sørlies et al 2003 data set
- Genes were mean (and eventually median) centered
- , the data file was adjusted for array batch
differences as follows on a gene-by-gene basis,
we computed the mean of the nonmissing expression
values separately in each batch. Then for each
sample and each gene, we subtracted its batch
mean for that gene. Hence, the adjusted array
would have zero row-means within each batch. This
ensures that any variance in a gene is not a
result of a batch effect. - Rows (genes) were median-centered and both genes
and experiments were clustered by using an
average hierarchical clustering algorithm. - West et al data set (Affymetrix, single channel
data) - Genes were centered
- Data were transformed to a compatible format by
normalizing to the median experiment Each
absolute expression value in a given sample was
converted to a ratio by dividing by its average
expression value across all samples. - vant Veer et al data set
- Genes do not seem to have been mean-centered
- Other data sets where the method was applied
- Genes were always centered
Mean-centering
ER-
ER
36Possible concerns on the application of the
method of centroids
- How are the classification results influenced
by... - normalization of the data (mean-centering of the
genes)? - differences in subtype prevalence across data
sets? - presence of study (or batch) effects?
- choice of the method of centroids as a
classification method? - the use of the arbitrary cut-off for non
classifiable samples?
Lusa et al, Challenges in projecting clustering
results across gene expression-profiling datasets
JNCI 2007
37ER (Ligand-Binding Assay) 34 ER-/65 ER 7650
clones (6878 unique)
381. Effects of mean-centering the genes
method of centroids
centered (C)
Sorlies centroids (derived from centered data
set)
Sotirious data set
336/552 common and unique clones
non centered (N)
ER subset (65 samples)
ER- subset (34 samples)
full data set (99 samples)
Full data Full data Full data Full data ER subset ER subset
Centered Centered Not centered Not centered Centered Not centered
Class Number classified (?lt.1) ER Number classified (?lt.1) ER Number classified (?lt.1) Number classified (?lt.1)
Luminal A 43 (5) 41 59 (1) 55 19 (6) 55 (1)
Luminal B 13 (2) 11 1 (1) 1 13 (3) 1 (0)
ERBB2 13 (2) 6 10 (0) 2 11 (1) 2 (0)
Basal 21 (0) 0 5 (0) 0 11(5) 0 (0)
Normal 9 (0) 7 24 (2) 7 11 (1) 7 (0)
392. Effects of prevalence of subgroups in
(training and) testing set?
Predictive accuracy ER / ER-
10 ER/ 10 ER-
Test set
55 ER/ 24 ER-
95 / 79
55 ER/ 24 ER-
78 / 88
24 ER/ 24 ER-
88 / 83
12 ER/ 24 ER-
92 / 79
55 ER/ 0 ER-
53 / ND
0 ER/ 24 ER-
ND / 62
402b. What is the role played by prevalence of
subgroups in training and testing set?
ER status prediction Sotirious data set
multiple (100) random SPLITS
testing
training
method of centroids
Testing set
Training set
751 variance filtered unique clones
(C)
(C)
(N)
(N)
0 ?test 1 (ntest24) 0 ER/24ER- 1
ER/23ER- 24 ER/0ER-
?tr1/2 (ntr20) 10 ER/10ER-
? of ER samples in the testing set
correctly classified in class of ER
correctly classified in class of ER- of
correctly classified overall
413. (Possible) study effect on real data Sotiriou
vant Veer
Predicted class membership
Class True ER (?lt.1) True ER-(?lt.1) Cor (min-max)
PredictedER 39 (1) 4 (2) .42 (.03-.62)
Predicted ER- 7 (4) 67 (4) .26 (.01-.55)
Class True ER (?lt.1) True ER- (?lt.1) Cor (min-max)
Predicted ER 43 (43) 8 (7) .02 (-.24-.13)
Predicted ER- 3 (3) 63 (53) -.03(-.23-16)
- The predictive accuracy is the same
- Most of the samples in the non-centered analysis
would not be classificable using the threshold
42Conclusions I
- Musts for a clinically useful classifier
- It classifies unambiguously a new sample,
independently of any other samples being
considered for classification at the same time - The clinical meaning of the subtype assignment
(survival probability, probability of response to
treatment) must be stable across populations to
which the classifier might be applied - The technology used to assay the samples must be
stable and reproducible sample assayed on
different occasions assigned to the same subtype - BUT we showed that subgroup assignments of new
samples can be substantially influenced by - Normalization of data
- Appropriateness of gene-centering depends on the
situation - Proportion of samples from each subtype in the
test set - Presence of systematic differences across data
sets - Use of arbitrary rules for identifying
non-classifiable samples - Most of our conclusions apply also to different
classification method
43Conclusions II
- Most of the studies claiming to have validated
the subtypes have focused only on comparing
clinical outcome differences - Shows consistency of results between studies
- BUT does not provide direct measure of the
robustness of the classification essential before
using the subtypes in clinical practice - Careful thought must be given to comparability of
patient populations and datasets - Many difficulties remain in validating and
extending class discovery results to new samples
and a robust classification rule remains elusive - The subtyping of breast cancer seems promising
- BUT
- a standardized definition of the subtypes based
on a robust measurement method is needed
44Some useful resources and readings
- Books
- Simon et al. Design and Analysis of DNA
Microarray Investigations Ch.8 - Speed (Ed.) Statistical Analysis of Gene
Expression Microarray Data Ch.3 - Bishop- Pattern Recognition and Machine Learning
- Hastie, Tibshirani and Friedman The Elements of
Statistical Learning - Duda, Hart and Stork Pattern Classification
- Software for data analysis
- R and Bioconductor (www.r-project.org,
www.bioconductor.org) - BRB Array Tools (http// linus.nci.nih.gov)
- Web sites
- BRB/NCI web site (NIH)
- Tibshiranis web site (Stanford)
- Terry Speeds web site (Berkley)