Class prediction for experiments with microarrays

About This Presentation

Title:

Class prediction for experiments with microarrays

Description:

Title: Assessment of the reproducibility of gene expression profile in a breast cancer cell line Author: Lara Lusa Last modified by: lara Created Date – PowerPoint PPT presentation

Number of Views:120

Avg rating:3.0/5.0

Slides: 45

Provided by: Lara45

Category:

more less

Transcript and Presenter's Notes

Title: Class prediction for experiments with microarrays

1
Class prediction for experiments with microarrays

Lara Lusa
Inštitut za biomedicinsko informatiko Medicinska
fakulteta
Lara.Lusa at mf.uni-lj.si

2
Outline

Objectives of microarray experiments
Class prediction
What is a predictor?
How to develop a predictor?
Which are the available methods?
Which features should be used in the predictor?
How to evaluate a predictor?
Internal v External validation
Some examples of what can go wrong
The molecular classification of breast cancer

3
Scheme of an experiment

Study design
Performance of the experiment
Sample preparation
Hybridization
Image analysis
Quality control and normalization
Data analysis
Class comparison
Class prediction
Class discovery
Interpretation of the results

4
Aims of high-throughput experiments

Class comparison - supervised
establish differences in gene expression between
predetermined classes (phenotypes)
Tumor vs. Normal tissue
Recurrent vs. Non-recurrent patients treated with
a drug (Ma, 2004)
ER vs ER- patients (West, 2001)
BRCA1, BRCA2 and sporadics in breast cancer
(Hedenfalk, 2001)
Class prediction - supervised
prediction of phenotype using gene expression
data
morphology of a leukemia patient based on his
gene expression (ALL vs. AML, Golub 1999)
which patients with breast cancer will develop a
distant metastasis within 5 years (vant Veer,
2002)
Class discovery - unsupervised
discover groups of samples or genes with similar
expression
Luminal A, B, C(?), Basal, ERBB2, Normal in
Breast Cancer (Perou 2001, Sørlie, 2003)

5
Data from microarray experiments
6
How to develop a predictor?

On a training set of samples
Select a subset of genes (feature selection)
Use gene expression measurements (X)
Predict class
membership (Y) of new samples
(test set)

Obtain a RULE (g(X)) based on gene-expression for
the classification of new samples
7
An example from Duda et al.
8
Rule Nearest-neighbor classifier

For each sample of the independent data set
(testing set) calculate Pearsons (centered)
correlation of its gene expression with each
sample from the test
Classification rule assign the new sample to the
class to which belongs the samples from the
training set which has the highest correlation
with the new sample

Samples from training set
correlation
new sample
Bishop, 2006
9
Rule K-Nearest-neighbor classifier

For each sample of the independent data set
(testing set) calculate Pearsons (centered)
correlation of its gene expression with each
samplefrom the test
Classification rule assign the new sample to the
class to which belong the majority of the samples
from the training set which have the K highest
correlation with the new sample

Samples from training set
correlation
new sample
K3
Bishop, 2006
10
Rule Method of centroids (Sørlie et al. 2003)

Method of centroids class prediction rule
Define a centroid for each class on the original
data set (training set)
For each gene, average its expression from the
samples assigned to that class
For each sample of the independent data set
(testing set) calculate Pearsons (centered)
correlation of its gene expression with each
centroid
Classification rule Assign the sample to the
class for which the centroid has the highest
correlation with the sample (if below .1 do not
assign)

centroids
correlation
new sample
Assigned to the class which centroid has highest
correlation with the new sample
11
Rule Diagonal Linear Discriminant Analysis (DLDA)

Calculate mean expression of samples from Class 1
and Class 2 in the training set for each of the G
genes
and the pooled within class variance
For each sample x of the test set evaluate if
where xj is the expression of the j-th gene for
the new sample
Classification rule if the above inequality is
satisfied, classify the sample in Class 1,
otherwise to Class 2.

12
Rule Diagonal Linear Discriminant Analysis (DLDA)

Particular case of discriminant analysis with the
hypotheses that
the feature are not correlated
the variances of the two classes are the same
Other methods used in microarray studies are
variants of discriminant analysis
Compound covariate predictor
Weighted vote method

Bishop, 2006
13
Other popular classification methods

Classification and Regression Trees (CART)
Prediction Analysis of Microarrays (PAM)
Support Vector Machines (SVM)
Logistic regression
Neural networks

Bishop, 2006
14
How to choose a classification method?

No single method is optimal in every situation
No Free Lunch Theorem in absence of assumptions
we should not prefer any classification algorithm
over another
Ugly Ducking Theorem in absence of assumptions
there is no best set of features

15
The bias-variance tradeoff
Hastie et al, 2001
MSEED (g(x D) F(x))2 ( ED g(x D)
F(x) )2 ED ( g(x D) ED g(xD) )2
Bias2Variance
Duda et al, 2001
16
Feature selection

Can ALL the gene expression variables be included
in the classifier?
Which variables should be used to build the
classifier?
Filter methods
Prior to building the classifier
One feature at a time or joint distribution
approaches
Wrapper methods
Performed implicitly by the classifier
CART, PAM

From Fridlyand, CBMB Workshop
17
A comparison of classifiers performance for
microarray data

Dudoit, Fridlyand and Speed -2002, JASA on 3 data
sets
DA, DLDA, k-NN, SVM, CART
Good performance of simple classifiers as DLDA
and NN
Feature selection small number of features
included in the classifier

18
How to evaluate the performance of a classifier

Classification error
A sample is classified in a class to which it
does not belong
g(X) ? Y
Predictive accuracy of correctly classified
samples
In a two-class problem, using the terminology
from diagnostic tests (diseased, -healthy)
Sensitivity P(classified true )
Specificity P(classified - true -)
Positive predictive value P( true classified
)
Negative predictive value P( true -
classified -)

19
Class prediction how to assess the predictive
accuracy?

Use an independent data set
If it is not available?
ABSOLUTELY WRONG
Apply your predictor to the data you used to
develop it and see how well it predicts
OK
cross validation
bootstrap

train
train
train
test
train
train
train
data
test
test
test
test
test
20
How to develop a cross-validated class predictor

Training set

Test set
Predict class using class predictor from test set

21
Dupuy and Simon, JNCI 2007
Supervised prediction 12/28 reported a
misleading estimate of prediction accuracy 50
of studies contained one or more major flaws
22
(No Transcript)
23
Class prediction a famous example
vant Veer et al. report results obtained with
wrong analysis in the paper and correct analysis
(with less striking results) just in the
supplementary material
24
What went wrong?
Produces highly biased estimates of predictive
accuracy
Going beyond the quantification of predictive
accuracy and attempting to make inference with
cross-validated class predictor INFERENCE MADE
IS NOT VALID
25
Observed
Hypothesis there is no difference between
classes Prop. of rejected H0 0.01 0.05
0.10 LOO CV 0.268 0.414 0.483 (n 100) Lusa,
McShane, Radmacher, Shih, Wright, Simon,
Statistics in Medicine, 2007
lt5 yrs gt5yrs
Good prognosis 31 18
Bad prognosis 2 26
Microarray predictor
Odds ratio15.0, p-value4 10(-6)

Parameter Logistic Coeff Std. Error Odds
ratio 95 CI
--------------------------------------------------
--------------------------------------------------
------
Grade -0.08 0.79 1.1 0.2 5.1
ER 0.5 0.94 1.7 0.3 10.4
PR -0.75 0.93 2.1 0.3 13.1
size (mm) -1.26 0.66 3.5 1.0 12.8
Age 1.4 0.79 4 0.9 19.1
Angioinvasion -1.55 0.74 4.7 1.1 20.1
Microarray 2.87 0.851 7.6 3.3 93.7
--------------------------------------------------
--------------------------------------------------
------

26
Michiels et al, 2005 Lancet
27
Final remarks

Simple classification methods such as LDDA have
proved to work well for microarray studies and
outperform fancier methods
A lot of classification methods which have been
proposed in the field with new names are just
slight modifications of already known techniques

28
Final remarks

Report all the necessary information about your
classifier so that other can apply it to their
data
Evaluate correctly the predictive accuracy of the
classifier
in early microarray times, many papers
presented analyses that were not correct, or drew
wrong conclusions from their work.
still now, middle and low IF journals keep
publishing obviously wrong analyses
Dont apply methods without understanding exactly
what they are doing
on which assumptions they rely

29
Other issues in classification

Missing data
Class representation
Choice of distance function
Standardization of observations and variables
An example where all this matters

30
Class discovery

Mostly performed through hierarchical clustering
of genes and samples
Often abused method in microarray analysis, used
instead of supervised methods
In very few examples
stability and reproducibility of clustering is
assessed
results arevalidated or further used after
discovery
a rule for classification of new samples is given
Projection of the clustering to new data sets
seems still problematic

It becomes a class prediction problem
31
Molecular taxonomy of breast cancer

Perou/Sørlie (Stanford/Norway)
Class sub-type discovery (Perou, Nature 2001,
Sørlie, PNAS 2001, Sørlie, PNAS 2003)
Association of discovered classes with survival
and other clinical variables (Sørlie, PNAS 2001,
Sørlie, PNAS 2003)
Validation of findings assigning class labels
defined from class discovery to independent data
sets (Sørlie, PNAS 2003)

32
Sørlie et al, PNAS 2003
10 (gt.31) 2/3
28 (gt.32) 89
11 (gt.28) 82
11 (gt.34) 64
19 (gt.41) 22
n79 (64) (?)
ER
Hierarchical clustering of the 122 samples from
the paper using the intrinsic gene-set (500
genes) Average linkage and distance 1- Pearsons
(centered) correlation Number of samples in each
class (node correlation for the core samples
included for each subtype) and percentage of ER
positive samples
33
Can we assign subtype membership to samples from
independent data sets?
Sørlie et al. 2003
centroids

Method of centroids class prediction rule
Define a centroid for each class on the original
data set (training set)
For each gene, average its expression from the
samples assigned to that class
For each sample of the independent data set
(testing set) calculate Pearsons (centered)
correlation of its gene expression with each
centroid
Classification rule Assign the sample to the
class for which the centroid has the highest
correlation with the sample (if below .1 do not
assign)

correlation
Assigned to the class which centroid has highest
correlation with the new sample
new sample

Cited thousands of times
Widely used in research papers and praised in
editorials
Recent concerns raised about their
reproducibility and robustness

West data set
34
Predicted class membership Sørlie our data

Loris I obtained the subtypes on our data! All
the samples from Tam113 are Lum A, a bit
strange... there are no Lum B in our data set
Lara Have you tried also on the BRCA60?
Loris No ... Those are mostly LumA, too. Some
are Normal, very strange..there are no basal
among the ER-!
Lara ... Have you mean-centered the genes?
Loris No ... Looks better on BRCA60 Now the
ER- of are mostly basal... On Tam113 I get many
lumB... But 50 of the samples from Tam113 are
NOT luminal anymore!
Something is wrong!

BRCA60 Hereditary BRCa (42ER/16ER-)
Tam113 Tamoxifen treated BR Ca 113
ER/ 0 ER-
35
How are the systematic differences between
microarray platforms/batches taken into account?

Sørlies et al 2003 data set
Genes were mean (and eventually median) centered
, the data file was adjusted for array batch
differences as follows on a gene-by-gene basis,
we computed the mean of the nonmissing expression
values separately in each batch. Then for each
sample and each gene, we subtracted its batch
mean for that gene. Hence, the adjusted array
would have zero row-means within each batch. This
ensures that any variance in a gene is not a
result of a batch effect.
Rows (genes) were median-centered and both genes
and experiments were clustered by using an
average hierarchical clustering algorithm.
West et al data set (Affymetrix, single channel
data)
Genes were centered
Data were transformed to a compatible format by
normalizing to the median experiment Each
absolute expression value in a given sample was
converted to a ratio by dividing by its average
expression value across all samples.
vant Veer et al data set
Genes do not seem to have been mean-centered
Other data sets where the method was applied
Genes were always centered

Mean-centering
ER-
ER
36
Possible concerns on the application of the
method of centroids

How are the classification results influenced
by...
normalization of the data (mean-centering of the
genes)?
differences in subtype prevalence across data
sets?
presence of study (or batch) effects?
choice of the method of centroids as a
classification method?
the use of the arbitrary cut-off for non
classifiable samples?

Lusa et al, Challenges in projecting clustering
results across gene expression-profiling datasets
JNCI 2007
37
ER (Ligand-Binding Assay) 34 ER-/65 ER 7650
clones (6878 unique)
38
1. Effects of mean-centering the genes
method of centroids
centered (C)
Sorlies centroids (derived from centered data
set)
Sotirious data set
336/552 common and unique clones
non centered (N)
ER subset (65 samples)
ER- subset (34 samples)
full data set (99 samples)
Full data Full data Full data Full data ER subset ER subset
Centered Centered Not centered Not centered Centered Not centered
Class Number classified (?lt.1) ER Number classified (?lt.1) ER Number classified (?lt.1) Number classified (?lt.1)
Luminal A 43 (5) 41 59 (1) 55 19 (6) 55 (1)
Luminal B 13 (2) 11 1 (1) 1 13 (3) 1 (0)
ERBB2 13 (2) 6 10 (0) 2 11 (1) 2 (0)
Basal 21 (0) 0 5 (0) 0 11(5) 0 (0)
Normal 9 (0) 7 24 (2) 7 11 (1) 7 (0)
39
2. Effects of prevalence of subgroups in
(training and) testing set?
Predictive accuracy ER / ER-
10 ER/ 10 ER-
Test set
55 ER/ 24 ER-
95 / 79
55 ER/ 24 ER-
78 / 88
24 ER/ 24 ER-
88 / 83
12 ER/ 24 ER-
92 / 79
55 ER/ 0 ER-
53 / ND
0 ER/ 24 ER-
ND / 62
40
2b. What is the role played by prevalence of
subgroups in training and testing set?
ER status prediction Sotirious data set
multiple (100) random SPLITS
testing
training
method of centroids
Testing set
Training set
751 variance filtered unique clones
(C)
(C)
(N)
(N)
0 ?test 1 (ntest24) 0 ER/24ER- 1
ER/23ER- 24 ER/0ER-
?tr1/2 (ntr20) 10 ER/10ER-
? of ER samples in the testing set
correctly classified in class of ER
correctly classified in class of ER- of
correctly classified overall
41
3. (Possible) study effect on real data Sotiriou
vant Veer

vant Veer (Centered)

vant Veer (Non centered)

Predicted class membership
Class True ER (?lt.1) True ER-(?lt.1) Cor (min-max)
PredictedER 39 (1) 4 (2) .42 (.03-.62)
Predicted ER- 7 (4) 67 (4) .26 (.01-.55)
Class True ER (?lt.1) True ER- (?lt.1) Cor (min-max)
Predicted ER 43 (43) 8 (7) .02 (-.24-.13)
Predicted ER- 3 (3) 63 (53) -.03(-.23-16)

The predictive accuracy is the same
Most of the samples in the non-centered analysis
would not be classificable using the threshold

42
Conclusions I

Musts for a clinically useful classifier
It classifies unambiguously a new sample,
independently of any other samples being
considered for classification at the same time
The clinical meaning of the subtype assignment
(survival probability, probability of response to
treatment) must be stable across populations to
which the classifier might be applied
The technology used to assay the samples must be
stable and reproducible sample assayed on
different occasions assigned to the same subtype
BUT we showed that subgroup assignments of new
samples can be substantially influenced by
Normalization of data
Appropriateness of gene-centering depends on the
situation
Proportion of samples from each subtype in the
test set
Presence of systematic differences across data
sets
Use of arbitrary rules for identifying
non-classifiable samples
Most of our conclusions apply also to different
classification method

43
Conclusions II

Most of the studies claiming to have validated
the subtypes have focused only on comparing
clinical outcome differences
Shows consistency of results between studies
BUT does not provide direct measure of the
robustness of the classification essential before
using the subtypes in clinical practice
Careful thought must be given to comparability of
patient populations and datasets
Many difficulties remain in validating and
extending class discovery results to new samples
and a robust classification rule remains elusive
The subtyping of breast cancer seems promising
BUT
a standardized definition of the subtypes based
on a robust measurement method is needed