Title: Mauro Delorenzi
1The ISREC / NCCR Bioinformatics Core Facility
(BCF)DAFL OPEN HOUSE December 10, 2003
Mauro Delorenzi
2Overview
- What is the BCF ?, (towards a Lemanic
Centre for Data Analysis ?) - About spotted cDNA Microarrays
- Detection of differential expression by different
Microarray Platforms (spotted cDNA, Agilent and
Affymyetrix) and MPSS (case study,
preliminary analysis) - Tools Development for Analysis
3BCF What is it ?
- ISREC-based, supported by the NCCR for molecular
oncology, member group of the SIB - Crated by the NCCR molecular oncology to assist
its DAF (which is now absorbed into the DAFL) and
its microarray users in their biomedical research - A group devoted to the bioinformatics and
statistical aspects of gene expression research,
in particular to the analysis of data generated
with the microarray technology
4BCF and DAFL
- Shared operations with the DAFL
- array testing, annotation
- normalization and quality control services
- collaboration in future developments of the
technology (platform evaluation, optimisation) ? - collaboration in statistical data analysis of
research projects with the DAFL bioinformatics
team and with biostatistics dept at EPFL
DAFL as "nucleation center" ?
5BCF Our Main Components
- 1. Technical Support
- advice in experimental design and data analysis
- production, control, development of spotted
arrays - processing of microarray data, data quality
- 2. Collaborations
- - statistical data analysis of research
projects - 3. Education
- practical training through classes / workshops
- 4. Research Development
- development / testing tools methods
- gt we get requests for assistance and
collaborations - gt we get requests of young postdocs that ask for
jobs - Started an "Open House Concept"
- post someone in the BCF
- pay for service / collaboration
6Spotted cDNA arrays
- Data processing service through web interface
- set up for NCCR, now shared with DAFL
- reachable through links from our webpage
- http//www.isrec.isb-sib.ch/BCF/index.html
- MAIN AIMS
- - Proper Normalization
- Diagnostic plots (quality control, error analysis
) - Updated Information on the spotted clones / genes
7Spotted cDNA arrays
Human 10k Array 8x4 subarrays
8Diagnostic plot spatial-effect-visualization
diagnosis / feedback available for
inhomogeneous hybridization weak
signals saturation spot detection
problems overall performance can be useful also
for Agilent slides
problem alleviated but not eliminated by stronger
correction
9Experience with spotted cDNA arrays
- after different improvements we reached the stage
where peak performance is excellent - reproducibility is high ...
- ... when all steps worked well,
- ... which is not always the case
- robustness is relatively low,
- genomic coverage of ours arrays is relatively
poor - quality control is important (web interface
provides kind of a minimal standard and an
immediatae feedback) - the BCF/DAF/DAFL has set up a system of controls
- failures can happen in the production and in the
RNA isolations and hybridisations
10Study Design
- Platforms
- Affymetrix GeneChips short oligo arrays
- Agilent long oligo arrays
- in-house spotted cDNA arrays
- MPSS (massively parallel signature sequencing),
- collaboration with the Ludwig Institute
- Basic Design
- replicate measurements for two mRNAs (human
placenta and testis) - dye swap for two-color systems (Agilent, cDNA)
- 2 to 3 millions tags sequenced for MPSS
11Study Design II
- Experimental Method
- as recommended by "specialists"
- Affymetrix Biozentrum Basel
- Agilent Institut Goustav Roussy (Paris)
- Spotted cDNA arrays Otto Hagenbuechle's
- (DAF, now DAFL) crew
- MPSS Lynx (California), Victor Jongeneel's crew
- Data Handling
- as recommended by "specialists" see above
- but RMA quantile normalization for Affymetrix
12Platform Comparison Study
- Purpose
- to assess accuracy and reproducibility of
different gene expression platforms - to compare features of different measurement
types - to understand the system (important for
normalization and downstream analysis)
13Comparison principle
- crossplatform matching is done through the Tromer
database of transcripts, conserving only genes we
classify as reliably mapped between platforms - we have not yet looked at probe(set)s that could
not be well mapped to known transcripts - "peak technical performance" this is a case
study, not a systematic study - comparison based on M (log ratio) and A (log
intensity) values - accuracy cannot be assessed, as true M values are
not known
14DiffVAvrg plots testing reproducibilty
ydiff in M xavrg Int (A)
Affy U133A
Agilent h1A
DAF(L) h10k
15correlations
first quartile (25 less frequent RNAs)
fourth quartile (25 most frequent RNAs)
16Agreement top up 200 (placenta)
M range Affy 1.66 - 7.94 Agil 1.48 -
6.17 NCCR 1.83 - 7.12
17Agreement top down 200 (testis)
M range Affy -8.27 - -1.65 Agil -6.07 -
-1.47 NCCR -6.18 - -1.79
18Preliminary Conclusions I
- The three microarray platforms compared performed
very similarly in terms of which genes are
detected as differentially expressed ... - ... and also similarly in terms of the
distributions of M values and the deviation
between replicated measurements - ... so similarly that it is hard to find real
intrinsic differences between the three platforms
.... - ... at least in this case study, which has strong
effects at all signal intensities (at strongest
at average-high intensities) - Sensitivity (LOD), detection of DE at low signal
intensity is likely to be similar too - One would need to know real M values to see who
is right when they diagree, use MPSS?, plan qPCR
19Preliminary Conclusions II
- The Affy-RMA M values are better
variance-stabilized, but reproducibility is good
for all platforms except for weak signals in
Agilent (likely due to bgr sbtr) - The Affy-RMA M values are more strongly
"compressed" towards zero at low intensity, if
this is a good strategy cannot be said on the
basis of our results, it reduces false positive
calls but might make DE at low intensity
undetectable (but is it detectable at all?)
20Preliminary Conclusions III
- Microarray vs MPSS
- M values, quantitative comparison
- the disagreement is considerable ...
- ... so large that it is hard to reconcile the
values (by using confidence intervals) - M values, qualitative comparison
- there is a good degree of agreement
- approximately the same to all three microarray
platforms
21Interpretation tool Isrec Ontologizer Io
Thierry Sengstag Coll. with Pascal Anderle
Selection of hierarchical level Classification
of probe sets Classification of
UniGenes Classification of RefSeqs Flagging of
ambiguous results
22Analysis Tool Can we detect joint action of
two genes?
Postdoc Asa Wirapati
- Look at each possible pair, try to separate
tissue types by a straight line. Two aims - good discrimination (with a small number of
genes) - identify genes that are discriminative in
combination, but not taken singularly - Several dozens pairs are statistically
significant (w.r.t. randomly permuted labels), - they could represent cases of bona-fide
biologically significant "joint action" (Godard
2003)
23END
questions