Title: From Probe 'Cel to Expression Level
1 From Probe .Cel to Expression
Level Christine Steinhoff Max Planck
Institut für Molekulare Genetik Computational
Molecular Biology Berlin
2Max Planck Institut für Molekulare Genetik
Outline
Outline
- Choose of Technology
- Affymetrix Technology
- Low Level Analysis Problems
- Background
- PM/MM
- Summary Statistic
- Comparison of different Low Level Analysis
Procedures - MAS
- Li/Wong
- RMA
- Comparison of different Normalization Strategies
3Max Planck Institut für Molekulare Genetik
Dataprocessing
4Max Planck Institut für Molekulare Genetik
Choose of Technology
Red Green Experiments
Affymetrix - Experiments
5Max Planck Institut für Molekulare Genetik
Choose of Technology
Patient Control
6Max Planck Institut für Molekulare Genetik
Levels of Replication
Hybridization
7Max Planck Institut für Molekulare Genetik
From probe .cel to expression
... TGTGATGGTGGGAATGGGTCAGAAGGACTCCTATGTGGGTGACGAG
GCC
TTACCCAGTCTTCCTGAGGATACACCCAC
TTACCCAGTCTTGCTGAGGATACACCCAC
8Max Planck Institut für Molekulare Genetik
probe .cel Problems
Background substraction PM / MM wanted one
value how to summarize ? variances across
array within probe set
1.5 2.4 10.4 0.1 ... 1.3 3.4
9Max Planck Institut für Molekulare Genetik
what has been done?
MAS 5.0
Background Array is split up into K
rectangular zones (default K16) Control cells
and masked cells are not used Ranking
cells Zbg lowest 2 for that zone (average
background of that zone) smoothing
dk(x,y)distance from the center of the zone
to some coordinate (x,y) wk(x,y)1/(dk2 s)
(default s100) background ?k wk(x,y)
Zbg / ?k wk(x,y)
.
.
10Max Planck Institut für Molekulare Genetik
what has been done?
MAS 5.0
PM - MM Signal calculation 1. Cell
intensities are preprocessed for global
background 2. Ideal Mismatch is calculated and
subtracted to adjust PM 3. Biweight estimator
as robust mean of resulting values 4. Signal is
scaled using trimmed mean
V i,j max (PM i,j - IM i,j , d) default d 2
-20 IM Ideal Match dependending on MM gt or
lt PM PV i,j log(V i,j) for j1,...,ni
11Max Planck Institut für Molekulare Genetik
what has been done?
MAS 5.0
Summary SignalLogValue Tbi(PV i,1 , ... ,
PV i,ni) (one step Tukeys Biweight)
u (x-Median(PV i,1 , ... , PV i,ni) ) /
(constMAD eps) w(u)
(1 - u2)2 for u lt 1 0 else
12Max Planck Institut für Molekulare Genetik
what has been done?
Li/Wong (PNAS 2001 vol 98 (1), pp31-36) Model
MMij ?j ?i ?j ? PMij ?j ?i ?j ?i
?j ? ?j baseline ?i expression for the gene
in the i th sample ?j rate of increase of the MM
response of j th probe pair ?j additional rate
of increase in the corresponding PM response ?
random error
13Max Planck Institut für Molekulare Genetik
what has been done?
Li/Wong
Summary Statistic Least Square Fitting to PMij
- MMij ?i ?j ?ij ?ij N(0,?2) gives least
square estimate for ?
14Max Planck Institut für Molekulare Genetik
what has been done?
RMA Irizarry/Bolstad/Speed (NAR, 2003 31(4),
e15) Background correction on raw intensity
scale subtraction Signal model PM background
signal bg s
background correction B(PM) E(sPM) s
exponential bg normal
optical noise non specific binding
15Max Planck Institut für Molekulare Genetik
what has been done?
RMA
PM, MM Forget about MM Reason
mathematical subtraction does not translate into
biological meaning Future improve BG
correction by using MMs
16Max Planck Institut für Molekulare Genetik
what has been done?
RMA Summary Statistic
Yijn ?jn ?jn ?ijn i1,...,I (chips) j1,...
,J (probes) n1,...,n (probe set) ?jn probe
affinity effect ?jn log scale expression
level ?ijn error iid N(0, ?2) ?j ?j 0 ? n -gt
median polish
Note Irizarry et al. (2003) recommend first
normalization than parameter estimation
17Max Planck Institut für Molekulare Genetik
does it matter at all?
all spots
MAS 5.0
Li/Wong pm only
Av Diff pm only
Li/Wong pm-mm
RMA
bgMASAv Diff pm only
Av Diff pm - mm
18Max Planck Institut für Molekulare Genetik
does it matter at all?
Reference distribution is normal for the log fold
change from Terry Speed, Summarizing and
comparing GeneChip? data
19Max Planck Institut für Molekulare Genetik
definitions
For the rest of the talk (1) take background
(RMA like) (2) only use PM (we dont know a
better solution) (3) summarize using RMA
model bioconductorlibrary(affy) x ReadAffy(ce
lfile.path"/project/gene_expression/spikein/") da
ta.rma express ( x, subset NULL ,
bg.correct bg.correct.rma ,
pmcorrect.method"pmonly" , summary.stat
medianpolish , normalizeF , verbose
TRUE )
20Max Planck Institut für Molekulare Genetik
Normalization
Problem Normalization ---gt Summary
Statistic Summary Statistic ---gt Normalization
first normalization
first summary
21Max Planck Institut für Molekulare Genetik
Problem Normalization
User Defined Sets Housekeeping (?!) Controls
etc useful for Most Genes Changed- Settings
Entire Dataset useful for Most Genes
Unchanged- Settings
22Max Planck Institut für Molekulare Genetik
Problem Normalization
Local Regression determine regression lines
locally
23Max Planck Institut für Molekulare Genetik
Problem Normalization
24Max Planck Institut für Molekulare Genetik
Problem Normalization
Goal Detection of Differentially expressed Genes
Var Stab ANOVA Lin Regr Least Med Local
Regr Mean Median Shorth Zscore Raw
d(i,j) 1 - 6/(N(N2-1)) ?k,l1...N d(i,j)k,l
genes ordered by abs(logratio) d(i,j)k,l
rank(genek)-rank(genel) if exists
N1 else
25Max Planck Institut für Molekulare Genetik
Problem Normalization
Biological Evaluation (a) Northern Blotting
(b) quant. RT PCR (c) SAGE library (d)
quantifiable controls
26Max Planck Institut für Molekulare Genetik
Dataset
Spike in dataset Design
flagged
flagged
27Max Planck Institut für Molekulare Genetik
Dataset
conc
expset 1
expset 3
expset 2
Exp
spike ins
28Max Planck Institut für Molekulare Genetik
Comparison of Normalization Strategies
quantile-norm.
q-spline-norm.
loess-norm.
log-conc
vsn-norm.
log-Int
29Max Planck Institut für Molekulare Genetik
Comparison of Normalization Strategies
log-Int Chip2
log-Int Chip1
30Max Planck Institut für Molekulare Genetik
Comparison of Normalization Strategies
31Max Planck Institut für Molekulare Genetik
Comparison of Normalization Strategies
Biological Evaluation (a) Northern Blotting
(b) quant. RT PCR (c) SAGE library (d)
quantifiable controls
32Max Planck Institut für Molekulare Genetik
Comparison of Normalization Strategies
(1) 0.845
(2) 0.845
(7) 0.854
(3) 0.859
(4) 0.851
(1) Raw data (2) median (3) ZScore (4) Overall
(linear) Regression (5) Local Regression (6)
Variance Stabilization (7) ANOVA
(5) 0.851
(6) 0.853
33Max Planck Institut für Molekulare Genetik
Spoil the data
Plate specific effect Random
effect Labeling effect Scanner effect
34Max Planck Institut für Molekulare Genetik
Comparison of Normalization Strategies
35Max Planck Institut für Molekulare Genetik
Summary
- Choose of Technology Crucial for Design of
experiment - Low Level Analysis Problems
- Background depending on the model different
results! - PM/MM Forget about MM because mathematical
subtraction does not translate into biological
meaning - Summary Statistic decide first normalizing or
first summarizing! - Comparison of different Low Level Analysis
Procedures - MAS performs worst
- Li/Wong performs well
- RMA performs well
- for good data it seems not to matter
- Comparison of different Normalization
Strategies - variance stabilization seems always to work
quite well