Title: Max Planck Institute
1Max Planck Institute for Molecular Genetics
Microarray Data Analysis Comparison of various
Normalization Methods Christine Steinhoff Max
Planck Institute for Molecular Genetics
2Outline
- Types of Arrays
- Effects in the Data
- Procedure of Data Analysis
- Problems and Noise
- What is Normalization and why?
- Methods
- How to use them
- Does it make any difference which method to use?
- How to find out which one? one example
3Types of Arrays
Red/Green experiments
Affymetrix Chips
Radioactive filters
4Data Analysis Procedure
Informations about spot intensities (Pixel
and mean/median etc) local background pins PCR
plates Localization standard deviation...
5Effects in your Data
Hybridisierung
6Data Analysis Procedure
Starting with Image Processing Scanner output
information about spot intensities local
background pins PCR plates localization
standard deviation...
Quality Check Are there any effects due to
pins PCR plates local effects ...
7Problem 1 Background
8Problem 2 variability
9Problem 3 Saturation
10Problem 4 linearity
11Problem 5 variance
Logratio
Product intensity (logscale)
12Problem 6 Pin/Plate Effect?
Ratio of intensitities of both channels
Huber/von Heydebreck
Product intensitity of both channels
13Problem 7 Pin Effect
Ratio of intensitities of both channels
Yang, YH et al, SPIE BiOS, San Jose 2001
Product intensitity of both channels
14What is Normalization?
Systematic Variation in Microarray Experiments
- Saturation (Scanner Labeling) -
Nonlinearity of Cy5, Cy3 Labeling - Efficiencies
of Cy5, Cy3 Labeling - Variation of
Low-Intensities - Pins - PCR Plates - Local
Effects ...
Normalization is the process of describing and
removing such variation
15Why ?
Goal Reliable Measurement of Ratios Patient
vs. Control Patient(red)/Control(green)
Patient(green)/Control(red)
In Self-Self-Hybridization we would
expect green/red 1 for all genes
Mixture of Unequal Labelling Noise not constant
Variance Differential Expression (not in this
example!) ...
16Methods
User Defined Sets Housekeeping (?!) Controls
etc useful for Most Genes Changed- Settings
Entire Dataset useful for Most Genes
Unchanged- Settings
17Methods
Local Regression determine regression lines
locally
18Methods
19How to use them?
http//www.bioconductor.org
20Methods
1 maximal differential genes (red, 138 genes)
discarding 5 lowest expressed genes (green, 691
genes) before log product vs. log ratio of
normalized intensities
No Normalization
Linear Regression
Local Regression
Overall Median
Zscore
ANOVA
Variance Stabilization
21Comparison
Goal Detection of differentially expressed
genes Set of 30 maximal differential genes out
of 13824
22Comparison
Goal Detection of Differentially expressed Genes
Var Stab ANOVA Lin Regr Least Med Local
Regr Mean Median Shorth Zscore Raw
distance d(i,j) N - shared genes
23Comparison
Goal Detection of Differentially expressed Genes
Var Stab ANOVA Lin Regr Least Med Local
Regr Mean Median Shorth Zscore Raw
d(i,j) 1 - 6/(N(N2-1)) ?k,l1...N d(i,j)k,l
genes ordered by abs(logratio) d(i,j)k,l
rank(genek)-rank(genel) if exists
N1 else
24Which One?
Dataset three repetitions of one dye swap
experiment (6) followed by Northern blot
verification
Normalization strategies
Biological Evaluation (a) Northern Blotting
(b) quant. RT PCR (c) SAGE library (d)
quantifiable controls
25Goal Biological Evaluation
Microarray Ratios
quantifiable method different from microarray
26Three Dye Swaps
experiment 1
experiment 2
experiment 3
swap 1
genes empties housekeping plant
swap 2
27Biological Evaluation
RAW DATA
LogRatio
product of intensity (logScale)
28Comparison
(1) 0.845
(2) 0.845
(7) 0.854
(3) 0.859
(4) 0.851
(1) Raw data (2) median (3) ZScore (4) Overall
(linear) Regression (5) Local Regression (6)
Variance Stabilization (7) ANOVA
(5) 0.851
(6) 0.853
29Fazit
For good data it doesnt really matter But
whats about bad data?
30Spoil the data
Plate specific effect Random
effect Labeling effect Scanner effect
31Normalize again
32Compare with original data
no/mean ZScore Local R
Lin R ANOVA VarStab
raw random effect on 5 spots labeling
scatter in low intensity
correlation coefficient