Title: RNA sample and array quality checking and normalization
1RNA sample and array quality checking and
normalization
- Stephen Ohms BRF/CBiS
- Topics for this talk
- Some terminology
- RNA sample quality checking
- Selection of Affymetrix QC measurements
- Chip quality checking checking probe array
images for defects - Normalization how can we make different
arrays comparable
2Overview of Affymetrix GeneChip technology
An Affymetrix GeneChip microarray consists of a
1.28 cm2 glass square which is divided into a
grid of about 400,000 probe cells (each about 20
24 mm x 20 24 mm). Each probe cell contains
about 106 copies of one specific 25-mer
oligonucleotide. There is a set of 32 - 40
oligonucleotide probe cells for each gene or mRNA
transcript.
3The set of 32 - 40 probe cells consists of 16 -
20 perfect match (PM) probes and a paired set of
16 - 20 mismatch (MM) probes. Perfect match
(PM) A 25-mer complementary to a reference
sequence of interest (e.g., part of a
gene). Mismatch(MM) same sequence as PM but with
a single base change for the middle (13th) base
to its Watson-Crick complement . The purpose of
the MM probe is to detect non-specific binding
and background noise.
4MAS Data Architecture
Assay Steps
Output file
Prepare Define Targets
Exp.
.EXP
Process probe array Scan Chip Compute cell
intensities Analyze cell intensities
.DAT
.CEL
.RPT
.CHP
5Data file architecture in Affymetrix Microarray
Suite
The main software from Affymetrix is MAS 5.0
(Microarray Suite Version 5.0). Important files
include .DAT file image file produced by the
scanner, contains 10 x 10 pixels for each PM and
MM probe cell on the chip (107 pixels), 50MB
size .CEL file cell intensity file, contains PM
and MM probe values. The .CEL file contains one
value for every PM and MM cell on the chip. .CHP
file combines (or summarizes) the 20 PM and 20
MM values for each probe set to give a single
expression level for each probe set. The .CHP
file contains one expression value for every
probe set on the chip.
6RNA sample quality checking
- RNA quality and quantity is of great importance
to the success and - reproducibility of gene expression profiling
experiments. There are three - complementary methods to check RNA quality
- run the initial total RNA on an agarose gel and
examine the ribosomal RNA bands. Non-distinct
ribosomal RNA bands indicate degradation. - RNA purity and yield can be determined by optical
density (OD) absorbance measurements at
wavelengths of 260 and 280 nm. 260/280
absorbance readings should be carried out for
both total RNA and biotinylated cRNA. - Acceptable 260/280 ratios fall in the range of
1.8 to 2.1. - Ratios below 1.8 indicate possible protein
contamination. - Ratios above 2.1 indicate the presence of
degraded RNA, truncated cRNA transcripts, and/or
excess free nucleotides.
7(3) Assessing RNA sample quality with the Agilent
Bioanalyzer
Evaluation of RNA sample quality can also be
carried out with the Agilent Bioanalyzer which is
similar to a gel in concept. This has the
advantage of requiring less RNA than a gel (e.g.
200 pg of total RNA, 1 mL volume).
Electropherograms (fr electrophoresis) detect
and measure the ribosomal 5S, 18S, and 28S bands.
8Assessing RNA quality with the Bioanalyzer
When viewed as a virtual gel, the 28S, 18S, and
5S rRNA bands are easily identifiable. The
sample RNA in Lane 7 outlined in red
demonstrates how degraded RNA appears. Conversely
lane 8 contains good quality RNA outlined in
blue and the corresponding electropherogram can
be viewed in Figure 3 in the next slide. RNA
samples with visible degradation should not be
further processed.
9Assessing RNA quality with the Bioanalyzer
electropherograms
Ideally, the ratio of 28S/18S bands should be
close to 2, but samples that show clear 18S and
28S peaks are acceptable
Poor quality sample The 18S and 28S rRNA peaks
are jagged and there are many small peaks
(fragmented RNA) throughout the run demonstrating
degradation.
Good quality sample The 5S, 18S, and 28S rRNA
peaks are clean without additional erroneous
spikes.
10Array quality checking
Inspect the probe array image (.DAT file) for the
presence of image artifacts (i.e., high/low
intensity spots, scratches, high regional, or
overall background,etc.) on the array. There
should be no white speckling, holes, smudges,
areas of saturation or uneven hybridization on
the chip. Defects in probe array images may also
be seen in (1) Histograms of probe
intensities (2) q-q plots of probe intensities
11Histograms of raw PM or MM intensities are
usually skewed to the right. Thus they can be
transformed to a log2 scale for plotting. The
logarithm function tends to squeeze together the
larger values in a data set (values larger than
1) and stretches out the smaller values (values
less than 1). The same data has been plotted
using the two scales above.
12Why we do log transformation of probe intensities
The logarithm function tends to squeeze together
the larger values in a data set (values larger
than 1) and stretches out the smaller values
(values less than 1). This is illustrated in the
plot of a log function below e.g. y log(x)
2.0 2.2 2.6 2.8
The first two values on the x axis are 2.0 and
2.2. Their logarithms, 0.69 and 0.79 on the y
axis are much closer. The second two values, 2.6
and 2.8, are squeezed even more. Their logarithms
are 0.96 and 1.03.
1.03 0.96 0.79 0.69
13Log transformation of probe intensities
0.2 0.24 0.4 0.45
-0.8 -0.92 -1.39 -1.61
The values of 0.4 and 0.45 on the x axis have
logarithms (-0.92 and -0.80) that are further
apart. The values of 0.20 and 0.25 on the x axis
are stretched even further. Their logarithms are
-1.61 and -1.39, respectively.
14Checking array quality with images and histograms
The following set of arrays show one normal array
and arrays with different types of defect. The
probe array images and histograms of the log2
intensities clearly show the defects.
15(No Transcript)
16(No Transcript)
17q-q plots provide a visual comparison of
two populations A q-q plot is a plot of
the quantiles of one data set against the
quantiles of the second data set. The elements
of each population are sorted (or ranked), and
the quantiles of the first data set are plotted
against the quantiles of the second data set.
By a quantile, we mean the fraction (or percent)
of points below the given value. That is, the 0.3
(or 30) quantile is the point at which 30
percent of the data fall below and 70 fall above
that value.
18Example of a q-q plot
Suppose our data sets are Data set 1 6.0, 3, 1,
4.1, 5.2, 2 Data set 2 4.1, 1.2, 3.2, 2.1, 6.1,
5.0 We sort (or rank) the data Data set 1 -gt
1, 2, 3, 4.1, 5.2, 6.0 Data set 2 -gt 1.2 2.1,
3.2, 4.1, 5.0, 6.1 Then we plot the
corresponding quantiles against each other e.g 1
in data set 1 vs 1.2 in data set 2, 2 in data set
1 vs 2.1 in data set 2, etc
19q-q plot example
A 45-degree reference line is also plotted. If
the two sets come from a population with the same
distribution, the points should fall
approximately along this reference line. The
greater the departure from this reference line,
the greater the evidence for the conclusion that
the two data sets have come from populations with
different distributions.
20Many aspects of the distributions can be
simultaneously tested. For example, shifts in
location, shifts in scale, changes in symmetry,
and the presence of outliers can all be detected
from this plot. If the two data sets come from
populations whose distributions differ only by a
shift in location, the points should lie along a
straight line that is displaced either up or down
from the 45-degree reference line.
21This q-q plot shows that These 2 batches do
not appear to have come from populations with a
common distribution. The batch 1 values are
significantly higher than the corresponding batch
2 values. The differences are increasing from
values 525 to 625. Then the values for the 2
batches get closer again.
22Checking array quality with histograms and q-q
plots
The histograms of the log2 PM intensities and the
q-q plots of the log2 PM intensities and log2 MM
intensities clearly show the defects in the
following arrays. The corresponding histograms
and q-q plots of the log2 MM intensities show the
defects in a similar manner.
23(No Transcript)
24Histograms of .CEL intensities are a good
visualization tool for identifying saturation,
which is seen as an additional peak at the
highest log intensity in the plot. Saturated
probes should be excluded from further analysis.
25Box plots
Box plots are another good visualization tool for
analyzing the overall intensities of all probes
across the array. The box is drawn from the 25th
and 75th percentiles in the distribution of
intensities. The median, or 50th percentile, is
drawn inside the box. The whiskers (lines
extending from the box) describe the spread of
the data.
26Affymetrix QC
3'/5' ratio of housekeeping genes Reverse
transcriptase synthesizes cDNA starting from the
3'-end of an mRNA and finishing at the
5'-end. Thus most GeneChip expression arrays
contain probes for the regions corresponding to
the 3', middle and 5'-end of the house keeping
genes such as GAPDH and Actin. The ratio of
signal intensity for 3' probes to that from 5'
probes is a measure of the number of cDNA
synthesis reactions that went to completion (full
length cDNA is synthesized). An ideal ratio is
1 while a higher value indicates that many cDNAs
were started but did not go to completion. The
3'/5' ratio for the housekeeping genes should be
at most 3 or lt 4 for GAPDH (oligonucleotide array
Best Practices Workshops 2003). (http//jbpc.mb
l.edu/jbpc/GenomesMedia/HoffmannBestPractices.htm)
27Percent Present (P) values
The number of probe sets called Present
relative to the total number of probe sets on the
array is considered to be an important quality
control criterion by the oligonucleotide array
Best Practices Workshops (BPW) 2003. (http//jbpc
.mbl.edu/jbpc/GenomesMedia/HoffmannBestPractices.h
tm) Percent Present (P) values depend on
multiple factors including cell/tissue type,
biological or environmental stimuli, probe array
type, and overall quality of RNA. BPW 2003 for
PMT settings of 1,800 Volts, with SAPE
amplification present calls should be gt 25
(A mouse, rat, human arrays) Extremely low P
values are a possible indication of poor sample
quality. Replicate samples should have similar
P values (Affymetrix). present calls should
be consistent within a range of 10 (BPW 2003)
28Normalization/scaling
The overall brightness (or image intensity) of
a scanned array usually varies from array to
array. Sometimes it is possible to see
differences in the overall intensities of the
scanned images of arrays The upper probe array
image in the handout is abnormally bright
compared to the lower (normal) image.
29Normalization/scaling
- Differences in the overall intensities
- of arrays can also be seen by
- plotting histograms of the probe
- intensities on a single plot (or the
- log2 of the probe intensities)
- The histogram of the abnormal bright
- array is shifted to the right
- Another array is also shifted to the
- right, but to a lesser degree
30Normalization/scaling
We can also see the need for normalization from
boxplots Boxplots show the minimum, lower
quartile, upper quartile, median and maximum
value of the log 2 intensity levels for each
array The central box is drawn from the 25th to
the 75th percentiles with a line for the
median. Lines(whiskers) join the box to the
extreme observations, apart from suspected
outliers which are plotted as individual points.
The abnormal bright array is on the left
31Normalization/scaling
- There are many sources of experimental variation
that cause differences in the - overall intensities of arrays
- During RNA preparation e.g. differences in
number of cell between samples, mRNA extraction,
labeling - During manufacture of arrays e.g. amount of
oligos on cells - During hybridization e.g. amount of sample
applied, amount of target hybridized - After hybridization e.g. optical measurements,
label intensity, differences in signal
measurement sensitivity/differences in scanning
between samples
32Normalization/scaling
In order to compare gene expression levels
between two or more arrays, it is necessary that
the overall intensity of each array be brought to
the same level. There are many different ways
to achieve this, but the simplest method is
to multiply every expression value on one array
by a constant scaling factor to make the overall
level equivalent to that of another array. NB a
special case occurs when there are large overall
changes in gene expression between arrays e.g.
diauxic shift experiment. The method of
normalization described above should not be used
in this case.
33What we trying to do by normalizing note this
is quantile normalization
34Normalization to mRNA
The most widely used normalization method is to
normalize to mRNA. This method assumes that the
total amount of mRNA in a cell remains constant
under various experimental conditions. It is
assumed that the large majority of genes remain
relatively constant and those genes whose
expression levels are increased are countered by
genes whose expression levels decrease. Thus the
sum (or mean) of the expression values of each
array is simply scaled to a common number. This
gives the scaling factor. Then every expression
value on the array is multiplied by this scaling
factor. This is known as the global method of
scaling/normalization in MAS 5.0 and is used in
the majority of experiments.
35Normalization to housekeeping genes
This method is based on the assumption that
certain housekeeping genes such as GAPDH or
actin are constitutively expressed at a constant
level. Then we check that these genes are
expressed at the same level in the control and in
the sample. If they are not, we multiply all
expression levels in the sample array by a
scaling factor until the maintenance genes match
the expression levels in the control.
36Normalization to housekeeping genes
It is rare, however, that maintenance genes are
completely unaffected by an experiment and there
have been numerous reports that under certain
circumstances this is not the case Suzuki et al,
2000, Biotechniques 29 332 - 337. For
example, if one is comparing the expression
levels of a small number of genes where other
more global normalization methods are
inappropriate this method is acceptable.
However, even in these cases, it should be kept
in mind that small differences may be due to
differences in the standard. In general, this
is not considered an acceptable normalization
method.
37Normalization to a reference RNA
A special situation arises when a large
proportion of genes are affected in the
experiment due to global effects such as
starvation or heat shock. The global method of
scaling does not make sense in this case because
the overall intensities among arrays are no
longer comparable. Differences in overall
intensity are due to biological and/or
environmental conditions. In this case, spike
(hybridization) controls can be used for
normalization, addition of mRNA from a foreign
organism for which there are probes on the chip.
If spike controls are added in equal amounts to
mRNA preparations, it is possible to use the
ratio of their intensities to scale. For
example, Affymetrix adds bacterial and
bacteriophage probes to their eukaryotic
GeneChips. This allows the data from each
microarray to be scaled to the levels of these
known amounts of RNA standards. This is known as
the Selected Probe Sets method of
scaling/normalization in MAS 5.0.
38Normalization to a reference RNA
The scaling factor is calculated from the ratio
of the intensity levels of the reference probes
in the arrays. Then every expression level in
the array is multiplied by this scaling
factor. The disadvantages of this method of
normalization arise from the compounded errors
associated with the amounts of the standard
applied to each array and errors in the
measurements of these standards. But how we know
that the mRNA preparations are comparable at the
time that we add the spike controls to them? -
use spike control labeling only if there is no
other choice.
39Normalization in MAS 5.0
MAS calculates the mean intensity of the baseline
array by averaging the intensity values of every
probe set on the array with the exception of the
top and bottom 2 of the probe set intensities
(this is known as the trimmed mean). The trimmed
mean is used to make the measurement robust (i.e.
not sensitive to outliers). The mean intensity
of the comparison array is then multiplied by a
Normalization Factor (NF) to bring it to mean
intensity level of the baseline array.
40Scaling in MAS 5.0
As before, MAS calculates the trimmed mean
intensity of each array. The mean intensity of
the array is then multiplied by a Scaling Factor
(SF) to bring it to an arbitrary Target Intensity
value (usually 150) set by the user. Thus,
scaling allows a number of experiments to become
normalized to one Target Intensity, allowing
comparison between any two experiments.
In a particular set of experiments, the Scaling
Factor value for all the arrays should be close
to each other (within three-fold of each other).
41Scaling/Normalization in MAS
The scaling/normalization factors (calculated by
the global method) should be comparable among
arrays. Best Practices Workshop 2003 for PMT
settings of 1,800 Volts and using SAPE
amplification the scaling factor should be lt 5
Scaling/normalization factors calculated by the
Selected Probe Sets method should also be
equivalent for arrays being compared. Large
discrepancies between scaling/normalization
factors (e.g. three-fold or greater) may indicate
significant assay or biological variability or
degradation of the transcripts used for
scaling/normalization, which leads to noisier
data.
42Non-linear scaling
The linear scaling method described above is
suitable when the same degree of scaling can be
applied to all expression levels. However, there
are instances where the plot of intensities
appears like the middle plot below
After scaling
After scaling
After scaling
Gene intensities for chip 2
Before scaling
Before scaling
Before scaling
Gene intensities for chip 1
Gene intensities for chip 1
Gene intensities for chip 1
One scale factor, linear data
One scale factor, nonlinear data
Nonlinear scaling, nonlinear data
43Non-linear scaling
For non-linear, non-zero y-intercept data (centre
graph), the use of a single scaling factor fails
After scaling chip 1 with a single factor,
all genes are off the diagonal, implying
different expression on the two chips when it
should be the same for most genes. Such
nonlinear data are better scaled by scaling the
weakly and highly expressed genes separately.
Within a small range there is linearity and a
simple scaling factor can be used. (see Schadt
et al, 2000, J. Cell. BioChem 80 192-202, and
dChip Li and Wong, 2001 Genome Biology 2
1-11). Another example of a piece-wise linear
scaling technique for non-linear data is LOWESS
(LOcally WEighted Scatterplot Smoothing)
normalization (regression).
44Non-linear scaling
How many scaling factors should be used? If we
were to use as many scale factors as there are
genes we will scale out the signal, i.e. those
genes that are significantly different between
the two arrays. Thus, for example, Schadt et al,
J Cell Biochem (2000) 80 192-202, use two scale
factors. As for the case of linear scaling,
robustness is an issue. One way to make these
methods more robust is to perform the
normalization in multiple steps first fit a
curve, remove a certain fraction of points which
are furthest away from the curve and which
therefore are likely to be outliers, and finally
fit a curve again on the remaining points.
Multiple runs of these can be done to increase
robustness.
45- Quantile normalization is another method of
normalization that is used in RMA and other
packages from Bioconductor (www.bioconductor.org).
- Each array contains a certain distribution of
expression values and this method aims at making
the distributions across various arrays not just
similar but identical. - This is done as follows
- Imagine that the expression values from the
various arrays have been loaded into a
spreadsheet with genes along rows and arrays
along columns - Each column is sorted first. Next the value in
each row is replaced with the average of the
values in this row - Finally, the columns are unsorted (i.e. the
effect of the sorting step is reversed so items
in a column go back wherever they came from) - The distributions in all arrays become
identical in this process
46Original expression levels
Quantile normalization - Step 1 Sort each column
47Quantile normalization - Step 2 unsort the
columns back to their original positions
48Normalization in dChip
Note that MAS applies normalization after probe
set summarization, but it is also possible to do
normalization before the summarization
step. e.g. in dChip By default, in a set of
samples, the chip with median overall brightness
is chosen, and all other arrays are normalized to
it using invariant set normalization. In
invariant set normalization, a subset of PM
probes are obtained as follows. The probes on
each of the arrays are ranked by expression value
and probes with similar ranks comprise the
invariant set. The trimmed mean or curve fitting
is carried out on the invariant set only and then
the actual shift or shifts are performed on all
the probes.
49Conclusions
Normalization is important Hoffmann et al
Genome Biol. (2002) 3 Research 0033.1
0033.11 applied 12 algorithms by combining 4
normalization methods (invariant feature
normalization, model-based expression values,
invariant probe set and global scaling) and 3
methods of statistical analysis (parametric
ANOVA/non-parametric ANOVA and SAM) to two sample
groups. (1) The normalization method had more
impact on the resulting gene list than the method
of statistical analysis. (2) The underlying
assumptions of current normalization methods
generally fail for tissues of different origins
or sample subjected to extreme treatment
conditions. The user needs to apply existing
normalization methods with caution in such
situations.
50Which method of normalization is best?
There are no easy answers (1) MAS 5.0 is not
the final answer. (2) Bolstad et al (2003)
Bioinformatics 19 185-193, have compared
normalization methods. Their main conclusion
is that non-linear methods and their derivatives
that do not require a baseline and the quantile
method typically perform better than plain
unnormalized data or normalizing through mean
shifting. (3) Other references Irizarry et al
(2003) Biostat (in press) Zhou and Abagyan,
Current Opinion in Drug Discovery and Research
(2003), 6(3) 339-345
51Conclusions
The Association of Biomolecular Resource
Facilities (ABRF) conducted a multicenter study
in 2002 to identify factors that contribute to
variability in oligonucleotide microarray
results. This retrospective study used data from
835 MG-U74A and HG-U95A Affymetrix arrays that
were previously generated in the microarray core
facilities of the members of the Microarray
Research Group (MARG) (Knudtson et. al Factors
contributing to variability in DNA microarray
results the ABRF Microarray Research Group 2002
study. J. Biomol. Tech. 2002 108).
52Conclusions
The results of this study indicate that
Lab-to-lab variation accounted for the greatest
source of error. .CHP data generated by
different institutions may not be easily compared
without further normalization in comparative
analyses. Generating high quality microarray
data requires vigorous quality control measures
at each individual step of the process, starting
with the experimental design of the study, the
generation of samples, extraction of RNA,
labeling of the probe, and microarray
hybridization. In addition To minimize
experimental variability, the same personnel
should perform the hybridization, washing, and
scanning steps. The same Affymetrix protocols
should be used for hybridization, washing, and
scanning, and microarrays from the same
production lot should be used in comparative
studies.
53Acknowledgements
CBiS/CMA Sue Wilson Simon Easteal Yvonne
Pittelkow John Maindonald
JCSMR Frances Shannon
BRF/JCSMR Karen Edwards Kaiman Peng