Title: The Genome Access Course Microarray Informatics
1TheGenomeAccessCourseMicroarrayInformatics
Maple Leaves
2Yeast Microarray
3From Brown PO, Botstein D. Exploring the new
world of the genome with DNA microarrays. Nat
Genet. 1999 21 33-7
4From Brown PO, Botstein D. Exploring the new
world of the genome with DNA microarrays. Nat
Genet. 1999 21 33-7
5From Brown PO, Botstein D. Exploring the new
world of the genome with DNA microarrays. Nat
Genet. 1999 21 33-7
6(No Transcript)
7(No Transcript)
8Two-Color Detection
- Cy3 and Cy5
- Forgiving of array imperfections
- Generates robust ratio data
- Scalable
9Systems
- Affymetrix
- Axon
- Home-grown
10Terminology
- Feature array element
- Probe a feature corresponding to a defined
sequence (synthetic oligos or cDNAs) - Target nucleic acid pool of unknown sequence
11Issues
- Array Fabrication
- Probe Preparation
- Hybridization
- Image Analysis
- Data Visualization/Analysis
- Data Storage
12Image Analysis
- Feature Identification
- Background
- Median vs. mean
- GenePix
13Data Analysis
- Normalization
- Replicates
- Expression Analysis
- Clustering
- Hierarchical Clustering
- k-Means
- Self-Organizing Maps
14Data Storage
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22Normalization
- Correct for systematic bias in data
- Attempts to remove non-biological influences from
biological data - Provides a baseline for comparison between
microarrays - Possibly compare data from one platform to another
23Sources of Variation
- Printing and/or tip problems
- Labeling and dye effects (differing amounts of
RNA labeled between the 2 channels) - Differences in the power of the two lasers (or
other scanner problems) - Difference in DNA concentration on arrays (plate
effects) - Spatial biases in ratios across the surface of
the microarray due to uneven hybridization
24An Example
- In array 1, the Cy5 dye labels twice as
efficiently of the is 2 times better than for the
Cy3 dye. - Everything else is the same
- Ratios will be 2 instead of 1
- In array 2, the Cy5 dye labels only 1.5 times as
efficiently - Ratios will be 1.5 instead of 1
25Methods
- Housekeeping genes
- Spiked-in controls
- Mean/median log ratio adjustment
- Loess curve fit
- Local versions of above two
- Use of reverse-labeled replicates (dye swap
experiments)
26Methods Normalization Factor/Function Calculation
- Global mean or median normalization This method
calculates a normalization factor based on the
selected elements, as either the mean or median
log ratio of those selected elements. This value
is then subtracted from the log ratio of the
elements to which this normalization factor is to
be applied. If you plot a histogram of log
ratios, you get a roughly normal distribution.
What this method does is simply shift that
distribution along the x-axis, so that it is
centered around zero. - Intensity-dependent normalization A function is
generated, using the selected elements, that is
intensity-dependent. This is usually done as a
loess fit to a plot of log ratio vs log (mean
intensity). This function is then applied to the
data.
27Background Subtraction
- Element Intensity - Element Background Intensity
Background Corrected Intensity - Can be local or global
- Sensitive to spot morphologies
28Global Mean Normalization
- Element Intensity / Global Mean Element Intensity
Global Mean Normalized Intensity - Performed for both sets of intensities separately
29Local Mean Normalization across Microarray Surface
- Corrects spatial artifacts
- Requires x y coordinates
- Element Intensity/ Local Mean Intensity Local
Mean Normalized Intensity
30Logarithmic Transformation
- Perform a logarithmic transformation of all
intensities - Base 10 is common
31Calculate Mean Log(Intensities) and Log(Ratios)
- X axis is the mean gene expression level in the
two samples - Mean (Log(Intensity)) Geomentric Mean Intensity
- Y axis is a measure of differential gene
expression between the two samples - Log(ONE/TWO) Log(ONE) - Log(TWO)
32Local Mean Normalization across Element Signal
Intensity
- Generates a loess fit from local mean intensity
levels - Log (Ratio) - Local Mean Intensity Residual
Corrected Log (Ratio)
33Local Variance Correction across Element Signal
Intensity
- Loess fit from local standard deviations
Corrected Log (Ratio) / Local Stadard Deviation
Local Z-Score