Title: MIcroarray Data Analysis System MIDAS
1MIcroarray Data Analysis System (MIDAS)
Wei Liang
July 22, 2002
2MIcroarray Data Analysis System (MIDAS)
- Where does MIDAS fit in TIGRs Microarray
software family
SpotFinder
PCRSCORE
SLITRACK
MADAM
MeV
MIDAS
McCoder
MABCDS
ExpDesigner
3Microarray Data Analysis System (MIDAS)
- Where to get it? Answer Software on Microarray
(o\MIDAS\Alpha-Version\)
4Inputs and Outputs
- Inputs
- Full size tav-format data file(s)
- 4800, 19200, 27648, 32448 spots
- 12-pen, 48-pen
Outputs Normalized, trimmed/full size
tav-format data file(s)
5Inputs Selections
- Directory mode
- Batch-process multiple tav files under a
directory
- Pair mode
- Flip-dye consistency checking and normalization
- File mode
- Single tav file process
6Select Data Pane
7Single File Selection Mode
8File Pair(s) Selection Mode
9Directory Selection Mode
10Operation Method Options
11Operations
12Total Intensity Normalization
13Total Intensity Normalization Trimming
- Single file Total Intensity Normalization
parameters
- Cy3, Cy5 cutoff thresholds
- Set Cy3, Cy5 to 0 for Non-B/C-flagged genes
- Calculate Total Intensity factor SCy3 / SCy5
- Scale Cy3 (to Cy3) or Cy5 (to Cy5) by the
factor
- Trim genes with Cy3 lt threshold or Cy5 lt
threshold
14Multiple File Total Intensity Normalization
Trimming
- Multiple file Total Intensity Normalization
parameters
- Cy3, Cy5 cutoff thresholds
- Cross file cutoff percentage
- For each file, set Cy3, Cy5 to 0 for
Non-B/C-flagged genes
- For each file, calculate Total Intensity factor
SCy3 / SCy5
- For each file, scale Cy3 (to Cy3) or Cy5 (to
Cy5) by the factor
- For each file, flag good genes with Cy3 or
Cy5 gt threshold
- Cross files, for the same gene, find genes with
of good genes / of files gt cross file
cutoff
- Output evenly-trimmed tav files for these genes
15Cross File Trimming
Flagged good gene
Flagged bad gene
If Cross file cutoff 60
2/6 33
6/6 100
3/6 50
4/6 66
6/6 100
5/6 83
5/6 83
16Low Intensity Filter
Cy3, Cy5 threshold cutoff trimming, no
normalization
performed
Cy3, Cy5 threshold cutoff cross file
trimming, no
normalization performed
17LocFit Normlization Trimming
Block mode Apply LOWESS on each block
data set
Global mode Apply LOWESS on full data set
18LocFit Normalization
- Tilted tails at low end and high end
2. Mean not centered at 0
19LocFit Normalization (Cond)
Gene X
Exp factor
Bio factor
- If Cy3, Cy5 equally expressed, log2(Cy5/Cy3) 0
- Two factors contributed to up-regulate gene X
1. Biological factors (we are interested)
2. Experimental factors, e.g. different
sensitivity to
red and green lasers (we are NOT
interested and
desire to get rid of.)
20LocFit Normalization (Cond)
Gene X
Exp factor
Bio factor
We need to find a way to extract the experimental
factors
Approach Assume similiar experimental factors
applied
to genes closer to each other
in the logProd-logRatio plot
Predict the Exp factor from a
group of locally neighboring
data --- equivalent to a curve
fitting problem.
21LOWESS
- Stand for Locally-Weighted Estimation or
Locally-Weighted Regression (LWR)
- Historic work Local Regression is an old data
smoothing method proposed back in 1829 by a
Danish
mathmatician.
Modern work Kernel function and weight
function
better studied in 1950s.
- An approach to fitting curves and surfaces to
noisy data
by a multivariate smoothing procedure fitting
a linear
or quadratic function of the predictor
variables in a
moving fashion.
22LOWESS (cond)
Global linear Regression Local linear
regression
- Localness Smooth Parameter / Bandwidth
Low Smooth Param
High Smooth Param
Bumpy curve (underfit, noise) Smooth
curve (overfit, info loss)
23LOWESS (Cond)
Once the local data sets is determined, a
weight function
is applied to all data in the local data
set. The further the
distance is between the query data and its
neighboring
data, the less the neighboring data will
affect the
pridiction of the query data.
(1- u3)3 u lt 1
Tri-cube function w(u)
0 u gt 1
24LOWESS (Cond)
- Local linear regression model
- Tri-cube weight function
- Least Squares
Estimated values of log2(Cy5/Cy3) as function of
log10(Cy3Cy5)
25LOWESS (Cond)
- Use the estimated curve y(xi) to correct raw data
log2(Ri/Gi) log2(Ri/Gi) y(xi) log2(Ri/Gi)
log2(Ri/Gi) log22y(xi) log2(Ri/Gi)
log2(Ri/Gi 1/2y(xi))
Ri Ri Gi Gi 2 y(xi)
26LOWESS (Cond)
LOWESS-corrected logRatio-logProd plot
27Slice Analysis
- Use user-specified SliceWindow to calculate
SliceWidth
SliceWidth max(logProd) min(logProd)/SliceWi
ndow
- Slide the Slice Window along the logProd axis
- Find the logRatio distribution in each slice
window, calculate s
- Generate tav files for those genes falling into
a user-specified
range
28Slice Analysis (Cond)
29Flip-Dye Normalization Trimming
File 1
File 2
G2
R2
G1
R1
30Replicates Analysis
31Questions? Suggestions?
My email wliang_at_tigr.org