MIcroarray Data Analysis System MIDAS - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

MIcroarray Data Analysis System MIDAS

Description:

Directory mode: Cy3, Cy5 threshold cutoff cross file trimming, no. normalization performed ... moving fashion. Historic work: Local Regression is an old data ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 32
Provided by: wli8
Category:

less

Transcript and Presenter's Notes

Title: MIcroarray Data Analysis System MIDAS


1
MIcroarray Data Analysis System (MIDAS)
Wei Liang
July 22, 2002
2
MIcroarray Data Analysis System (MIDAS)
  • Where does MIDAS fit in TIGRs Microarray
    software family

SpotFinder
PCRSCORE
SLITRACK
MADAM
MeV
MIDAS
McCoder
MABCDS
ExpDesigner
3
Microarray Data Analysis System (MIDAS)
  • Where to get it? Answer Software on Microarray
    (o\MIDAS\Alpha-Version\)

4
Inputs and Outputs
  • Inputs
  • Full size tav-format data file(s)
  • 4800, 19200, 27648, 32448 spots
  • 12-pen, 48-pen

Outputs Normalized, trimmed/full size
tav-format data file(s)
5
Inputs Selections
  • Directory mode
  • Batch-process multiple tav files under a
    directory
  • Pair mode
  • Flip-dye consistency checking and normalization
  • File mode
  • Single tav file process

6
Select Data Pane
7
Single File Selection Mode
8
File Pair(s) Selection Mode
9
Directory Selection Mode
10
Operation Method Options
11
Operations
12
Total Intensity Normalization
13
Total Intensity Normalization Trimming
  • Single file Total Intensity Normalization
    parameters
  • Cy3, Cy5 cutoff thresholds
  • Reference
  • Algorithm
  • Set Cy3, Cy5 to 0 for Non-B/C-flagged genes
  • Calculate Total Intensity factor SCy3 / SCy5
  • Scale Cy3 (to Cy3) or Cy5 (to Cy5) by the
    factor
  • Trim genes with Cy3 lt threshold or Cy5 lt
    threshold
  • Output trimmed tav file

14
Multiple File Total Intensity Normalization
Trimming
  • Multiple file Total Intensity Normalization
    parameters
  • Cy3, Cy5 cutoff thresholds
  • Cross file cutoff percentage
  • Reference
  • Algorithm
  • For each file, set Cy3, Cy5 to 0 for
    Non-B/C-flagged genes
  • For each file, calculate Total Intensity factor
    SCy3 / SCy5
  • For each file, scale Cy3 (to Cy3) or Cy5 (to
    Cy5) by the factor
  • For each file, flag good genes with Cy3 or
    Cy5 gt threshold
  • Cross files, for the same gene, find genes with

of good genes / of files gt cross file
cutoff
  • Output evenly-trimmed tav files for these genes

15
Cross File Trimming
Flagged good gene
Flagged bad gene
If Cross file cutoff 60
2/6 33
6/6 100
3/6 50
4/6 66
6/6 100
5/6 83
5/6 83
16
Low Intensity Filter
  • File mode

Cy3, Cy5 threshold cutoff trimming, no
normalization
performed
  • Directory mode

Cy3, Cy5 threshold cutoff cross file
trimming, no
normalization performed
17
LocFit Normlization Trimming
  • Lowess correction

Block mode Apply LOWESS on each block
data set
Global mode Apply LOWESS on full data set
18
LocFit Normalization
  • Why LOWESS?
  • Observations
  • Tilted tails at low end and high end

2. Mean not centered at 0
19
LocFit Normalization (Cond)
Gene X
Exp factor
Bio factor
  • If Cy3, Cy5 equally expressed, log2(Cy5/Cy3) 0
  • Two factors contributed to up-regulate gene X

1. Biological factors (we are interested)
2. Experimental factors, e.g. different
sensitivity to
red and green lasers (we are NOT
interested and
desire to get rid of.)
20
LocFit Normalization (Cond)
Gene X
Exp factor
Bio factor
We need to find a way to extract the experimental
factors
Approach Assume similiar experimental factors
applied
to genes closer to each other
in the logProd-logRatio plot
Predict the Exp factor from a
group of locally neighboring
data --- equivalent to a curve
fitting problem.
21
LOWESS
  • Stand for Locally-Weighted Estimation or

Locally-Weighted Regression (LWR)
  • Historic work Local Regression is an old data

smoothing method proposed back in 1829 by a
Danish
mathmatician.
Modern work Kernel function and weight
function
better studied in 1950s.
  • An approach to fitting curves and surfaces to
    noisy data

by a multivariate smoothing procedure fitting
a linear
or quadratic function of the predictor
variables in a
moving fashion.
22
LOWESS (cond)
  • Local Regression

Global linear Regression Local linear
regression
  • Localness Smooth Parameter / Bandwidth

Low Smooth Param
High Smooth Param
Bumpy curve (underfit, noise) Smooth
curve (overfit, info loss)
23
LOWESS (Cond)
  • Weight function

Once the local data sets is determined, a
weight function
is applied to all data in the local data
set. The further the
distance is between the query data and its
neighboring
data, the less the neighboring data will
affect the
pridiction of the query data.
(1- u3)3 u lt 1
Tri-cube function w(u)
0 u gt 1
24
LOWESS (Cond)
  • Local linear regression model
  • Tri-cube weight function
  • Least Squares

Estimated values of log2(Cy5/Cy3) as function of
log10(Cy3Cy5)
25
LOWESS (Cond)
  • Use the estimated curve y(xi) to correct raw data

log2(Ri/Gi) log2(Ri/Gi) y(xi) log2(Ri/Gi)
log2(Ri/Gi) log22y(xi) log2(Ri/Gi)
log2(Ri/Gi 1/2y(xi))
Ri Ri Gi Gi 2 y(xi)
26
LOWESS (Cond)
LOWESS-corrected logRatio-logProd plot
27
Slice Analysis
  • Use user-specified SliceWindow to calculate
    SliceWidth

SliceWidth max(logProd) min(logProd)/SliceWi
ndow
  • Slide the Slice Window along the logProd axis
  • Find the logRatio distribution in each slice
    window, calculate s
  • Generate tav files for those genes falling into
    a user-specified

range
28
Slice Analysis (Cond)
29
Flip-Dye Normalization Trimming
File 1
File 2
G2
R2
G1
R1
30
Replicates Analysis
31
Questions? Suggestions?
My email wliang_at_tigr.org
Write a Comment
User Comments (0)
About PowerShow.com