MIcroarray Data Analysis System version 2'19 - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

MIcroarray Data Analysis System version 2'19

Description:

... Within each block and each , spots should have the same spread for log(Cy5 ... Let aij be the raw log ratio for the jth spot in ith block (or ) ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 41
Provided by: wli7
Category:

less

Transcript and Presenter's Notes

Title: MIcroarray Data Analysis System version 2'19


1
MIcroarray Data Analysis System(version 2.19)
Wei Liang October 2004
2
Microarray Data Flow
Image Analysis
.tiff Image File
Raw Gene Expression Data
Gene Annotation
Normalization / Filtering
Normalized Data with Gene Annotation
Expression Analysis
Data Entry / Management
Interpretation of Analysis Results
3
MIDAS is a Normalization and Filtering tool
for microarray data analysis!
4
MIDAS is a Normalization and Filtering tool
for microarray data analysis!
Serves as a data pre-processor for clustering
analysis (MeV).
5
Why Normalization and Filtering?
.tiff Image Files
Raw Data File
Sample1 mRNA
Cy3 intensity
RT
RT
cDNA array
Sample2 mRNA
Cy5 intensity
6
Why Normalization and Filtering?
  • The hypothesis underlying microarray analysis is
    that the measured intensities for each arrayed
    gene represent its relative expression level.
  • We use these intensities to identify biologically
    relevant patterns of expression by comparing
    measured levels between states on a gene-by-gene
    basis.
  • However, before the levels can be appropriately
    compared, one generally performs a number of
    transformations on the data to eliminate
    questionable or low quality data, to adjust the
    measured intensities to facilitate comparisons,
    and to select those genes that are significantly
    differentially expressed.

7
MIDAS data analysis methods
  • 8 normalization/transformation methods

Total Intensity normalization
Ratio Statistics normalization
LOWESS (Locfit) normalization
Standard deviation regularization
Iterative linear regression normalization
In-slide replicates analysis
Iterative log mean centering normalization
MA-ANOVA
  • 10 quality control filtering methods

Flip-dye consistency checking
Low intensity filter
Spot QC flag checking
Ratio Statistics confidence interval checking
Signal/Noise checking
Invalid-intensity checking
Cross-file-trim
  • 3 significant genes identification methods

Slice analysis (non-statistical)
Cross-slide replicates t-test (statistical)
Cross-slide one-class SAM (statistical)
8
Graphical scripting language
9
Graphical scripting language
  • Read input files
  • Define analysis
  • pipeline and set
  • parameters for
  • each analysis module
  • Write output files

10
MIDAS data analysis methods
  • 8 normalization/transformation methods

Total Intensity normalization
Ratio Statistics normalization
LOWESS (Locfit) normalization
Standard deviation regularization
Iterative linear regression normalization
In-slide replicates analysis
Iterative log mean centering normalization
MA-ANOVA
  • 10 quality control filtering methods

Flip-dye consistency checking
Low intensity filter
Spot QC flag checking
Ratio Statistics confidence interval checking
Signal/Noise checking
Invalid-intensity checking
Cross-file-trim
  • 3 significant genes identification methods

Slice analysis (non-statistical)
Cross-slide replicates t-test (statistical)
Cross-slide one-class SAM (statistical)
11
Sample data
12
LOWESS (Locfit) normalization
R-I plot logRatio vs. logIntensityProduct
  • Observations
  • Tilted tails at low intensity end and high
    intensity end

2. Mean not centered at 0 intensity dependent
13
LOWESS (Locfit) normalization
Gene X
Exp factor
Bio factor
  • If Cy3, Cy5 equally expressed, log2(Cy5/Cy3) 0
  • Two factors contributed to the up-regulated gene
    X

1. Biological factors (we are interested)
2. Experimental factors, e.g. different
sensitivity to
red and green lasers (we are NOT
interested and
desire to get rid of.)
14
LOWESS (Locfit) normalization
Gene X
Exp factor
Bio factor
15
LOWESS (Locfit) normalization
  • Local linear regression model
  • Tri-cube weight function
  • Least Squares

Estimated values of log2(Cy5/Cy3) as function of
log10(Cy3Cy5)
16
LOWESS (Locfit) normalization
Use the estimated curve y(xi) to correct raw data
log2(Ri/Gi) log2(Ri/Gi) y(xi) log2(Ri/Gi)
log2(Ri/Gi) log22y(xi) log2(Ri/Gi)
log2(Ri/Gi 1/2y(xi))
Ri Ri Gi Gi 2 y(xi)
17
LOWESS (Locfit) normalization
LOWESS-corrected RI plot
18
Standard deviation regularization
Assumption Within each block and each slide,
spots should have the same spread for
log(Cy5/Cy3, 2) values
SD-Reg scales the (Cy3, Cy5) intensity pair for
each spot so that the spot sets within each block
or each slide will have the same standard
deviation as other blocks or slides.
19
Standard deviation regularization
  • Let aij be the raw log ratio for the jth spot
    in ith block (or slide)

aij be the scaled log ratio for the jth spot in
ith block (or slide)
where Nj denotes the number of genes ith block or
ith slide, M denotes the number of blocks or
slides, aij denotes the log ratio mean of ith
block (or ith slide)
20
Standard deviation regularization
21
Flip dye replicates consistency filter
  • Flip dye experiments help reduce random error
  • The intensities in the file pair are flipped,
    i.e.
  • R1/G1 G2/R2
  • or
  • R1 G2, G1 R2

22
Flip dye replicates consistency filter
  • Calculate expression levels for all genes in the
    flip-dye pair
  • Filter genes with inconsistent expression levels
    between
  • flip-dye replicates
  • For those genes passed the consistency checking,
    take geometric mean for the corresponding
    intensities from the replicated pairs

How consistency is measured between replicates?
23
Flip dye replicates consistency filter
100 consistency
24
Flip dye replicates consistency Filter
  • SD cut vs. Threshold cut

Regardless of datasets, always cut the same
percentage for the same ?
SD cut
The percentage to cut depends on the specified
log-ratio consistency range
-1lt lt 1
Threshold cut
1/2 lt lt 2
25
Flip dye replicates consistency filter
  • Calculate expression levels for all genes in the
    flip-dye pair
  • Filter genes with inconsistent expression levels
    between
  • flip-dye replicates
  • For those genes passed the consistency checking,
    take geometric mean for the corresponding
    intensities from the replicated pairs

26
Slice Analysis filter
  • Remove genes with z-scores beyond an interested
    range

27
Slice Analysis filter
  • Remove genes with z-scores beyond an interested
    range

28
Slice Analysis filter
  • Define a slice window
  • Sliding the window along the log(IntensityProduct
    ) axis
  • Calculate logRatioMean and logRatioSD of data
    points within each slice window
  • Calculate Z-scores of each data point

Z-score (logRatio-logRatioM
ean)/ logRatioSD
  • Trim data with Z-scores beyond interested range

29
Slice Analysis filter
30
Analysis packaging
myAnalysis.prj
31
MIDAS graphing
32
MIDAS graphing
R-I plot (.prc)
FlipDye Diagnostic plot (.rrc)
Intensity plot (.ity, .lty)
Z-score Distribution plot (.his)
SAM plot (.sam)
Box plot (.box)
33
MIDAS data viewer
34
Statistical significant genes identification
methods
Two methods implemented in this release of MIDAS
  • Cross-slide replicates one-class T-test
  • Cross-slide replicates one-class SAM

35
SAM (Significance Analysis of Microarrays)
A statistical technique for finding significant
genes in a set of microarray experiments.
Reference
Tusher, V.G., R. Tibshirani and G. Chu. 2001.
Significance analysis of microarrays applied to
the ionizing radiation response. Proceedings of
the National Academy of Sciences USA 98
5116-5121.
Designs
  • two-class unpaired
  • two-class paired
  • multi-class unpaired
  • censored survival
  • one-class (available in this release)

36
SAM (Significance Analysis of Microarrays)
One-class SAM
Identify genes whose mean expression across
experiments are different from a user-specified
mean.
  • Assign a score (d) to each gene based on its
    change in expression relative

to the standard deviation of repeated
measurements for the gene
  • Genes with scores gt a threshold (?) are deemed
    potentially significant
  • For these deemed potentially significant
    genes, the proportion of

them likely to have been wrongly identified by
chance, or
False Discovery Rate (FDR) is estimated
  • The goal is picking a set of differentially
    expressed genes with a

user-satisfied FDR
37
SAM (Significance Analysis of Microarrays)
positively significant genes
FDR
? adjustment
38
Automated report generation
39
Automated report generation
40
TM4 MIDAS web page
http//www.tigr.org/software/tm4/midas.html
http//www.tm4.org/midas.html
Write a Comment
User Comments (0)
About PowerShow.com