Statistica Analysis of Microarray Data - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Statistica Analysis of Microarray Data

Description:

Statistica Analysis of Microarray Data – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 26
Provided by: james210
Category:

less

Transcript and Presenter's Notes

Title: Statistica Analysis of Microarray Data


1
An Analysis of MicroArray Quality Control Data
James J. Chen, Ph.D. Division of Biometry and
Risk Assessment National Center for Toxicological
Research U.S. Food and Drug Administration 2006
FDA and Industry Workshop September 29, 2006 The
views expressed in this presentation do not
represent those of the U.S. Food and Drug
Administration
2
Outline
  • Background MAQC experimental design and data
  • Microarray Platform Comparisons
  • Inter-platform analysis
  • Intra-platform analysis and platforms
    performance
  • concordance, site effects, consistency,
    discriminability
  • sensitivity, specificity, and accuracy in gene
    selection
  • self-consistency of titration mixture
  • TaqMan and microarray platforms comparability
  • Conclusion

3
MicroArray Quality Control Project
Objective To compare expression data generated
at multiple test sites (labs) using several
microarray-based and alternative technology
platforms Microarray platforms
Alternatives platforms Applied Biosystems
ABI (1) Applied Biosystems
(TAQ) Affymetrix  AFX (1)
Panomics (QGN) Agilent
AGI (1, 2) Gene Express
(GEX) Eppendorf  EPP (1) GE
Healthcare  GEH (1) Illumina 
ILM (1) NCI_Operon 
NCI (2)
Nature Biotechnology v24(9), Sep (2006)
4
MAQC Experimental Design
  • Four RNA samples
  • Sample A Universal human reference RNA
    (Stratagene)
  • Sample B Human brain reference RNA (Ambion)
  • Sample C (75 A 25 B)
  • Sample D (25 A 75 B)
  • Three sites for each microarray platform (NCI 2
    sites)
  • One site for the TAQ, QGN, GEX
  • Five technical replicates for each microarray
    platform
  • Four replicates for TAQ, three replicates for
    QGN GEX
  • EPP 294 target genes QGN 245 GEX205

5
MAQC Data Used for Comparisons
Array2 58 60 56 60 59 N/A
Probe 32,878 54,675 43,931 54,359 47,293 1,004
Site 3 3 3 3 3 1
Rep1 5 5 5 5 5 4
Sample 4 4 4 4 4 4
Platform ABI AFX AGIGEH ILM TAQ
12,091 common genes among microarray platforms
906 TAQ genes are among the 12,091 genes 1.
technical replicates 2. a total of 293 arrays
6
Hierarchical Clustering of 293 arrays on12091
genes from all pairwise correlations between two
arrays.
7
Concordance all pairwise Inter-platform sample
correlation coefficients between two arrays from
different platforms.
.82
.74
.71
.70
.68
.45
Up to 2250 (10x15x15) correlations computed for
each sample.
8
Concordance all pairwise Inter-platform
fold-change correlation coefficients between two
arrays from different platforms.
.92
.85
.84
.82
.78
.78
.75
.53
90 (10 x 3 x 3) correlations for each fold-change
9
Cross Platform Consistency
  • Proportion of genes shows a significant
    platformsample interaction from the
    (gene-by-gene) ANOVA
  • y m P Sample PSample e
  • Significant interaction the patterns of
    expression of the four samples are inconsistent
    across the platforms.

10
Plot of the p-values versus ranking proportions
P r o p o r t i o n
0.3
log10 p
The proportion of significances is 30 at a 0.01
11
Consistency (p gt 0.01)
Inconsistency (p lt 0.01)
12
Intra-Platform Analysis
  • Concordance all pairwise correlations between
    two arrays from different sites for samples
    A,B,C, and D (3 x 5 x 5 correlations).
  • Site Effects ANOVA y m sample site
    samplesite e
  • Site Effect the variance ratio, F MSEsite/MSEe
  • Consistency proportion of genes shown to have a
    significant samplesite interaction (a 0.01).
  • Discriminability ANOVA y m sample e
  • Variability residual mean square (total
    variation other than sample differences).
  • Discriminability the proportion of the genes
    shown to have significant sample effects (a
    0.0001). .

13
Individual Platforms Performance
  • Reproducibility and Consistency
    Performance
  • Median Correlation Site
    Consy MSE Discrty2
  • rA rB rC rD
    Fm h1 s2 t
  • AFX .988 .988 .991 .992 24. .012
    .066 .618
  • ABI .968 .964 .972 .969 15. .008
    .107 .620
  • AG1 .978 .982 .982 .981 28. .063
    .090 .633
  • ILM .980 .979 .980 .981 242. .020
    .266 .441
  • GEH .925 .904 .872 .862 64. .097 .267
    .453

1. a 0.01 2. a 0.0001.
14
Gold Standard Set
  • A gene is differentially expressed if it was
    shown to be significant in at least 2 of the 5
    platforms at a 10-5.
  • H0 mA - mB 0 versus H1 mA - mB
    ? 0
  • (8265 genes were selected)
  • A gene is non-differentially expressed if its
    fold change was shown to be between 0.90 and
    1/0.90 in at least 2 of the 5 platforms at a
    10-3. Let d - log2(0.90)
  • Equivalence test H0 mA-mB gt d
    versus H1 mA-mB lt d
  • (498 genes were selected)
  • Gold Standard 8607 genes (delete 78
    overlaps)

15
Accuracy (AC), sensitivity (SN), specificity
(SP), and FDR by FWE 0.05 and FDR 0.05 as
threshold.
FWE 0.05 FDR 0.05

AC SN SP FDR .92 .94 .55
.024 .89 .91 .59 .023 .92 .94
.55 .024 .88 .88 .95 .023 .82
.82 .69 .019
  • AC SN SP FDR
  • .77 .76 .95 .004
  • .74 .73 .95 .004
  • .81 .80 .80 .003
  • .55 .53 1.0 .000
  • .54 .52 .95 .005

AFX ABI AG1 ILM GEH
a 0.05/8607 5.8 x 10-6
16
Comment on MAQC Gene Selection
  • The MAQC project used technical replicates (small
    variance) with two distinct biological samples
    (large difference).
  • The number of differential expressed genes are
    much more than typical microarray experiments.
  • Generating a gene list is not a problem, the
    problem is determining the number of genes in the
    list.
  • General principle to identify a list of
    differentially expressed genes as accurately as
    possible.

17
Reproducibility of lists of differentially
expressed genes Percentage of Overlapping Genes
(POG)
For AFX, 6319 genes have p lt 10-5 4370 genes
have FC gt 2. For AB1, 6127 genes have p lt 10-5
4835 genes have FC gt 2. At least more than
4,000 genes can be selected with an FDR estimate
less than 2/4,000.
from MAQC Fig S2 of supplements.
18
Assessment of Titration Trend
  • Titration correlations 0.75A0.25B and C
    0.25A0.75B and D
  • Titration model (A two-step test)
  • The titration relationship can be modelled by
  • M1t y m b Conc Site e
  • Full ANOVA model.
  • M1 y m Sample Site e
  • S1 Test for Sample difference M1 H0t1 mA
    mB mC mD
  • S2 Test for the goodness of fit H0t2 M1t M1
  • Proportion of genes that reject H0t1and accept
    H0t2

19
Linear Titration Model
H0t1A
H0t1R,H0t2A H0t1R,H0t2R

20
Titration correlation for samples C and D, and
the proportions of the genes that follow the
titration relationship.
Correlation Titration Model (a1,
a2)
Sample C Sample D (5, 5) (1, 1)
.909 .911 .963
.976 .916 .928
.954 .967 .930
.939 .923 .944
.930 .936 .937
.954 .923 .934
.988 .988
AFX ABI AG1 ILM GEH
21
Taqman and microarray platform concordance
Box-Plots of all pairwise sample correlation
coefficients.
.80
.78
.77
.76
.75
.74
.74
.71
.71
.66
.62
.52
60 (4 x 15) correlations computed in each sample
22
Taqman and microarray platform concordance
Box-Plots of fold-change (B/A) correlation
coefficients.
.90
.89
.89
.88
.86
.86
.82
23
Consistency of TaqMan and Microarray platforms
Taqman and microarray
microarray platforms
  • Proportions of significances 0.72, 0.57, 0.49,
    0.65, 0.39 Proportion of significances
    microarray platforms 0.30

24
Conclusion (1)
  • Inter platform (microarray and Taqman)
  • Concordance
  • Sample correlations 0.45(D)-0.82 (A)
  • FC correlations Higher B/A Lower C/A
  • In-consistency
  • Microarray platforms Thirty percent (30) of
    genes show inconsistent expression patterns at a
    0.01.
  • Taqman and microarray platforms The proportions
    are between 0.34 to 0.74 for the five platforms.
  • Comparability
  • Intensities measured by different microarray
    platforms, and measured between microarray and
    Taqman platforms are different.

25
Conclusion (2)
  • Titration Trend
  • Titration Correlation The correlations between
    observed intensity and expected intensity are
    more than 90.
  • Titration trend All five platforms follow the
    linear titration relationship well.
  • Intra microarray platforms performance
  • Concordance Intra-platform correlations are
    high.
  • Site effect All platforms show site effects.
  • Consistency The patterns of expression are
    consistent across three sites.
Write a Comment
User Comments (0)
About PowerShow.com